# **Anomaly Detection & Time Series**

# **Assignment Code: DA-AG-018**

# **PRACTICAL QUESTIONS :**

**Question 6: Load a time series dataset (e.g., AirPassengers), plot the original series,
and decompose it into trend, seasonality, and residual components.**

**ANSWER:**

    import pandas as pd
    import matplotlib.pyplot as plt
    from statsmodels.tsa.seasonal import seasonal_decompose

# Load dataset
    df = pd.read_csv("AirPassengers.csv")

# Convert to datetime
    df['Month'] = pd.to_datetime(df['Month'])
    df.set_index('Month', inplace=True)

# Plot original series
    plt.figure(figsize=(10,4))
    plt.plot(df['Passengers'])
    plt.title("AirPassengers Time Series")
    plt.xlabel("Year")
    plt.ylabel("Passengers")
    plt.show()

# Decomposition
    decomposition = seasonal_decompose(df['Passengers'], model='multiplicative')

# Plot components
    decomposition.plot()
    plt.show()



**Question 7: Apply Isolation Forest on a numerical dataset (e.g., NYC Taxi Fare) to
detect anomalies. Visualize the anomalies on a 2D scatter plot.**

**ANSWER :**

    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.ensemble import IsolationForest

# Load dataset
    df = pd.read_csv("NYC_taxi_fare_data.csv")

# Select numerical features
    data = df[['fare_amount', 'trip_distance']].dropna()

# Train Isolation Forest
    model = IsolationForest(contamination=0.02, random_state=42)
    data['anomaly'] = model.fit_predict(data)

# Plot anomalies
    plt.figure(figsize=(8,6))
    plt.scatter(data['trip_distance'], data['fare_amount'],
    c=data['anomaly'], cmap='coolwarm', alpha=0.6)
    plt.xlabel("Trip Distance")
    plt.ylabel("Fare Amount")
    plt.title("Isolation Forest Anomaly Detection")
    plt.show()

**Question 8: Train a SARIMA model on the monthly airline passengers dataset.
Forecast the next 12 months and visualize the results.**

**ANSWER:**

    import pandas as pd
    import matplotlib.pyplot as plt
    from statsmodels.tsa.statespace.sarimax import SARIMAX

# Load dataset
    df = pd.read_csv("AirPassengers.csv")
    df['Month'] = pd.to_datetime(df['Month'])
    df.set_index('Month', inplace=True)

# Train SARIMA model
    model = SARIMAX(df['Passengers'],
                order=(1,1,1),
                seasonal_order=(1,1,1,12))
    results = model.fit()

# Forecast next 12 months
    forecast = results.forecast(steps=12)

# Plot results
    plt.figure(figsize=(10,5))
    plt.plot(df['Passengers'], label='Original')
    plt.plot(forecast, label='Forecast', color='red')
    plt.legend()
    plt.title("SARIMA Forecast for Next 12 Months")
plt.show()

**Question 9: Apply Local Outlier Factor (LOF) on any numerical dataset to detect
anomalies and visualize them using matplotlib.**

**ANSWER:**

    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.neighbors import LocalOutlierFactor

# Load dataset
    df = pd.read_csv("NYC_taxi_fare_data.csv")

# Select features
    data = df[['fare_amount', 'trip_distance']].dropna()

# Apply LOF
    lof = LocalOutlierFactor(n_neighbors=20, contamination=0.02)
    data['anomaly'] = lof.fit_predict(data)

# Plot
    plt.figure(figsize=(8,6))
    plt.scatter(data['trip_distance'], data['fare_amount'],
            c=data['anomaly'], cmap='coolwarm', alpha=0.6)
    plt.xlabel("Trip Distance")
    plt.ylabel("Fare Amount")
    plt.title("LOF Anomaly Detection")
    plt.show()

**Question 10: You are working as a data scientist for a power grid monitoring company.
Your goal is to forecast energy demand and also detect abnormal spikes or drops in
real-time consumption data collected every 15 minutes. The dataset includes features
like timestamp, region, weather conditions, and energy usage.
Explain your real-time data science workflow:**

*   **How would you detect anomalies in this streaming data (Isolation Forest / LOF /DBSCAN)?**

*   **Which time series model would you use for short-term forecasting (ARIMA /
SARIMA / SARIMAX)?**
  
* **How would you validate and monitor the performance over time?**

* **How would this solution help business decisions or operations?**

**ANSWER:**

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.ensemble import IsolationForest
    from statsmodels.tsa.statespace.sarimax import SARIMAX

# -----------------------------
# 1. Simulated real-time dataset
# -----------------------------
    np.random.seed(42)
    dates = pd.date_range(start="2023-01-01", periods=500, freq="15T")

# Simulated features
    energy = 100 + np.sin(np.arange(500)/20)*20 + np.random.normal(0,5,500)
    temperature = 25 + np.random.normal(0,3,500)

    df = pd.DataFrame({
    "timestamp": dates,
    "energy_usage": energy,
    "temperature": temperature
    })

    df.set_index("timestamp", inplace=True)

# Inject artificial anomalies
    df.iloc[100:105, 0] += 80
    df.iloc[300:305, 0] -= 60

# -----------------------------
# 2. Anomaly Detection (Isolation Forest)
# -----------------------------
    features = df[['energy_usage', 'temperature']]

    iso = IsolationForest(contamination=0.02, random_state=42)
    df['anomaly'] = iso.fit_predict(features)

# Plot anomalies
    plt.figure(figsize=(10,4))
    plt.plot(df.index, df['energy_usage'], label="Energy Usage")
    plt.scatter(df[df['anomaly'] == -1].index,
            df[df['anomaly'] == -1]['energy_usage'],
            color='red', label="Anomaly")
    plt.legend()
    plt.title("Real-time Anomaly Detection")
    plt.show()

# -----------------------------
# 3. Forecasting (SARIMAX)
# -----------------------------
    model = SARIMAX(df['energy_usage'],
                order=(1,1,1),
                seasonal_order=(1,1,1,96),  # 96 = 1 day in 15-min data
                exog=df[['temperature']])

    results = model.fit(disp=False)

# Forecast next 96 steps (1 day ahead)
    future_temp = np.full((96,1), df['temperature'].mean())
    forecast = results.forecast(steps=96, exog=future_temp)

# Plot forecast
    plt.figure(figsize=(10,4))
    plt.plot(df['energy_usage'][-200:], label="Recent Usage")
    plt.plot(pd.date_range(df.index[-1], periods=96, freq="15T"),
         forecast, label="Forecast", color='green')
    plt.legend()
    plt.title("Short-Term Energy Forecast")
    plt.show()




