# Anomaly Detection in Infrastructure Metrics

## Objectives
- Understand how to frame an unsupervised anomaly detection problem for time-series infrastructure data.
- Implement an **Isolation Forest** to identify spikes or drops in CPU/Memory usage.
- Use **Z-Score** statistical methods for simpler, threshold-based alerting.

## Dataset
- We will generate continuous, synthetic "Prometheus-style" time series data representing server CPU utilization over 7 days, complete with weekly seasonality, background noise, and injected anomalies.

## Expected Outcome
- A visualization distinguishing normal operational behavior from anomalous spikes.
- A simple function that accepts a new data point and returns whether it should trigger an alert.

## Challenge
- Tune the `contamination` parameter of the Isolation Forest. How does changing it impact the false-positive rate of your alerts?

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import IsolationForest

# Set random state for reproducibility
np.random.seed(42)
sns.set_theme(style="darkgrid")

### 1. Generating Synthetic Prometheus Data
Real infrastructure metrics have seasonality (people sleep at night) and noise.

In [None]:
def generate_cpu_metrics(days=7, points_per_hour=60):
    total_points = days * 24 * points_per_hour
    t = np.linspace(0, days * 2 * np.pi, total_points)
    
    # Base daily sine wave (peaking in middle of day)
    base_load = 40 + 20 * np.sin(t - (np.pi/2))
    
    # Add noise
    noise = np.random.normal(0, 3, total_points)
    
    # Combine
    cpu_usage = base_load + noise
    
    # Inject Anomalies
    # 1. Sudden massive spike
    cpu_usage[5000:5010] += 45
    # 2. Complete drop (service down)
    cpu_usage[8000:8050] = 0
    
    # Create DataFrame
    df = pd.DataFrame({
        'timestamp': pd.date_range(start='2025-01-01', periods=total_points, freq='1min'),
        'cpu_usage': np.clip(cpu_usage, 0, 100) # CPU can't go below 0 or above 100
    })
    return df

df = generate_cpu_metrics()

plt.figure(figsize=(15, 5))
plt.plot(df['timestamp'], df['cpu_usage'], alpha=0.7)
plt.title("Synthetic CPU Usage (7 Days)")
plt.ylabel("CPU Utilization %")
plt.show()

### 2. Isolation Forest
Isolation Forest is great for multidimensional data or data that isn't perfectly normally distributed. It "isolates" anomalies by randomly partitioning the data; anomalies are easier to isolate (require fewer partitions) than normal points.

In [None]:
# We expect roughly 1% of our data to be anomalous
model = IsolationForest(contamination=0.01, random_state=42)

# Fit the model (unsupervised, so we don't pass labels)
# Must reshape data to 2D array for sklearn
df['anomaly_iforest'] = model.fit_predict(df[['cpu_usage']])

# Isolation forest returns -1 for anomaly, 1 for normal.
# Let's map it to True/False for easier plotting
df['is_anomaly'] = df['anomaly_iforest'] == -1

print(f"Detected {df['is_anomaly'].sum()} anomalies out of {len(df)} points.")

In [None]:
plt.figure(figsize=(15, 5))
plt.plot(df['timestamp'], df['cpu_usage'], alpha=0.7, label='Normal Load')

# Highlight anomalies
anomalies = df[df['is_anomaly']]
plt.scatter(anomalies['timestamp'], anomalies['cpu_usage'], color='red', label='Anomaly', zorder=5)

plt.title("Isolation Forest Anomaly Detection")
plt.ylabel("CPU Utilization %")
plt.legend()
plt.show()