# üîç Day 3: Anomaly Detection

**üéØ Goal:** Master techniques to find outliers, anomalies, and unusual patterns in data

**‚è±Ô∏è Time:** 60-75 minutes

**üåü Why This Matters for AI:**
- Fraud detection: catch unusual transactions before they cause damage
- AI safety: detect unusual or harmful AI agent behavior
- RAG systems: identify low-quality or irrelevant documents
- Used by OpenAI, Anthropic to monitor model outputs for safety
- Critical for multimodal AI: detect adversarial inputs (jailbreaks)
- Powers cybersecurity, quality control, and system monitoring

---

## ü§î What is Anomaly Detection?

**Anomaly (Outlier)** = A data point that differs significantly from the majority.

**Why it matters:**
- üí≥ Fraud detection: Normal spending = $50-200, Anomaly = $5,000 purchase in foreign country
- üè• Medical diagnosis: Detect unusual vital signs before emergency
- ü§ñ AI safety: Detect when AI agent behaves unusually
- üîí Cybersecurity: Identify intrusions and attacks
- üè≠ Manufacturing: Catch defective products

**The challenge:**
- Anomalies are rare (1% or less of data)
- Usually unlabeled (you don't know what's anomalous in advance)
- Can be subtle or obvious
- Context-dependent (unusual in one context, normal in another)

**We'll learn 3 powerful techniques:**
1. **Isolation Forest** - Fast, tree-based, great for high dimensions
2. **One-Class SVM** - Boundary-based, works well for complex shapes
3. **Autoencoders** - Deep learning approach, learns normal patterns

Let's catch some anomalies! üëá

In [None]:
# Import our tools
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM
from sklearn.datasets import make_blobs, make_moons
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix

# Deep learning (for autoencoders)
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers
    TF_AVAILABLE = True
except ImportError:
    TF_AVAILABLE = False
    print("‚ö†Ô∏è  TensorFlow not installed. Autoencoder section will be skipped.")
    print("   Install with: pip install tensorflow")

# Set random seed and style
np.random.seed(42)
if TF_AVAILABLE:
    tf.random.set_seed(42)
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)

print("‚úÖ Libraries imported successfully!")
print("üîç Ready to detect anomalies!")

## üéØ Technique 1: Isolation Forest

**Isolation Forest** = Isolate anomalies using random decision trees!

### Core idea:
**Anomalies are few and different** ‚Üí easier to isolate!

**Analogy:** Finding a purple cow in a field of brown cows:
- Ask: "Is it purple?" ‚Üí Immediately isolates the anomaly! (1 question)
- Finding a specific brown cow ‚Üí Need many questions to isolate it

### How it works:
1. **Build trees:** Randomly select features and split points
2. **Measure path length:** How many splits to isolate each point?
3. **Anomaly score:** Short path = easy to isolate = anomaly!

### Strengths:
- ‚úÖ Fast and scalable (works on millions of points)
- ‚úÖ Works well in high dimensions
- ‚úÖ No need to define "normal" (unsupervised)
- ‚úÖ Robust to noise
- ‚úÖ Few hyperparameters

### Weaknesses:
- ‚ùå May struggle with local anomalies in dense regions
- ‚ùå Random (slightly different results each run)

### Real AI Use (2024-2025):
- **Fraud detection:** Catch unusual credit card transactions
- **RAG quality:** Identify low-quality or irrelevant documents
- **AI monitoring:** Detect unusual model outputs
- **Cybersecurity:** Identify network intrusions

In [None]:
# Create sample data: normal points + anomalies
np.random.seed(42)

# Normal data: two clusters
X_normal, _ = make_blobs(n_samples=300, n_features=2, centers=2, 
                         cluster_std=0.5, random_state=42)

# Anomalies: scattered outliers
X_anomalies = np.random.uniform(low=-6, high=6, size=(20, 2))

# Combine
X_combined = np.vstack([X_normal, X_anomalies])
y_true = np.array([0]*300 + [1]*20)  # 0=normal, 1=anomaly

print("üìä Dataset:")
print(f"   Normal points: {len(X_normal)}")
print(f"   Anomalies: {len(X_anomalies)}")
print(f"   Anomaly rate: {len(X_anomalies)/len(X_combined):.1%}")

# Visualize original data
plt.figure(figsize=(10, 7))
plt.scatter(X_normal[:, 0], X_normal[:, 1], 
           c='blue', s=50, alpha=0.6, label='Normal', edgecolors='black', linewidths=0.5)
plt.scatter(X_anomalies[:, 0], X_anomalies[:, 1], 
           c='red', s=100, alpha=0.8, label='Anomalies', 
           marker='X', edgecolors='black', linewidths=1)
plt.xlabel('Feature 1', fontsize=12)
plt.ylabel('Feature 2', fontsize=12)
plt.title('Dataset: Normal Points + Anomalies', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

print("\nüéØ Goal: Detect the red anomalies automatically (unsupervised!)")

In [None]:
# Apply Isolation Forest
iso_forest = IsolationForest(
    contamination=0.1,  # Expected proportion of anomalies (10%)
    random_state=42,
    n_estimators=100
)

# Fit and predict
y_pred = iso_forest.fit_predict(X_combined)
# Note: IsolationForest returns 1 for normal, -1 for anomaly
# Convert to 0/1 for consistency
y_pred_binary = (y_pred == -1).astype(int)

# Get anomaly scores (lower = more anomalous)
anomaly_scores = iso_forest.score_samples(X_combined)

# Visualize results
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Predicted anomalies
normal_mask = (y_pred == 1)
anomaly_mask = (y_pred == -1)

axes[0].scatter(X_combined[normal_mask, 0], X_combined[normal_mask, 1],
               c='blue', s=50, alpha=0.6, label='Normal', edgecolors='black', linewidths=0.5)
axes[0].scatter(X_combined[anomaly_mask, 0], X_combined[anomaly_mask, 1],
               c='red', s=100, alpha=0.8, label='Detected Anomalies',
               marker='X', edgecolors='black', linewidths=1)
axes[0].set_xlabel('Feature 1', fontsize=11)
axes[0].set_ylabel('Feature 2', fontsize=11)
axes[0].set_title('Isolation Forest: Detected Anomalies', fontsize=13, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(alpha=0.3)

# Plot 2: Anomaly scores (heatmap)
scatter = axes[1].scatter(X_combined[:, 0], X_combined[:, 1],
                         c=anomaly_scores, cmap='RdYlBu_r', s=50,
                         edgecolors='black', linewidths=0.5)
axes[1].set_xlabel('Feature 1', fontsize=11)
axes[1].set_ylabel('Feature 2', fontsize=11)
axes[1].set_title('Anomaly Scores (Red = More Anomalous)', fontsize=13, fontweight='bold')
plt.colorbar(scatter, ax=axes[1], label='Anomaly Score')
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Evaluate
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy = accuracy_score(y_true, y_pred_binary)
precision = precision_score(y_true, y_pred_binary)
recall = recall_score(y_true, y_pred_binary)
f1 = f1_score(y_true, y_pred_binary)

print("üéØ ISOLATION FOREST RESULTS")
print("=" * 60)
print(f"Accuracy:  {accuracy:.2%} - Overall correctness")
print(f"Precision: {precision:.2%} - Of detected anomalies, how many were real?")
print(f"Recall:    {recall:.2%} - Of real anomalies, how many did we catch?")
print(f"F1-Score:  {f1:.2%} - Balanced metric")

print("\nüí° Interpretation:")
if recall > 0.8:
    print("   ‚úÖ Excellent! Caught most anomalies")
else:
    print("   ‚ö†Ô∏è  Missed some anomalies - consider adjusting contamination parameter")

if precision > 0.8:
    print("   ‚úÖ High precision! Few false alarms")
else:
    print("   ‚ö†Ô∏è  Some false positives - flagged normal points as anomalies")

### üîç Tuning the Contamination Parameter

**contamination** = Expected proportion of anomalies in your data

**How to choose:**
- If you know: Use domain knowledge (e.g., "fraud is 0.1% of transactions")
- If unknown: Start with 0.05-0.10, then adjust based on results
- Too low: Miss anomalies (low recall)
- Too high: Flag normal points (low precision)

Let's see the effect!

In [None]:
# Try different contamination values
contaminations = [0.05, 0.1, 0.2]
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for i, cont in enumerate(contaminations):
    iso = IsolationForest(contamination=cont, random_state=42)
    y_pred_temp = iso.fit_predict(X_combined)
    
    normal_mask = (y_pred_temp == 1)
    anomaly_mask = (y_pred_temp == -1)
    
    axes[i].scatter(X_combined[normal_mask, 0], X_combined[normal_mask, 1],
                   c='blue', s=40, alpha=0.6, label='Normal')
    axes[i].scatter(X_combined[anomaly_mask, 0], X_combined[anomaly_mask, 1],
                   c='red', s=80, alpha=0.8, label='Anomaly', marker='X')
    
    n_detected = np.sum(anomaly_mask)
    axes[i].set_title(f'Contamination={cont}\n({n_detected} anomalies detected)', 
                     fontsize=12, fontweight='bold')
    axes[i].set_xlabel('Feature 1')
    axes[i].set_ylabel('Feature 2')
    axes[i].legend(fontsize=9)
    axes[i].grid(alpha=0.3)

plt.suptitle('Effect of Contamination Parameter', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("üìä Contamination Guide:")
print("   0.05 (5%):  Conservative - fewer false alarms, may miss some")
print("   0.10 (10%): Balanced - good starting point")
print("   0.20 (20%): Aggressive - catch more, but more false alarms")

## üéØ Technique 2: One-Class SVM

**One-Class SVM** = Learn a boundary around normal data!

### Core idea:
- Train on **normal data only**
- Learn a tight boundary around it
- Anything outside the boundary = anomaly!

**Analogy:** Drawing a fence around your house:
- Everything inside fence = normal (your property)
- Everything outside = anomaly (trespasser!)

### How it works:
1. **Map to high dimension:** Use kernel trick (like SVM)
2. **Find hyperplane:** Separate normal data from origin
3. **Maximize margin:** Create tight boundary
4. **Classify:** Inside boundary = normal, outside = anomaly

### Key parameter: nu (ŒΩ)
- **nu** = Upper bound on fraction of outliers (like contamination)
- Range: 0 to 1 (typical: 0.01 to 0.1)
- Lower = tighter boundary, fewer anomalies
- Higher = looser boundary, more anomalies

### Strengths:
- ‚úÖ Works well with complex, non-linear boundaries (kernel trick!)
- ‚úÖ Theoretical foundation (strong math)
- ‚úÖ Effective for small to medium datasets

### Weaknesses:
- ‚ùå Slow on large datasets (doesn't scale well)
- ‚ùå Sensitive to kernel choice
- ‚ùå More parameters to tune

### Real AI Use:
- **Medical diagnosis:** Learn normal vital signs, flag abnormal
- **Manufacturing:** Detect defective products
- **Security:** Identify unusual network behavior

In [None]:
# Create more complex data (non-linear)
X_moons, _ = make_moons(n_samples=300, noise=0.05, random_state=42)

# Add anomalies
X_outliers = np.random.uniform(low=-2, high=3, size=(20, 2))
X_complex = np.vstack([X_moons, X_outliers])
y_true_complex = np.array([0]*300 + [1]*20)

# Apply One-Class SVM
ocsvm = OneClassSVM(
    kernel='rbf',  # Radial Basis Function (can learn complex boundaries)
    gamma='auto',  # Kernel coefficient
    nu=0.1         # Expected outlier fraction
)

y_pred_svm = ocsvm.fit_predict(X_complex)
y_pred_svm_binary = (y_pred_svm == -1).astype(int)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Results
normal_mask_svm = (y_pred_svm == 1)
anomaly_mask_svm = (y_pred_svm == -1)

axes[0].scatter(X_complex[normal_mask_svm, 0], X_complex[normal_mask_svm, 1],
               c='blue', s=50, alpha=0.6, label='Normal')
axes[0].scatter(X_complex[anomaly_mask_svm, 0], X_complex[anomaly_mask_svm, 1],
               c='red', s=100, alpha=0.8, label='Detected Anomalies', marker='X')
axes[0].set_xlabel('Feature 1', fontsize=11)
axes[0].set_ylabel('Feature 2', fontsize=11)
axes[0].set_title('One-Class SVM: Detected Anomalies', fontsize=13, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(alpha=0.3)

# Plot 2: Decision boundary
# Create mesh grid
xx, yy = np.meshgrid(np.linspace(-2.5, 3.5, 500), np.linspace(-1.5, 2, 500))
Z = ocsvm.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot decision boundary and margins
axes[1].contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap='Blues_r', alpha=0.6)
axes[1].contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkblue', 
               linestyles='solid', label='Decision Boundary')
axes[1].scatter(X_moons[:, 0], X_moons[:, 1], c='blue', s=30, alpha=0.6, label='Normal')
axes[1].scatter(X_outliers[:, 0], X_outliers[:, 1], c='red', s=80, alpha=0.8, 
               marker='X', label='Anomalies')
axes[1].set_xlabel('Feature 1', fontsize=11)
axes[1].set_ylabel('Feature 2', fontsize=11)
axes[1].set_title('One-Class SVM: Decision Boundary', fontsize=13, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Evaluate
accuracy_svm = accuracy_score(y_true_complex, y_pred_svm_binary)
precision_svm = precision_score(y_true_complex, y_pred_svm_binary, zero_division=0)
recall_svm = recall_score(y_true_complex, y_pred_svm_binary)
f1_svm = f1_score(y_true_complex, y_pred_svm_binary, zero_division=0)

print("üéØ ONE-CLASS SVM RESULTS")
print("=" * 60)
print(f"Accuracy:  {accuracy_svm:.2%}")
print(f"Precision: {precision_svm:.2%}")
print(f"Recall:    {recall_svm:.2%}")
print(f"F1-Score:  {f1_svm:.2%}")

print("\nüí° Key Feature:")
print("   - One-Class SVM learned a complex, non-linear boundary!")
print("   - Blue shaded area = 'normal' region")
print("   - Dark blue line = decision boundary")
print("   - Points outside = anomalies")

## üß† Technique 3: Autoencoders for Anomaly Detection

**Autoencoder** = Neural network that learns to compress and reconstruct data!

### Core idea:
- Train on **normal data only**
- Network learns to compress ‚Üí reconstruct normal patterns
- Normal data ‚Üí reconstructs well (low error)
- Anomalies ‚Üí reconstructs poorly (high error)!

**Analogy:** Learning to draw faces:
- Train on 1000 normal human faces
- You get good at drawing faces
- Show you a normal face ‚Üí you can redraw it well (low error)
- Show you an alien face ‚Üí you can't redraw it well (high error = anomaly!)

### Architecture:
```
Input (64D) ‚Üí Encoder ‚Üí Bottleneck (8D) ‚Üí Decoder ‚Üí Output (64D)
```

- **Encoder:** Compress input to lower dimension
- **Bottleneck:** Compact representation (latent space)
- **Decoder:** Reconstruct from bottleneck

### How it works:
1. **Train:** Minimize reconstruction error on normal data
2. **Threshold:** Calculate reconstruction errors, set threshold (e.g., 95th percentile)
3. **Detect:** High reconstruction error ‚Üí anomaly!

### Strengths:
- ‚úÖ Learns complex patterns (deep learning power!)
- ‚úÖ Works well in very high dimensions
- ‚úÖ Can handle images, text, any data type
- ‚úÖ Interpretable (can visualize reconstruction errors)

### Weaknesses:
- ‚ùå Requires more data to train
- ‚ùå Slower to train
- ‚ùå More hyperparameters (architecture, learning rate, etc.)
- ‚ùå Requires deep learning knowledge

### Real AI Use (2024-2025):
- **AI safety:** Detect unusual/harmful model outputs
- **Content moderation:** Flag unusual images/text
- **Fraud detection:** Detect unusual transaction patterns
- **Manufacturing:** Detect defects in images

In [None]:
if TF_AVAILABLE:
    # Load MNIST digits for demonstration
    from tensorflow.keras.datasets import mnist
    
    (X_train_full, y_train_full), (X_test, y_test) = mnist.load_data()
    
    # Normalize to [0, 1]
    X_train_full = X_train_full.astype('float32') / 255.0
    X_test = X_test.astype('float32') / 255.0
    
    # Flatten images: 28x28 ‚Üí 784
    X_train_full = X_train_full.reshape(-1, 784)
    X_test = X_test.reshape(-1, 784)
    
    # Train on digits 0-8 only (treat 9 as anomaly!)
    X_train = X_train_full[y_train_full < 9]
    y_train = y_train_full[y_train_full < 9]
    
    print("üìä Autoencoder Anomaly Detection Setup:")
    print(f"   Training on: Digits 0-8 (normal)")
    print(f"   Treating as anomaly: Digit 9")
    print(f"   Training samples: {len(X_train)}")
    print(f"   Test samples: {len(X_test)}")
    
else:
    print("‚ö†Ô∏è  TensorFlow not available. Skipping autoencoder section.")
    print("   Install with: pip install tensorflow")

In [None]:
if TF_AVAILABLE:
    # Build autoencoder
    input_dim = 784  # 28x28 pixels
    encoding_dim = 32  # Bottleneck size
    
    # Encoder
    encoder = keras.Sequential([
        layers.Input(shape=(input_dim,)),
        layers.Dense(128, activation='relu'),
        layers.Dense(64, activation='relu'),
        layers.Dense(encoding_dim, activation='relu')  # Bottleneck
    ], name='encoder')
    
    # Decoder
    decoder = keras.Sequential([
        layers.Input(shape=(encoding_dim,)),
        layers.Dense(64, activation='relu'),
        layers.Dense(128, activation='relu'),
        layers.Dense(input_dim, activation='sigmoid')  # Reconstruct
    ], name='decoder')
    
    # Full autoencoder
    autoencoder = keras.Sequential([encoder, decoder], name='autoencoder')
    
    # Compile
    autoencoder.compile(optimizer='adam', loss='mse')
    
    print("üß† Autoencoder Architecture:")
    print("=" * 60)
    autoencoder.summary()
    
    print("\n‚è≥ Training autoencoder (this may take 1-2 minutes)...")
    history = autoencoder.fit(
        X_train, X_train,  # Train to reconstruct itself!
        epochs=10,
        batch_size=256,
        validation_split=0.1,
        verbose=0
    )
    print("‚úÖ Training complete!")
    
    # Plot training history
    plt.figure(figsize=(10, 5))
    plt.plot(history.history['loss'], label='Training Loss', linewidth=2)
    plt.plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
    plt.xlabel('Epoch', fontsize=12)
    plt.ylabel('Reconstruction Error (MSE)', fontsize=12)
    plt.title('Autoencoder Training: Learning Normal Patterns', fontsize=14, fontweight='bold')
    plt.legend(fontsize=11)
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.show()
    
else:
    print("‚ö†Ô∏è  TensorFlow not available.")

In [None]:
if TF_AVAILABLE:
    # Reconstruct test data
    X_test_reconstructed = autoencoder.predict(X_test, verbose=0)
    
    # Calculate reconstruction errors
    reconstruction_errors = np.mean(np.square(X_test - X_test_reconstructed), axis=1)
    
    # Set threshold (95th percentile of normal data)
    normal_errors = reconstruction_errors[y_test < 9]
    threshold = np.percentile(normal_errors, 95)
    
    # Detect anomalies
    y_pred_ae = (reconstruction_errors > threshold).astype(int)
    y_true_ae = (y_test == 9).astype(int)  # 1 if digit is 9, else 0
    
    # Visualize reconstruction errors
    plt.figure(figsize=(14, 6))
    
    # Histogram of errors
    plt.subplot(1, 2, 1)
    plt.hist(reconstruction_errors[y_test < 9], bins=50, alpha=0.7, 
            label='Normal (0-8)', color='blue', edgecolor='black')
    plt.hist(reconstruction_errors[y_test == 9], bins=50, alpha=0.7, 
            label='Anomaly (9)', color='red', edgecolor='black')
    plt.axvline(threshold, color='green', linestyle='--', linewidth=2, 
               label=f'Threshold ({threshold:.4f})')
    plt.xlabel('Reconstruction Error', fontsize=12)
    plt.ylabel('Frequency', fontsize=12)
    plt.title('Reconstruction Error Distribution', fontsize=14, fontweight='bold')
    plt.legend(fontsize=10)
    plt.grid(alpha=0.3)
    
    # Box plot
    plt.subplot(1, 2, 2)
    data_to_plot = [reconstruction_errors[y_test < 9], reconstruction_errors[y_test == 9]]
    box = plt.boxplot(data_to_plot, labels=['Normal (0-8)', 'Anomaly (9)'],
                     patch_artist=True, widths=0.6)
    box['boxes'][0].set_facecolor('lightblue')
    box['boxes'][1].set_facecolor('lightcoral')
    plt.axhline(threshold, color='green', linestyle='--', linewidth=2, label='Threshold')
    plt.ylabel('Reconstruction Error', fontsize=12)
    plt.title('Error Distribution by Class', fontsize=14, fontweight='bold')
    plt.legend(fontsize=10)
    plt.grid(alpha=0.3, axis='y')
    
    plt.tight_layout()
    plt.show()
    
    # Evaluate
    accuracy_ae = accuracy_score(y_true_ae, y_pred_ae)
    precision_ae = precision_score(y_true_ae, y_pred_ae, zero_division=0)
    recall_ae = recall_score(y_true_ae, y_pred_ae)
    f1_ae = f1_score(y_true_ae, y_pred_ae, zero_division=0)
    
    print("üß† AUTOENCODER ANOMALY DETECTION RESULTS")
    print("=" * 60)
    print(f"Accuracy:  {accuracy_ae:.2%}")
    print(f"Precision: {precision_ae:.2%} - Of detected 9s, how many were actually 9s?")
    print(f"Recall:    {recall_ae:.2%} - Of all actual 9s, how many did we detect?")
    print(f"F1-Score:  {f1_ae:.2%}")
    
    print("\nüí° Insights:")
    print(f"   - Threshold: {threshold:.4f}")
    print(f"   - Normal digits (0-8): Low reconstruction error (trained on these!)")
    print(f"   - Anomaly digit (9): High reconstruction error (never seen during training!)")
    print("   - The autoencoder 'knows' what normal looks like, so it spots the unusual!")
    
else:
    print("‚ö†Ô∏è  TensorFlow not available.")

In [None]:
if TF_AVAILABLE:
    # Visualize some reconstructions
    n_examples = 10
    
    # Get some normal and anomalous examples
    normal_idx = np.where(y_test < 9)[0][:5]
    anomaly_idx = np.where(y_test == 9)[0][:5]
    indices = np.concatenate([normal_idx, anomaly_idx])
    
    fig, axes = plt.subplots(3, n_examples, figsize=(15, 5))
    
    for i, idx in enumerate(indices):
        # Original
        axes[0, i].imshow(X_test[idx].reshape(28, 28), cmap='gray')
        axes[0, i].axis('off')
        if i == 0:
            axes[0, i].set_ylabel('Original', fontsize=11, fontweight='bold')
        
        # Reconstructed
        axes[1, i].imshow(X_test_reconstructed[idx].reshape(28, 28), cmap='gray')
        axes[1, i].axis('off')
        if i == 0:
            axes[1, i].set_ylabel('Reconstructed', fontsize=11, fontweight='bold')
        
        # Difference (error)
        diff = np.abs(X_test[idx] - X_test_reconstructed[idx])
        axes[2, i].imshow(diff.reshape(28, 28), cmap='hot')
        axes[2, i].axis('off')
        if i == 0:
            axes[2, i].set_ylabel('Error', fontsize=11, fontweight='bold')
        
        # Title
        error = reconstruction_errors[idx]
        is_anomaly = "ANOMALY" if error > threshold else "Normal"
        color = 'red' if error > threshold else 'blue'
        axes[0, i].set_title(f"Digit {y_test[idx]}\n{is_anomaly}\nError: {error:.4f}",
                            fontsize=9, color=color, fontweight='bold')
    
    plt.suptitle('Autoencoder Reconstructions: Normal vs. Anomaly', 
                fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()
    
    print("üìä Visual Analysis:")
    print("   Left 5 columns: Normal digits (0-8)")
    print("      ‚Üí Low error (dark in error map)")
    print("      ‚Üí Good reconstruction")
    print("\n   Right 5 columns: Anomaly digit (9)")
    print("      ‚Üí High error (bright in error map)")
    print("      ‚Üí Poor reconstruction (autoencoder struggles!)")
    
else:
    print("‚ö†Ô∏è  TensorFlow not available.")

## ü§ñ Real AI Example: Detecting Unusual AI Agent Behavior

**Scenario:** You're building an agentic AI system (like AutoGPT or BabyAGI).
- Agent performs actions: search, write_file, send_email, etc.
- Goal: Detect unusual/potentially harmful behavior patterns

**Why this matters:**
- AI safety: Catch agent going rogue
- Security: Detect jailbreak attempts
- Quality: Flag unusual patterns for human review

Let's simulate this!

In [None]:
# Simulate AI agent behavior logs
np.random.seed(42)

# Features: [actions_per_minute, API_calls, file_writes, unique_tools_used, 
#            avg_response_time, error_rate]

# Normal behavior patterns
n_normal = 500
X_normal_behavior = np.column_stack([
    np.random.poisson(5, n_normal),      # 5 actions/min average
    np.random.poisson(10, n_normal),     # 10 API calls average
    np.random.poisson(2, n_normal),      # 2 file writes average
    np.random.randint(1, 6, n_normal),   # 1-5 unique tools
    np.random.normal(1.5, 0.3, n_normal), # 1.5s response time
    np.random.beta(2, 20, n_normal)      # Low error rate (~10%)
])

# Anomalous behavior patterns (unusual/potentially harmful)
n_anomalies = 30
anomaly_types = [
    # Type 1: Excessive API spam
    np.column_stack([
        np.random.poisson(50, 10),       # 50 actions/min (10x normal!)
        np.random.poisson(100, 10),      # 100 API calls
        np.random.poisson(2, 10),
        np.random.randint(1, 6, 10),
        np.random.normal(1.5, 0.3, 10),
        np.random.beta(2, 20, 10)
    ]),
    
    # Type 2: Excessive file operations (data exfiltration?)
    np.column_stack([
        np.random.poisson(8, 10),
        np.random.poisson(12, 10),
        np.random.poisson(50, 10),       # 50 file writes (25x normal!)
        np.random.randint(1, 6, 10),
        np.random.normal(1.5, 0.3, 10),
        np.random.beta(2, 20, 10)
    ]),
    
    # Type 3: Very slow responses + high errors (stuck/broken)
    np.column_stack([
        np.random.poisson(2, 10),
        np.random.poisson(5, 10),
        np.random.poisson(1, 10),
        np.random.randint(1, 3, 10),
        np.random.normal(10, 2, 10),     # 10s response (7x normal!)
        np.random.beta(8, 2, 10)         # 80% error rate!
    ])
]

X_anomalies_behavior = np.vstack(anomaly_types)

# Combine
X_agent = np.vstack([X_normal_behavior, X_anomalies_behavior])
y_true_agent = np.array([0]*n_normal + [1]*n_anomalies)

# Standardize (important for anomaly detection!)
scaler = StandardScaler()
X_agent_scaled = scaler.fit_transform(X_agent)

print("ü§ñ AI AGENT BEHAVIOR MONITORING")
print("=" * 60)
print(f"Total behavior logs: {len(X_agent)}")
print(f"Normal behaviors: {n_normal}")
print(f"Anomalous behaviors: {n_anomalies}")
print(f"Features tracked: {X_agent.shape[1]}")
print("\nFeatures:")
print("   1. Actions per minute")
print("   2. API calls")
print("   3. File writes")
print("   4. Unique tools used")
print("   5. Average response time (seconds)")
print("   6. Error rate")

In [None]:
# Apply Isolation Forest
iso_agent = IsolationForest(contamination=0.08, random_state=42, n_estimators=100)
y_pred_agent = iso_agent.fit_predict(X_agent_scaled)
y_pred_agent_binary = (y_pred_agent == -1).astype(int)

# Evaluate
accuracy_agent = accuracy_score(y_true_agent, y_pred_agent_binary)
precision_agent = precision_score(y_true_agent, y_pred_agent_binary)
recall_agent = recall_score(y_true_agent, y_pred_agent_binary)
f1_agent = f1_score(y_true_agent, y_pred_agent_binary)

print("üéØ ANOMALY DETECTION RESULTS")
print("=" * 60)
print(f"Accuracy:  {accuracy_agent:.2%}")
print(f"Precision: {precision_agent:.2%} - Of flagged behaviors, how many were actually anomalous?")
print(f"Recall:    {recall_agent:.2%} - Of all anomalies, how many did we catch?")
print(f"F1-Score:  {f1_agent:.2%}")

# Confusion matrix
cm = confusion_matrix(y_true_agent, y_pred_agent_binary)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
           xticklabels=['Normal', 'Anomaly'],
           yticklabels=['Normal', 'Anomaly'])
plt.title('AI Agent Behavior: Anomaly Detection', fontsize=14, fontweight='bold')
plt.ylabel('Actual', fontsize=12)
plt.xlabel('Predicted', fontsize=12)
plt.tight_layout()
plt.show()

# Analyze detected anomalies
anomaly_indices = np.where(y_pred_agent_binary == 1)[0]
actual_anomaly_indices = np.where(y_true_agent == 1)[0]

print("\nüîç DETECTED ANOMALY ANALYSIS")
print("=" * 60)
print(f"Total anomalies detected: {len(anomaly_indices)}")
print(f"True anomalies caught: {np.sum((y_pred_agent_binary == 1) & (y_true_agent == 1))}")
print(f"False alarms: {np.sum((y_pred_agent_binary == 1) & (y_true_agent == 0))}")

# Show examples of detected anomalies
feature_names = ['Actions/min', 'API calls', 'File writes', 
                'Unique tools', 'Response time', 'Error rate']

print("\nüìä Example Detected Anomalies:")
for i, idx in enumerate(anomaly_indices[:5]):
    print(f"\n   Anomaly {i+1} (Index {idx}):")
    for j, feature in enumerate(feature_names):
        value = X_agent[idx, j]
        avg_normal = X_agent[:n_normal, j].mean()
        diff = ((value - avg_normal) / avg_normal) * 100
        print(f"      {feature}: {value:.2f} (normal avg: {avg_normal:.2f}, {diff:+.0f}%)")
    
    actual = "ACTUAL ANOMALY" if y_true_agent[idx] == 1 else "False alarm"
    print(f"      Status: {actual}")

print("\n" + "=" * 60)
print("\nüí° PRODUCTION USE CASE:")
print("\n1Ô∏è‚É£  Real-time Monitoring:")
print("   - Run anomaly detection on agent behavior every minute")
print("   - Alert when anomalies detected")
print("   - Optionally pause agent for human review")

print("\n2Ô∏è‚É£  Safety Measures:")
print("   - High API calls ‚Üí Rate limiting")
print("   - Excessive file writes ‚Üí Block file operations")
print("   - High error rate ‚Üí Restart agent")

print("\n3Ô∏è‚É£  Investigation:")
print("   - Log all anomalies for analysis")
print("   - Review patterns to update safety rules")
print("   - Fine-tune contamination parameter based on false alarm rate")

print("\n4Ô∏è‚É£  Continuous Improvement:")
print("   - Retrain on new normal behaviors")
print("   - Adjust thresholds based on production data")
print("   - Add new features (e.g., sentiment of outputs, tool combinations)")

## üéØ YOUR TURN: Credit Card Fraud Detection Challenge

**Scenario:** You work for a bank's fraud detection team.
- Dataset: Credit card transactions
- Features: amount, merchant_category, time_of_day, distance_from_home
- Goal: Build an anomaly detector to catch fraud

**Your task:**
1. Generate synthetic transaction data
2. Apply Isolation Forest
3. Evaluate performance
4. Analyze false positives and false negatives
5. Recommend production deployment strategy

In [None]:
# YOUR TURN: Complete this exercise!

# Step 1: Generate transaction data
np.random.seed(42)

# Normal transactions
n_normal_trans = 1000
X_normal_trans = np.column_stack([
    # YOUR CODE - create normal transaction features
    # amount: typical range $10-500
    # merchant_category: 1-10
    # time_of_day: 0-23 hours
    # distance_from_home: 0-50 miles
])

# Fraudulent transactions
n_fraud = 50
X_fraud_trans = np.column_stack([
    # YOUR CODE - create fraudulent patterns
    # Large amounts, unusual times, far from home, etc.
])

# Step 2: Combine and scale
X_transactions = # YOUR CODE
y_true_trans = # YOUR CODE - labels

# Standardize
scaler_trans = StandardScaler()
X_transactions_scaled = # YOUR CODE

# Step 3: Apply Isolation Forest
iso_fraud = IsolationForest(
    contamination=# YOUR CODE - what's the fraud rate?,
    random_state=42
)
y_pred_trans = # YOUR CODE
y_pred_trans_binary = # YOUR CODE

# Step 4: Evaluate
# YOUR CODE - calculate metrics
accuracy_trans = 
precision_trans = 
recall_trans = 
f1_trans = 

print("üè¶ FRAUD DETECTION RESULTS")
print("=" * 60)
print(f"Accuracy:  {accuracy_trans:.2%}")
print(f"Precision: {precision_trans:.2%}")
print(f"Recall:    {recall_trans:.2%}")
print(f"F1-Score:  {f1_trans:.2%}")

# Step 5: Analyze errors
# YOUR CODE - find false positives and false negatives
# What do they have in common?

# Step 6: Production strategy
print("\nüí° YOUR RECOMMENDATION:")
print("   Would you deploy this to production?")
print("   What recall/precision trade-off makes sense for fraud?")
print("   How would you handle false positives (blocking legitimate purchases)?")

### ‚úÖ Solution

In [None]:
# SOLUTION: Credit Card Fraud Detection

np.random.seed(42)

# Normal transactions
n_normal_trans = 1000
X_normal_trans = np.column_stack([
    np.random.lognormal(4, 0.8, n_normal_trans),    # Amount: $10-500, skewed
    np.random.randint(1, 11, n_normal_trans),       # Merchant category: 1-10
    np.random.normal(14, 4, n_normal_trans),        # Time: ~2pm ¬± 4hrs
    np.random.exponential(10, n_normal_trans)       # Distance: mostly close, some far
])

# Clip values to reasonable ranges
X_normal_trans[:, 0] = np.clip(X_normal_trans[:, 0], 10, 500)
X_normal_trans[:, 2] = np.clip(X_normal_trans[:, 2], 0, 23)
X_normal_trans[:, 3] = np.clip(X_normal_trans[:, 3], 0, 50)

# Fraudulent transactions (3 patterns)
n_fraud = 50

# Pattern 1: Large amounts
fraud_pattern1 = np.column_stack([
    np.random.uniform(2000, 5000, 20),  # $2000-5000!
    np.random.randint(1, 11, 20),
    np.random.randint(0, 24, 20),
    np.random.uniform(0, 50, 20)
])

# Pattern 2: Unusual times + far from home
fraud_pattern2 = np.column_stack([
    np.random.uniform(100, 800, 15),
    np.random.randint(1, 11, 15),
    np.random.choice([2, 3, 4, 22, 23], 15),  # Late night/early morning
    np.random.uniform(100, 500, 15)  # 100-500 miles away!
])

# Pattern 3: Rapid sequence (same minute, different merchants)
fraud_pattern3 = np.column_stack([
    np.random.uniform(50, 300, 15),
    np.random.randint(1, 11, 15),
    np.full(15, 14.5),  # All within same hour
    np.random.uniform(20, 200, 15)  # Geographically dispersed
])

X_fraud_trans = np.vstack([fraud_pattern1, fraud_pattern2, fraud_pattern3])

# Combine
X_transactions = np.vstack([X_normal_trans, X_fraud_trans])
y_true_trans = np.array([0]*n_normal_trans + [1]*n_fraud)

# Standardize
scaler_trans = StandardScaler()
X_transactions_scaled = scaler_trans.fit_transform(X_transactions)

print("üè¶ CREDIT CARD FRAUD DETECTION DATASET")
print("=" * 60)
print(f"Total transactions: {len(X_transactions)}")
print(f"Normal: {n_normal_trans} ({n_normal_trans/len(X_transactions):.1%})")
print(f"Fraudulent: {n_fraud} ({n_fraud/len(X_transactions):.1%})")
print("\nFeatures:")
print("   1. Amount ($)")
print("   2. Merchant Category (1-10)")
print("   3. Time of Day (0-23)")
print("   4. Distance from Home (miles)")
print("\nFraud Patterns:")
print("   - Pattern 1: Large amounts ($2000-5000)")
print("   - Pattern 2: Late night + far from home")
print("   - Pattern 3: Rapid sequence in different locations")

# Apply Isolation Forest
fraud_rate = n_fraud / len(X_transactions)
iso_fraud = IsolationForest(
    contamination=fraud_rate,  # Use actual fraud rate
    random_state=42,
    n_estimators=100
)
y_pred_trans = iso_fraud.fit_predict(X_transactions_scaled)
y_pred_trans_binary = (y_pred_trans == -1).astype(int)

# Evaluate
accuracy_trans = accuracy_score(y_true_trans, y_pred_trans_binary)
precision_trans = precision_score(y_true_trans, y_pred_trans_binary)
recall_trans = recall_score(y_true_trans, y_pred_trans_binary)
f1_trans = f1_score(y_true_trans, y_pred_trans_binary)

print("\nüéØ FRAUD DETECTION RESULTS")
print("=" * 60)
print(f"Accuracy:  {accuracy_trans:.2%}")
print(f"Precision: {precision_trans:.2%} - Of flagged transactions, how many were fraud?")
print(f"Recall:    {recall_trans:.2%} - Of all fraud, how many did we catch?")
print(f"F1-Score:  {f1_trans:.2%}")

# Confusion matrix
cm_fraud = confusion_matrix(y_true_trans, y_pred_trans_binary)
tn, fp, fn, tp = cm_fraud.ravel()

plt.figure(figsize=(8, 6))
sns.heatmap(cm_fraud, annot=True, fmt='d', cmap='Reds',
           xticklabels=['Legitimate', 'Fraud'],
           yticklabels=['Legitimate', 'Fraud'])
plt.title('Credit Card Fraud Detection: Confusion Matrix', fontsize=14, fontweight='bold')
plt.ylabel('Actual', fontsize=12)
plt.xlabel('Predicted', fontsize=12)
plt.tight_layout()
plt.show()

print("\nüìä DETAILED BREAKDOWN")
print("=" * 60)
print(f"True Positives (TP): {tp} - Correctly caught fraud üéØ")
print(f"True Negatives (TN): {tn} - Correctly approved legitimate üí≥")
print(f"False Positives (FP): {fp} - Blocked legitimate (customer frustration!) ‚ö†Ô∏è")
print(f"False Negatives (FN): {fn} - Missed fraud (money lost!) ‚ùå")

# Cost analysis
avg_fraud_amount = X_fraud_trans[:, 0].mean()
cost_of_fraud = fn * avg_fraud_amount
cost_of_false_alarm = fp * 5  # $5 customer service cost per false alarm

print("\nüí∞ COST ANALYSIS")
print("=" * 60)
print(f"Average fraud amount: ${avg_fraud_amount:.2f}")
print(f"Cost of missed fraud (FN): ${cost_of_fraud:.2f} ({fn} √ó ${avg_fraud_amount:.2f})")
print(f"Cost of false alarms (FP): ${cost_of_false_alarm:.2f} ({fp} √ó $5)")
print(f"Total cost: ${cost_of_fraud + cost_of_false_alarm:.2f}")

# Analyze errors
fp_indices = np.where((y_pred_trans_binary == 1) & (y_true_trans == 0))[0]
fn_indices = np.where((y_pred_trans_binary == 0) & (y_true_trans == 1))[0]

if len(fp_indices) > 0:
    print("\n‚ö†Ô∏è  FALSE POSITIVES (Legitimate flagged as fraud):")
    print("   These customers will be frustrated!")
    for i, idx in enumerate(fp_indices[:3]):
        amount, category, time, distance = X_transactions[idx]
        print(f"   FP {i+1}: ${amount:.0f}, Cat {category:.0f}, {time:.0f}:00, {distance:.0f} miles")

if len(fn_indices) > 0:
    print("\n‚ùå FALSE NEGATIVES (Fraud that slipped through):")
    print("   These are costly! Money lost!")
    for i, idx in enumerate(fn_indices[:3]):
        amount, category, time, distance = X_transactions[idx]
        print(f"   FN {i+1}: ${amount:.0f}, Cat {category:.0f}, {time:.0f}:00, {distance:.0f} miles")

# Production recommendation
print("\n" + "=" * 60)
print("\nüéØ PRODUCTION DEPLOYMENT RECOMMENDATION")
print("=" * 60)

print("\n‚úÖ DEPLOY WITH THESE SAFEGUARDS:")
print("\n1Ô∏è‚É£  Two-Tier System:")
print("   - High confidence fraud (score < threshold) ‚Üí Block transaction")
print("   - Medium confidence ‚Üí Require 2FA/verification")
print("   - Low confidence ‚Üí Allow but monitor")

print("\n2Ô∏è‚É£  Recall Priority (Catch More Fraud):")
print(f"   - Current recall: {recall_trans:.1%}")
print("   - Target: 95%+ (miss at most 5% of fraud)")
print("   - Adjust contamination parameter higher if needed")
print("   - Accept more false positives (better safe than sorry!)")

print("\n3Ô∏è‚É£  False Positive Mitigation:")
print("   - Don't auto-block, send SMS verification first")
print("   - Learn from customer feedback ('Was this you?')")
print("   - Whitelist trusted merchants")
print("   - Allow quick appeal process")

print("\n4Ô∏è‚É£  Continuous Monitoring:")
print("   - Track false positive rate daily")
print("   - Retrain monthly on new fraud patterns")
print("   - A/B test different contamination values")
print("   - Monitor customer satisfaction (blocked tx = angry customers)")

print("\n5Ô∏è‚É£  Hybrid Approach:")
print("   - Combine with rule-based filters (e.g., amount > $10k ‚Üí always flag)")
print("   - Use anomaly detection + supervised ML (if you have labeled data)")
print("   - Add behavioral features (typing speed, mouse movements)")

print("\nüí° FINAL VERDICT:")
if recall_trans >= 0.8 and precision_trans >= 0.5:
    print("   ‚úÖ READY FOR PRODUCTION with 2FA fallback for flagged transactions!")
elif recall_trans >= 0.8:
    print("   ‚ö†Ô∏è  Good recall, but too many false alarms. Add 2FA tier.")
else:
    print("   ‚ùå NOT READY. Too much fraud slipping through. Tune parameters.")

print("\n   Remember: In fraud detection, FALSE NEGATIVES are costlier than FALSE POSITIVES!")
print("   Missing fraud = direct money loss")
print("   False alarm = minor inconvenience (SMS verification)")

## üìã Algorithm Comparison Cheat Sheet

| Feature | Isolation Forest | One-Class SVM | Autoencoder |
|---------|-----------------|---------------|-------------|
| **Speed** | üü¢ Fast | üü° Medium | üî¥ Slow (training) |
| **Scalability** | üü¢ Excellent (millions) | üî¥ Poor (thousands) | üü° Good |
| **High dimensions** | üü¢ Excellent | üü° Fair | üü¢ Excellent |
| **Interpretability** | üü° Medium | üî¥ Low | üü¢ High (see errors) |
| **Hyperparameters** | üü¢ Few | üü° Several | üî¥ Many |
| **Training data needed** | üü¢ Small | üü° Medium | üî¥ Large |
| **Complex patterns** | üü° Good | üü¢ Excellent | üü¢ Excellent |

**Decision tree:**
1. **Large dataset (>100K)?** ‚Üí Isolation Forest
2. **Small dataset (<10K)?** ‚Üí One-Class SVM
3. **Images/complex data?** ‚Üí Autoencoder
4. **Need speed?** ‚Üí Isolation Forest
5. **Need interpretability?** ‚Üí Autoencoder (visualize errors)
6. **Default choice?** ‚Üí Isolation Forest (fast, scalable, works well)

## üéâ Congratulations!

**You just mastered:**
- ‚úÖ Isolation Forest for fast, scalable anomaly detection
- ‚úÖ One-Class SVM for complex boundary-based detection
- ‚úÖ Autoencoders for deep learning-based anomaly detection
- ‚úÖ Evaluation metrics for anomaly detection (precision, recall trade-offs)
- ‚úÖ Real AI applications: fraud detection, AI safety monitoring
- ‚úÖ Production deployment strategies

**üéØ Key Takeaways:**
1. **Isolation Forest** = Fast, scalable, works great out-of-the-box
2. **One-Class SVM** = Complex boundaries, smaller datasets
3. **Autoencoders** = Best for images/complex data, interpretable
4. **Recall matters** = In fraud/safety, missing anomalies is costly!
5. **Real AI** = Critical for AI safety, fraud detection, quality control

**üöÄ Practice Challenge:**

Build a complete anomaly detection system:
1. Choose a real dataset (credit card, network traffic, sensor data)
2. Try all three methods
3. Compare results and choose the best
4. Implement a two-tier flagging system (high/medium/low confidence)
5. Calculate cost analysis (false positives vs. false negatives)
6. Write a deployment recommendation

---

**üìö Week 9 Complete!** You've mastered:
- Day 1: Clustering (K-Means, Hierarchical, DBSCAN)
- Day 2: Dimensionality Reduction (PCA, t-SNE, UMAP)
- Day 3: Anomaly Detection (Isolation Forest, One-Class SVM, Autoencoders)

**Next:** Week 10 - Neural Networks and Deep Learning!

**üí¨ Questions?** Apply these techniques to your own data - that's when the magic happens!

---

*"Anomalies are where the interesting stuff happens. In data, as in life, it's often the outliers that matter most!"* üéØ