# IoT Anomaly Detection - Complete End-to-End Example

This notebook provides a comprehensive end-to-end example of using the IoT Anomaly Detection system, from data loading to production deployment.

## Table of Contents
1. [Quick Anomaly Detection](#quick)
2. [Complete Training Pipeline](#training)
3. [Using Pre-Trained Models](#pretrained)
4. [Custom Model Training](#custom)
5. [Batch Predictions](#batch)
6. [Production Deployment](#production)

<a id='quick'></a>
## Example 1: Quick Anomaly Detection

**Goal**: Detect anomalies in IoT sensor data using a pre-trained model

**Time**: ~2 minutes

In [None]:
from iot_anomaly_utils import IoTAnomalyDetector
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

%matplotlib inline
pd.set_option('display.max_columns', None)

In [None]:
# Load detector
detector = IoTAnomalyDetector()
detector.load_data("data/smart_manufacturing_data.csv", sample_size=1000)
detector.prepare_features("anomaly_flag")
X_train, X_test, y_train, y_test = detector.prepare_train_test_split()

print(f"Data loaded and prepared!")
print(f"Training samples: {X_train.shape[0]}")
print(f"Testing samples: {X_test.shape[0]}")
print(f"Features: {X_train.shape[1]}")

In [None]:
# Load pre-trained model
detector.load_model("models/best_anomaly_model.pkl")

# Predict
predictions = detector.predict(X_test)

# Evaluate
metrics = detector.evaluate_classification(y_test, predictions)

print("\n" + "="*60)
print("ANOMALY DETECTION RESULTS")
print("="*60)
print(f"Accuracy:  {metrics['accuracy']:.4f}")
print(f"Precision: {metrics['precision']:.4f}")
print(f"Recall:    {metrics['recall']:.4f}")
print(f"F1 Score:  {metrics['f1']:.4f}")
print("="*60)

In [None]:
# Visualize confusion matrix
fig = detector.plot_confusion_matrix(y_test, predictions)
plt.title("Anomaly Detection - Confusion Matrix")
plt.tight_layout()
plt.show()

<a id='training'></a>
## Example 2: Complete Training Pipeline

Train multiple models and compare their performance.

In [None]:
from iot_anomaly_utils import load_results, plot_model_comparison

# Load all training results
results = load_results("results")

print("Training results loaded for tasks:")
for task in results.keys():
    print(f"  - {task}")

In [None]:
# View anomaly detection results
anomaly_results = results['results_anomaly']
print("\nTop 5 Models for Anomaly Detection:")
print("="*80)
anomaly_results[['model', 'f1_mean', 'precision_mean', 'recall_mean', 'train_time_mean']].head()

In [None]:
# Plot model comparison
fig = plot_model_comparison(anomaly_results, metric='f1_mean', 
                            title='Anomaly Detection - F1 Score Comparison')
plt.tight_layout()
plt.show()

In [None]:
# View maintenance prediction results
maintenance_results = results['results_maintenance']
print("\nTop 5 Models for Maintenance Prediction:")
print("="*80)
maintenance_results[['model', 'f1_mean', 'precision_mean', 'recall_mean']].head()

In [None]:
# Compare precision across all models
fig = plot_model_comparison(maintenance_results, metric='precision_mean',
                            title='Maintenance Prediction - Precision Comparison')
plt.tight_layout()
plt.show()

<a id='pretrained'></a>
## Example 3: Using Pre-Trained Models

Load and use all available pre-trained models.

In [None]:
# Test all classification models
models_to_test = [
    ("anomaly_flag", "models/best_anomaly_model.pkl", "Anomaly Detection"),
    ("maintenance_required", "models/best_maintenance_model.pkl", "Maintenance Prediction"),
    ("downtime_risk", "models/best_downtime_model.pkl", "Downtime Risk"),
]

results_summary = []

for target, model_path, task_name in models_to_test:
    # Prepare data
    detector = IoTAnomalyDetector()
    detector.load_data("data/smart_manufacturing_data.csv", sample_size=1000)
    detector.prepare_features(target)
    X_train, X_test, y_train, y_test = detector.prepare_train_test_split(target_col=target)
    
    # Load model and predict
    detector.load_model(model_path)
    predictions = detector.predict(X_test)
    
    # Evaluate
    metrics = detector.evaluate_classification(y_test, predictions)
    
    results_summary.append({
        'Task': task_name,
        'F1 Score': f"{metrics['f1']:.4f}",
        'Precision': f"{metrics['precision']:.4f}",
        'Recall': f"{metrics['recall']:.4f}",
        'Accuracy': f"{metrics['accuracy']:.4f}"
    })

# Display summary
summary_df = pd.DataFrame(results_summary)
print("\n" + "="*80)
print("PRE-TRAINED MODELS PERFORMANCE SUMMARY")
print("="*80)
print(summary_df.to_string(index=False))
print("="*80)

<a id='custom'></a>
## Example 4: Custom Model Training

Train a custom model with specific hyperparameters.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# Prepare data
detector = IoTAnomalyDetector()
detector.load_data("data/smart_manufacturing_data.csv", sample_size=2000)
detector.prepare_features("maintenance_required")
X_train, X_test, y_train, y_test = detector.prepare_train_test_split(
    target_col="maintenance_required"
)

print(f"Training custom model on {X_train.shape[0]} samples...")

In [None]:
# Define custom model with hyperparameter tuning
param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [10, 15],
    'min_samples_split': [2, 5]
}

base_model = RandomForestClassifier(random_state=42, n_jobs=-1)
grid_search = GridSearchCV(
    base_model,
    param_grid,
    cv=3,
    scoring='f1',
    n_jobs=-1,
    verbose=1
)

# Train
print("Training custom model with grid search...")
grid_search.fit(X_train, y_train)

# Get best model
best_model = grid_search.best_estimator_
print(f"\nBest parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.4f}")

In [None]:
# Evaluate on test set
detector.model = best_model
predictions = detector.predict(X_test)
metrics = detector.evaluate_classification(y_test, predictions)

print("\nCustom Model Test Performance:")
print(f"  F1 Score: {metrics['f1']:.4f}")
print(f"  Precision: {metrics['precision']:.4f}")
print(f"  Recall: {metrics['recall']:.4f}")

# Save custom model
detector.save_model("models/custom_maintenance_model.pkl")
print("\nCustom model saved!")

In [None]:
# Plot feature importance
fig = detector.plot_feature_importance(top_n=20)
plt.title("Custom Model - Top 20 Features")
plt.tight_layout()
plt.show()

<a id='batch'></a>
## Example 5: Batch Predictions

Process large datasets in batches for memory efficiency.

In [None]:
def batch_predict(data_path, model_path, batch_size=1000):
    """Process data in batches for memory efficiency."""
    detector = IoTAnomalyDetector()
    detector.load_model(model_path)
    
    # Load scaler
    import joblib
    detector.scaler = joblib.load("models/scaler.pkl")
    
    all_predictions = []
    
    for i, chunk in enumerate(pd.read_csv(data_path, chunksize=batch_size)):
        # Prepare features for chunk
        detector.df_raw = chunk
        detector.prepare_features("anomaly_flag")
        
        # Get features
        feature_cols = [c for c in detector.df_features.columns
                       if c not in ['machine_id', 'anomaly_flag',
                                   'maintenance_required', 'downtime_risk',
                                   'predicted_remaining_life', 'failure_type']]
        
        X_chunk = detector.df_features[feature_cols].values
        X_chunk_scaled = detector.scaler.transform(X_chunk)
        
        # Predict
        predictions = detector.model.predict(X_chunk_scaled)
        all_predictions.append(predictions)
        
        print(f"Processed batch {i+1}: {len(predictions)} predictions")
    
    return np.concatenate(all_predictions)

# Test batch prediction
print("Running batch predictions...")
predictions = batch_predict(
    "data/smart_manufacturing_data.csv",
    "models/best_anomaly_model.pkl",
    batch_size=5000
)

print(f"\nTotal predictions: {len(predictions)}")
print(f"Anomalies detected: {predictions.sum()}")
print(f"Anomaly rate: {predictions.mean():.2%}")

<a id='production'></a>
## Example 6: Production Deployment

Create a production-ready service class.

In [None]:
import joblib
from scripts.feature_engineering import compute_features

class AnomalyDetectionService:
    """Production service for anomaly detection."""
    
    def __init__(self, model_path, scaler_path):
        self.model = joblib.load(model_path)
        self.scaler = joblib.load(scaler_path)
        print(f"Service initialized with model: {type(self.model).__name__}")
    
    def predict_single(self, sensor_data: dict) -> dict:
        """Predict for a single sensor reading."""
        # Convert to DataFrame
        df = pd.DataFrame([sensor_data])
        
        # Engineer features
        df_features = compute_features(df)
        
        # Extract features
        feature_cols = [c for c in df_features.columns
                       if c not in ['machine_id', 'anomaly_flag',
                                   'maintenance_required', 'downtime_risk',
                                   'predicted_remaining_life', 'failure_type']]
        
        X = df_features[feature_cols].values
        X_scaled = self.scaler.transform(X)
        
        # Predict
        prediction = self.model.predict(X_scaled)[0]
        
        # Get probability if available
        if hasattr(self.model, 'predict_proba'):
            probability = self.model.predict_proba(X_scaled)[0]
        else:
            probability = None
        
        return {
            'prediction': int(prediction),
            'is_anomaly': bool(prediction == 1),
            'probability': probability.tolist() if probability is not None else None,
            'confidence': float(max(probability)) if probability is not None else None
        }

# Initialize service
service = AnomalyDetectionService(
    "models/best_anomaly_model.pkl",
    "models/scaler.pkl"
)

In [None]:
# Test with example sensor readings
normal_reading = {
    'timestamp': '2025-01-09 10:30:00',
    'machine_id': 15,
    'temperature': 75.2,
    'vibration': 45.5,
    'humidity': 55.3,
    'pressure': 2.8,
    'energy_consumption': 3.2,
    'machine_status': 1,
    'anomaly_flag': 0,
    'predicted_remaining_life': 100,
    'failure_type': 'Normal',
    'downtime_risk': 0.0,
    'maintenance_required': 0
}

anomalous_reading = {
    'timestamp': '2025-01-09 14:45:00',
    'machine_id': 22,
    'temperature': 125.8,  # High temperature
    'vibration': 85.2,     # High vibration
    'humidity': 25.1,      # Low humidity
    'pressure': 5.2,       # High pressure
    'energy_consumption': 7.5,  # High energy
    'machine_status': 1,
    'anomaly_flag': 1,
    'predicted_remaining_life': 10,
    'failure_type': 'Overheating',
    'downtime_risk': 0.9,
    'maintenance_required': 1
}

print("Testing Normal Reading:")
print("="*60)
result1 = service.predict_single(normal_reading)
print(f"Prediction: {'ANOMALY' if result1['is_anomaly'] else 'NORMAL'}")
print(f"Confidence: {result1['confidence']:.2%}" if result1['confidence'] else "N/A")
print()

print("Testing Anomalous Reading:")
print("="*60)
result2 = service.predict_single(anomalous_reading)
print(f"Prediction: {'ANOMALY' if result2['is_anomaly'] else 'NORMAL'}")
print(f"Confidence: {result2['confidence']:.2%}" if result2['confidence'] else "N/A")

## Results Interpretation Guide

### Classification Metrics

**F1 Score**: Harmonic mean of precision and recall
- `> 0.9`: Excellent
- `0.8 - 0.9`: Good
- `0.7 - 0.8`: Fair
- `< 0.7`: Needs improvement

**Precision**: Of all predicted anomalies, how many were actual anomalies?
- High precision = Few false alarms

**Recall**: Of all actual anomalies, how many did we detect?
- High recall = Few missed anomalies

### Our Results
```
Anomaly Detection:     F1 = 99.98% (Excellent)
Maintenance Prediction: F1 = 98.21% (Excellent)
Downtime Risk:         F1 = 99.98% (Excellent)
Failure Type:          F1 = 93.00% (Excellent)
```

## Summary

This notebook demonstrated:
1. ✅ Quick anomaly detection with pre-trained models
2. ✅ Complete training pipeline analysis
3. ✅ Using multiple pre-trained models
4. ✅ Custom model training with hyperparameter tuning
5. ✅ Batch processing for production
6. ✅ Production deployment service pattern

**Key Achievements:**
- 67 engineered features from 5 raw sensors
- 49 model configurations evaluated
- 99.98% F1 score for anomaly detection
- Production-ready API and service class

**For API reference, see [iot_anomaly.API.md](iot_anomaly.API.md)**

**For API demo, see [iot_anomaly.API.ipynb](iot_anomaly.API.ipynb)**