# üîç LSTM Autoencoder Anomaly Detection

This notebook demonstrates time series anomaly detection using LSTM Autoencoders.

## Contents
1. Data Generation & Preparation
2. Model Training
3. Anomaly Detection
4. Visualization & Analysis

In [None]:
import sys
sys.path.append('..')

import numpy as np
import matplotlib.pyplot as plt

from src import (
    LSTMAutoencoder,
    Trainer,
    AnomalyDetector,
    DataPreprocessor,
    AnomalyVisualizer,
    TimeSeriesGenerator,
    ThresholdMethod
)

import warnings
warnings.filterwarnings('ignore')

print('‚úì Modules loaded!')

## 1. Generate Sample Data

In [None]:
# Generate synthetic taxi ride data with anomalies
generator = TimeSeriesGenerator(seed=42)
data, injected_anomalies = generator.generate_taxi_data(
    n_points=5000,
    anomaly_ratio=0.02
)

print(f"Generated {len(data)} data points")
print(f"Injected {len(injected_anomalies)} anomaly windows")

In [None]:
# Visualize raw data
plt.figure(figsize=(14, 4))
plt.plot(data, linewidth=0.8)
plt.xlabel('Time (hours)')
plt.ylabel('Taxi Rides')
plt.title('Synthetic Taxi Ride Data')
plt.show()

## 2. Preprocess Data

In [None]:
# Scale data
preprocessor = DataPreprocessor(scaler_type='minmax')
data_scaled = preprocessor.fit_transform(data)

# Create sequences
SEQ_LENGTH = 50
sequences = DataPreprocessor.create_sequences(data_scaled, seq_length=SEQ_LENGTH)

print(f"Sequence shape: {sequences.shape}")
print(f"  - {sequences.shape[0]} samples")
print(f"  - {sequences.shape[1]} timesteps per sample")
print(f"  - {sequences.shape[2]} features")

In [None]:
# Split into train/test
train_size = int(0.8 * len(sequences))
train_data = sequences[:train_size]
test_data = sequences[train_size:]

print(f"Training samples: {len(train_data)}")
print(f"Test samples: {len(test_data)}")

## 3. Create & Train Model

In [None]:
# Create LSTM Autoencoder
model = LSTMAutoencoder(
    input_size=1,
    hidden_size=32,
    num_layers=1
)

print(model)

In [None]:
# Train the model
trainer = Trainer(model, learning_rate=0.001)

history = trainer.fit(
    train_data,
    val_data=test_data,
    epochs=50,
    batch_size=32,
    early_stopping_patience=10
)

In [None]:
# Plot training history
viz = AnomalyVisualizer()
viz.plot_training_history(history)

## 4. Detect Anomalies

In [None]:
# Initialize detector
detector = AnomalyDetector(model)

# Fit threshold on training data (normal patterns)
detector.fit(train_data, method=ThresholdMethod.MEAN_STD, n_std=2)

In [None]:
# Detect anomalies in test data
results = detector.detect(test_data)

print(f"\nüìä Detection Results:")
print(f"   Total samples: {len(test_data)}")
print(f"   Anomalies found: {results.is_anomaly.sum()}")
print(f"   Anomaly ratio: {results.anomaly_ratio:.2%}")

## 5. Visualize Results

In [None]:
# Plot anomaly scores
viz.plot_anomaly_scores(results)

In [None]:
# Plot error distribution
viz.plot_error_distribution(results)

In [None]:
# Get test data as flat array for visualization
test_flat = test_data[:, -1, 0]  # Last timestep of each sequence

# Plot time series with anomalies
viz.plot_time_series_with_anomalies(
    test_flat,
    results,
    title='Test Data with Detected Anomalies'
)

In [None]:
# Create comprehensive dashboard
viz.create_dashboard(
    test_flat,
    results,
    history,
    save_path='../output/dashboard.png'
)

## 6. Compare Threshold Methods

In [None]:
# Compare different threshold methods
methods = [
    (ThresholdMethod.MEAN_STD, {'n_std': 2}),
    (ThresholdMethod.MEAN_STD, {'n_std': 3}),
    (ThresholdMethod.PERCENTILE, {'percentile': 95}),
    (ThresholdMethod.IQR, {'k': 1.5}),
]

print("Threshold Method Comparison:")
print("-" * 50)

for method, params in methods:
    detector.fit(train_data, method=method, **params)
    results = detector.detect(test_data)
    print(f"{method.value:12} (params={params}): "
          f"{results.is_anomaly.sum():3} anomalies "
          f"({results.anomaly_ratio:.2%})")

## 7. Save Model

In [None]:
# Save trained model
trainer.save_checkpoint('../models/lstm_autoencoder.pt')

---
## Summary

This notebook demonstrated:
- ‚úÖ Synthetic data generation with injected anomalies
- ‚úÖ LSTM Autoencoder training with early stopping
- ‚úÖ Multiple threshold methods for anomaly detection
- ‚úÖ Comprehensive visualization of results

The LSTM Autoencoder successfully learned normal patterns and detected anomalies based on reconstruction error.