# Anomaly Detection Framework Demo

This notebook provides an end-to-end demonstration of three anomaly detection techniques on two distinct datasets: financial transactions and IoT sensor readings.

**Models Used:**
1. **Isolation Forest**
2. **Local Outlier Factor (LOF)**
3. **Autoencoder (Deep Learning)**

**Workflow:**
1. **Load & Prepare Data:** Use custom utility functions to load and scale both datasets.
2. **Run Models:** Apply each anomaly detection model to the prepared data.
3. **Analyze Results:** Identify the top anomalies flagged by each model and inspect their characteristics.

In [None]:
import pandas as pd
import numpy as np
import sys
import seaborn as sns
import matplotlib.pyplot as plt

# Add the src directory to the Python path to import our modules
sys.path.append('../src')

from data_utils import load_and_prepare_finance_data, load_and_prepare_iot_data
from anomaly_models import (
    run_isolation_forest,
    run_local_outlier_factor,
    build_autoencoder,
    get_reconstruction_errors
)

# Set plotting style
sns.set_style('whitegrid')

## 1. Load and Prepare Datasets

In [None]:
# Load financial data
finance_df, X_finance = load_and_prepare_finance_data('../data/finance_transactions.csv')
print("Financial Data Shape:", X_finance.shape)
display(finance_df.head())

# Load IoT data
iot_df, X_iot = load_and_prepare_iot_data('../data/iot_sensor_readings.csv')
print("\nIoT Data Shape:", X_iot.shape)
display(iot_df.head())

## 2. Model 1: Isolation Forest on Financial Data

Isolation Forest is efficient for high-dimensional data. We will apply it to the financial transaction data to find the most suspicious transactions.

In [None]:
# Get anomaly scores
finance_df['anomaly_score_isoforest'] = run_isolation_forest(X_finance)

# Inspect the top 10 most anomalous transactions
top_anomalies_isoforest = finance_df.sort_values(by='anomaly_score_isoforest', ascending=False)
print("Top 10 Anomalies (Isolation Forest):")
display(top_anomalies_isoforest.head(10))

## 3. Model 2: Local Outlier Factor (LOF) on IoT Data

LOF is great at finding anomalies in datasets where the density varies. We will use it on the IoT sensor data to identify potential equipment faults.

In [None]:
# Get anomaly flags (-1 for anomaly, 1 for normal)
iot_df['anomaly_flag_lof'] = run_local_outlier_factor(X_iot)

# Inspect all flagged anomalies
anomalies_lof = iot_df[iot_df['anomaly_flag_lof'] == -1]
print(f"Total Anomalies Found (LOF): {len(anomalies_lof)}")
display(anomalies_lof.head(10))

## 4. Model 3: Autoencoder on Financial Data

Autoencoders learn to reconstruct normal data. Anomalies will have a high reconstruction error. We will train an autoencoder on the financial data.

In [None]:
# Build and train the autoencoder
autoencoder = build_autoencoder(input_dim=X_finance.shape[1])
history = autoencoder.fit(
    X_finance, X_finance,
    epochs=20,
    batch_size=32,
    shuffle=True,
    validation_split=0.1,
    verbose=1
)

In [None]:
# Get reconstruction errors
finance_df['reconstruction_error'] = get_reconstruction_errors(autoencoder, X_finance)

# Plot the distribution of errors
plt.figure(figsize=(12, 6))
sns.histplot(finance_df['reconstruction_error'], bins=50, kde=True)
plt.title('Distribution of Reconstruction Errors')
plt.show()

# Inspect the top 10 anomalies based on reconstruction error
top_anomalies_autoencoder = finance_df.sort_values(by='reconstruction_error', ascending=False)
print("Top 10 Anomalies (Autoencoder):")
display(top_anomalies_autoencoder.head(10))