# Autoencoder â€“ Anomaly Detection

This notebook implements an Autoencoder-based anomaly detection model
for fraud detection.

The model is trained only on normal transactions and uses reconstruction
error to identify anomalous (fraudulent) transactions. Performance is
evaluated on a validation set containing both normal and fraud samples.


In [7]:
# -----------------------------
# Imports
# -----------------------------

import numpy as np

from sklearn.neural_network import MLPRegressor
from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    classification_report
)

print("MLPRegressor imported successfully")


MLPRegressor imported successfully


In [6]:
# -----------------------------
# Load Preprocessed Data
# -----------------------------
import numpy as np

ARTIFACT_DIR = "../models"

X_train_scaled = np.load(f"{ARTIFACT_DIR}/X_train_scaled.npy")
X_val_scaled = np.load(f"{ARTIFACT_DIR}/X_val_scaled.npy")
y_val = np.load(f"{ARTIFACT_DIR}/y_val.npy")

print("Data loaded:")
print("X_train_scaled:", X_train_scaled.shape)
print("X_val_scaled:", X_val_scaled.shape)
print("y_val:", y_val.shape)


Data loaded:
X_train_scaled: (227452, 38)
X_val_scaled: (57355, 38)
y_val: (57355,)


In [9]:
# -----------------------------
# Train Autoencoder (sklearn)
# -----------------------------
# Autoencoder is implemented by training MLPRegressor to reconstruct input

autoencoder = MLPRegressor(
    hidden_layer_sizes=(32, 16, 32),
    activation="relu",
    solver="adam",
    max_iter=100,
    batch_size=256,
    random_state=42,
    verbose=True
)

print("Training autoencoder on normal transactions...")
autoencoder.fit(X_train_scaled, X_train_scaled)

print("Autoencoder training completed.")


Training autoencoder on normal transactions...
Iteration 1, loss = 0.28493462
Iteration 2, loss = 0.18193088
Iteration 3, loss = 0.15714369
Iteration 4, loss = 0.14417441
Iteration 5, loss = 0.13515417
Iteration 6, loss = 0.12939991
Iteration 7, loss = 0.12405058
Iteration 8, loss = 0.11488047
Iteration 9, loss = 0.10836408
Iteration 10, loss = 0.10562839
Iteration 11, loss = 0.10382273
Iteration 12, loss = 0.10240649
Iteration 13, loss = 0.10103771
Iteration 14, loss = 0.09983008
Iteration 15, loss = 0.09878532
Iteration 16, loss = 0.09778978
Iteration 17, loss = 0.09704785
Iteration 18, loss = 0.09651044
Iteration 19, loss = 0.09614237
Iteration 20, loss = 0.09575935
Iteration 21, loss = 0.09548136
Iteration 22, loss = 0.09519488
Iteration 23, loss = 0.09488328
Iteration 24, loss = 0.09461577
Iteration 25, loss = 0.09460022
Iteration 26, loss = 0.09415551
Iteration 27, loss = 0.09397365
Iteration 28, loss = 0.09390484
Iteration 29, loss = 0.09380569
Iteration 30, loss = 0.09366143
It

In [10]:
# -----------------------------
# Reconstruction Error
# -----------------------------
# The autoencoder tries to reconstruct normal transactions.
# Higher reconstruction error => more anomalous => more likely fraud.

# Reconstruct validation data
X_val_recon = autoencoder.predict(X_val_scaled)

# Mean Squared Error per transaction
recon_error = np.mean((X_val_scaled - X_val_recon) ** 2, axis=1)

print("Reconstruction error computed successfully")
print("Reconstruction error statistics:")
print("Min:", recon_error.min())
print("Max:", recon_error.max())
print("Mean:", recon_error.mean())
print("Std:", recon_error.std())


Reconstruction error computed successfully
Reconstruction error statistics:
Min: 0.005939563733622087
Max: 38.55350006530182
Mean: 0.20513079251561092
Std: 0.6157717350340094


## Reconstruction Error Analysis

The reconstruction error was computed for all validation transactions to serve as the anomaly score for the autoencoder model.

- **Minimum error (~0.006)** corresponds to transactions that closely follow normal behavior and are well reconstructed by the model.
- **Mean error (~0.205)** represents typical reconstruction quality for normal transactions.
- **Standard deviation (~0.616)** indicates a clear spread in reconstruction errors, suggesting the presence of anomalous patterns.
- **Maximum error (~38.55)** is significantly higher than the mean, indicating a small number of transactions that deviate strongly from normal behavior and are likely fraudulent.

These results confirm that reconstruction error provides a meaningful and well-separated anomaly signal, which can be used for fraud scoring, probability estimation, and further evaluation.


In [11]:
# -----------------------------
# Thresholding for Autoencoder
# -----------------------------
# We flag the top X% highest reconstruction errors as anomalies.
# This is consistent with unsupervised anomaly detection practice.

# Choose threshold percentile (slightly higher than true fraud rate)
THRESHOLD_PERCENTILE = 99.5  # top 0.5% most anomalous

threshold = np.percentile(recon_error, THRESHOLD_PERCENTILE)

# Binary predictions: 1 = fraud, 0 = normal
y_pred_auto = (recon_error > threshold).astype(int)

print("Autoencoder thresholding completed")
print("Threshold value:", threshold)
print("Predicted fraud count:", y_pred_auto.sum())


Autoencoder thresholding completed
Threshold value: 1.9491261095898802
Predicted fraud count: 287


In [12]:
# -----------------------------
# Autoencoder Evaluation
# -----------------------------
from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    classification_report,
    confusion_matrix
)

# True labels
y_true = y_val

# Binary predictions from autoencoder
y_pred = y_pred_auto

# Metrics
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)
roc_auc = roc_auc_score(y_true, recon_error)

print("Autoencoder Performance:")
print(f"Precision: {precision:.4f}")
print(f"Recall:    {recall:.4f}")
print(f"F1-score:  {f1:.4f}")
print(f"ROC-AUC:   {roc_auc:.4f}")

print("\nClassification Report:")
print(classification_report(y_true, y_pred))

# Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:")
print(cm)


Autoencoder Performance:
Precision: 0.7491
Recall:    0.4370
F1-score:  0.5520
ROC-AUC:   0.9357

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     56863
           1       0.75      0.44      0.55       492

    accuracy                           0.99     57355
   macro avg       0.87      0.72      0.77     57355
weighted avg       0.99      0.99      0.99     57355

Confusion Matrix:
[[56791    72]
 [  277   215]]


In [13]:
# -----------------------------
# Threshold Sensitivity Analysis
# -----------------------------
from sklearn.metrics import precision_score, recall_score, f1_score

for p in [99.5, 99.3, 99.1]:
    threshold = np.percentile(recon_error, p)
    y_pred_tmp = (recon_error > threshold).astype(int)

    precision = precision_score(y_val, y_pred_tmp)
    recall = recall_score(y_val, y_pred_tmp)
    f1 = f1_score(y_val, y_pred_tmp)

    print(f"\nThreshold Percentile: {p}")
    print("Predicted fraud count:", y_pred_tmp.sum())
    print(f"Precision: {precision:.4f}")
    print(f"Recall:    {recall:.4f}")
    print(f"F1-score:  {f1:.4f}")



Threshold Percentile: 99.5
Predicted fraud count: 287
Precision: 0.7491
Recall:    0.4370
F1-score:  0.5520

Threshold Percentile: 99.3
Predicted fraud count: 402
Precision: 0.7214
Recall:    0.5894
F1-score:  0.6488

Threshold Percentile: 99.1
Predicted fraud count: 517
Precision: 0.6441
Recall:    0.6768
F1-score:  0.6601


## Threshold Sensitivity Analysis

The autoencoder produces a continuous reconstruction error score, which must be converted into a binary fraud decision using a threshold.
Since anomaly detection models do not learn an explicit decision boundary, the choice of threshold directly controls the trade-off between **precision** (false positives) and **recall** (missed frauds).

To analyze this trade-off, multiple percentile-based thresholds were evaluated:

- **99.5 percentile** (top 0.5% anomalies):
  - High precision (0.75)
  - Lower recall (0.44)
  - Conservative behavior with fewer false positives

- **99.3 percentile** (top 0.7% anomalies):
  - Improved recall (0.59)
  - Slight reduction in precision (0.72)
  - Better balance between false positives and missed frauds

- **99.1 percentile** (top 0.9% anomalies):
  - Highest recall (0.68)
  - Lower precision (0.64)
  - More aggressive fraud detection with increased false positives

This analysis demonstrates that the autoencoder itself remains unchanged; only the **decision threshold** is adjusted to reflect different operational preferences.
Lowering the threshold increases recall at the cost of additional false positives, which is a common and acceptable trade-off in fraud detection systems where missing fraudulent transactions is often more costly than reviewing extra alerts.


In [14]:
# -----------------------------
# Final Autoencoder Evaluation (Selected Threshold)
# -----------------------------
# Using 99.3 percentile based on threshold sensitivity analysis

from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    classification_report,
    confusion_matrix
)

# Selected threshold
FINAL_THRESHOLD_PERCENTILE = 99.3
final_threshold = np.percentile(recon_error, FINAL_THRESHOLD_PERCENTILE)

# Final binary predictions
y_pred_final = (recon_error > final_threshold).astype(int)

# Metrics
precision = precision_score(y_val, y_pred_final)
recall = recall_score(y_val, y_pred_final)
f1 = f1_score(y_val, y_pred_final)
roc_auc = roc_auc_score(y_val, recon_error)

print("Final Autoencoder Evaluation (Threshold = 99.3 percentile)")
print(f"Threshold value: {final_threshold:.4f}")
print(f"Predicted fraud count: {y_pred_final.sum()}")

print("\nPerformance Metrics:")
print(f"Precision: {precision:.4f}")
print(f"Recall:    {recall:.4f}")
print(f"F1-score:  {f1:.4f}")
print(f"ROC-AUC:   {roc_auc:.4f}")

print("\nClassification Report:")
print(classification_report(y_val, y_pred_final))

print("Confusion Matrix:")
print(confusion_matrix(y_val, y_pred_final))


Final Autoencoder Evaluation (Threshold = 99.3 percentile)
Threshold value: 1.4625
Predicted fraud count: 402

Performance Metrics:
Precision: 0.7214
Recall:    0.5894
F1-score:  0.6488
ROC-AUC:   0.9357

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     56863
           1       0.72      0.59      0.65       492

    accuracy                           0.99     57355
   macro avg       0.86      0.79      0.82     57355
weighted avg       0.99      0.99      0.99     57355

Confusion Matrix:
[[56751   112]
 [  202   290]]


## Final Autoencoder Evaluation (Selected Threshold)

Based on the threshold sensitivity analysis, the **99.3 percentile** of reconstruction error was selected as the final decision threshold for the autoencoder.

### Threshold Details
- **Threshold value:** 1.4625
- **Predicted fraud transactions:** 402

### Performance Metrics
- **Precision:** 0.72
- **Recall:** 0.59
- **F1-score:** 0.65
- **ROC-AUC:** 0.94

### Confusion Matrix Interpretation
- **True Positives (TP):** 290 fraud transactions correctly identified
- **False Positives (FP):** 112 normal transactions incorrectly flagged
- **False Negatives (FN):** 202 fraud transactions missed
- **True Negatives (TN):** 56,751 normal transactions correctly classified

### Interpretation
The selected threshold provides a balanced trade-off between precision and recall.
Compared to a more conservative threshold, recall improves significantly while precision remains strong, resulting in the highest F1-score achieved by the autoencoder.

This confirms that the autoencoder, when combined with appropriate threshold tuning, can effectively detect fraudulent transactions while maintaining a manageable false-positive rate.


In [1]:
# -----------------------------
# Environment & Library Check
# -----------------------------
import sys
import torch
import numpy as np

print("Python executable:", sys.executable)
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("NumPy version:", np.__version__)


Microsoft Visual C++ Redistributable is not installed, this may lead to the DLL load failure.
It can be downloaded at https://aka.ms/vs/17/release/vc_redist.x64.exe


OSError: [WinError 126] The specified module could not be found. Error loading "C:\Users\rajth\OneDrive\Desktop\Fraud_Anomaly_Detection\.venv\Lib\site-packages\torch\lib\c10.dll" or one of its dependencies.

In [5]:
# -----------------------------
# Imports & Load Preprocessing Artifacts
# -----------------------------
import sys
import os

# Add project root to Python path
sys.path.append(os.path.abspath(".."))

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    classification_report
)

# Device configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

# Load preprocessing artifacts
ARTIFACT_DIR = "../models"

X_train_scaled = np.load(f"{ARTIFACT_DIR}/X_train_scaled.npy")
X_val_scaled = np.load(f"{ARTIFACT_DIR}/X_val_scaled.npy")
y_val = np.load(f"{ARTIFACT_DIR}/y_val.npy")

print("Artifacts loaded successfully")
print("Train shape:", X_train_scaled.shape)
print("Validation shape:", X_val_scaled.shape)
print("Validation labels shape:", y_val.shape)


Microsoft Visual C++ Redistributable is not installed, this may lead to the DLL load failure.
It can be downloaded at https://aka.ms/vs/17/release/vc_redist.x64.exe


OSError: [WinError 126] The specified module could not be found. Error loading "C:\Users\rajth\OneDrive\Desktop\Fraud_Anomaly_Detection\.venv\Lib\site-packages\torch\lib\c10.dll" or one of its dependencies.