# üöÄ Production Model Training with Optimized Hyperparameters

## MLZoomcamp Capstone Project - Final Model Training

---

### üìã Overview

This notebook trains production-ready models using the optimized hyperparameters from our Optuna optimization. We'll:

1. **Load optimized hyperparameters** from the optimization phase
2. **Train models with best configurations** for maximum epochs
3. **Create optimized ensemble** with best weights
4. **Comprehensive evaluation** on test set
5. **Save production-ready models** with all artifacts

### üéØ Goals

- Achieve best possible performance with optimized settings
- Create deployment-ready model packages
- Generate final performance report
- Prepare models for API deployment

---

## 1. Environment Setup

In [2]:
import os
import sys
import warnings
import json
import pickle
import time
from pathlib import Path
from datetime import datetime
import hashlib
warnings.filterwarnings('ignore')

# Enable MPS fallback for M4 Mac
os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'

# Data manipulation
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim import Adam, AdamW
from torch.optim.lr_scheduler import ReduceLROnPlateau, CosineAnnealingLR

# Scikit-learn
from sklearn.metrics import (
    roc_auc_score, average_precision_score,
    precision_score, recall_score, f1_score,
    confusion_matrix, classification_report,
    precision_recall_curve, roc_curve
)

# Model tracking
from collections import defaultdict
import joblib

# Set device for M4 Mac
device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')
print(f"üî• Using device: {device}")

# Set random seeds
def set_seed(seed=42):
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

set_seed(42)

# Custom colors
COLORS = {
    'normal': '#2E7D32',
    'fraud': '#C62828',
    'primary': '#1565C0',
    'secondary': '#FF6F00',
    'success': '#4CAF50'
}

print("‚úÖ Environment setup complete")
print(f"üìÖ Training date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

üî• Using device: mps
‚úÖ Environment setup complete
üìÖ Training date: 2025-12-21 21:50:53


## 2. Load Data and Optimization Results

In [3]:
# Load preprocessed data
artifacts_dir = Path('artifacts')

X_train_scaled = np.load(artifacts_dir / 'X_train_scaled.npy')
X_val_scaled = np.load(artifacts_dir / 'X_val_scaled.npy')
X_test_scaled = np.load(artifacts_dir / 'X_test_scaled.npy')
y_train = np.load(artifacts_dir / 'y_train.npy')
y_val = np.load(artifacts_dir / 'y_val.npy')
y_test = np.load(artifacts_dir / 'y_test.npy')

# Load configuration
with open(artifacts_dir / 'config.json', 'r') as f:
    config = json.load(f)

n_features = config['n_features']

print(f"‚úÖ Data loaded: {n_features} features")
print(f"   Train: {X_train_scaled.shape[0]:,} samples ({y_train.sum():,} fraud)")
print(f"   Val: {X_val_scaled.shape[0]:,} samples ({y_val.sum():,} fraud)")
print(f"   Test: {X_test_scaled.shape[0]:,} samples ({y_test.sum():,} fraud)")

‚úÖ Data loaded: 88 features
   Train: 199,364 samples (384 fraud)
   Val: 42,721 samples (56 fraud)
   Test: 42,722 samples (52 fraud)


In [4]:
# Load optimization results (if available)
optim_dir = Path('optimization_results')

if optim_dir.exists() and (optim_dir / 'optimization_results.json').exists():
    with open(optim_dir / 'optimization_results.json', 'r') as f:
        optim_results = json.load(f)
    print("‚úÖ Loaded optimization results")
else:
    print("‚ö†Ô∏è No optimization results found, using default hyperparameters")
    # Default hyperparameters
    optim_results = {
        'autoencoder': {
            'best_params': {
                'encoding_dim': 32,
                'n_layers': 2,
                'hidden_dim_0': 64,
                'hidden_dim_1': 48,
                'dropout_rate': 0.2,
                'activation': 'relu',
                'batch_size': 256,
                'learning_rate': 0.001,
                'optimizer': 'adamw'
            }
        },
        'vae': {
            'best_params': {
                'latent_dim': 20,
                'n_layers': 2,
                'hidden_dim_0': 64,
                'hidden_dim_1': 48,
                'dropout_rate': 0.2,
                'activation': 'relu',
                'beta': 1.0,
                'batch_size': 256,
                'learning_rate': 0.001,
                'optimizer': 'adamw'
            }
        },
        'svdd': {
            'best_params': {
                'rep_dim': 20,
                'n_layers': 2,
                'hidden_dim_0': 64,
                'hidden_dim_1': 48,
                'dropout_rate': 0.2,
                'activation': 'relu',
                'batch_size': 256,
                'learning_rate': 0.001,
                'optimizer': 'adamw'
            }
        },
        'ensemble': {
            'best_weights': [0.3, 0.4, 0.3]
        },
        'threshold': {
            'best_percentile': 95.0
        }
    }

print("\nüìä Hyperparameters to use:")
print(f"   Autoencoder: {optim_results['autoencoder']['best_params'].get('encoding_dim', 32)}D encoding_dim")
print(f"   VAE: {optim_results['vae']['best_params'].get('latent_dim', 20)}D latent space")
print(f"   SVDD: {optim_results['svdd']['best_params'].get('rep_dim', 20)}D representation space")
print(f"   Threshold: {optim_results['threshold'].get('best_percentile', 95.0)}th percentile")

‚úÖ Loaded optimization results

üìä Hyperparameters to use:
   Autoencoder: 24D encoding_dim
   VAE: 16D latent space
   SVDD: 8D representation space
   Threshold: 96.52521733675086th percentile


In [5]:
# Create PyTorch datasets
class FraudDataset(Dataset):
    def __init__(self, X, y=None):
        self.X = torch.FloatTensor(X)
        self.y = torch.FloatTensor(y) if y is not None else None
        
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        if self.y is not None:
            return self.X[idx], self.y[idx]
        return self.X[idx]

# Create datasets for normal training samples only
normal_idx = y_train == 0
X_train_normal = X_train_scaled[normal_idx]
y_train_normal = y_train[normal_idx]

train_dataset_normal = FraudDataset(X_train_normal, y_train_normal)
train_dataset_all = FraudDataset(X_train_scaled, y_train)
val_dataset = FraudDataset(X_val_scaled, y_val)
test_dataset = FraudDataset(X_test_scaled, y_test)

print(f"\nüì¶ Datasets ready:")
print(f"   Normal training: {len(train_dataset_normal):,} samples")
print(f"   All training: {len(train_dataset_all):,} samples")
print(f"   Validation: {len(val_dataset):,} samples")
print(f"   Test: {len(test_dataset):,} samples")


üì¶ Datasets ready:
   Normal training: 198,980 samples
   All training: 199,364 samples
   Validation: 42,721 samples
   Test: 42,722 samples


## 3. Define Optimized Model Architectures

In [6]:
class OptimizedAutoencoder(nn.Module):
    """Autoencoder with optimized architecture"""
    
    def __init__(self, input_dim, params):
        super().__init__()
        
        encoding_dim = params.get('encoding_dim', 32)
        dropout_rate = params.get('dropout_rate', 0.2)
        activation = params.get('activation', 'relu')
        
        # Build hidden layers from params
        hidden_layers = []
        n_layers = params.get('n_layers', 2)
        for i in range(n_layers):
            hidden_dim = params.get(f'hidden_dim_{i}', 64)
            hidden_layers.append(hidden_dim)
        
        # Select activation
        if activation == 'relu':
            self.activation = nn.ReLU()
        elif activation == 'leaky_relu':
            self.activation = nn.LeakyReLU(0.1)
        elif activation == 'elu':
            self.activation = nn.ELU()
        else:
            self.activation = nn.Tanh()
        
        # Encoder
        encoder_layers = []
        prev_dim = input_dim
        
        for hidden_dim in hidden_layers:
            encoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                self.activation,
                nn.Dropout(dropout_rate)
            ])
            prev_dim = hidden_dim
        
        encoder_layers.append(nn.Linear(prev_dim, encoding_dim))
        self.encoder = nn.Sequential(*encoder_layers)
        
        # Decoder (mirror)
        decoder_layers = []
        prev_dim = encoding_dim
        
        for hidden_dim in reversed(hidden_layers):
            decoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                self.activation,
                nn.Dropout(dropout_rate)
            ])
            prev_dim = hidden_dim
        
        decoder_layers.append(nn.Linear(prev_dim, input_dim))
        self.decoder = nn.Sequential(*decoder_layers)
        
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded
    
    def get_embedding(self, x):
        return self.encoder(x)

In [7]:
class OptimizedVAE(nn.Module):
    """VAE with optimized architecture"""
    
    def __init__(self, input_dim, params):
        super().__init__()
        
        latent_dim = params.get('latent_dim', 20)
        dropout_rate = params.get('dropout_rate', 0.2)
        activation = params.get('activation', 'relu')
        
        # Build hidden layers
        hidden_layers = []
        n_layers = params.get('n_layers', 2)
        for i in range(n_layers):
            hidden_dim = params.get(f'hidden_dim_{i}', 64)
            hidden_layers.append(hidden_dim)
        
        # Select activation
        if activation == 'relu':
            self.activation = nn.ReLU()
        elif activation == 'leaky_relu':
            self.activation = nn.LeakyReLU(0.1)
        else:
            self.activation = nn.ELU()
        
        # Encoder
        encoder_layers = []
        prev_dim = input_dim
        
        for hidden_dim in hidden_layers:
            encoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                self.activation,
                nn.Dropout(dropout_rate)
            ])
            prev_dim = hidden_dim
        
        self.encoder = nn.Sequential(*encoder_layers)
        self.fc_mu = nn.Linear(prev_dim, latent_dim)
        self.fc_logvar = nn.Linear(prev_dim, latent_dim)
        
        # Decoder
        decoder_layers = []
        prev_dim = latent_dim
        
        for hidden_dim in reversed(hidden_layers):
            decoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                self.activation,
                nn.Dropout(dropout_rate)
            ])
            prev_dim = hidden_dim
        
        decoder_layers.append(nn.Linear(prev_dim, input_dim))
        self.decoder = nn.Sequential(*decoder_layers)
        
    def encode(self, x):
        h = self.encoder(x)
        mu = self.fc_mu(h)
        logvar = self.fc_logvar(h)
        logvar = torch.clamp(logvar, min=-20, max=2)
        return mu, logvar
    
    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std
    
    def decode(self, z):
        return self.decoder(z)
    
    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        recon = self.decode(z)
        return recon, mu, logvar

In [9]:
class OptimizedDeepSVDD(nn.Module):
    """Deep SVDD network for tabular anomaly detection"""
    def __init__(self, input_dim, params):
        super().__init__()
        rep_dim = params.get('rep_dim', 16)
        dropout_rate = params.get('dropout_rate', 0.2)  
        activation = params.get('activation', 'relu')
        
        # Select activation
        if activation == 'relu':
            self.activation = nn.ReLU()
        elif activation == 'leaky_relu':       
            self.activation = nn.LeakyReLU(0.1)
        elif activation == 'elu':
            self.activation = nn.ELU()
        else:
            self.activation = nn.Tanh()

        layers = []
        prev = input_dim    

        # Build hidden layers from params
        n_layers = params.get('n_layers', 2)
        # If params doesn't have hidden_dim_i, check for 'hidden_layers' list fallback
        if not any(f'hidden_dim_{i}' in params for i in range(n_layers)) and 'hidden_layers' in params:
            hidden_dims = params['hidden_layers']
        else:
            hidden_dims = [params.get(f'hidden_dim_{i}', 64) for i in range(n_layers)]

        for h in hidden_dims:
            layers.extend([
                nn.Linear(prev, h),
                nn.BatchNorm1d(h),
                self.activation,
                nn.Dropout(dropout_rate)
            ])
            prev = h
            
        layers.append(nn.Linear(prev, rep_dim))
        self.net = nn.Sequential(*layers)

    def forward(self, x):
        return self.net(x)

@torch.no_grad()
def init_svdd_center(model, loader, device, eps=1e-3):
    model.eval()
    reps = []
    for x, _ in loader:
        x = x.to(device)
        z = model(x)
        reps.append(z.detach().cpu())
    reps = torch.cat(reps, dim=0)
    c = reps.mean(dim=0)
    c[(c.abs() < eps)] = eps * c[(c.abs() < eps)].sign().clamp(min=1)
    return c.to(device)

## 4. Enhanced Training Pipeline

In [10]:
class ProductionTrainer:
    """Enhanced trainer for production models (AE / VAE / DeepSVDD)"""

    def __init__(self, model, model_type="autoencoder", params=None, device=None):
        self.device = device or torch.device(
            "mps" if torch.backends.mps.is_available()
            else "cuda" if torch.cuda.is_available()
            else "cpu"
        )

        self.model = model.to(self.device)
        self.model_type = model_type
        self.params = params or {}

        # Optimizer
        lr = float(self.params.get("learning_rate", 0.001))
        opt_name = str(self.params.get("optimizer", "adamw")).lower()

        if opt_name == "adamw":
            self.optimizer = AdamW(self.model.parameters(), lr=lr, weight_decay=1e-5)
        else:
            self.optimizer = Adam(self.model.parameters(), lr=lr)

        # Scheduler
        self.scheduler = ReduceLROnPlateau(self.optimizer, mode="min", patience=10, factor=0.5)

        # History tracking
        self.history = defaultdict(list)
        self.best_model_state = None
        self.best_metrics = {}

        # Deep SVDD center (trainer-level fallback)
        self.center_c = None

    @torch.no_grad()
    def init_svdd_center(self, train_loader, eps=1e-3, max_batches=None):
        """
        Initialize SVDD center c as mean representation over training data.
        Assumes the loader contains NORMAL transactions only (recommended).

        Assumes the loader contains NORMAL transactions only (recommended).
        """
        self.model.eval()
        reps = []
        for b_idx, data in enumerate(train_loader):
            if max_batches is not None and b_idx >= max_batches:
                break
            if isinstance(data, (list, tuple)):
                x = data[0]
            else:
                x = data
            x = x.to(self.device)

            z = self.model(x)  # DeepSVDD forward returns representation
            reps.append(z.detach().cpu())

        if len(reps) == 0:
            raise ValueError("Could not initialize SVDD center: empty loader?")
        reps = torch.cat(reps, dim=0)
        c = reps.mean(dim=0)

        # avoid exactly-0 dims
        mask = c.abs() < eps
        if mask.any():
            c[mask] = eps * torch.sign(c[mask]).clamp(min=1)

        # Prefer model buffer if present
        if hasattr(self.model, "center_c") and torch.is_tensor(getattr(self.model, "center_c")):
            self.model.center_c.data = c.to(self.device)
        else:
            # fallback
            self.center_c = c.to(self.device)

    def _svdd_center(self):
        if hasattr(self.model, "center_c") and torch.is_tensor(getattr(self.model, "center_c")):
            return self.model.center_c
        if self.center_c is None:
            raise RuntimeError("SVDD center not initialized. Call init_svdd_center() before training/eval.")
        return self.center_c

    def train_epoch(self, train_loader):
        self.model.train()
        total_loss = 0.0
        batch_count = 0
        for batch_idx, data in enumerate(train_loader):
            if isinstance(data, (list, tuple)):
                x = data[0]  # (x, y) or (x, _)\n
            else:
                x = data

            x = x.to(self.device)
            self.optimizer.zero_grad()
            if self.model_type == "vae":
                recon_x, mu, logvar = self.model(x)
                beta = float(self.params.get("beta", 1.0))
                recon_loss = F.mse_loss(recon_x, x, reduction="sum")
                kld_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp().clamp(max=1e10))
                loss = (recon_loss + beta * kld_loss) / x.size(0)

            elif self.model_type == "svdd":
                z = self.model(x)
                c = self._svdd_center()
                loss = torch.mean(torch.sum((z - c) ** 2, dim=1))

            else:  # autoencoder
                recon_x = self.model(x)
                loss = F.mse_loss(recon_x, x)

            if torch.isnan(loss) or torch.isinf(loss):
                continue

            loss.backward()
            torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
            self.optimizer.step()
            total_loss += float(loss.item())
            batch_count += 1
        return total_loss / max(batch_count, 1)

    @torch.no_grad()
    def _batch_scores(self, x):
        """
        Returns anomaly scores for a batch.
        Higher score = more anomalous (consistent across models).
        """
        if self.model_type == "vae":
            recon_x, _, _ = self.model(x)
            scores = F.mse_loss(recon_x, x, reduction="none").mean(dim=1)

        elif self.model_type in ["svdd", "deep_svdd"]:
            z = self.model(x)
            c = self._svdd_center()
            scores = torch.sum((z - c) ** 2, dim=1)

        else:
            recon_x = self.model(x)
            scores = F.mse_loss(recon_x, x, reduction="none").mean(dim=1)

        # stabilize
        scores = torch.nan_to_num(scores, nan=1000.0, posinf=1000.0, neginf=0.0)
        return scores

    def evaluate(self, data_loader):
        self.model.eval()
        all_scores = []
        all_labels = []
        total_loss = 0.0
        batch_count = 0

        for data in data_loader:
            x, y = data
            x = x.to(self.device)

            scores = self._batch_scores(x)
            scores_np = scores.detach().cpu().numpy()
            scores_np = np.clip(scores_np, 0, 1000)

            all_scores.extend(scores_np)
            all_labels.extend(y.numpy())

            total_loss += float(scores.mean().item())
            batch_count += 1

        all_labels = np.asarray(all_labels).astype(int)
        all_scores = np.asarray(all_scores).astype(float)
        metrics = {}
        if len(np.unique(all_labels)) > 1:
            try:
                metrics["auroc"] = float(roc_auc_score(all_labels, all_scores))
                metrics["auprc"] = float(average_precision_score(all_labels, all_scores))

                threshold_pct = float(self.params.get("threshold_percentile", 95))
                threshold = float(np.percentile(all_scores[all_labels == 0], threshold_pct))
                predictions = (all_scores > threshold).astype(int)

                metrics["precision"] = float(precision_score(all_labels, predictions, zero_division=0))
                metrics["recall"] = float(recall_score(all_labels, predictions, zero_division=0))
                metrics["f1"] = float(f1_score(all_labels, predictions, zero_division=0))
                metrics["threshold"] = threshold

            except Exception as e:
                print(f"Warning: Could not compute metrics: {e}")
                metrics = {"auroc": 0.0, "auprc": 0.0, "precision": 0.0, "recall": 0.0, "f1": 0.0}

        avg_loss = total_loss / max(batch_count, 1)
        return avg_loss, metrics, all_scores, all_labels
    def train(self, train_loader, val_loader, epochs=100, patience=20, verbose=True):
        """
        NOTE: For DeepSVDD you should pass train_loader built ONLY from normal transactions.
        """
        # Initialize SVDD center once (before training)
        if self.model_type in ["svdd", "deep_svdd"]:
            try:
                _ = self._svdd_center()
            except Exception:
                self.init_svdd_center(train_loader)

        best_auroc = 0.0
        patience_counter = 0

        for epoch in range(epochs):
            start_time = time.time()
            train_loss = self.train_epoch(train_loader)
            val_loss, val_metrics, _, _ = self.evaluate(val_loader)
            epoch_time = time.time() - start_time
            self.scheduler.step(val_loss)

            # history
            self.history["epoch"].append(epoch + 1)
            self.history["train_loss"].append(train_loss)
            self.history["val_loss"].append(val_loss)
            for k, v in val_metrics.items():
                self.history[f"val_{k}"].append(v)
            self.history["epoch_time"].append(epoch_time)

            current_auroc = float(val_metrics.get("auroc", 0.0))

            if current_auroc > best_auroc:
                best_auroc = current_auroc
                self.best_model_state = {k: v.detach().cpu().clone() for k, v in self.model.state_dict().items()}
                self.best_metrics = val_metrics.copy()
                patience_counter = 0
            else:
                patience_counter += 1

            if verbose and (epoch + 1) % 5 == 0:
                print(
                    f"Epoch [{epoch+1:3d}/{epochs}] | "
                    f"Train Loss: {train_loss:.6f} | "
                    f"Val Loss: {val_loss:.6f} | "
                    f"AUROC: {current_auroc:.4f} | "
                    f"Time: {epoch_time:.1f}s"
                )
            if patience_counter >= patience:
                if verbose:
                    print(f"\nEarly stopping at epoch {epoch + 1}")
                break

            if self.device.type == "mps" and epoch % 10 == 0:
                torch.mps.empty_cache()

        if self.best_model_state:
            self.model.load_state_dict(self.best_model_state)
        return self.history

## 5. Train Optimized Models

### 5.1 Train Optimized Autoencoder

In [13]:
print("üöÄ Training Optimized Autoencoder")
print("=" * 60)

# Get parameters
ae_params = optim_results['autoencoder']['best_params']
batch_size = ae_params.get('batch_size', 256)

# Create model
autoencoder_model = OptimizedAutoencoder(n_features, ae_params)
print(f"Model parameters: {sum(p.numel() for p in autoencoder_model.parameters()):,}")

# Create data loaders
train_loader = DataLoader(train_dataset_normal, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0)

# Train
ae_trainer = ProductionTrainer(autoencoder_model, model_type='autoencoder', params=ae_params)
ae_history = ae_trainer.train(
    train_loader, val_loader,
    epochs=100,
    patience=20,
    verbose=True
)

print(f"\n‚úÖ Best AUROC: {ae_trainer.best_metrics.get('auroc', 0):.4f}")
print(f"‚úÖ Best F1: {ae_trainer.best_metrics.get('f1', 0):.4f}")

üöÄ Training Optimized Autoencoder
Model parameters: 25,648
Epoch [  5/100] | Train Loss: 313.623086 | Val Loss: 35.492490 | AUROC: 0.9584 | Time: 4.1s
Epoch [ 10/100] | Train Loss: 309.191105 | Val Loss: 35.663969 | AUROC: 0.9508 | Time: 4.1s
Epoch [ 15/100] | Train Loss: 305.981955 | Val Loss: 34.173264 | AUROC: 0.9540 | Time: 3.9s
Epoch [ 20/100] | Train Loss: 303.378719 | Val Loss: 33.465468 | AUROC: 0.9547 | Time: 4.1s

Early stopping at epoch 24

‚úÖ Best AUROC: 0.9599
‚úÖ Best F1: 0.0446


### 5.2 Train Optimized VAE

In [14]:
print("\nüöÄ Training Optimized VAE")
print("=" * 60)

# Get parameters
vae_params = optim_results['vae']['best_params']
batch_size = vae_params.get('batch_size', 256)

# Create model
vae_model = OptimizedVAE(n_features, vae_params)
print(f"Model parameters: {sum(p.numel() for p in vae_model.parameters()):,}")

# Create data loaders
train_loader = DataLoader(train_dataset_normal, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0)

# Train
vae_trainer = ProductionTrainer(vae_model, model_type='vae', params=vae_params)
vae_history = vae_trainer.train(
    train_loader, val_loader,
    epochs=100,
    patience=20,
    verbose=True
)

print(f"\n‚úÖ Best AUROC: {vae_trainer.best_metrics.get('auroc', 0):.4f}")
print(f"‚úÖ Best F1: {vae_trainer.best_metrics.get('f1', 0):.4f}")


üöÄ Training Optimized VAE
Model parameters: 25,656
Epoch [  5/100] | Train Loss: 25385.298143 | Val Loss: 26.763980 | AUROC: 0.9538 | Time: 4.7s
Epoch [ 10/100] | Train Loss: 24233.465621 | Val Loss: 10.681532 | AUROC: 0.9512 | Time: 4.7s
Epoch [ 15/100] | Train Loss: 23577.098972 | Val Loss: 7.277839 | AUROC: 0.9575 | Time: 4.6s
Epoch [ 20/100] | Train Loss: 22944.951773 | Val Loss: 13.407018 | AUROC: 0.9530 | Time: 4.6s
Epoch [ 25/100] | Train Loss: 22625.811007 | Val Loss: 10.215022 | AUROC: 0.9489 | Time: 4.7s
Epoch [ 30/100] | Train Loss: 22190.093313 | Val Loss: 16.284673 | AUROC: 0.9483 | Time: 4.7s
Epoch [ 35/100] | Train Loss: 22078.887454 | Val Loss: 10.686662 | AUROC: 0.9507 | Time: 4.6s
Epoch [ 40/100] | Train Loss: 21861.115612 | Val Loss: 15.140689 | AUROC: 0.9505 | Time: 4.6s

Early stopping at epoch 42

‚úÖ Best AUROC: 0.9576
‚úÖ Best F1: 0.0446


### 5.3 Train Deep SVDD

Deep SVDD is trained on **normal** transactions only. We initialize the center `c` from the initial representations and optimize squared distance to `c`.


In [11]:
print("\nüöÄ Training Optimized SVDD")
print("=" * 60)

# Get parameters
svdd_params = optim_results['svdd']['best_params']
batch_size = svdd_params.get('batch_size', 256)

# Create model
svdd_model = OptimizedDeepSVDD(input_dim=n_features, params=svdd_params).to(device)
print(f"Model parameters: {sum(p.numel() for p in svdd_model.parameters()):,}")

# Create data loaders
train_loader = DataLoader(train_dataset_normal, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0)

# Train
svdd_trainer = ProductionTrainer(svdd_model, model_type='svdd', params=svdd_params)
svdd_trainer.init_svdd_center(train_loader)
svdd_history = svdd_trainer.train(
    train_loader, val_loader,
    epochs=100,
    patience=20,
    verbose=True
)

print(f"\n‚úÖ Best AUROC: {svdd_trainer.best_metrics.get('auroc', 0):.4f}")
print(f"‚úÖ Best F1: {svdd_trainer.best_metrics.get('f1', 0):.4f}")



üöÄ Training Optimized SVDD
Model parameters: 5,672
Epoch [  5/100] | Train Loss: 0.003449 | Val Loss: 0.000019 | AUROC: 0.9520 | Time: 2.5s
Epoch [ 10/100] | Train Loss: 0.000002 | Val Loss: 0.000000 | AUROC: 0.9395 | Time: 2.5s
Epoch [ 15/100] | Train Loss: 0.000000 | Val Loss: 0.000000 | AUROC: 0.7145 | Time: 2.5s
Epoch [ 20/100] | Train Loss: 0.000000 | Val Loss: 0.000000 | AUROC: 0.7472 | Time: 2.5s
Epoch [ 25/100] | Train Loss: 0.000001 | Val Loss: 0.000000 | AUROC: 0.9063 | Time: 2.5s

Early stopping at epoch 28

‚úÖ Best AUROC: 0.9613
‚úÖ Best F1: 0.0446


## 7.Model Comparison

In [16]:
# Evaluate all models on test set
test_loader = DataLoader(test_dataset, batch_size=256, shuffle=False, num_workers=0)

print("\nüìä Final Model Comparison on Test Set")
print("=" * 70)

comparison_results = []

# Evaluate individual models
for name, trainer in [('Autoencoder', ae_trainer), ('VAE', vae_trainer), ('SVDD', svdd_trainer)]:
    _, metrics, scores, labels = trainer.evaluate(test_loader)
    
    comparison_results.append({
        'Model': name,
        'AUROC': metrics.get('auroc', 0),
        'AUPRC': metrics.get('auprc', 0),
        'Precision': metrics.get('precision', 0),
        'Recall': metrics.get('recall', 0),
        'F1': metrics.get('f1', 0)
    })

# Create comparison dataframe
comparison_df = pd.DataFrame(comparison_results)
comparison_df = comparison_df.round(4)

print(comparison_df.to_string(index=False))

# Find best model
best_idx = comparison_df['AUROC'].idxmax()
print(f"\nüèÜ Best Model: {comparison_df.loc[best_idx, 'Model']} (AUROC={comparison_df.loc[best_idx, 'AUROC']:.4f})")


üìä Final Model Comparison on Test Set
      Model  AUROC  AUPRC  Precision  Recall     F1
Autoencoder 0.9190 0.0813     0.0198  0.8269 0.0386
        VAE 0.9551 0.1044     0.0198  0.8269 0.0386
       SVDD 0.9428 0.3078     0.0198  0.8269 0.0386

üèÜ Best Model: VAE (AUROC=0.9551)
