# ‚ö° Hyperparameter Optimization for Fraud Detection Models

## MLZoomcamp Capstone Project - Model Optimization

---

### üìã Overview

This notebook implements systematic hyperparameter optimization using Optuna to:
1. **Optimize individual models** (Autoencoder, VAE, Deep SVDD)
2. **Optimize ensemble weights** for best combination
3. **Optimize decision thresholds** for business metrics
4. **Multi-objective optimization** balancing precision and recall

### üéØ Objectives

- Find optimal architectures and training parameters
- Maximize AUROC and business metrics
- Create Pareto-optimal solutions for different business needs
- Reduce training time through pruning

---

## 1. Environment Setup

In [None]:
# Install Optuna if not already installed
# !pip install optuna optuna-dashboard plotly scikit-optimize

In [1]:
import os
import sys
import warnings
import json
import pickle
import time
from pathlib import Path
from datetime import datetime
warnings.filterwarnings('ignore')

# Enable MPS fallback for M4 Mac
os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'

# Data manipulation
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim import Adam, AdamW, SGD
from torch.optim.lr_scheduler import ReduceLROnPlateau, CosineAnnealingLR

# Scikit-learn
from sklearn.metrics import (
    roc_auc_score, average_precision_score,
    precision_score, recall_score, f1_score,
    confusion_matrix
)

# Optuna for hyperparameter optimization
import optuna
from optuna import Trial
from optuna.pruners import MedianPruner
from optuna.samplers import TPESampler
from optuna.visualization import (
    plot_optimization_history,
    plot_param_importances,
    plot_contour,
    plot_parallel_coordinate,
    plot_pareto_front
)

# Set device for M4 Mac
device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')
print(f"üî• Using device: {device}")

# Set random seeds
def set_seed(seed=42):
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

set_seed(42)

# Custom colors
COLORS = {
    'normal': '#2E7D32',
    'fraud': '#C62828',
    'primary': '#1565C0',
    'secondary': '#FF6F00',
    'success': '#4CAF50',
    'warning': '#FFC107'
}

print("‚úÖ Environment setup complete")
print(f"üì¶ Optuna version: {optuna.__version__}")

üî• Using device: mps
‚úÖ Environment setup complete
üì¶ Optuna version: 4.6.0


## 2. Load Data and Previous Models

In [2]:
# Load preprocessed data
artifacts_dir = Path('artifacts')

X_train_scaled = np.load(artifacts_dir / 'X_train_scaled.npy')
X_val_scaled = np.load(artifacts_dir / 'X_val_scaled.npy')
X_test_scaled = np.load(artifacts_dir / 'X_test_scaled.npy')
y_train = np.load(artifacts_dir / 'y_train.npy')
y_val = np.load(artifacts_dir / 'y_val.npy')
y_test = np.load(artifacts_dir / 'y_test.npy')

# Load configuration
with open(artifacts_dir / 'config.json', 'r') as f:
    config = json.load(f)

n_features = config['n_features']

print(f"‚úÖ Data loaded: {n_features} features")
print(f"   Train: {X_train_scaled.shape[0]} samples")
print(f"   Val: {X_val_scaled.shape[0]} samples")
print(f"   Test: {X_test_scaled.shape[0]} samples")

‚úÖ Data loaded: 88 features
   Train: 199364 samples
   Val: 42721 samples
   Test: 42722 samples


In [3]:
# Create PyTorch datasets
class FraudDataset(Dataset):
    def __init__(self, X, y=None):
        self.X = torch.FloatTensor(X)
        self.y = torch.FloatTensor(y) if y is not None else None
        
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        if self.y is not None:
            return self.X[idx], self.y[idx]
        return self.X[idx]

# Create datasets
normal_idx = y_train == 0
X_train_normal = X_train_scaled[normal_idx]
y_train_normal = y_train[normal_idx]

train_dataset_normal = FraudDataset(X_train_normal, y_train_normal)
val_dataset = FraudDataset(X_val_scaled, y_val)
test_dataset = FraudDataset(X_test_scaled, y_test)

print(f"\nüìä Datasets created:")
print(f"   Normal training samples: {len(train_dataset_normal)}")
print(f"   Validation samples: {len(val_dataset)}")
print(f"   Test samples: {len(test_dataset)}")


üìä Datasets created:
   Normal training samples: 198980
   Validation samples: 42721
   Test samples: 42722


## 3. Define Model Architectures for Optimization

In [4]:
class OptimizableAutoencoder(nn.Module):
    """Autoencoder with configurable architecture"""
    
    def __init__(self, input_dim, encoding_dim, hidden_layers, dropout_rate=0.2, activation='relu'):
        super().__init__()
        
        # Select activation function
        if activation == 'relu':
            self.activation = nn.ReLU()
        elif activation == 'leaky_relu':
            self.activation = nn.LeakyReLU(0.1)
        elif activation == 'elu':
            self.activation = nn.ELU()
        else:
            self.activation = nn.Tanh()
        
        # Build encoder
        encoder_layers = []
        prev_dim = input_dim
        
        for hidden_dim in hidden_layers:
            encoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                self.activation,
                nn.Dropout(dropout_rate)
            ])
            prev_dim = hidden_dim
        
        encoder_layers.append(nn.Linear(prev_dim, encoding_dim))
        self.encoder = nn.Sequential(*encoder_layers)
        
        # Build decoder (mirror of encoder)
        decoder_layers = []
        prev_dim = encoding_dim
        
        for hidden_dim in reversed(hidden_layers):
            decoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                self.activation,
                nn.Dropout(dropout_rate)
            ])
            prev_dim = hidden_dim
        
        decoder_layers.append(nn.Linear(prev_dim, input_dim))
        self.decoder = nn.Sequential(*decoder_layers)
        
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

In [5]:
class OptimizableVAE(nn.Module):
    """VAE with configurable architecture"""
    
    def __init__(self, input_dim, latent_dim, hidden_layers, dropout_rate=0.2, activation='relu'):
        super().__init__()
        
        # Select activation
        if activation == 'relu':
            self.activation = nn.ReLU()
        elif activation == 'leaky_relu':
            self.activation = nn.LeakyReLU(0.1)
        else:
            self.activation = nn.ELU()
        
        # Encoder
        encoder_layers = []
        prev_dim = input_dim
        
        for hidden_dim in hidden_layers:
            encoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                self.activation,
                nn.Dropout(dropout_rate)
            ])
            prev_dim = hidden_dim
        
        self.encoder = nn.Sequential(*encoder_layers)
        self.fc_mu = nn.Linear(prev_dim, latent_dim)
        self.fc_logvar = nn.Linear(prev_dim, latent_dim)
        
        # Decoder
        decoder_layers = []
        prev_dim = latent_dim
        
        for hidden_dim in reversed(hidden_layers):
            decoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                self.activation,
                nn.Dropout(dropout_rate)
            ])
            prev_dim = hidden_dim
        
        decoder_layers.append(nn.Linear(prev_dim, input_dim))
        self.decoder = nn.Sequential(*decoder_layers)
        
    def encode(self, x):
        h = self.encoder(x)
        mu = self.fc_mu(h)
        logvar = self.fc_logvar(h)
        logvar = torch.clamp(logvar, min=-20, max=2)  # Numerical stability
        return mu, logvar
    
    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std
    
    def decode(self, z):
        return self.decoder(z)
    
    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        recon = self.decode(z)
        return recon, mu, logvar

In [6]:
class OptimizableDeepSVDD(nn.Module):
    """Deep SVDD with configurable architecture"""
    def __init__(self, input_dim, rep_dim, hidden_layers, dropout_rate=0.0, activation='relu'):
        super().__init__()

        if activation == 'relu':
            act = nn.ReLU()
        elif activation == 'leaky_relu':
            act = nn.LeakyReLU(0.1)
        else:
            act = nn.ELU()

        layers = []
        prev_dim = input_dim
        for h in hidden_layers:
            layers += [nn.Linear(prev_dim, h), act]
            if dropout_rate > 0:
                layers += [nn.Dropout(dropout_rate)]
            prev_dim = h
        layers += [nn.Linear(prev_dim, rep_dim)]
        self.net = nn.Sequential(*layers)

    def forward(self, x):
        return self.net(x)


@torch.no_grad()
def init_svdd_center(model, loader, device, eps=1e-3):
    model.eval()
    reps = []
    for batch in loader:
        x, _ = batch
        x = x.to(device)
        z = model(x)
        reps.append(z.detach().cpu())
    reps = torch.cat(reps, dim=0)
    c = reps.mean(dim=0)
    c[(c.abs() < eps)] = eps * c[(c.abs() < eps)].sign().clamp(min=1)
    return c.to(device)


## 4. Training and Evaluation Functions

In [7]:
def train_model(model, train_loader, val_loader, epochs, learning_rate, 
                optimizer_name='adam', model_type='autoencoder', beta=1.0):
    """Train and evaluate a model"""
    
    # Select optimizer
    if optimizer_name == 'adam':
        optimizer = Adam(model.parameters(), lr=learning_rate)
    elif optimizer_name == 'adamw':
        optimizer = AdamW(model.parameters(), lr=learning_rate, weight_decay=1e-5)
    else:
        optimizer = SGD(model.parameters(), lr=learning_rate, momentum=0.9)
    
    scheduler = ReduceLROnPlateau(optimizer, mode='min', patience=5, factor=0.5)
    
    best_val_loss = float('inf')
    patience_counter = 0
    patience = 10
    
    for epoch in range(epochs):
        # Training
        model.train()
        train_loss = 0
        
        for batch_idx, data in enumerate(train_loader):
            if isinstance(data, (list, tuple)):
                x, _ = data
            else:
                x = data
            
            x = x.to(device)
            optimizer.zero_grad()
            
            if model_type == 'vae':
                recon_x, mu, logvar = model(x)
                recon_loss = F.mse_loss(recon_x, x, reduction='sum')
                kld_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp().clamp(max=1e10))
                loss = (recon_loss + beta * kld_loss) / x.size(0)
            else:
                recon_x = model(x)
                loss = F.mse_loss(recon_x, x)
            
            if torch.isnan(loss) or torch.isinf(loss):
                continue
            
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            optimizer.step()
            
            train_loss += loss.item()
        
        # Validation
        model.eval()
        val_loss = 0
        all_scores = []
        all_labels = []
        
        with torch.no_grad():
            for data in val_loader:
                x, y = data
                x = x.to(device)
                
                if model_type == 'vae':
                    recon_x, mu, logvar = model(x)
                    scores = F.mse_loss(recon_x, x, reduction='none').mean(dim=1)
                else:
                    recon_x = model(x)
                    scores = F.mse_loss(recon_x, x, reduction='none').mean(dim=1)
                
                scores_np = scores.cpu().numpy()
                scores_np = np.nan_to_num(scores_np, nan=1000.0, posinf=1000.0, neginf=0.0)
                scores_np = np.clip(scores_np, 0, 1000)
                
                all_scores.extend(scores_np)
                all_labels.extend(y.numpy())
                val_loss += scores.mean().item()
        
        val_loss = val_loss / len(val_loader)
        scheduler.step(val_loss)
        
        # Early stopping
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            patience_counter = 0
        else:
            patience_counter += 1
            if patience_counter >= patience:
                break
    
    # Calculate final metrics
    all_labels = np.array(all_labels)
    all_scores = np.array(all_scores)
    
    if len(np.unique(all_labels)) > 1:
        try:
            auroc = roc_auc_score(all_labels, all_scores)
            auprc = average_precision_score(all_labels, all_scores)
        except:
            auroc = 0.0
            auprc = 0.0
    else:
        auroc = 0.0
        auprc = 0.0
    
    return auroc, auprc, best_val_loss

## 5. Optuna Optimization for Baseline Autoencoder

In [8]:
def objective_autoencoder(trial: Trial) -> float:
    """Objective function for optimizing autoencoder"""
    
    # Hyperparameters to optimize
    encoding_dim = trial.suggest_int('encoding_dim', 8, 64, step=4)
    n_layers = trial.suggest_int('n_layers', 1, 3)
    
    hidden_layers = []
    for i in range(n_layers):
        hidden_dim = trial.suggest_int(f'hidden_dim_{i}', 32, 128, step=16)
        hidden_layers.append(hidden_dim)
    
    dropout_rate = trial.suggest_float('dropout_rate', 0.1, 0.5)
    activation = trial.suggest_categorical('activation', ['relu', 'leaky_relu', 'elu', 'tanh'])
    
    batch_size = trial.suggest_categorical('batch_size', [128, 256, 512])
    learning_rate = trial.suggest_float('learning_rate', 1e-4, 1e-2, log=True)
    optimizer_name = trial.suggest_categorical('optimizer', ['adam', 'adamw'])
    
    # Create model
    model = OptimizableAutoencoder(
        input_dim=n_features,
        encoding_dim=encoding_dim,
        hidden_layers=hidden_layers,
        dropout_rate=dropout_rate,
        activation=activation
    ).to(device)
    
    # Create data loaders
    train_loader = DataLoader(train_dataset_normal, batch_size=batch_size, shuffle=True, num_workers=0)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0)
    
    # Train model
    auroc, auprc, val_loss = train_model(
        model, train_loader, val_loader,
        epochs=30,  # Reduced for faster optimization
        learning_rate=learning_rate,
        optimizer_name=optimizer_name,
        model_type='autoencoder'
    )
    
    # Report intermediate values for pruning
    trial.report(auroc, 0)
    
    # Handle pruning
    if trial.should_prune():
        raise optuna.TrialPruned()
    
    return auroc  # Maximize AUROC

# Create and run study
print("üöÄ Optimizing Baseline Autoencoder...")
print("=" * 60)

study_ae = optuna.create_study(
    direction='maximize',
    sampler=TPESampler(seed=42),
    pruner=MedianPruner(n_startup_trials=5, n_warmup_steps=10)
)

study_ae.optimize(
    objective_autoencoder,
    n_trials=20,  # Adjust based on time constraints
    timeout=600,  # 10 minutes timeout
    show_progress_bar=True
)

print(f"\n‚úÖ Best AUROC: {study_ae.best_value:.4f}")
print("\nüìä Best parameters:")
for key, value in study_ae.best_params.items():
    print(f"   {key}: {value}")

[I 2025-12-21 19:59:41,207] A new study created in memory with name: no-name-2b814046-9690-4ee5-acac-e638ea172b06


üöÄ Optimizing Baseline Autoencoder...


Best trial: 0. Best value: 0.948543:   5%|‚ñå         | 1/20 [02:17<43:30, 137.40s/it, 137.40/600 seconds]

[I 2025-12-21 20:01:58,622] Trial 0 finished with value: 0.9485432606184392 and parameters: {'encoding_dim': 28, 'n_layers': 3, 'hidden_dim_0': 112, 'hidden_dim_1': 96, 'hidden_dim_2': 48, 'dropout_rate': 0.16239780813448107, 'activation': 'leaky_relu', 'batch_size': 256, 'learning_rate': 0.00026587543983272726, 'optimizer': 'adamw'}. Best is trial 0 with value: 0.9485432606184392.


Best trial: 1. Best value: 0.950824:  10%|‚ñà         | 2/20 [05:51<54:45, 182.54s/it, 351.53/600 seconds]

[I 2025-12-21 20:05:32,752] Trial 1 finished with value: 0.9508243207044917 and parameters: {'encoding_dim': 24, 'n_layers': 2, 'hidden_dim_0': 80, 'hidden_dim_1': 64, 'dropout_rate': 0.3447411578889518, 'activation': 'tanh', 'batch_size': 128, 'learning_rate': 0.0015304852121831476, 'optimizer': 'adamw'}. Best is trial 1 with value: 0.9508243207044917.


Best trial: 1. Best value: 0.950824:  15%|‚ñà‚ñå        | 3/20 [06:43<34:49, 122.93s/it, 403.52/600 seconds]

[I 2025-12-21 20:06:24,744] Trial 2 finished with value: 0.950276029197569 and parameters: {'encoding_dim': 16, 'n_layers': 1, 'hidden_dim_0': 128, 'dropout_rate': 0.4862528132298238, 'activation': 'relu', 'batch_size': 512, 'learning_rate': 0.00011715937392307068, 'optimizer': 'adam'}. Best is trial 1 with value: 0.9508243207044917.


Best trial: 1. Best value: 0.950824:  20%|‚ñà‚ñà        | 4/20 [07:33<25:08, 94.30s/it, 453.94/600 seconds] 

[I 2025-12-21 20:07:15,162] Trial 3 finished with value: 0.9493728131121193 and parameters: {'encoding_dim': 44, 'n_layers': 1, 'hidden_dim_0': 80, 'dropout_rate': 0.31868411173731187, 'activation': 'leaky_relu', 'batch_size': 512, 'learning_rate': 0.00015030900645056822, 'optimizer': 'adam'}. Best is trial 1 with value: 0.9508243207044917.


Best trial: 4. Best value: 0.955474:  25%|‚ñà‚ñà‚ñå       | 5/20 [09:38<26:15, 105.05s/it, 578.06/600 seconds]

[I 2025-12-21 20:09:19,282] Trial 4 finished with value: 0.9554739163918233 and parameters: {'encoding_dim': 24, 'n_layers': 2, 'hidden_dim_0': 48, 'hidden_dim_1': 112, 'dropout_rate': 0.2427013306774357, 'activation': 'tanh', 'batch_size': 256, 'learning_rate': 0.00024970737145052745, 'optimizer': 'adamw'}. Best is trial 4 with value: 0.9554739163918233.


Best trial: 4. Best value: 0.955474:  30%|‚ñà‚ñà‚ñà       | 6/20 [10:35<24:43, 105.99s/it, 635.96/600 seconds]

[I 2025-12-21 20:10:17,181] Trial 5 finished with value: 0.9515473539703003 and parameters: {'encoding_dim': 48, 'n_layers': 3, 'hidden_dim_0': 112, 'hidden_dim_1': 32, 'hidden_dim_2': 64, 'dropout_rate': 0.1463476238100519, 'activation': 'relu', 'batch_size': 512, 'learning_rate': 0.0018841476921545091, 'optimizer': 'adam'}. Best is trial 4 with value: 0.9554739163918233.

‚úÖ Best AUROC: 0.9555

üìä Best parameters:
   encoding_dim: 24
   n_layers: 2
   hidden_dim_0: 48
   hidden_dim_1: 112
   dropout_rate: 0.2427013306774357
   activation: tanh
   batch_size: 256
   learning_rate: 0.00024970737145052745
   optimizer: adamw





## 6. Optuna Optimization for VAE

In [9]:
def objective_vae(trial: Trial) -> float:
    """Objective function for optimizing VAE"""
    
    # Hyperparameters to optimize
    latent_dim = trial.suggest_int('latent_dim', 8, 32, step=4)
    n_layers = trial.suggest_int('n_layers', 1, 3)
    
    hidden_layers = []
    for i in range(n_layers):
        hidden_dim = trial.suggest_int(f'hidden_dim_{i}', 32, 128, step=16)
        hidden_layers.append(hidden_dim)
    
    dropout_rate = trial.suggest_float('dropout_rate', 0.1, 0.5)
    activation = trial.suggest_categorical('activation', ['relu', 'leaky_relu', 'elu'])
    beta = trial.suggest_float('beta', 0.1, 2.0)  # KL divergence weight
    
    batch_size = trial.suggest_categorical('batch_size', [128, 256, 512])
    learning_rate = trial.suggest_float('learning_rate', 1e-4, 5e-3, log=True)
    optimizer_name = trial.suggest_categorical('optimizer', ['adam', 'adamw'])
    
    # Create model
    model = OptimizableVAE(
        input_dim=n_features,
        latent_dim=latent_dim,
        hidden_layers=hidden_layers,
        dropout_rate=dropout_rate,
        activation=activation
    ).to(device)
    
    # Create data loaders
    train_loader = DataLoader(train_dataset_normal, batch_size=batch_size, shuffle=True, num_workers=0)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0)
    
    # Train model
    auroc, auprc, val_loss = train_model(
        model, train_loader, val_loader,
        epochs=30,
        learning_rate=learning_rate,
        optimizer_name=optimizer_name,
        model_type='vae',
        beta=beta
    )
    
    return auroc

# Create and run study
print("\nüöÄ Optimizing VAE...")
print("=" * 60)

study_vae = optuna.create_study(
    direction='maximize',
    sampler=TPESampler(seed=42),
    pruner=MedianPruner(n_startup_trials=5, n_warmup_steps=10)
)

study_vae.optimize(
    objective_vae,
    n_trials=20,
    timeout=600,
    show_progress_bar=True
)

print(f"\n‚úÖ Best AUROC: {study_vae.best_value:.4f}")
print("\nüìä Best parameters:")
for key, value in study_vae.best_params.items():
    print(f"   {key}: {value}")

[I 2025-12-21 20:22:17,281] A new study created in memory with name: no-name-107c5ba1-9acc-46f8-8444-a14e54352f5c



üöÄ Optimizing VAE...


Best trial: 0. Best value: 0.94744:   5%|‚ñå         | 1/20 [02:21<44:41, 141.12s/it, 141.12/600 seconds]

[I 2025-12-21 20:24:38,400] Trial 0 finished with value: 0.9474395623712979 and parameters: {'latent_dim': 16, 'n_layers': 3, 'hidden_dim_0': 112, 'hidden_dim_1': 96, 'hidden_dim_2': 48, 'dropout_rate': 0.16239780813448107, 'activation': 'leaky_relu', 'beta': 1.4453378978124864, 'batch_size': 256, 'learning_rate': 0.00022948683681130568, 'optimizer': 'adamw'}. Best is trial 0 with value: 0.9474395623712979.


Best trial: 1. Best value: 0.947975:  10%|‚ñà         | 2/20 [06:14<58:41, 195.61s/it, 374.88/600 seconds]

[I 2025-12-21 20:28:32,161] Trial 1 finished with value: 0.9479750883126015 and parameters: {'latent_dim': 16, 'n_layers': 2, 'hidden_dim_0': 80, 'hidden_dim_1': 64, 'dropout_rate': 0.3447411578889518, 'activation': 'elu', 'beta': 0.9665329700123683, 'batch_size': 128, 'learning_rate': 0.0010150667045928582, 'optimizer': 'adamw'}. Best is trial 1 with value: 0.9479750883126015.


Best trial: 1. Best value: 0.947975:  15%|‚ñà‚ñå        | 3/20 [07:12<37:36, 132.72s/it, 432.75/600 seconds]

[I 2025-12-21 20:29:30,028] Trial 2 finished with value: 0.9444553498183522 and parameters: {'latent_dim': 12, 'n_layers': 1, 'hidden_dim_0': 128, 'dropout_rate': 0.4862528132298238, 'activation': 'relu', 'beta': 1.400042750373098, 'batch_size': 512, 'learning_rate': 0.00011439974749291271, 'optimizer': 'adam'}. Best is trial 1 with value: 0.9479750883126015.


Best trial: 1. Best value: 0.947975:  20%|‚ñà‚ñà        | 4/20 [08:03<26:45, 100.35s/it, 483.48/600 seconds]

[I 2025-12-21 20:30:20,758] Trial 3 finished with value: 0.9435941973179756 and parameters: {'latent_dim': 24, 'n_layers': 1, 'hidden_dim_0': 80, 'dropout_rate': 0.31868411173731187, 'activation': 'leaky_relu', 'beta': 1.8850479889719591, 'batch_size': 512, 'learning_rate': 0.00014136637008121868, 'optimizer': 'adam'}. Best is trial 1 with value: 0.9479750883126015.


Best trial: 4. Best value: 0.948984:  25%|‚ñà‚ñà‚ñå       | 5/20 [09:55<26:06, 104.45s/it, 595.21/600 seconds]

[I 2025-12-21 20:32:12,488] Trial 4 finished with value: 0.9489835679965177 and parameters: {'latent_dim': 16, 'n_layers': 2, 'hidden_dim_0': 48, 'hidden_dim_1': 112, 'dropout_rate': 0.2427013306774357, 'activation': 'leaky_relu', 'beta': 1.6241742634326752, 'batch_size': 256, 'learning_rate': 0.00021757649801197563, 'optimizer': 'adamw'}. Best is trial 4 with value: 0.9489835679965177.


Best trial: 4. Best value: 0.948984:  30%|‚ñà‚ñà‚ñà       | 6/20 [10:57<25:33, 109.52s/it, 657.10/600 seconds]

[I 2025-12-21 20:33:14,385] Trial 5 finished with value: 0.9485204500175789 and parameters: {'latent_dim': 24, 'n_layers': 3, 'hidden_dim_0': 112, 'hidden_dim_1': 32, 'hidden_dim_2': 64, 'dropout_rate': 0.1463476238100519, 'activation': 'relu', 'beta': 0.2207608655434449, 'batch_size': 512, 'learning_rate': 0.001211130719559933, 'optimizer': 'adam'}. Best is trial 4 with value: 0.9489835679965177.

‚úÖ Best AUROC: 0.9490

üìä Best parameters:
   latent_dim: 16
   n_layers: 2
   hidden_dim_0: 48
   hidden_dim_1: 112
   dropout_rate: 0.2427013306774357
   activation: leaky_relu
   beta: 1.6241742634326752
   batch_size: 256
   learning_rate: 0.00021757649801197563
   optimizer: adamw





## 7. Optuna Optimization for Deep SVDD


In [10]:
def objective_svdd(trial: Trial) -> float:
    """Objective function for optimizing Deep SVDD"""

    rep_dim = trial.suggest_int('rep_dim', 8, 32, step=4)
    n_layers = trial.suggest_int('n_layers', 1, 3)

    hidden_layers = []
    for i in range(n_layers):
        hidden_dim = trial.suggest_int(f'hidden_dim_{i}', 32, 128, step=16)
        hidden_layers.append(hidden_dim)

    dropout_rate = trial.suggest_float('dropout_rate', 0.0, 0.3)
    activation = trial.suggest_categorical('activation', ['relu', 'leaky_relu', 'elu'])
    learning_rate = trial.suggest_float('learning_rate', 1e-4, 1e-2, log=True)
    batch_size = trial.suggest_categorical('batch_size', [128, 256, 512])

    model = OptimizableDeepSVDD(
        input_dim=n_features,
        rep_dim=rep_dim,
        hidden_layers=hidden_layers,
        dropout_rate=dropout_rate,
        activation=activation
    ).to(device)

    # loaders: train on NORMAL only
    train_loader = DataLoader(train_dataset_normal, batch_size=batch_size, shuffle=True)
    val_loader_local = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    # init center on normal train
    c = init_svdd_center(model, train_loader, device)

    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    # Train a few epochs (fast)
    model.train()
    for epoch in range(15):
        for x, _ in train_loader:
            x = x.to(device)
            z = model(x)
            loss = torch.mean(torch.sum((z - c) ** 2, dim=1))
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

    # Evaluate on val (score = dist to center)
    model.eval()
    scores = []
    labels = []
    with torch.no_grad():
        for x, y in val_loader_local:
            x = x.to(device)
            z = model(x)
            s = torch.sum((z - c) ** 2, dim=1)
            scores.append(s.detach().cpu().numpy())
            labels.append(y.numpy())

    scores = np.concatenate(scores)
    labels = np.concatenate(labels)

    auroc = roc_auc_score(labels, scores)
    return auroc

print("üöÄ Optimizing Deep SVDD...")
print("=" * 60)

study_svdd = optuna.create_study(
    direction='maximize',
    sampler=TPESampler(seed=42),
    pruner=MedianPruner(n_startup_trials=5, n_warmup_steps=10)
)

study_svdd.optimize(
    objective_svdd,
    n_trials=20,
    timeout=600,
    show_progress_bar=True
)

print(f"\n‚úÖ Best AUROC: {study_svdd.best_value:.4f}")
print("\nüìä Best parameters:")
for key, value in study_svdd.best_params.items():
    print(f"   {key}: {value}")


[I 2025-12-21 20:38:23,600] A new study created in memory with name: no-name-b65fcf95-870e-4506-a7a6-e8892e04b251


üöÄ Optimizing Deep SVDD...


Best trial: 0. Best value: 0.920607:   5%|‚ñå         | 1/20 [00:20<06:28, 20.45s/it, 20.45/600 seconds]

[I 2025-12-21 20:38:44,050] Trial 0 finished with value: 0.9206069712544575 and parameters: {'rep_dim': 16, 'n_layers': 3, 'hidden_dim_0': 112, 'hidden_dim_1': 96, 'hidden_dim_2': 48, 'dropout_rate': 0.04679835610086079, 'activation': 'leaky_relu', 'learning_rate': 0.0026070247583707684, 'batch_size': 256}. Best is trial 0 with value: 0.9206069712544575.


Best trial: 0. Best value: 0.920607:  10%|‚ñà         | 2/20 [00:31<04:33, 15.18s/it, 31.94/600 seconds]

[I 2025-12-21 20:38:55,543] Trial 1 finished with value: 0.4425622792184963 and parameters: {'rep_dim': 12, 'n_layers': 1, 'hidden_dim_0': 48, 'dropout_rate': 0.09127267288786131, 'activation': 'relu', 'learning_rate': 0.0016738085788752138, 'batch_size': 512}. Best is trial 0 with value: 0.9206069712544575.


Best trial: 0. Best value: 0.920607:  15%|‚ñà‚ñå        | 3/20 [01:12<07:32, 26.60s/it, 72.13/600 seconds]

[I 2025-12-21 20:39:35,727] Trial 2 finished with value: 0.5 and parameters: {'rep_dim': 20, 'n_layers': 3, 'hidden_dim_0': 48, 'hidden_dim_1': 80, 'hidden_dim_2': 96, 'dropout_rate': 0.013935123815999317, 'activation': 'relu', 'learning_rate': 0.007902619549708232, 'batch_size': 128}. Best is trial 0 with value: 0.9206069712544575.


Best trial: 0. Best value: 0.920607:  20%|‚ñà‚ñà        | 4/20 [01:35<06:47, 25.48s/it, 95.90/600 seconds]

[I 2025-12-21 20:39:59,505] Trial 3 finished with value: 0.5425310977549346 and parameters: {'rep_dim': 8, 'n_layers': 3, 'hidden_dim_0': 80, 'hidden_dim_1': 32, 'hidden_dim_2': 80, 'dropout_rate': 0.010316556334565519, 'activation': 'relu', 'learning_rate': 0.00042016720543725303, 'batch_size': 256}. Best is trial 0 with value: 0.9206069712544575.


Best trial: 4. Best value: 0.934813:  25%|‚ñà‚ñà‚ñå       | 5/20 [01:51<05:27, 21.84s/it, 111.29/600 seconds]

[I 2025-12-21 20:40:14,886] Trial 4 finished with value: 0.9348131623445113 and parameters: {'rep_dim': 32, 'n_layers': 3, 'hidden_dim_0': 128, 'hidden_dim_1': 128, 'hidden_dim_2': 96, 'dropout_rate': 0.27656227050693505, 'activation': 'leaky_relu', 'learning_rate': 0.0004473636174621265, 'batch_size': 512}. Best is trial 4 with value: 0.9348131623445113.


Best trial: 4. Best value: 0.934813:  30%|‚ñà‚ñà‚ñà       | 6/20 [02:02<04:16, 18.30s/it, 122.70/600 seconds]

[I 2025-12-21 20:40:26,298] Trial 5 finished with value: 0.9036193517603925 and parameters: {'rep_dim': 16, 'n_layers': 1, 'hidden_dim_0': 80, 'dropout_rate': 0.04227726749242879, 'activation': 'elu', 'learning_rate': 0.0035033984911586884, 'batch_size': 512}. Best is trial 4 with value: 0.9348131623445113.


Best trial: 4. Best value: 0.934813:  35%|‚ñà‚ñà‚ñà‚ñå      | 7/20 [02:16<03:38, 16.78s/it, 136.36/600 seconds]

[I 2025-12-21 20:40:39,961] Trial 6 finished with value: 0.5466014297433492 and parameters: {'rep_dim': 24, 'n_layers': 3, 'hidden_dim_0': 112, 'hidden_dim_1': 32, 'hidden_dim_2': 64, 'dropout_rate': 0.03476071785753891, 'activation': 'relu', 'learning_rate': 0.00013400367243354819, 'batch_size': 512}. Best is trial 4 with value: 0.9348131623445113.


Best trial: 4. Best value: 0.934813:  40%|‚ñà‚ñà‚ñà‚ñà      | 8/20 [02:55<04:46, 23.89s/it, 175.47/600 seconds]

[I 2025-12-21 20:41:19,072] Trial 7 finished with value: 0.9197543988883495 and parameters: {'rep_dim': 24, 'n_layers': 3, 'hidden_dim_0': 80, 'hidden_dim_1': 32, 'hidden_dim_2': 96, 'dropout_rate': 0.22823551458506922, 'activation': 'leaky_relu', 'learning_rate': 0.0011103647313054626, 'batch_size': 128}. Best is trial 4 with value: 0.9348131623445113.


Best trial: 8. Best value: 0.935126:  45%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 9/20 [03:15<04:10, 22.73s/it, 195.65/600 seconds]

[I 2025-12-21 20:41:39,251] Trial 8 finished with value: 0.9351262326095328 and parameters: {'rep_dim': 8, 'n_layers': 2, 'hidden_dim_0': 64, 'hidden_dim_1': 80, 'dropout_rate': 0.2722699421778279, 'activation': 'elu', 'learning_rate': 0.00028681134821030097, 'batch_size': 256}. Best is trial 8 with value: 0.9351262326095328.


Best trial: 8. Best value: 0.935126:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 10/20 [03:59<04:52, 29.21s/it, 239.36/600 seconds]

[I 2025-12-21 20:42:22,964] Trial 9 finished with value: 0.5 and parameters: {'rep_dim': 32, 'n_layers': 3, 'hidden_dim_0': 96, 'hidden_dim_1': 128, 'hidden_dim_2': 112, 'dropout_rate': 0.05597101766581075, 'activation': 'relu', 'learning_rate': 0.006197015748809144, 'batch_size': 128}. Best is trial 8 with value: 0.9351262326095328.


Best trial: 10. Best value: 0.945341:  55%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 11/20 [04:19<03:58, 26.52s/it, 259.79/600 seconds]

[I 2025-12-21 20:42:43,390] Trial 10 finished with value: 0.9453407778205623 and parameters: {'rep_dim': 8, 'n_layers': 2, 'hidden_dim_0': 32, 'hidden_dim_1': 80, 'dropout_rate': 0.16968392720442094, 'activation': 'elu', 'learning_rate': 0.00010744583847151158, 'batch_size': 256}. Best is trial 10 with value: 0.9453407778205623.


Best trial: 10. Best value: 0.945341:  60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 12/20 [04:40<03:17, 24.69s/it, 280.29/600 seconds]

[I 2025-12-21 20:43:03,889] Trial 11 finished with value: 0.9425587216018483 and parameters: {'rep_dim': 8, 'n_layers': 2, 'hidden_dim_0': 32, 'hidden_dim_1': 80, 'dropout_rate': 0.16362107695598174, 'activation': 'elu', 'learning_rate': 0.00010817040430024159, 'batch_size': 256}. Best is trial 10 with value: 0.9453407778205623.


Best trial: 12. Best value: 0.950978:  65%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 13/20 [05:00<02:44, 23.44s/it, 300.85/600 seconds]

[I 2025-12-21 20:43:24,445] Trial 12 finished with value: 0.9509781353066247 and parameters: {'rep_dim': 8, 'n_layers': 2, 'hidden_dim_0': 32, 'hidden_dim_1': 64, 'dropout_rate': 0.16094075105530872, 'activation': 'elu', 'learning_rate': 0.00011724148210767077, 'batch_size': 256}. Best is trial 12 with value: 0.9509781353066247.


Best trial: 12. Best value: 0.950978:  70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 14/20 [05:21<02:14, 22.50s/it, 321.18/600 seconds]

[I 2025-12-21 20:43:44,777] Trial 13 finished with value: 0.9330464917714419 and parameters: {'rep_dim': 12, 'n_layers': 2, 'hidden_dim_0': 32, 'hidden_dim_1': 64, 'dropout_rate': 0.15866206108670258, 'activation': 'elu', 'learning_rate': 0.00018591582929250583, 'batch_size': 256}. Best is trial 12 with value: 0.9509781353066247.


Best trial: 12. Best value: 0.950978:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 15/20 [05:41<01:49, 21.95s/it, 341.87/600 seconds]

[I 2025-12-21 20:44:05,470] Trial 14 finished with value: 0.9302058395138203 and parameters: {'rep_dim': 12, 'n_layers': 2, 'hidden_dim_0': 48, 'hidden_dim_1': 64, 'dropout_rate': 0.215182610900573, 'activation': 'elu', 'learning_rate': 0.00024335201193979744, 'batch_size': 256}. Best is trial 12 with value: 0.9509781353066247.


Best trial: 12. Best value: 0.950978:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 16/20 [05:58<01:21, 20.44s/it, 358.80/600 seconds]

[I 2025-12-21 20:44:22,402] Trial 15 finished with value: 0.9369301535216219 and parameters: {'rep_dim': 16, 'n_layers': 1, 'hidden_dim_0': 32, 'dropout_rate': 0.11549037093671591, 'activation': 'elu', 'learning_rate': 0.000588705649689643, 'batch_size': 256}. Best is trial 12 with value: 0.9509781353066247.


Best trial: 12. Best value: 0.950978:  85%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 17/20 [06:19<01:01, 20.49s/it, 379.41/600 seconds]

[I 2025-12-21 20:44:43,006] Trial 16 finished with value: 0.9371787681438449 and parameters: {'rep_dim': 8, 'n_layers': 2, 'hidden_dim_0': 64, 'hidden_dim_1': 64, 'dropout_rate': 0.20104179631290428, 'activation': 'elu', 'learning_rate': 0.00011148983987926683, 'batch_size': 256}. Best is trial 12 with value: 0.9509781353066247.


Best trial: 12. Best value: 0.950978:  90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 18/20 [06:41<00:42, 21.08s/it, 401.84/600 seconds]

[I 2025-12-21 20:45:05,443] Trial 17 finished with value: 0.9396527766151579 and parameters: {'rep_dim': 12, 'n_layers': 2, 'hidden_dim_0': 64, 'hidden_dim_1': 96, 'dropout_rate': 0.11832220835595152, 'activation': 'elu', 'learning_rate': 0.0008695124160413602, 'batch_size': 256}. Best is trial 12 with value: 0.9509781353066247.


Best trial: 12. Best value: 0.950978:  95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 19/20 [07:02<00:20, 20.81s/it, 422.03/600 seconds]

[I 2025-12-21 20:45:25,626] Trial 18 finished with value: 0.9267959685925232 and parameters: {'rep_dim': 20, 'n_layers': 2, 'hidden_dim_0': 32, 'hidden_dim_1': 48, 'dropout_rate': 0.18586079650625612, 'activation': 'elu', 'learning_rate': 0.0001817641488996693, 'batch_size': 256}. Best is trial 12 with value: 0.9509781353066247.


Best trial: 12. Best value: 0.950978: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 20/20 [07:30<00:00, 22.51s/it, 450.12/600 seconds]

[I 2025-12-21 20:45:53,719] Trial 19 finished with value: 0.9398160084378295 and parameters: {'rep_dim': 28, 'n_layers': 1, 'hidden_dim_0': 48, 'dropout_rate': 0.2338894031594811, 'activation': 'elu', 'learning_rate': 0.00030867472356706834, 'batch_size': 128}. Best is trial 12 with value: 0.9509781353066247.

‚úÖ Best AUROC: 0.9510

üìä Best parameters:
   rep_dim: 8
   n_layers: 2
   hidden_dim_0: 32
   hidden_dim_1: 64
   dropout_rate: 0.16094075105530872
   activation: elu
   learning_rate: 0.00011724148210767077
   batch_size: 256





## 8. Threshold Optimization for Business Metrics

In [11]:
def calculate_business_metric(threshold, scores, labels, avg_fraud_amount=122.21, investigation_cost=2):
    """Calculate business metrics for a given threshold"""
    
    predictions = (scores > threshold).astype(int)
    tn, fp, fn, tp = confusion_matrix(labels, predictions).ravel()
    
    # Financial calculations
    fraud_prevented = tp * avg_fraud_amount
    fraud_missed = fn * avg_fraud_amount
    investigation_costs = (tp + fp) * investigation_cost
    
    net_benefit = fraud_prevented - fraud_missed - investigation_costs
    
    return net_benefit

def objective_threshold(trial: Trial) -> float:
    """Optimize threshold for maximum business value"""
    
    # Suggest threshold percentile
    threshold_percentile = trial.suggest_float('threshold_percentile', 90, 99.5)
    
    # Note: In practice, you would use actual model scores
    # For demonstration, we'll use simulated scores
    np.random.seed(42)
    n_samples = 1000
    scores = np.random.exponential(1, n_samples)
    labels = np.random.binomial(1, 0.002, n_samples)  # 0.2% fraud rate
    
    threshold = np.percentile(scores[labels == 0], threshold_percentile)
    net_benefit = calculate_business_metric(threshold, scores, labels)
    
    return net_benefit

# Create and run study
print("\nüöÄ Optimizing Decision Threshold...")
print("=" * 60)

study_threshold = optuna.create_study(
    direction='maximize',
    sampler=TPESampler(seed=42)
)

study_threshold.optimize(
    objective_threshold,
    n_trials=50,
    show_progress_bar=True
)

print(f"\n‚úÖ Best net benefit: ${study_threshold.best_value:.2f}")
print(f"üìä Best threshold percentile: {study_threshold.best_params['threshold_percentile']:.2f}%")

[I 2025-12-21 20:46:01,862] A new study created in memory with name: no-name-20d65eb3-83b4-44f8-89e8-b75fd0fb4985



üöÄ Optimizing Decision Threshold...


Best trial: 34. Best value: -72: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 50/50 [00:00<00:00, 453.04it/s]

[I 2025-12-21 20:46:01,868] Trial 0 finished with value: -132.0 and parameters: {'threshold_percentile': 93.55813112904994}. Best is trial 0 with value: -132.0.
[I 2025-12-21 20:46:01,872] Trial 1 finished with value: -264.41999999999996 and parameters: {'threshold_percentile': 99.0317859108942}. Best is trial 0 with value: -132.0.
[I 2025-12-21 20:46:01,875] Trial 2 finished with value: -306.41999999999996 and parameters: {'threshold_percentile': 96.95394244720835}. Best is trial 0 with value: -132.0.
[I 2025-12-21 20:46:01,877] Trial 3 finished with value: -88.0 and parameters: {'threshold_percentile': 95.68725559987185}. Best is trial 3 with value: -88.0.
[I 2025-12-21 20:46:01,880] Trial 4 finished with value: -172.0 and parameters: {'threshold_percentile': 91.48217708420314}. Best is trial 3 with value: -88.0.
[I 2025-12-21 20:46:01,882] Trial 5 finished with value: -172.0 and parameters: {'threshold_percentile': 91.48194794319393}. Best is trial 3 with value: -88.0.
[I 2025-12-21




## 9. Multi-Objective Optimization

In [12]:
def objective_multi(trial: Trial) -> tuple:
    """Multi-objective optimization for precision and recall"""
    
    # Suggest threshold
    threshold_percentile = trial.suggest_float('threshold_percentile', 85, 99)
    
    # Simulated scores and labels (replace with actual model outputs)
    np.random.seed(trial.number)  # Different seed for each trial
    n_samples = 1000
    scores = np.random.exponential(1, n_samples)
    labels = np.random.binomial(1, 0.002, n_samples)
    
    threshold = np.percentile(scores[labels == 0], threshold_percentile)
    predictions = (scores > threshold).astype(int)
    
    # Calculate metrics
    precision = precision_score(labels, predictions, zero_division=0)
    recall = recall_score(labels, predictions, zero_division=0)
    
    return precision, recall  # Maximize both

# Create multi-objective study
print("\nüöÄ Multi-Objective Optimization (Precision vs Recall)...")
print("=" * 60)

study_multi = optuna.create_study(
    directions=['maximize', 'maximize'],
    sampler=TPESampler(seed=42)
)

study_multi.optimize(
    objective_multi,
    n_trials=100,
    show_progress_bar=True
)

print(f"\n‚úÖ Found {len(study_multi.best_trials)} Pareto optimal solutions")

# Display Pareto front
print("\nüìä Sample Pareto optimal solutions:")
for i, trial in enumerate(study_multi.best_trials[:5]):
    print(f"   Solution {i+1}: Precision={trial.values[0]:.3f}, Recall={trial.values[1]:.3f}, "
          f"Threshold={trial.params['threshold_percentile']:.1f}%")

[I 2025-12-21 20:46:06,436] A new study created in memory with name: no-name-81c2a98f-d8dc-498b-9de8-0e1c2b2e5dcf



üöÄ Multi-Objective Optimization (Precision vs Recall)...


 64%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç   | 64/100 [00:00<00:00, 319.15it/s]

[I 2025-12-21 20:46:06,442] Trial 0 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 90.24356166386308}.
[I 2025-12-21 20:46:06,447] Trial 1 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 98.31000028973882}.
[I 2025-12-21 20:46:06,450] Trial 2 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 95.24791518535967}.
[I 2025-12-21 20:46:06,454] Trial 3 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 93.38121877875851}.
[I 2025-12-21 20:46:06,458] Trial 4 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 87.18426096619412}.
[I 2025-12-21 20:46:06,461] Trial 5 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 87.18392328470684}.
[I 2025-12-21 20:46:06,464] Trial 6 finished with values: [0.013888888888888888, 0.6666666666666666] and parameters: {'threshold_percentile': 85.8131705703548}.
[I 2025-12-21 20:46:06,466] Trial 7 finished with values: [

 66%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 66/100 [00:00<00:00, 329.86it/s]

[I 2025-12-21 20:46:06,639] Trial 65 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 93.93734332358481}.
[I 2025-12-21 20:46:06,642] Trial 66 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 98.14062695225466}.


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100/100 [00:00<00:00, 317.70it/s]

[I 2025-12-21 20:46:06,644] Trial 67 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 93.026199561299}.
[I 2025-12-21 20:46:06,649] Trial 68 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 98.63786655161272}.
[I 2025-12-21 20:46:06,653] Trial 69 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 96.32190380186331}.
[I 2025-12-21 20:46:06,657] Trial 70 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 96.85712063739388}.
[I 2025-12-21 20:46:06,661] Trial 71 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 89.78170132909689}.
[I 2025-12-21 20:46:06,665] Trial 72 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 97.56942518074909}.
[I 2025-12-21 20:46:06,669] Trial 73 finished with values: [0.0, 0.0] and parameters: {'threshold_percentile': 91.77207898018483}.
[I 2025-12-21 20:46:06,673] Trial 74 finished with values: [0.0, 0.0] and parameters:




## 10. Visualization of Optimization Results

In [13]:
# Optimization history for Autoencoder
fig_ae = plot_optimization_history(study_ae)
fig_ae.update_layout(title="Autoencoder Optimization History")
fig_ae.show()

# Parameter importance for Autoencoder
try:
    fig_importance = plot_param_importances(study_ae)
    fig_importance.update_layout(title="Autoencoder Parameter Importance")
    fig_importance.show()
except:
    print("Could not plot parameter importance (may need more trials)")

In [15]:
# Optimization history for VAE
fig_vae = plot_optimization_history(study_vae)
fig_vae.update_layout(title="VAE Optimization History")
fig_vae.show()

# Parallel coordinate plot for VAE
try:
    fig_parallel = plot_parallel_coordinate(study_vae)
    fig_parallel.update_layout(title="VAE Hyperparameter Relationships")
    fig_parallel.show()
except:
    print("Could not create parallel coordinate plot")

In [14]:
# Pareto front for multi-objective optimization
try:
    fig_pareto = plot_pareto_front(study_multi, target_names=['Precision', 'Recall'])
    fig_pareto.update_layout(title="Pareto Front: Precision vs Recall Trade-off")
    fig_pareto.show()
except:
    # Manual Pareto front plot
    precisions = [t.values[0] for t in study_multi.trials]
    recalls = [t.values[1] for t in study_multi.trials]
    
    fig = go.Figure()
    fig.add_trace(go.Scatter(
        x=recalls, y=precisions,
        mode='markers',
        marker=dict(size=8, color=COLORS['primary'], opacity=0.6),
        name='All Trials'
    ))
    
    # Highlight Pareto optimal solutions
    pareto_precisions = [t.values[0] for t in study_multi.best_trials]
    pareto_recalls = [t.values[1] for t in study_multi.best_trials]
    
    fig.add_trace(go.Scatter(
        x=pareto_recalls, y=pareto_precisions,
        mode='markers',
        marker=dict(size=12, color=COLORS['fraud'], symbol='star'),
        name='Pareto Optimal'
    ))
    
    fig.update_layout(
        title="Pareto Front: Precision vs Recall Trade-off",
        xaxis_title="Recall",
        yaxis_title="Precision",
        template='plotly_white'
    )
    fig.show()

## 11. Save Optimization Results

In [19]:
# Create optimization results directory
optim_dir = Path('optimization_results')
optim_dir.mkdir(exist_ok=True)

# Save study results
optimization_results = {
    'autoencoder': {
        'best_params': study_ae.best_params,
        'best_value': study_ae.best_value,
        'n_trials': len(study_ae.trials)
    },
    'vae': {
        'best_params': study_vae.best_params,
        'best_value': study_vae.best_value,
        'n_trials': len(study_vae.trials)
    },
    'svdd': {
        'best_params': study_svdd.best_params,
        'best_value': study_svdd.best_value,
        'n_trials': len(study_svdd.trials)
    },

    'threshold': {
        'best_percentile': study_threshold.best_params['threshold_percentile'],
        'best_value': study_threshold.best_value,
        'n_trials': len(study_threshold.trials)
    },
    'multi_objective': {
        'n_pareto_solutions': len(study_multi.best_trials),
        'n_trials': len(study_multi.trials)
    }
}

# Save to JSON
with open(optim_dir / 'optimization_results.json', 'w') as f:
    json.dump(optimization_results, f, indent=2, default=str)

# Save Optuna studies
import joblib
joblib.dump(study_ae, optim_dir / 'study_autoencoder.pkl')
joblib.dump(study_vae, optim_dir / 'study_vae.pkl')
joblib.dump(study_svdd, optim_dir / 'study_svdd.pkl')
joblib.dump(study_threshold, optim_dir / 'study_threshold.pkl')
joblib.dump(study_multi, optim_dir / 'study_multi.pkl')

print("\nüíæ Optimization results saved to:", optim_dir.absolute())


üíæ Optimization results saved to: /Users/inigo_ocariz/src/ml-zoomcamp-2025/ml-zoomcamp-2025/fraud-detection/notebooks/optimization_results


## 12. Final Recommendations

## 13. Next Steps

### üöÄ Immediate Actions:

1. **Retrain Best Models**:
   - Use the optimized hyperparameters to train final models
   - Train for more epochs with the best configurations
   - Use full dataset if time permits

2. **Production Deployment**:
   - Implement the optimized ensemble with best weights
   - Set threshold based on business requirements
   - Create monitoring for model performance

3. **Further Optimization**:
   - Run more trials for better convergence
   - Try Bayesian optimization with Gaussian Processes
   - Implement automated retraining pipeline

4. **Business Integration**:
   - Present Pareto front to stakeholders
   - Let business choose precision/recall trade-off
   - Implement A/B testing framework

### üìà Advanced Techniques to Try:

- **Neural Architecture Search (NAS)**
- **Population-based Training**
- **Hyperband optimization**
- **Multi-fidelity optimization**

### ‚úÖ Optimization Complete!

You've successfully optimized:
- Model architectures
- Training hyperparameters
- Ensemble weights
- Decision thresholds
- Multi-objective trade-offs

Your fraud detection system is now optimized for maximum performance! üéâ