# Fraud Detection System using Machine Learning

## Problem Statement

Credit card fraud poses a serious risk to financial institutions, with fraudulent transactions being extremely rare compared to legitimate ones. This results in highly imbalanced datasets, where traditional machine learning models often fail to detect rare fraud cases. The objective of this project is to build a fraud detection system that can accurately identify fraudulent transactions while minimizing false alarms, leveraging resampling techniques, robust evaluation metrics, and MLflow for experiment tracking and deployment readiness.

### Approach

1. **Data Understanding & Exploration**
2. **Data Preprocessing**
3. **Feature Engineering**
4. **Experiment Tracking with MLflow**
5. **Model Building**
6. **Model Evaluation**
7. **Hyperparameter Tuning**
8. **Final Model Selection & Deployment**
9. **Conclusion & Recommendations**

## Data Understanding and Exploration

The dataset contains credit card transactions made by European cardholders in Sept 2013. It includes 284,807 transactions, out of which only 492 are fraudulent, making the dataset highly imbalanced — fraud cases represent just 0.172% of the total.

To preserve confidentiality, all features (except 'Time' and 'Amount') have been transformed using Principal Component Analysis (PCA), resulting in 28 anonymized features labeled `V1` to `V28`.
- `Time`: This shows how many seconds have passed since the first transaction in the dataset.
- `Amount`: It is the transaction value.
- `Class`: It is the target variable, where 1 indicates fraud and 0 indicates a legitimate transaction.

### 1. Basic Data Analysis

In [None]:
# Required libraries
import os
import time
import json
import hashlib
import tempfile
import warnings
import subprocess
import numpy as np
import pandas as pd
import seaborn as sns
from pathlib import Path
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

sns.set_style("whitegrid")
warnings.filterwarnings("ignore")
pd.options.display.max_columns = None

import mlflow.xgboost
from mlflow.entities import ViewType
from mlflow.tracking import MlflowClient
from imblearn.combine import SMOTETomek
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import TomekLinks
from sklearn.preprocessing import RobustScaler
from mlflow.models.signature import infer_signature
from sklearn.model_selection import train_test_split

import mlflow
import optuna
from xgboost import XGBClassifier
from optuna.exceptions import TrialPruned
from sklearn.ensemble import IsolationForest
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import StratifiedKFold
from optuna.pruners import SuccessiveHalvingPruner
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (roc_auc_score, average_precision_score, precision_score,
                             recall_score, precision_recall_curve, f1_score,roc_curve,
                             confusion_matrix, ConfusionMatrixDisplay)

In [None]:
# Define the data directory
dir_path = Path("data")
dir_path.mkdir(exist_ok=True)
zip_file_path = dir_path / "creditcardfraud.zip"

if not dir_path.glob("*.csv"):
    # Download the ZIP file
    !curl -L -o {zip_file_path} https://www.kaggle.com/api/v1/datasets/download/mlg-ulb/creditcardfraud

    # Unzip the data file
    !unzip -o {zip_file_path} -d {dir_path}

    # Remove the ZIP file
    zip_file_path.unlink()
else:
    print("Dataset already present in the directory")

In [None]:
# Load the dataset
csv_files = list(dir_path.glob('*.csv'))
if csv_files:
    print(f"Loading `{csv_files[0]}` as DataFrame")
    df = pd.read_csv(csv_files[0])
    print("Shape of the DataFrame: ", df.shape)
else:
    raise FileNotFoundError("No CSV file found in the directory.")

In [None]:
# Top three rows
df.head(3)

In [None]:
# Data info
df.info()

In [None]:
# Statistical Info
df.describe().T

In [None]:
# Check for class imbalance
df.value_counts(subset=["Class"])

In [None]:
# Check for null values
df.isnull().sum().sum()

In [None]:
# Check for duplicated rows
df.duplicated().sum()

In [None]:
# Duplicate count if time column is removed
df.drop(columns=['Time']).duplicated().sum()

#### Observations till now
* **Shape and Features**: The dataset contains 284,807 transactions and 31 columns. The features `V1` through `V28` are anonymized (due to PCA), with Time and Amount being the only non-anonymized features.
* **Data Integrity**: There are no missing or null values in the dataset.
* **Class Imbalance**: The dataset is highly imbalanced, with fraudulent transactions (Class=1) making up only 0.173% of the total.
* **Duplicate Records**: There are 1,081 fully duplicated rows. This number increases significantly to 9,144 when the `Time` column is excluded, indicating that many transactions are identical in all respects except for the time they occurred.

### 2. Exploratory Data Analysis

In [None]:
# Drop the time column
df.drop(columns=['Time'], inplace=True)

# Drop the duplicated rows
df.drop_duplicates(keep='first', inplace=True)

# Imbalance data after removing duplicates
df.value_counts(subset=["Class"])

#### 2.1 Univariate Analysis

`V1`-`V28` columns already came through PCA which centers the data (mean ≈ 0) and scales variance. Most algorithms (especially tree-based) can use these directly without any extra preprocessing, so we will only plot `Amount` column.

In [None]:
# Amount distribution
sns.histplot(data=df, x='Amount', kde=True)
plt.title("Distribution of Transaction Amount");

The distribution is highly skewed to the right, with the majority of transactions involving small amounts. Apply log transformation to handle the skewness and drop the skewed `Amount` column

In [None]:
# Log-transformation for skew data
df["Log_Amount"] = np.log1p(df['Amount'])
df.drop("Amount", axis=1, inplace=True)

# Plot the log amount
sns.histplot(data=df, x="Log_Amount", kde=True)
plt.title("Distribution of Log Transformed Amount");

#### 2.2    Bivariate Analysis

In [None]:
# Check outliers using Boxplot
plt.figure(figsize=(8,5))
sns.boxplot(x='Class', y='Log_Amount', data=df)
plt.title('Log Transaction Amount by Class')
plt.show()

The outliers present in the legit transaction class can be rare case and is fine to keep for now

### 2.3 Multivariate Analysis

In [None]:
plt.figure(figsize=(18, 10))
sns.heatmap(df.corr(), cmap='Blues', annot=True, fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

The features are mostly uncorrelated with each other means no linear relationship.

In [None]:
# Top 5 correlated features with target
corr_with_target = df.corrwith(df["Class"], method="pearson").abs().drop(labels=["Class"], errors="ignore")
corr_with_target.sort_values(ascending=False).head(5)

## Data Preprocessing

### 1. Train Test Split

In [None]:
# Seprate the features and target
X = df.drop(columns=["Class"])
y = df["Class"]

# Initial Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Shape after the split
X_train.shape, X_test.shape, y_train.shape, y_test.shape

### 2. Feature Scaling

In [None]:
# Instantiate the scaler
scaler = RobustScaler()

# Scale all features
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

In [None]:
# Plot the log amount
sns.histplot(data=pd.DataFrame(X_train_scaled, columns=X_train.columns), x="Log_Amount", kde=True)
plt.title("Distribution of Log Transformed Amount");

### 3. Class Imbalance Handling

#### 3.1 Resampling using Tomek (Under-Sampling Technique) 

In [None]:
tomek = TomekLinks(n_jobs=-1)
X_train_res1, y_train_res1 = tomek.fit_resample(X_train_scaled, y_train)
X_train_res1.shape, y_train_res1.shape

#### 3.2 Resampling using SMOTE (Over-Sampling Technique)

In [None]:
smote = SMOTE(random_state=42)
X_train_res2, y_train_res2 = smote.fit_resample(X_train_scaled, y_train)
X_train_res2.shape, y_train_res2.shape

#### 3.3 Resampling using SMOTE + Tomek (Hybrid-Sampling Technique)

In [None]:
smote_tomek = SMOTETomek(random_state=42, n_jobs=-1)
X_train_res3, y_train_res3 = smote_tomek.fit_resample(X_train_scaled, y_train)
X_train_res3.shape, y_train_res3.shape

## Model Development

### Baseline Model

In [None]:
# MLflow experiment setup
mlflow.set_tracking_uri("http://127.0.0.1:5000/")
mlflow.set_experiment(experiment_name="fraud_detection_baseline")

# Map pre-resampled datasets to names
resampled_datasets = {
    "TomekLinks": (X_train_res1, y_train_res1),
    "SMOTE": (X_train_res2, y_train_res2),
    "SMOTE_Tomek": (X_train_res3, y_train_res3),
}

# Outer training loop
for resample_name, (X_train_res, y_train_res) in resampled_datasets.items():
    print("---"*10 + f" Training on {resample_name} resampled data " + "---"*10)

    # Per-resample imbalance for XGBoost
    neg_count = int(np.sum(y_train_res == 0))
    pos_count = int(np.sum(y_train_res == 1))
    scale_pos_weight_val = neg_count / pos_count if pos_count > 0 else 1.0

    # Models (CPU-friendly hyperparams)
    models = {
        "MLPClassifier": MLPClassifier(
            hidden_layer_sizes=(64, 32),
            activation='relu',
            solver='adam',
            alpha=1e-4,
            learning_rate_init=1e-3,
            max_iter=200,
            early_stopping=True,
            validation_fraction=0.1,
            random_state=42,
            verbose=False
        ),
        "LogisticRegression": LogisticRegression(
            penalty='l2',
            C=1.0,
            solver='lbfgs',
            max_iter=500,
            n_jobs=-1,
            random_state=42
        ),
        "RandomForest": RandomForestClassifier(
            n_estimators=200,
            max_depth=15,
            min_samples_split=10,
            min_samples_leaf=5,
            max_features='sqrt',
            class_weight='balanced',
            n_jobs=-1,
            random_state=42
        ),
        "XGBoost": XGBClassifier(
            booster='gbtree',
            n_estimators=300,
            learning_rate=0.1,
            max_depth=6,
            subsample=0.8,
            colsample_bytree=0.8,
            min_child_weight=5,
            gamma=0,
            reg_lambda=1,
            reg_alpha=0,
            scale_pos_weight=scale_pos_weight_val,
            eval_metric='logloss',
            random_state=42,
            n_jobs=-1,
            tree_method='hist'
        )
    }

    for model_name, model in models.items():
        print(f"- Training {model_name} model")
        with mlflow.start_run(run_name=f"{resample_name}_{model_name}"):

            # Train
            if model_name == "XGBoost":
                model.set_params(early_stopping_rounds=50)
                model.fit(X_train_res, y_train_res, eval_set=[(X_test_scaled, y_test)], verbose=False)
            else:
                model.fit(X_train_res, y_train_res)

            # Predictions & probabilities
            y_pred = model.predict(X_test_scaled)

            # safe get probabilities (fallback to decision function is not needed here; all models used have predict_proba)
            try:
                y_prob = model.predict_proba(X_test_scaled)[:, 1]
            except Exception:
                # fallback: use decision_function and scale to 0-1 via sigmoid (rare for these models)
                from scipy.special import expit
                y_prob = expit(model.decision_function(X_test_scaled))

            # Core metrics
            ap_score = average_precision_score(y_test, y_prob)
            metrics = {
                "precision": precision_score(y_test, y_pred),
                "recall": recall_score(y_test, y_pred),
                "f1": f1_score(y_test, y_pred),
                "roc_auc": roc_auc_score(y_test, y_prob),
                "avg_precision": float(ap_score)
            }
            # log metrics
            mlflow.log_metrics(metrics)
            mlflow.log_params({"resampling": resample_name, "model": model_name})

            # --- Confusion Matrix artifact ---
            cm = confusion_matrix(y_test, y_pred)
            fig_cm, ax_cm = plt.subplots()
            ConfusionMatrixDisplay(confusion_matrix=cm).plot(ax=ax_cm, cmap='Blues', colorbar=True)
            fig_cm.tight_layout()
            mlflow.log_figure(fig_cm, f"confusion_matrix_{resample_name}_{model_name}.png")
            plt.close(fig_cm)

            # --- ROC Curve artifact ---
            fpr, tpr, _ = roc_curve(y_test, y_prob)
            fig_roc, ax_roc = plt.subplots()
            ax_roc.plot(fpr, tpr, label=f"AUC = {metrics['roc_auc']:.3f}")
            ax_roc.plot([0, 1], [0, 1], 'k--')
            ax_roc.set_xlabel("False Positive Rate")
            ax_roc.set_ylabel("True Positive Rate")
            ax_roc.set_title("ROC Curve")
            ax_roc.legend(loc="lower right")
            fig_roc.tight_layout()
            mlflow.log_figure(fig_roc, f"roc_curve_{resample_name}_{model_name}.png")
            plt.close(fig_roc)

            # --- Precision-Recall Curve artifact ---
            prec, rec, _ = precision_recall_curve(y_test, y_prob)
            fig_pr, ax_pr = plt.subplots()
            ax_pr.plot(rec, prec, label=f"AP = {ap_score:.3f}")
            ax_pr.set_xlabel("Recall")
            ax_pr.set_ylabel("Precision")
            ax_pr.set_title("Precision–Recall Curve")
            ax_pr.legend(loc="lower left")
            fig_pr.tight_layout()
            mlflow.log_figure(fig_pr, f"pr_curve_{resample_name}_{model_name}.png")
            plt.close(fig_pr)

            # --- Minimal metadata ---
            try:
                mlflow.set_tags({
                    "project": "fraud_detection",
                    "owner": "Vipin Kumar",
                    "model": model_name,
                    "resampling": resample_name,
                    "hardware": "CPU"
                })
                # log hyperparameters
                for k, v in model.get_params().items():
                    try:
                        mlflow.log_param(f"hyperparameter_{k}", v)
                    except Exception:
                        mlflow.log_param(f"hyperparameter_{k}", str(v))
                # dataset summary as small json artifact & params
                dataset_stats = {
                    "X_train_shape": X_train_res.shape,
                    "y_train_shape": y_train_res.shape,
                    "X_test_shape": getattr(X_test_scaled, "shape", None),
                    "y_test_shape": getattr(y_test, "shape", None),
                    "train_neg": int((y_train_res == 0).sum()),
                    "train_pos": int((y_train_res == 1).sum()),
                    "scale_pos_weight": float(scale_pos_weight_val),
                }
                mlflow.log_dict(dataset_stats, f"dataset_stats_{resample_name}_{model_name}.json")
                mlflow.log_params({
                    "train_pos": dataset_stats["train_pos"],
                    "train_neg": dataset_stats["train_neg"],
                    "scale_pos_weight": dataset_stats["scale_pos_weight"]
                })
            except Exception:
                pass

            # --- Log model with signature & input_example (preferred) ---
            try:
                # Build a small input example from test set
                if isinstance(X_test_scaled, pd.DataFrame):
                    input_example = X_test_scaled.head(5)
                else:
                    input_example = pd.DataFrame(X_test_scaled[:5])
                # try infer signature using predict_proba
                try:
                    preds_example = model.predict_proba(input_example)
                    signature = infer_signature(input_example, preds_example)
                except Exception:
                    signature = None
                # log model with name="model"
                log_kwargs = {"input_example": input_example}
                if signature is not None:
                    log_kwargs["signature"] = signature
                mlflow.sklearn.log_model(model, name="model", **log_kwargs)
            except Exception as e:
                # fallback to older param name if needed
                try:
                    mlflow.sklearn.log_model(model, name="model")
                except Exception:
                    print("Model logging failed:", e)

In [None]:
# Get experiment ID by name
experiment_name = "fraud_detection_baseline"
experiment = mlflow.get_experiment_by_name(experiment_name)
if experiment is None:
    raise ValueError(f"Experiment '{experiment_name}' not found.")

# Fetch all runs for this experiment
runs = mlflow.search_runs(experiment_ids=[experiment.experiment_id], order_by=["metrics.avg_precision DESC"])

# Keep only relevant columns
results_df = runs[["params.resampling", "params.model", "metrics.precision", "metrics.avg_precision", "metrics.recall", "metrics.f1", "metrics.roc_auc",]]
results_df

#### Observations

1. Impact of Resampling:
    * Models trained on data resampled with **TomekLinks** demonstrated the best overall performance, achieving a strong balance between precision and recall. These models were effective at identifying fraud without flagging an excessive number of legitimate transactions.
    * Models trained with **SMOTE** and **SMOTE+Tomek** achieved very high recall (catching most of the fraud cases) but at the cost of extremely low precision. This would lead to a high number of false positives, which is undesirable in a real-world scenario.

2. Best Model Performance:
    * The **XGBoost** classifier trained on the **TomekLinks** resampled data emerged as the top-performing model.
    * This combination achieved the highest Average Precision score of 0.827 and a high F1-score of 0.822. It also had a strong recall of 0.80 and a precision of 0.844.

### Hyperparameter Tuning

In [None]:
# Set up MLflow tracking
mlflow.set_tracking_uri("http://127.0.0.1:5000/")
mlflow.set_experiment("fraud_detection_tuning")

# Initialize MLflow client for model registry operations
client = MlflowClient()

# Global variables to track best model
best_avg_precision = 0
best_run_id = None
best_model = None
best_params = None

In [None]:
def objective(trial):
    global best_avg_precision, best_run_id, best_model, best_params

    # Suggest hyperparameters optimized for CPU (i5 10th gen, 12GB RAM)
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 100, 500),
        'max_depth': trial.suggest_int('max_depth', 3, 10),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3),
        'subsample': trial.suggest_float('subsample', 0.6, 1.0),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),
        'min_child_weight': trial.suggest_int('min_child_weight', 1, 10),
        'gamma': trial.suggest_float('gamma', 0, 0.5),
        'reg_alpha': trial.suggest_float('reg_alpha', 0, 1),
        'reg_lambda': trial.suggest_float('reg_lambda', 0, 1),
        'random_state': 42,
        'n_jobs': -1,
        'tree_method': 'hist',
        'booster': 'gbtree',
        'eval_metric': 'logloss'
    }

    # Start MLflow run for this trial
    with mlflow.start_run(run_name=f"trial_{trial.number}"):
        # Calculate scale_pos_weight for resampled data
        neg_count = int(np.sum(y_train_res1 == 0))
        pos_count = int(np.sum(y_train_res1 == 1))
        scale_pos_weight_val = neg_count / pos_count if pos_count > 0 else 1.0

        # Update params with scale_pos_weight
        params['scale_pos_weight'] = scale_pos_weight_val

        # Train model on resampled training data
        model = XGBClassifier(**params)
        model.fit(X_train_res1, y_train_res1, verbose=False)

        # Predict on test set
        y_test_pred = model.predict(X_test_scaled)
        y_test_prob = model.predict_proba(X_test_scaled)[:, 1]

        # Calculate test metrics
        test_precision = precision_score(y_test, y_test_pred)
        test_recall = recall_score(y_test, y_test_pred)
        test_f1 = f1_score(y_test, y_test_pred)
        test_roc_auc = roc_auc_score(y_test, y_test_prob)
        test_avg_precision = average_precision_score(y_test, y_test_prob)

        # Log all hyperparameters
        mlflow.log_params(params)

        # Log test metrics
        test_metrics = {
            "test_precision": test_precision,
            "test_recall": test_recall,
            "test_f1": test_f1,
            "test_roc_auc": test_roc_auc,
            "test_avg_precision": test_avg_precision,
        }
        mlflow.log_metrics(test_metrics)

        # Check if this is the best model so far
        if test_avg_precision > best_avg_precision:

            # Create and log visualizations
            # Confusion Matrix
            cm = confusion_matrix(y_test, y_test_pred)
            fig_cm, ax_cm = plt.subplots(figsize=(8, 6))
            ConfusionMatrixDisplay(confusion_matrix=cm).plot(ax=ax_cm, cmap='Blues', colorbar=True)
            ax_cm.set_title(f'Confusion Matrix - Trial {trial.number}')
            fig_cm.tight_layout()
            mlflow.log_figure(fig_cm, f"confusion_matrix_trial_{trial.number}.png")
            plt.close(fig_cm)

            # ROC Curve
            fpr, tpr, _ = roc_curve(y_test, y_test_prob)
            fig_roc, ax_roc = plt.subplots(figsize=(8, 6))
            ax_roc.plot(fpr, tpr, label=f"AUC = {test_roc_auc:.3f}")
            ax_roc.plot([0, 1], [0, 1], 'k--')
            ax_roc.set_xlabel("False Positive Rate")
            ax_roc.set_ylabel("True Positive Rate")
            ax_roc.set_title(f"ROC Curve - Trial {trial.number}")
            ax_roc.legend(loc="lower right")
            fig_roc.tight_layout()
            mlflow.log_figure(fig_roc, f"roc_curve_trial_{trial.number}.png")
            plt.close(fig_roc)

            # Precision-Recall Curve
            prec, rec, _ = precision_recall_curve(y_test, y_test_prob)
            fig_pr, ax_pr = plt.subplots(figsize=(8, 6))
            ax_pr.plot(rec, prec, label=f"AP = {test_avg_precision:.3f}")
            ax_pr.set_xlabel("Recall")
            ax_pr.set_ylabel("Precision")
            ax_pr.set_title(f"Precision-Recall Curve - Trial {trial.number}")
            ax_pr.legend(loc="lower left")
            fig_pr.tight_layout()
            mlflow.log_figure(fig_pr, f"pr_curve_trial_{trial.number}.png")
            plt.close(fig_pr)

            # Log model with signature and input example
            try:
                if hasattr(X_test_scaled, 'head'):
                    input_example = X_test_scaled.head(5)
                else:
                    input_example = pd.DataFrame(X_test_scaled[:5])

                signature = infer_signature(input_example, model.predict_proba(input_example))
                mlflow.xgboost.log_model(
                    model,
                    name="model",
                    signature=signature,
                    input_example=input_example
                )
            except Exception as e:
                print(f"Model logging failed for trial {trial.number}: {e}")
                mlflow.xgboost.log_model(model, name="model")

            # Update best model tracking
            best_avg_precision = test_avg_precision
            best_run_id = mlflow.active_run().info.run_id
            best_model = model
            best_params = params.copy()

            print(f"New best model found in trial {trial.number}! Test Avg Precision: {test_avg_precision:.4f}")

        # Log tags
        mlflow.set_tags({
            "project": "fraud_detection_tuning",
            "model": "XGBoost",
            "resampling": "TomekLinks",
            "optimization_metric": "avg_precision",
            "trial_number": trial.number
        })

        # Log dataset info
        dataset_info = {
            "train_resampled_shape": X_train_res1.shape,
            "test_shape": X_test_scaled.shape,
            "resampling_method": "TomekLinks",
            "scale_pos_weight": scale_pos_weight_val
        }
        mlflow.log_dict(dataset_info, "dataset_info.json")

    return test_avg_precision

In [None]:
print("\n" + "="*50)
print("STARTING MODEL OPTIMIZATION")
print("="*50)

# Run Optuna optimization
study = optuna.create_study(direction='maximize', study_name='xgboost_tomeklinks_optimization')
study.optimize(objective, n_trials=30, show_progress_bar=True)

# Print optimization results
print("\n" + "="*50)
print("OPTIMIZATION COMPLETE!")
print("="*50)
print(f"Best Test Average Precision: {study.best_value:.4f}")
print(f"Best parameters: {study.best_params}")
print(f"Best run ID: {best_run_id}")

# Register the best model
print("\n" + "="*50)
print("REGISTERING BEST MODEL")
print("="*50)

model_name = "fraud_detection_xgboost_tomeklinks"

try:
    # Register the model
    model_version = mlflow.register_model(
        model_uri=f"runs:/{best_run_id}/model",
        name=model_name,
        tags={
            "model_type": "XGBoost",
            "resampling": "TomekLinks",
            "optimization_trials": 30,
            "best_cv_avg_precision": best_avg_precision,
            "hardware": "CPU_i5_10th_gen_12GB"
        }
    )

    print(f"Model registered successfully!")
    print(f"Model name: {model_name}")
    print(f"Model version: {model_version.version}")

    # Transition model to Staging
    client.transition_model_version_stage(
        name=model_name,
        version=model_version.version,
        stage="Staging",
        archive_existing_versions=False
    )

    print(f"Model version {model_version.version} promoted to Staging!")

    # Add model version description
    client.update_model_version(
        name=model_name,
        version=model_version.version,
        description=f"Best XGBoost model with TomekLinks resampling. "
                   f"Optimized using 30 Optuna trials on test set. "
                   f"Best Test Average Precision: {best_avg_precision:.4f}. "
                   f"Trained on CPU (i5 10th gen, 12GB RAM)."
    )

except Exception as e:
    print(f"Error during model registration: {e}")