# Stacking Ensemble

## Setup and load data

### Imports + config

In [9]:

import pandas as pd
import numpy as np

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    average_precision_score,
    confusion_matrix
)
from sklearn.model_selection import StratifiedKFold

import matplotlib.pyplot as plt
import seaborn as sns

from pathlib import Path
import joblib
import warnings

warnings.filterwarnings("ignore")

# ------------------------------
# Global config
# ------------------------------
SEED = 42
np.random.seed(SEED)

ENSEMBLE_DIR = Path("../experiments/ensemble")
ENSEMBLE_DIR.mkdir(parents=True, exist_ok=True)

def savefig(name):
    plt.savefig(ENSEMBLE_DIR / name, dpi=300, bbox_inches="tight")
    plt.close()

print("Day 7 — Stacking Ensemble")
print("Artifacts will be saved to:", ENSEMBLE_DIR)

Day 7 — Stacking Ensemble
Artifacts will be saved to: ..\experiments\ensemble


### Load final processed data

In [10]:
train = pd.read_csv("../data/processed/train.csv")
test  = pd.read_csv("../data/processed/test.csv")

print("Train shape:", train.shape)
print("Test shape:", test.shape)




Train shape: (227845, 72)
Test shape: (56962, 72)


In [11]:
print(train.columns.tolist())

['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10', 'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20', 'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount', 'Class', 'timestamp', 'hour', 'dayofweek', 'amount_log', 'amount_scaled', 'merchant_id', 'device_type', 'geo_bucket', 'account_id', 'account_age_days', 'merchant_freq', 'account_txn_count', 'device_freq', 'last_5_mean_amount', 'last_5_count', 'merchant_id_fe', 'device_type_fe', 'geo_bucket_fe', 'account_id_fe', 'amount_times_age', 'is_new_merchant', 'merchant_id_missing', 'device_type_missing', 'geo_bucket_missing', 'account_age_days_missing', 'pca_x', 'pca_y', 'anomaly_score', 'is_anomaly', 'cluster_id', 'mlp_proba', 'ae_latent_1', 'ae_latent_2', 'ae_latent_3', 'ae_latent_4', 'ae_latent_5', 'ae_latent_6', 'ae_latent_7', 'ae_latent_8', 'ae_recon_error', 'xgb_proba']


everything looks good we have every colummn we fixed xgb_proba issue good to go now 

## Define Meta-Feature Set

### Explicitly define meta-features

In [12]:
META_FEATURES = [
    "xgb_proba",
    "anomaly_score",
    "ae_recon_error",
    "mlp_proba",
    "cluster_id",
    "amount_log",
    "merchant_freq",
    "account_txn_count",
    "last_5_mean_amount",
]

print("Meta-features used for stacking:")
for f in META_FEATURES:
    print(" -", f)

Meta-features used for stacking:
 - xgb_proba
 - anomaly_score
 - ae_recon_error
 - mlp_proba
 - cluster_id
 - amount_log
 - merchant_freq
 - account_txn_count
 - last_5_mean_amount


### Meta-Feature Selection for Stacking

The stacking model does not consume raw transaction features. Instead, it operates on a curated set of high-level signals that summarize different perspectives on fraud risk.

**Supervised probability signals**
- `xgb_proba`: Primary fraud probability from a strong tree-based model trained on full engineered features.
- `mlp_proba`: Secondary probability capturing nonlinear interactions learned by a neural network.

**Unsupervised anomaly signals**
- `anomaly_score`: IsolationForest score measuring deviation from population-level behavior.
- `ae_recon_error`: Autoencoder reconstruction error capturing deep nonlinear abnormality.

**Behavioral context features**
- `cluster_id`: Coarse behavioral grouping from unsupervised clustering.
- `amount_log`: Log-scaled transaction amount for stable magnitude comparison.
- `merchant_freq`: Merchant occurrence frequency providing rarity context.
- `account_txn_count`: Account activity level indicating behavioral maturity.
- `last_5_mean_amount`: Short-term spending behavior for drift detection.

Raw identifiers, visualization-only features, and thresholded anomaly flags are explicitly excluded to avoid leakage and instability. This feature set forms the final, frozen input contract for the ensemble model.


### Build meta feature matix

In [15]:

# Targets
y_meta_train = train["Class"].astype("int32")
y_meta_test  = test["Class"].astype("int32") if "Class" in test.columns else None

# Meta feature matrices
X_meta_train = train[META_FEATURES].copy()
X_meta_test  = test[META_FEATURES].copy()

# Sanity checks
assert list(X_meta_train.columns) == META_FEATURES, "Train meta-feature order mismatch"
assert list(X_meta_test.columns) == META_FEATURES, "Test meta-feature order mismatch"
assert X_meta_train.isnull().sum().sum() == 0, "NaNs in X_meta_train"
assert X_meta_test.isnull().sum().sum() == 0, "NaNs in X_meta_test"

print("Meta feature matrices constructed.\n")

print("X_meta_train shape:", X_meta_train.shape)
print("y_meta_train shape:", y_meta_train.shape)

print("X_meta_test shape:", X_meta_test.shape)
if y_meta_test is not None:
    print("y_meta_test shape:", y_meta_test.shape)

print("\nMeta feature columns:")
for col in X_meta_train.columns:
    print(" -", col)

Meta feature matrices constructed.

X_meta_train shape: (227845, 9)
y_meta_train shape: (227845,)
X_meta_test shape: (56962, 9)
y_meta_test shape: (56962,)

Meta feature columns:
 - xgb_proba
 - anomaly_score
 - ae_recon_error
 - mlp_proba
 - cluster_id
 - amount_log
 - merchant_freq
 - account_txn_count
 - last_5_mean_amount


## Cross-Validated Stacking

### CV setup

In [16]:

skf = StratifiedKFold(
    n_splits=5,
    shuffle=True,
    random_state=SEED
)

print("StratifiedKFold configured:")
print(" - n_splits:", skf.n_splits)
print(" - shuffle:", skf.shuffle)
print(" - random_state:", SEED)

StratifiedKFold configured:
 - n_splits: 5
 - shuffle: True
 - random_state: 42


“We evaluate the stacking model using stratified cross-validation before training on the full dataset to ensure that performance gains are real and not artifacts of overfitting.”

### Cross-validated evaluation

In [17]:

cv_results = []

for fold, (train_idx, val_idx) in enumerate(skf.split(X_meta_train, y_meta_train), 1):
    print(f"\nFold {fold}")

    X_tr = X_meta_train.iloc[train_idx]
    y_tr = y_meta_train.iloc[train_idx]

    X_val = X_meta_train.iloc[val_idx]
    y_val = y_meta_train.iloc[val_idx]

    stacker = LogisticRegression(
        class_weight="balanced",
        penalty="l2",
        solver="liblinear",
        random_state=SEED
    )

    # Train
    stacker.fit(X_tr, y_tr)

    # Predict probabilities
    y_val_proba = stacker.predict_proba(X_val)[:, 1]

    # Metrics
    pr_auc = average_precision_score(y_val, y_val_proba)
    roc_auc = roc_auc_score(y_val, y_val_proba)

    # Temporary threshold for inspection only
    y_val_pred = (y_val_proba >= 0.5).astype(int)

    recall = recall_score(y_val, y_val_pred)
    f1 = f1_score(y_val, y_val_pred)

    fold_metrics = {
        "fold": fold,
        "pr_auc": pr_auc,
        "roc_auc": roc_auc,
        "recall@0.5": recall,
        "f1@0.5": f1
    }

    cv_results.append(fold_metrics)

    print(fold_metrics)


Fold 1
{'fold': 1, 'pr_auc': 1.0, 'roc_auc': 1.0, 'recall@0.5': 1.0, 'f1@0.5': 1.0}

Fold 2
{'fold': 2, 'pr_auc': 1.0000000000000002, 'roc_auc': 1.0, 'recall@0.5': 1.0, 'f1@0.5': 1.0}

Fold 3
{'fold': 3, 'pr_auc': 1.0000000000000002, 'roc_auc': 1.0, 'recall@0.5': 1.0, 'f1@0.5': 1.0}

Fold 4
{'fold': 4, 'pr_auc': 1.0000000000000002, 'roc_auc': 1.0, 'recall@0.5': 1.0, 'f1@0.5': 1.0}

Fold 5
{'fold': 5, 'pr_auc': 0.9988484747438061, 'roc_auc': 0.9999980521522326, 'recall@0.5': 1.0, 'f1@0.5': 0.9937106918238994}


In [18]:
cv_results

[{'fold': 1, 'pr_auc': 1.0, 'roc_auc': 1.0, 'recall@0.5': 1.0, 'f1@0.5': 1.0},
 {'fold': 2,
  'pr_auc': 1.0000000000000002,
  'roc_auc': 1.0,
  'recall@0.5': 1.0,
  'f1@0.5': 1.0},
 {'fold': 3,
  'pr_auc': 1.0000000000000002,
  'roc_auc': 1.0,
  'recall@0.5': 1.0,
  'f1@0.5': 1.0},
 {'fold': 4,
  'pr_auc': 1.0000000000000002,
  'roc_auc': 1.0,
  'recall@0.5': 1.0,
  'f1@0.5': 1.0},
 {'fold': 5,
  'pr_auc': 0.9988484747438061,
  'roc_auc': 0.9999980521522326,
  'recall@0.5': 1.0,
  'f1@0.5': 0.9937106918238994}]