# NETWORK INTRUSION DETECTION SYSTEM (NIDS)
## Fase 3: Machine Learning Models - Training, Evaluation & Comparison


## Metodologia:

**1. Multi-Model Strategy**
- **Random Forest**: Ensemble bagging interpretabile, robustezza dimostrata su CIC-IDS-2017
- **LightGBM**: Gradient boosting SOTA, 5-10x pi√π veloce di XGBoost (Ke et al., 2017)
- **Neural Network (MLP)**: Deep learning per pattern complessi

**2. Imbalance Handling - Double Strategy** (SMOTE, 2002)
- **Baseline**: `class_weight='balanced'` nativo, zero overhead computazionale
- **Enhanced**: SMOTE con strategia **parziale** inside CV loop (no data leakage)
- **Strategia ottimizzata**: SMOTE porta minority a 70% della soglia successiva (non 100%)

**3. Cross-Validation Rigorosa**
- **StratifiedKFold n_splits=5**: Mantiene proporzioni classi in ogni fold
- **SMOTE inside loop**: Applicato SOLO a training fold, MAI a validation fold
- **Test set**: Truly held-out, no contamination, no SMOTE
- **Critico**: Evita data leakage tra train/validation

**4. Metriche Multi-Class**
- **Macro F1** (PRIMARY METRIC): Media non pesata, **tratta classi equamente**
  - **Perch√© primaria**: In cybersecurity, OGNI classe (anche rara) √® critica 
- **Weighted F1** (Secondary): Pesato per frequenza, riflette performance globale
- **Accuracy**: Informativa ma ingannevole con imbalance (Benign 80% ‚Üí accuracy inflata)



---

# SECTION 1: Setup Ambiente

## Obiettivo
Configurare l'ambiente Python con tutte le librerie necessarie per ML training.

## Librerie
- **scikit-learn 1.x**: Standard per ML
- **LightGBM**: Gradient boosting (Ke et al., 2017)
- **imbalanced-learn**: SMOTE implementation
- **Keras/TensorFlow**: Neural network backend

## Riproducibilit√†
- `RANDOM_SEED=42`: Garantisce risultati riproducibili
- Fissiamo tutti i random state (numpy, sklearn, lightgbm, keras)


In [None]:
# SECTION 1: SETUP AMBIENTE
import sys, pandas as pd, numpy as np
import matplotlib.pyplot as plt, seaborn as sns
import os, json, warnings, sklearn, time
warnings.filterwarnings('ignore')

from sklearn.model_selection import StratifiedKFold, cross_validate
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline
from sklearn.metrics import (
    classification_report, confusion_matrix,
    accuracy_score, f1_score, precision_score, recall_score
)
from imblearn.over_sampling import SMOTE
import lightgbm as lgb

# Plotting config
sns.set_style('whitegrid')
plt.rcParams['figure.dpi'] = 120
plt.rcParams['figure.figsize'] = (14, 6)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

# Riproducibilit√†
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

# Paths
DATA_DIR = '../output/processed_datasets'
OUTPUT_DIR = '../output/model_results'
IMG_DIR = '../output/images/model_evaluation'

for p in [OUTPUT_DIR, IMG_DIR]:
    os.makedirs(p, exist_ok=True)

print(" Setup completato.")
print(f"   Python: {sys.version.split()[0]}")
print(f"   scikit-learn: {sklearn.__version__}")
print(f"   LightGBM: {lgb.__version__}")


---

# SECTION 2: Data Loading CON LABEL ENCODING CONSISTENCY

# fondamentale: Preservare Label Encoding del Notebook 02

## Dataset Split Strategy
- **Train**: 70% (1,210,461 samples) - Addestramento modelli
- **Validation**: 15% (259,438 samples) - Model selection
- **Test**: 15% (259,395 samples) - Final held-out evaluation

## Safety Checks
1. `Label` NON presente in X (no data leakage)
2. Feature identiche in train/val/test
3. 20 feature esatte (post feature selection Notebook 02)


In [None]:
# SECTION 2: DATA LOADING - PRESERVA ENCODING NOTEBOOK 02

print("Caricamento Dataset Pre-Processati...")

# Load data
X_train = pd.read_parquet(os.path.join(DATA_DIR, 'X_train.parquet'))
y_train = pd.read_parquet(os.path.join(DATA_DIR, 'y_train.parquet'))['Label']

X_val = pd.read_parquet(os.path.join(DATA_DIR, 'X_val.parquet'))
y_val = pd.read_parquet(os.path.join(DATA_DIR, 'y_val.parquet'))['Label']

X_test = pd.read_parquet(os.path.join(DATA_DIR, 'X_test.parquet'))
y_test = pd.read_parquet(os.path.join(DATA_DIR, 'y_test.parquet'))['Label']

print(f"\nTrain: {X_train.shape}, Val: {X_val.shape}, Test: {X_test.shape}")

#  CRITICO: PRESERVA LABEL ENCODING DEL NOTEBOOK 02
le = LabelEncoder()
y_train_encoded = le.fit_transform(y_train)
y_val_encoded = le.transform(y_val)
y_test_encoded = le.transform(y_test)

n_classes = len(le.classes_)
class_names = le.classes_

print(f"{n_classes} classi: {list(class_names)}")

#  VERIFICA: Stampa mapping per trasparenza
print("\n" + "="*70)
print("LABEL ENCODING COERENTE CON NOTEBOOK 02")
print("="*70)
for idx, cls in enumerate(class_names):
    print(f"   {idx} ‚Üí {cls}")

# Safety Checks
assert 'Label' not in X_train.columns
assert list(X_train.columns) == list(X_val.columns) == list(X_test.columns)
assert X_train.shape[1] == 20
print("\n Safety Checks: No leakage, feature allineate.")

# Class Distribution
class_counts = pd.Series(y_train_encoded).value_counts().sort_index()
class_dist = pd.DataFrame({
    'Class': class_names,
    'Count': class_counts.values,
    'Percentage': (100 * class_counts.values / len(y_train_encoded))
})

print("\n" + "="*70)
print("Distribuzione Classi (Training Set)")
print("="*70)
print(class_dist.to_string(index=False))
print(f"\nImbalance Ratio: {class_counts.max() / class_counts.min():.0f}:1")


---

# SECTION 3: Visualizzazione Distribuzione + Strategia SMOTE

## Rationale: Class Imbalance in Intrusion Detection


## SMOTE Partial Strategy

- **Full balancing** (100%): Causa overfitting su synthetic data
-  **Partial balancing**: Migliora minority recall senza overfitting

## Visualizzazioni
- Distribuzione PRE-SMOTE (scala lineare + logaritmica)
- Confronto PRE vs POST-SMOTE
- Tabella strategia dettagliata


### SMOTE Strategy: Gap-Filling 70% - Giustificazione

#### Design Principles

**Obiettivo**: Bilanciamento progressivo senza oversynthesis

#### Safety Constraints

1. **Max 20x increment**: Prevent extreme synthetic dominance
2. **<95% next class**: Preserve order
3. **>5% majority ‚Üí no SMOTE**: Avoid unnecessary synthesis



In [None]:
# SECTION 3: PROGRESSIVE GAP-FILLING SMOTE STRATEGY

print("\n" + "="*70)
print("STRATEGIA SMOTE (Progressive Gap-Filling + Order-Preserving)")
print("="*70 + "\n")

majority_count = class_counts.max()
smote_targets = {}

print(" Strategy Principle:")
print("   - Fill 70% of the gap between each class and the next larger one")
print("   - Guarantees order preservation by design")
print("   - Max 20x increment per class (overfitting protection)")
print("   - Classes >5% of majority: no change\n")

# Step 1: Sort classes by original count
class_order = sorted([(class_counts[idx], cls, idx) 
                      for idx, cls in enumerate(class_names)])

print("Original order (ascending):")
for count, cls, idx in class_order:
    print(f"  {cls:12}: {count:7,} ({count/majority_count*100:6.2f}%)")
print()

# Step 2: Progressive gap-filling dal basso verso l'alto
for i, (current_count, current_cls, current_idx) in enumerate(class_order):
    percent_of_majority = (current_count / majority_count) * 100
    
    # Classi >5% majority: no change
    if percent_of_majority > 5.0:
        target_count = current_count
        gap_filled = 0
        print(f"  {current_cls:12}: {current_count:7,} ‚Üí {target_count:7,} "
              f"(1.0x) [>5% majority - no change]")
    else:
        # Trova la classe immediatamente successiva
        if i < len(class_order) - 1:
            next_count = class_order[i + 1][0]
            next_cls = class_order[i + 1][1]
            
            # Calcola gap
            gap = next_count - current_count
            
            # Target: current + 60% del gap
            target_count = current_count + int(gap * 0.70)
            
            # Safety caps:
            # 1. Max 20x increment (overfitting protection)
            max_by_increment = current_count * 20
            target_count = min(target_count, max_by_increment)
            
            # 2. Deve rimanere <95% della classe successiva
            max_by_order = int(next_count * 0.95)
            target_count = min(target_count, max_by_order)
            
            gap_filled = ((target_count - current_count) / gap * 100) if gap > 0 else 0
            
            print(f"  {current_cls:12}: {current_count:7,} ‚Üí {target_count:7,} "
                  f"({target_count/current_count:5.1f}x) [gap to {next_cls}: {gap_filled:4.0f}%]")
        else:
            # √à la classe pi√π grande (Benign)
            target_count = current_count
            print(f"  {current_cls:12}: {current_count:7,} ‚Üí {target_count:7,} "
                  f"(1.0x) [majority class - no change]")
    
    smote_targets[current_cls] = target_count

print()

# Step 3: Verifica finale ordine
print("="*70)
print("ORDER VERIFICATION")
print("="*70)

original_sorted = sorted([(class_counts[idx], cls) for idx, cls in enumerate(class_names)])
smote_sorted = sorted([(smote_targets[cls], cls) for cls in class_names])

order_preserved = True
for i in range(len(class_names)):
    orig_count, orig_cls = original_sorted[i]
    smote_count, smote_cls = smote_sorted[i]
    status = "" if orig_cls == smote_cls else ""
    print(f"  Position {i+1}: {orig_cls:12} ({orig_count:7,}) ‚Üí "
          f"{smote_cls:12} ({smote_count:7,}) {status}")
    if orig_cls != smote_cls:
        order_preserved = False

if order_preserved:
    print("\n PERFECT ORDER PRESERVATION!\n")
else:
    print("\n ORDER VIOLATION DETECTED!\n")

# Tabella finale
strategy_df = pd.DataFrame({
    'Class': class_names,
    'Original': class_counts.values,
    'After SMOTE': [smote_targets[cls] for cls in class_names],
    'Increment (x)': [smote_targets[cls] / class_counts[idx]
                      for idx, cls in enumerate(class_names)],
    '% of Majority (Before)': [(class_counts[idx] / majority_count) * 100
                               for idx in range(len(class_names))],
    '% of Majority (After)': [(smote_targets[cls] / majority_count) * 100
                              for cls in class_names],
    'Gap Filled': [((smote_targets[cls] - class_counts[idx]) / 
                    max(1, class_counts[idx])) * 100 
                   for idx, cls in enumerate(class_names)]
})

strategy_df_sorted = strategy_df.sort_values('Original')
print("="*70)
print("GAP-FILLING SMOTE STRATEGY (sorted by original count)")
print("="*70)
print(strategy_df_sorted[['Class', 'Original', 'After SMOTE', 'Increment (x)', 
                          '% of Majority (After)']].to_string(index=False))
print(f"\nTotal samples: {class_counts.sum():,} ‚Üí {sum(smote_targets.values()):,}")
print(f"Increment ratio: {sum(smote_targets.values()) / class_counts.sum():.2f}x\n")

print("="*70)
print("STRATEGY JUSTIFICATION")
print("="*70)
print(" Gap-based approach guarantees order preservation by construction")
print(" 70% fill ratio balances improvement vs overfitting risk")
print(" 20x cap prevents synthetic noise on extreme minorities")
print(" All relative positions maintained exactly as original dataset")
print(" Computational cost: minimal increase vs aggressive strategies\n")

# Visualizzazione
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
post_counts = [smote_targets[cls] for cls in class_names]

# PRE-SMOTE
axes[0].bar(class_names, class_counts.values, color='#5A9FD4', alpha=0.85, 
            edgecolor='black', linewidth=1.2)
axes[0].set_yscale('log')
axes[0].set_title('PRE-SMOTE Distribution', fontweight='bold', fontsize=13)
axes[0].set_xlabel('Class', fontweight='bold', fontsize=11)
axes[0].set_ylabel('Count (log scale)', fontweight='bold', fontsize=11)
#  Fix: usa plt.setp invece di tick_params
plt.setp(axes[0].get_xticklabels(), rotation=45, ha='right')
axes[0].grid(axis='y', alpha=0.3, linestyle='--')

# POST-SMOTE
axes[1].bar(class_names, post_counts, color='#5CB85C', alpha=0.85, 
            edgecolor='black', linewidth=1.2)
axes[1].set_yscale('log')
axes[1].set_title('POST-SMOTE (Gap-Filling Strategy)', fontweight='bold', fontsize=13)
axes[1].set_xlabel('Class', fontweight='bold', fontsize=11)
axes[1].set_ylabel('Count (log scale)', fontweight='bold', fontsize=11)
#  Fix: stessa cosa
plt.setp(axes[1].get_xticklabels(), rotation=45, ha='right')
axes[1].grid(axis='y', alpha=0.3, linestyle='--')

plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '01_smote_strategy.png'), dpi=150, bbox_inches='tight')
plt.show()

print("\n Gap-filling SMOTE strategy visualization saved\n")

smote_strategy_config = {
    idx: smote_targets[cls]
    for idx, cls in enumerate(class_names)
}

print(f" SMOTE strategy configurata: {sum(smote_targets.values()):,} samples totali\n")

---

# SECTION 4: Model Configuration (Memory-Optimized)

#### nota: In cross validation ho usato configurazioni pi√π leggere perch√© andavo incontro a crash

## Hardware Constraints: Ryzen 5 3600X
- **CPU**: 6 core / 12 thread @ 3.8GHz
- **RAM**: 16GB
- **Strategia**: Bilanciare performance vs memory footprint


**Rationale 3 modelli**:
- RF: Interpretable, robusto, baseline ensemble
- LightGBM: SOTA boosting, velocit√†, performance
- MLP: Deep learning, pattern complessi, non-linearit√†

## Pipeline Structure
- Nessuno scaling necessario (RF/LightGBM invariant, MLP far√† interno)
- Pipeline semplice: `classifier` solo

In [None]:
# SECTION 4: MODEL CONFIGURATION

print("\n" + "="*70)
print("CONFIGURAZIONE MODELLI (Memory-Optimized)")
print("="*70 + "\n")

# Random Forest config
rf_config = {
    'n_estimators': 50,
    'max_depth': 20,
    'min_samples_split': 10,
    'min_samples_leaf': 5,
    'class_weight': 'balanced',
    'random_state': RANDOM_SEED,
    'n_jobs': 4,
    'verbose': 0,
    'max_samples': 0.8
}

# LightGBM config
lgb_config = {
    'n_estimators': 100,
    'max_depth': 7,
    'learning_rate': 0.05,
    'min_child_samples': 50,
    'num_leaves': 31,
    'is_unbalance': True,
    'random_state': RANDOM_SEED,
    'n_jobs': 4,
    'verbose': -1
}

# MLP config
mlp_config = {
    'hidden_layer_sizes': (256, 128, 64),
    'activation': 'relu',
    'solver': 'adam',
    'alpha': 0.0001,
    'batch_size': 512,
    'max_iter': 50,
    'early_stopping': True,
    'validation_fraction': 0.1,
    'random_state': RANDOM_SEED,
    'verbose': False
}

print("Random Forest:")
for k, v in rf_config.items():
    print(f"   {k}: {v}")

print("\nLightGBM:")
for k, v in lgb_config.items():
    print(f"   {k}: {v}")

print("\nMLP:")
for k, v in mlp_config.items():
    print(f"   {k}: {v}")


---

# SECTION 4.5: Baseline Cross-Validation (5-Fold)

## Metodologia

**StratifiedKFold (n_splits=5)**:
- Mantiene proporzioni classi in ogni fold
- Riduce varianza stime rispetto a single train/val split


**Metriche monitorate**:
- **Macro F1** (primary): Media non pesata F1 per classe
- **Weighted F1**: Pesato per support
- **Accuracy**: Overall correct rate

**Models trained**:
1. Random Forest Baseline (`class_weight='balanced'`)
2. LightGBM Baseline (`is_unbalance=True`)
3. Neural Network Baseline (no resampling)

## Output
- Tabella: Model | Macro F1 (mean ¬± std) | Weighted F1 | Accuracy
- Console: Progress log per ogni fold
- Tempo training per modello


In [None]:
# SECTION 4.5: BASELINE CROSS-VALIDATION

print("\n" + "="*70)
print("BASELINE MODELS - 5-FOLD CROSS-VALIDATION")
print("="*70 + "\n")

# StratifiedKFold
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=RANDOM_SEED)

# Metriche da calcolare
scoring = {
    'macro_f1': 'f1_macro',
    'weighted_f1': 'f1_weighted',
    'accuracy': 'accuracy'
}

# Storage results
baseline_results = {}

# 1. Random Forest Baseline
print(" Random Forest Baseline...")
rf_baseline = RandomForestClassifier(**rf_config)

start_time = time.time()
cv_results_rf = cross_validate(
    rf_baseline, X_train, y_train_encoded,
    cv=skf, scoring=scoring, n_jobs=1, verbose=0
)
train_time_rf = time.time() - start_time

baseline_rf_macro = cv_results_rf['test_macro_f1'].mean()
baseline_rf_std = cv_results_rf['test_macro_f1'].std()
baseline_rf_weighted = cv_results_rf['test_weighted_f1'].mean()
baseline_rf_acc = cv_results_rf['test_accuracy'].mean()

baseline_results['RF Baseline'] = {
    'Macro F1': baseline_rf_macro,
    'Std': baseline_rf_std,
    'Weighted F1': baseline_rf_weighted,
    'Accuracy': baseline_rf_acc,
    'Train Time (s)': train_time_rf
}

print(f"    Macro F1: {baseline_rf_macro:.4f} ¬± {baseline_rf_std:.4f}")
print(f"    Train time: {train_time_rf:.1f}s\n")

# 2. LightGBM Baseline
print(" LightGBM Baseline...")
lgb_baseline = lgb.LGBMClassifier(**lgb_config)

start_time = time.time()
cv_results_lgb = cross_validate(
    lgb_baseline, X_train, y_train_encoded,
    cv=skf, scoring=scoring, n_jobs=1, verbose=0
)
train_time_lgb = time.time() - start_time

baseline_lgb_macro = cv_results_lgb['test_macro_f1'].mean()
baseline_lgb_std = cv_results_lgb['test_macro_f1'].std()
baseline_lgb_weighted = cv_results_lgb['test_weighted_f1'].mean()
baseline_lgb_acc = cv_results_lgb['test_accuracy'].mean()

baseline_results['LGB Baseline'] = {
    'Macro F1': baseline_lgb_macro,
    'Std': baseline_lgb_std,
    'Weighted F1': baseline_lgb_weighted,
    'Accuracy': baseline_lgb_acc,
    'Train Time (s)': train_time_lgb
}

print(f"    Macro F1: {baseline_lgb_macro:.4f} ¬± {baseline_lgb_std:.4f}")
print(f"    Train time: {train_time_lgb:.1f}s\n")

# 3. MLP Baseline (con StandardScaler)
print(" Neural Network Baseline...")
mlp_baseline = Pipeline([
    ('scaler', StandardScaler()),
    ('mlp', MLPClassifier(**mlp_config))
])

start_time = time.time()
cv_results_mlp = cross_validate(
    mlp_baseline, X_train, y_train_encoded,
    cv=skf, scoring=scoring, n_jobs=1, verbose=0
)
train_time_mlp = time.time() - start_time

baseline_mlp_macro = cv_results_mlp['test_macro_f1'].mean()
baseline_mlp_std = cv_results_mlp['test_macro_f1'].std()
baseline_mlp_weighted = cv_results_mlp['test_weighted_f1'].mean()
baseline_mlp_acc = cv_results_mlp['test_accuracy'].mean()

baseline_results['MLP Baseline'] = {
    'Macro F1': baseline_mlp_macro,
    'Std': baseline_mlp_std,
    'Weighted F1': baseline_mlp_weighted,
    'Accuracy': baseline_mlp_acc,
    'Train Time (s)': train_time_mlp
}

print(f"    Macro F1: {baseline_mlp_macro:.4f} ¬± {baseline_mlp_std:.4f}")
print(f"    Train time: {train_time_mlp:.1f}s\n")

# Tabella risultati
baseline_df = pd.DataFrame(baseline_results).T
baseline_df['Macro F1 (¬±std)'] = baseline_df.apply(
    lambda row: f"{row['Macro F1']:.4f} ¬± {row['Std']:.4f}", axis=1
)

print("="*70)
print("BASELINE CV RESULTS (5-Fold)")
print("="*70)
print(baseline_df[['Macro F1 (¬±std)', 'Weighted F1', 'Accuracy', 'Train Time (s)']].to_string())
print("\n Baseline CV completata")


---

# SECTION 4.6: SMOTE Cross-Validation (5-Fold)

## CRITICAL: SMOTE Inside CV Loop

**Metodologia rigorosa** (no data leakage):
```python
for train_idx, val_idx in CV_folds:
    X_train_fold = X_train[train_idx]
    X_val_fold = X_train[val_idx]

    #  SMOTE solo su train fold
    X_train_smote, y_train_smote = SMOTE().fit_resample(X_train_fold, y_train_fold)

    #  Validation fold: NO SMOTE
    model.fit(X_train_smote, y_train_smote)
    score = model.score(X_val_fold, y_val_fold)  # Originale, no synthetic
```

Perch√© CRITICO:

Applicare SMOTE a intero train+val ‚Üí data leakage (synthetic contamina validation)

Validation deve simulare test set (real data only)


In [None]:

# SECTION 4.6: SMOTE CROSS-VALIDATION (INSIDE LOOP)

print("\n" + "="*70)
print("SMOTE MODELS - 5-FOLD CROSS-VALIDATION (Inside Loop)")
print("="*70 + "\n")

# Storage results
smote_results = {}

# Helper function: Manual CV con SMOTE inside loop
def smote_cv(model, X, y, cv, smote_strategy_config):
    """Cross-validation con SMOTE applicato solo a train fold"""
    macro_f1_scores = []
    weighted_f1_scores = []
    accuracy_scores = []

    for fold_idx, (train_idx, val_idx) in enumerate(cv.split(X, y)):
        # Split fold
        X_train_fold = X.iloc[train_idx]
        y_train_fold = y[train_idx]
        X_val_fold = X.iloc[val_idx]
        y_val_fold = y[val_idx]

        #  Apply SMOTE SOLO a train fold
        smote = SMOTE(sampling_strategy=smote_strategy_config, random_state=RANDOM_SEED)
        X_train_smote, y_train_smote = smote.fit_resample(X_train_fold, y_train_fold)

        # Train model
        model.fit(X_train_smote, y_train_smote)

        #  Evaluate on ORIGINAL validation fold (no SMOTE)
        y_val_pred = model.predict(X_val_fold)

        macro_f1_scores.append(f1_score(y_val_fold, y_val_pred, average='macro'))
        weighted_f1_scores.append(f1_score(y_val_fold, y_val_pred, average='weighted'))
        accuracy_scores.append(accuracy_score(y_val_fold, y_val_pred))

    return {
        'macro_f1': np.array(macro_f1_scores),
        'weighted_f1': np.array(weighted_f1_scores),
        'accuracy': np.array(accuracy_scores)
    }

# 1. Random Forest + SMOTE
print(" Random Forest + SMOTE...")
rf_smote = RandomForestClassifier(**rf_config)

start_time = time.time()
cv_results_rf_smote = smote_cv(rf_smote, X_train, y_train_encoded, skf, smote_strategy_config)
train_time_rf_smote = time.time() - start_time

smote_rf_macro = cv_results_rf_smote['macro_f1'].mean()
smote_rf_std = cv_results_rf_smote['macro_f1'].std()
smote_rf_weighted = cv_results_rf_smote['weighted_f1'].mean()
smote_rf_acc = cv_results_rf_smote['accuracy'].mean()

smote_results['RF SMOTE'] = {
    'Macro F1': smote_rf_macro,
    'Std': smote_rf_std,
    'Weighted F1': smote_rf_weighted,
    'Accuracy': smote_rf_acc,
    'Train Time (s)': train_time_rf_smote
}

print(f"    Macro F1: {smote_rf_macro:.4f} ¬± {smote_rf_std:.4f}")
print(f"    Train time: {train_time_rf_smote:.1f}s\n")

# 2. LightGBM + SMOTE
print(" LightGBM + SMOTE...")
lgb_smote = lgb.LGBMClassifier(**lgb_config)

start_time = time.time()
cv_results_lgb_smote = smote_cv(lgb_smote, X_train, y_train_encoded, skf, smote_strategy_config)
train_time_lgb_smote = time.time() - start_time

smote_lgb_macro = cv_results_lgb_smote['macro_f1'].mean()
smote_lgb_std = cv_results_lgb_smote['macro_f1'].std()
smote_lgb_weighted = cv_results_lgb_smote['weighted_f1'].mean()
smote_lgb_acc = cv_results_lgb_smote['accuracy'].mean()

smote_results['LGB SMOTE'] = {
    'Macro F1': smote_lgb_macro,
    'Std': smote_lgb_std,
    'Weighted F1': smote_lgb_weighted,
    'Accuracy': smote_lgb_acc,
    'Train Time (s)': train_time_lgb_smote
}

print(f"    Macro F1: {smote_lgb_macro:.4f} ¬± {smote_lgb_std:.4f}")
print(f"    Train time: {train_time_lgb_smote:.1f}s\n")

# 3. MLP + SMOTE
print(" Neural Network + SMOTE...")
# Per MLP, dobbiamo applicare StandardScaler DOPO SMOTE
def smote_cv_mlp(mlp_config, X, y, cv, smote_strategy_config):
    """CV MLP con SMOTE + StandardScaler"""
    macro_f1_scores = []
    weighted_f1_scores = []
    accuracy_scores = []

    for train_idx, val_idx in cv.split(X, y):
        X_train_fold = X.iloc[train_idx]
        y_train_fold = y[train_idx]
        X_val_fold = X.iloc[val_idx]
        y_val_fold = y[val_idx]

        # SMOTE
        smote = SMOTE(sampling_strategy=smote_strategy_config, random_state=RANDOM_SEED)
        X_train_smote, y_train_smote = smote.fit_resample(X_train_fold, y_train_fold)

        # StandardScaler DOPO SMOTE
        scaler = StandardScaler()
        X_train_scaled = scaler.fit_transform(X_train_smote)
        X_val_scaled = scaler.transform(X_val_fold)

        # Train MLP
        mlp = MLPClassifier(**mlp_config)
        mlp.fit(X_train_scaled, y_train_smote)

        # Evaluate
        y_val_pred = mlp.predict(X_val_scaled)
        macro_f1_scores.append(f1_score(y_val_fold, y_val_pred, average='macro'))
        weighted_f1_scores.append(f1_score(y_val_fold, y_val_pred, average='weighted'))
        accuracy_scores.append(accuracy_score(y_val_fold, y_val_pred))

    return {
        'macro_f1': np.array(macro_f1_scores),
        'weighted_f1': np.array(weighted_f1_scores),
        'accuracy': np.array(accuracy_scores)
    }

start_time = time.time()
cv_results_mlp_smote = smote_cv_mlp(mlp_config, X_train, y_train_encoded, skf, smote_strategy_config)
train_time_mlp_smote = time.time() - start_time

smote_mlp_macro = cv_results_mlp_smote['macro_f1'].mean()
smote_mlp_std = cv_results_mlp_smote['macro_f1'].std()
smote_mlp_weighted = cv_results_mlp_smote['weighted_f1'].mean()
smote_mlp_acc = cv_results_mlp_smote['accuracy'].mean()

smote_results['MLP SMOTE'] = {
    'Macro F1': smote_mlp_macro,
    'Std': smote_mlp_std,
    'Weighted F1': smote_mlp_weighted,
    'Accuracy': smote_mlp_acc,
    'Train Time (s)': train_time_mlp_smote
}

print(f"    Macro F1: {smote_mlp_macro:.4f} ¬± {smote_mlp_std:.4f}")
print(f"    Train time: {train_time_mlp_smote:.1f}s\n")

# Tabella risultati
smote_df = pd.DataFrame(smote_results).T
smote_df['Macro F1 (¬±std)'] = smote_df.apply(
    lambda row: f"{row['Macro F1']:.4f} ¬± {row['Std']:.4f}", axis=1
)

print("="*70)
print("SMOTE CV RESULTS (5-Fold, Inside Loop)")
print("="*70)
print(smote_df[['Macro F1 (¬±std)', 'Weighted F1', 'Accuracy', 'Train Time (s)']].to_string())
print("\n SMOTE CV completata")

# Salva CSV
cv_comparison = pd.concat([baseline_df, smote_df])
cv_comparison.to_csv(os.path.join(OUTPUT_DIR, 'cv_results_baseline_vs_smote.csv'))
print("\n Risultati CV salvati: cv_results_baseline_vs_smote.csv")


---

# SECTION 5: Model Selection on Validation Set

## Cosa facciamo in questa sezione

- Alleniamo **6 modelli**:
  - 3 baseline: RF, LightGBM, MLP
  - 3 SMOTE: RF+SMOTE, LGB+SMOTE, MLP+SMOTE
- Valutiamo ognuno sul **validation set**:
  - Macro F1 (primary)
  - Weighted F1
  - Accuracy
- Selezioniamo il **best model** (Macro F1 massimo)
- Salviamo:
  - `validation_model_selection.csv`
  - `05_validation_model_selection.png` (bar chart baseline vs SMOTE)

---

### Interpretazione ‚Äì Validation Results

**Best performing model:**
- [identificare modello con F1-macro pi√π alto]
- Possibili motivi: [es. SMOTE aiuta?, tree-based vs neural?]

**SMOTE Impact:**
- Confronta baseline vs SMOTE per ogni algoritmo
- SMOTE migliora recall su classi minoritarie? Trade-off con precision?

**Algorithm Comparison:**
- Random Forest vs LightGBM: quale performa meglio?
- MLP competitivo? Considerazioni: tempo training, interpretabilit√†

**Next Steps:**
- Analisi dettagliata per-model (Section 6)
- Confusion matrix, feature importance, misclassification

---



In [None]:
# SECTION 5: MODEL CONFIGURATIONS

print("\n" + "="*70)
print("SECTION 5: MODEL TRAINING & VALIDATION")
print("="*70 + "\n")

# Configurazioni modelli
rf_config = {
    'n_estimators': 200,
    'max_depth': 25,
    'min_samples_split': 5,
    'min_samples_leaf': 2,
    'random_state': 42,
    'n_jobs': -1,
    'class_weight': 'balanced'  # gestisce imbalance internamente
}

lgb_config = {
    'n_estimators': 200,
    'max_depth': 25,
    'learning_rate': 0.1,
    'num_leaves': 50,
    'random_state': 42,
    'n_jobs': -1,
    'verbose': -1,
    'class_weight': 'balanced'
}

mlp_config_dict = {
    'hidden_layers': [256, 128, 64],
    'dropout_rate': 0.3,
    'learning_rate': 0.001,
    'batch_size': 64,
    'epochs': 100,
    'patience': 15
}

print(" Configurazioni definite\n")


In [None]:
# SECTION 5.1: DATA SCALING (necessario per MLP)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(X_train)
x_val_scaled = scaler.transform(X_val)
x_test_scaled = scaler.transform(X_test)  # scala anche test per uso futuro

print(" Feature scaling completato")
print(f"  Train shape: {x_train_scaled.shape}")
print(f"  Val shape:   {x_val_scaled.shape}")
print(f"  Test shape:  {x_test_scaled.shape}\n")


In [None]:
# SECTION 5.2: BASELINE MODELS (no SMOTE)

from sklearn.ensemble import RandomForestClassifier
import lightgbm as lgb
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.callbacks import EarlyStopping

print("="*70)
print("TRAINING BASELINE MODELS (no balancing)")
print("="*70 + "\n")

# ========== 1) RANDOM FOREST BASELINE ==========
print("[1/3] Training Random Forest Baseline...")
rf_baseline = RandomForestClassifier(**rf_config)
rf_baseline.fit(X_train, y_train_encoded)

val_pred_rf = rf_baseline.predict(X_val)
acc_rf_val = accuracy_score(y_val_encoded, val_pred_rf)
f1_rf_val = f1_score(y_val_encoded, val_pred_rf, average='macro')

print(f"   RF Baseline trained")
print(f"    Validation Accuracy: {acc_rf_val:.4f}")
print(f"    Validation F1-Macro: {f1_rf_val:.4f}\n")

# ========== 2) LIGHTGBM BASELINE ==========
print("[2/3] Training LightGBM Baseline...")
lgb_baseline = lgb.LGBMClassifier(**lgb_config)
lgb_baseline.fit(X_train, y_train_encoded)

val_pred_lgb = lgb_baseline.predict(X_val)
acc_lgb_val = accuracy_score(y_val_encoded, val_pred_lgb)
f1_lgb_val = f1_score(y_val_encoded, val_pred_lgb, average='macro')

print(f"   LGB Baseline trained")
print(f"    Validation Accuracy: {acc_lgb_val:.4f}")
print(f"    Validation F1-Macro: {f1_lgb_val:.4f}\n")

# ========== 3) MLP BASELINE ==========
print("[3/3] Training MLP Baseline...")

# Build model
mlp_baseline = keras.Sequential()
mlp_baseline.add(layers.Input(shape=(X_train.shape[1],)))

for units in mlp_config_dict['hidden_layers']:
    mlp_baseline.add(layers.Dense(units, activation='relu'))
    mlp_baseline.add(layers.Dropout(mlp_config_dict['dropout_rate']))

mlp_baseline.add(layers.Dense(len(class_names), activation='softmax'))

mlp_baseline.compile(
    optimizer=keras.optimizers.Adam(learning_rate=mlp_config_dict['learning_rate']),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

early_stop = EarlyStopping(
    monitor='val_loss',
    patience=mlp_config_dict['patience'],
    restore_best_weights=True,
    verbose=0
)

history_mlp = mlp_baseline.fit(
    x_train_scaled, y_train_encoded,
    validation_data=(x_val_scaled, y_val_encoded),
    epochs=mlp_config_dict['epochs'],
    batch_size=mlp_config_dict['batch_size'],
    callbacks=[early_stop],
    verbose=0
)

val_pred_mlp_proba = mlp_baseline.predict(x_val_scaled, verbose=0)
val_pred_mlp = np.argmax(val_pred_mlp_proba, axis=1)
acc_mlp_val = accuracy_score(y_val_encoded, val_pred_mlp)
f1_mlp_val = f1_score(y_val_encoded, val_pred_mlp, average='macro')

print(f"   MLP Baseline trained ({len(history_mlp.history['loss'])} epochs)")
print(f"    Validation Accuracy: {acc_mlp_val:.4f}")
print(f"    Validation F1-Macro: {f1_mlp_val:.4f}\n")

print("="*70)
print("BASELINE MODELS TRAINING COMPLETED")
print("="*70 + "\n")


In [None]:
# SECTION 5.3: SMOTE MODELS

from imblearn.over_sampling import SMOTE

print("="*70)
print("TRAINING SMOTE MODELS (balanced training set)")
print("="*70 + "\n")

# Applica SMOTE al training set
print("Applying SMOTE to training data...")

smote_sampler = SMOTE(sampling_strategy=smote_strategy_config, random_state=42)
x_train_smote, y_train_smote = smote_sampler.fit_resample(X_train, y_train_encoded)


print(f"  Original training samples: {X_train.shape[0]}")
print(f"  SMOTE training samples:    {x_train_smote.shape[0]}")
print(f"   SMOTE applied\n")

# Scala versione SMOTE per MLP
x_train_smote_scaled = scaler.fit_transform(x_train_smote)

# ========== 4) RANDOM FOREST + SMOTE ==========
print("[1/3] Training Random Forest + SMOTE...")
# Rimuovi class_weight (SMOTE gi√† bilancia)
rf_smote_config = {k: v for k, v in rf_config.items() if k != 'class_weight'}
rf_smote = RandomForestClassifier(**rf_smote_config)
rf_smote.fit(x_train_smote, y_train_smote)

val_pred_rf_smote = rf_smote.predict(X_val)
acc_rf_smote_val = accuracy_score(y_val_encoded, val_pred_rf_smote)
f1_rf_smote_val = f1_score(y_val_encoded, val_pred_rf_smote, average='macro')

print(f"   RF + SMOTE trained")
print(f"    Validation Accuracy: {acc_rf_smote_val:.4f}")
print(f"    Validation F1-Macro: {f1_rf_smote_val:.4f}\n")

# ========== 5) LIGHTGBM + SMOTE ==========
print("[2/3] Training LightGBM + SMOTE...")
lgb_smote_config = {k: v for k, v in lgb_config.items() if k != 'class_weight'}
lgb_smote = lgb.LGBMClassifier(**lgb_smote_config)
lgb_smote.fit(x_train_smote, y_train_smote)

val_pred_lgb_smote = lgb_smote.predict(X_val)
acc_lgb_smote_val = accuracy_score(y_val_encoded, val_pred_lgb_smote)
f1_lgb_smote_val = f1_score(y_val_encoded, val_pred_lgb_smote, average='macro')

print(f"   LGB + SMOTE trained")
print(f"    Validation Accuracy: {acc_lgb_smote_val:.4f}")
print(f"    Validation F1-Macro: {f1_lgb_smote_val:.4f}\n")

# ========== 6) MLP + SMOTE ==========
print("[3/3] Training MLP + SMOTE...")

mlp_smote = keras.Sequential()
mlp_smote.add(layers.Input(shape=(x_train_smote.shape[1],)))

for units in mlp_config_dict['hidden_layers']:
    mlp_smote.add(layers.Dense(units, activation='relu'))
    mlp_smote.add(layers.Dropout(mlp_config_dict['dropout_rate']))

mlp_smote.add(layers.Dense(len(class_names), activation='softmax'))

mlp_smote.compile(
    optimizer=keras.optimizers.Adam(learning_rate=mlp_config_dict['learning_rate']),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

early_stop_smote = EarlyStopping(
    monitor='val_loss',
    patience=mlp_config_dict['patience'],
    restore_best_weights=True,
    verbose=0
)

history_mlp_smote = mlp_smote.fit(
    x_train_smote_scaled, y_train_smote,
    validation_data=(x_val_scaled, y_val_encoded),
    epochs=mlp_config_dict['epochs'],
    batch_size=mlp_config_dict['batch_size'],
    callbacks=[early_stop_smote],
    verbose=0
)

val_pred_mlp_smote_proba = mlp_smote.predict(x_val_scaled, verbose=0)
val_pred_mlp_smote = np.argmax(val_pred_mlp_smote_proba, axis=1)
acc_mlp_smote_val = accuracy_score(y_val_encoded, val_pred_mlp_smote)
f1_mlp_smote_val = f1_score(y_val_encoded, val_pred_mlp_smote, average='macro')

print(f"   MLP + SMOTE trained ({len(history_mlp_smote.history['loss'])} epochs)")
print(f"    Validation Accuracy: {acc_mlp_smote_val:.4f}")
print(f"    Validation F1-Macro: {f1_mlp_smote_val:.4f}\n")

print("="*70)
print("SMOTE MODELS TRAINING COMPLETED")
print("="*70 + "\n")


In [None]:
# SECTION 5.4: VALIDATION RESULTS COMPARISON (Fixed)

results_val = pd.DataFrame({
    'Model': [
        'RF_Baseline', 'LGB_Baseline', 'MLP_Baseline',
        'RF_SMOTE', 'LGB_SMOTE', 'MLP_SMOTE'
    ],
    'Validation_Accuracy': [
        acc_rf_val, acc_lgb_val, acc_mlp_val,
        acc_rf_smote_val, acc_lgb_smote_val, acc_mlp_smote_val
    ],
    'Validation_F1_Macro': [
        f1_rf_val, f1_lgb_val, f1_mlp_val,
        f1_rf_smote_val, f1_lgb_smote_val, f1_mlp_smote_val
    ]
})

results_val = results_val.sort_values('Validation_F1_Macro', ascending=False).reset_index(drop=True)

print("="*70)
print("VALIDATION RESULTS SUMMARY")
print("="*70 + "\n")
print(results_val.to_string(index=False))

results_val.to_csv(os.path.join(OUTPUT_DIR, '05_validation_results_comparison.csv'), index=False)
print("\n Risultati salvati in:", os.path.join(OUTPUT_DIR, '05_validation_results_comparison.csv'))

# Plot comparison (Fixed: no xlim restriction, better colors)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy - colori uniformi
colors_acc = ['#2E86AB', '#A23B72', '#F18F01', '#2E86AB', '#A23B72', '#F18F01']
axes[0].barh(results_val['Model'], results_val['Validation_Accuracy'],
             color=colors_acc, alpha=0.85, edgecolor='black')
axes[0].set_xlabel('Validation Accuracy', fontweight='bold', fontsize=11)
axes[0].set_title('Model Comparison: Validation Accuracy', fontweight='bold', fontsize=12)
axes[0].grid(axis='x', alpha=0.3, linestyle='--')
# Rimuovi xlim fisso: lascia che si adatti ai dati
axes[0].set_xlim([results_val['Validation_Accuracy'].min() - 0.05, 1.0])

# F1-Macro
axes[1].barh(results_val['Model'], results_val['Validation_F1_Macro'],
             color=colors_acc, alpha=0.85, edgecolor='black')
axes[1].set_xlabel('Validation F1-Macro', fontweight='bold', fontsize=11)
axes[1].set_title('Model Comparison: Validation F1-Macro', fontweight='bold', fontsize=12)
axes[1].grid(axis='x', alpha=0.3, linestyle='--')
axes[1].set_xlim([results_val['Validation_F1_Macro'].min() - 0.05, 1.0])

plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '05_validation_results_comparison.png'), dpi=150, bbox_inches='tight')
plt.show()

print("\n Plot salvato\n")
print("="*70)
print("SECTION 5 COMPLETED")
print("="*70 + "\n")


---

# SECTION 6: Detailed Per-Model Analysis

## Obiettivo

Analizzare in dettaglio il comportamento di OGNI modello (6 totali) sul **validation set**:

- Confusion Matrix annotata (8x8)
- Per-class metrics table (Precision, Recall, F1, Support)
- Feature importance (solo RF/LightGBM)
- Misclassification analysis (Top-10 errori pi√π frequenti)
- Commento testuale (interpretazione risultati)

## Struttura

- 6.1: Random Forest Baseline
- 6.2: LightGBM Baseline
- 6.3: MLP Baseline
- 6.4: Random Forest + SMOTE
- 6.5: LightGBM + SMOTE
- 6.6: MLP + SMOTE

Tutta l‚Äôanalisi √® fatta su **validation set** (no test leakage).


In [None]:
# COLORMAP PERSONALIZZATA per tutte le confusion matrix
# Usa una scala da bianco ‚Üí blu scuro per leggibilit√† ottimale

import matplotlib.colors as mcolors

# Crea colormap personalizzata: bianco ‚Üí azzurro ‚Üí blu scuro
colors_cm = ['#FFFFFF', '#E3F2FD', '#90CAF9', '#42A5F5', '#1E88E5', '#1565C0', '#0D47A1']
n_bins = 100
cmap_custom = mcolors.LinearSegmentedColormap.from_list('custom_blues', colors_cm, N=n_bins)


In [None]:
# SECTION 6.1: RANDOM FOREST BASELINE ‚Äì DETAILED ANALYSIS
print("\n" + "="*70)
print("SECTION 6.1: RANDOM FOREST BASELINE ‚Äì DETAILED ANALYSIS")
print("="*70 + "\n")

# 1) Confusion Matrix
cm_rf = confusion_matrix(y_val_encoded, val_pred_rf)

fig, ax = plt.subplots(figsize=(12, 10))
sns.heatmap(
    cm_rf, annot=True, fmt='d', cmap=cmap_custom,
    xticklabels=class_names, yticklabels=class_names,
    cbar_kws={'label': 'Count'}, ax=ax,
    linewidths=0.5, linecolor='gray',
    annot_kws={'fontsize': 9}
)
ax.set_xlabel('Predicted Label', fontweight='bold', fontsize=11)
ax.set_ylabel('True Label', fontweight='bold', fontsize=11)
ax.set_title('Random Forest Baseline ‚Äì Confusion Matrix (Validation Set)',
             fontweight='bold', fontsize=13)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_1_rf_baseline_confusion_matrix.png'), dpi=150, bbox_inches='tight')
plt.show()
print(" Confusion matrix salvata\n")

# 2) Per-class metrics
report_rf = classification_report(
    y_val_encoded, val_pred_rf,
    target_names=class_names,
    output_dict=True
)
perclass_rf = pd.DataFrame(report_rf).T.iloc[:-3, :]
perclass_rf['support'] = perclass_rf['support'].astype(int)
perclass_rf.index.name = 'Class'

print("Per-Class Metrics (RF Baseline):")
print(perclass_rf[['precision', 'recall', 'f1-score', 'support']].to_string())

perclass_rf.to_csv(os.path.join(OUTPUT_DIR, '06_1_rf_baseline_perclass_metrics.csv'))
print("\n Per-class metrics salvate\n")

# 3) Feature importance (Top-20)
fi_rf = pd.DataFrame({
    'Feature': X_train.columns,
    'Importance': rf_baseline.feature_importances_
}).sort_values('Importance', ascending=False).head(20)

fig, ax = plt.subplots(figsize=(12, 8))
ax.barh(fi_rf['Feature'], fi_rf['Importance'], color='steelblue', alpha=0.9)
ax.invert_yaxis()
ax.set_xlabel('Gini Importance', fontweight='bold')
ax.set_title('Random Forest Baseline ‚Äì Top 20 Features by Importance', fontweight='bold')
ax.grid(axis='x', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_1_rf_baseline_feature_importance.png'), dpi=150)
plt.show()
print(" Feature importance salvata\n")

fi_rf.to_csv(os.path.join(OUTPUT_DIR, '06_1_rf_baseline_feature_importance.csv'), index=False)

# 4) Misclassification analysis (Top-10 errori)
misclass_rf = []
for true_idx in range(len(class_names)):
    for pred_idx in range(len(class_names)):
        if true_idx != pred_idx:
            count = cm_rf[true_idx, pred_idx]
            if count > 0:
                misclass_rf.append({
                    'True_Class': class_names[true_idx],
                    'Predicted_Class': class_names[pred_idx],
                    'Count': count
                })

misclass_rf_df = pd.DataFrame(misclass_rf).sort_values('Count', ascending=False).head(10)
print("Top-10 Misclassifications (RF Baseline):")
print(misclass_rf_df.to_string(index=False))
misclass_rf_df.to_csv(os.path.join(OUTPUT_DIR, '06_1_rf_baseline_top10_errors.csv'), index=False)
print("\n Misclassification analysis salvata\n")

print("="*70)
print("SECTION 6.1: RANDOM FOREST BASELINE ‚Äì COMPLETED")
print("="*70 + "\n")


In [None]:
# SECTION 6.2: LIGHTGBM BASELINE ‚Äì DETAILED ANALYSIS

print("\n" + "="*70)
print("SECTION 6.2: LIGHTGBM BASELINE ‚Äì DETAILED ANALYSIS")
print("="*70 + "\n")

# 1) Confusion Matrix
cm_lgb = confusion_matrix(y_val_encoded, val_pred_lgb)

fig, ax = plt.subplots(figsize=(12, 10))
sns.heatmap(
    cm_lgb, annot=True, fmt='d', cmap=cmap_custom,
    xticklabels=class_names, yticklabels=class_names,
    cbar_kws={'label': 'Count'}, ax=ax,
    linewidths=0.5, linecolor='gray',
    annot_kws={'fontsize': 9}
)
ax.set_xlabel('Predicted Label', fontweight='bold', fontsize=11)
ax.set_ylabel('True Label', fontweight='bold', fontsize=11)
ax.set_title('LightGBM Baseline ‚Äì Confusion Matrix (Validation Set)',
             fontweight='bold', fontsize=13)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_2_lgb_baseline_confusion_matrix.png'), dpi=150, bbox_inches='tight')
plt.show()
print(" Confusion matrix salvata\n")

# 2) Per-class metrics
report_lgb = classification_report(
    y_val_encoded, val_pred_lgb,
    target_names=class_names,
    output_dict=True
)
perclass_lgb = pd.DataFrame(report_lgb).T.iloc[:-3, :]
perclass_lgb['support'] = perclass_lgb['support'].astype(int)
perclass_lgb.index.name = 'Class'

print("Per-Class Metrics (LightGBM Baseline):")
print(perclass_lgb[['precision', 'recall', 'f1-score', 'support']].to_string())

perclass_lgb.to_csv(os.path.join(OUTPUT_DIR, '06_2_lgb_baseline_perclass_metrics.csv'))
print("\n Per-class metrics salvate\n")

# 3) Feature importance (Top-20)
fi_lgb = pd.DataFrame({
    'Feature': X_train.columns,
    'Importance': lgb_baseline.feature_importances_
}).sort_values('Importance', ascending=False).head(20)

fig, ax = plt.subplots(figsize=(12, 8))
ax.barh(fi_lgb['Feature'], fi_lgb['Importance'], color='forestgreen', alpha=0.9)
ax.invert_yaxis()
ax.set_xlabel('Split Importance', fontweight='bold')
ax.set_title('LightGBM Baseline ‚Äì Top 20 Features by Importance', fontweight='bold')
ax.grid(axis='x', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_2_lgb_baseline_feature_importance.png'), dpi=150)
plt.show()
print(" Feature importance salvata\n")

fi_lgb.to_csv(os.path.join(OUTPUT_DIR, '06_2_lgb_baseline_feature_importance.csv'), index=False)

# 4) Misclassification analysis
misclass_lgb = []
for true_idx in range(len(class_names)):
    for pred_idx in range(len(class_names)):
        if true_idx != pred_idx:
            count = cm_lgb[true_idx, pred_idx]
            if count > 0:
                misclass_lgb.append({
                    'True_Class': class_names[true_idx],
                    'Predicted_Class': class_names[pred_idx],
                    'Count': count
                })

misclass_lgb_df = pd.DataFrame(misclass_lgb).sort_values('Count', ascending=False).head(10)
print("Top-10 Misclassifications (LightGBM Baseline):")
print(misclass_lgb_df.to_string(index=False))
misclass_lgb_df.to_csv(os.path.join(OUTPUT_DIR, '06_2_lgb_baseline_top10_errors.csv'), index=False)
print("\n Misclassification analysis salvata\n")

print("="*70)
print("SECTION 6.2: LIGHTGBM BASELINE ‚Äì COMPLETED")
print("="*70 + "\n")


In [None]:
# SECTION 6.3: MLP BASELINE ‚Äì DETAILED ANALYSIS

print("\n" + "="*70)
print("SECTION 6.3: MLP BASELINE ‚Äì DETAILED ANALYSIS")
print("="*70 + "\n")

# 1) Confusion Matrix
cm_mlp = confusion_matrix(y_val_encoded, val_pred_mlp)

fig, ax = plt.subplots(figsize=(12, 10))
sns.heatmap(
    cm_mlp, annot=True, fmt='d', cmap=cmap_custom,
    xticklabels=class_names, yticklabels=class_names,
    cbar_kws={'label': 'Count'}, ax=ax,
    linewidths=0.5, linecolor='gray',
    annot_kws={'fontsize': 9}
)
ax.set_xlabel('Predicted Label', fontweight='bold', fontsize=11)
ax.set_ylabel('True Label', fontweight='bold', fontsize=11)
ax.set_title('MLP Baseline ‚Äì Confusion Matrix (Validation Set)',
             fontweight='bold', fontsize=13)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_3_mlp_baseline_confusion_matrix.png'), dpi=150, bbox_inches='tight')
plt.show()
print(" Confusion matrix salvata\n")

# 2) Per-class metrics
report_mlp = classification_report(
    y_val_encoded, val_pred_mlp,
    target_names=class_names,
    output_dict=True
)
perclass_mlp = pd.DataFrame(report_mlp).T.iloc[:-3, :]
perclass_mlp['support'] = perclass_mlp['support'].astype(int)
perclass_mlp.index.name = 'Class'

print("Per-Class Metrics (MLP Baseline):")
print(perclass_mlp[['precision', 'recall', 'f1-score', 'support']].to_string())

perclass_mlp.to_csv(os.path.join(OUTPUT_DIR, '06_3_mlp_baseline_perclass_metrics.csv'))
print("\n Per-class metrics salvate\n")

# 3) Training history plot (loss e accuracy)
history_mlp_df = pd.DataFrame(history_mlp.history)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss
axes[0].plot(history_mlp_df.index, history_mlp_df['loss'], label='Train Loss', linewidth=2)
axes[0].plot(history_mlp_df.index, history_mlp_df['val_loss'], label='Val Loss', linewidth=2)
axes[0].set_xlabel('Epoch', fontweight='bold')
axes[0].set_ylabel('Loss', fontweight='bold')
axes[0].set_title('MLP Baseline ‚Äì Training Loss', fontweight='bold')
axes[0].legend()
axes[0].grid(alpha=0.3)

# Accuracy
axes[1].plot(history_mlp_df.index, history_mlp_df['accuracy'], label='Train Accuracy', linewidth=2)
axes[1].plot(history_mlp_df.index, history_mlp_df['val_accuracy'], label='Val Accuracy', linewidth=2)
axes[1].set_xlabel('Epoch', fontweight='bold')
axes[1].set_ylabel('Accuracy', fontweight='bold')
axes[1].set_title('MLP Baseline ‚Äì Training Accuracy', fontweight='bold')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_3_mlp_baseline_training_history.png'), dpi=150)
plt.show()
print(" Training history salvata\n")

history_mlp_df.to_csv(os.path.join(OUTPUT_DIR, '06_3_mlp_baseline_training_history.csv'), index_label='Epoch')

# 4) Misclassification analysis
misclass_mlp = []
for true_idx in range(len(class_names)):
    for pred_idx in range(len(class_names)):
        if true_idx != pred_idx:
            count = cm_mlp[true_idx, pred_idx]
            if count > 0:
                misclass_mlp.append({
                    'True_Class': class_names[true_idx],
                    'Predicted_Class': class_names[pred_idx],
                    'Count': count
                })

misclass_mlp_df = pd.DataFrame(misclass_mlp).sort_values('Count', ascending=False).head(10)
print("Top-10 Misclassifications (MLP Baseline):")
print(misclass_mlp_df.to_string(index=False))
misclass_mlp_df.to_csv(os.path.join(OUTPUT_DIR, '06_3_mlp_baseline_top10_errors.csv'), index=False)
print("\n Misclassification analysis salvata\n")

print("="*70)
print("SECTION 6.3: MLP BASELINE ‚Äì COMPLETED")
print("="*70 + "\n")


In [None]:
# SECTION 6.4: RANDOM FOREST + SMOTE ‚Äì DETAILED ANALYSIS

print("\n" + "="*70)
print("SECTION 6.4: RANDOM FOREST + SMOTE ‚Äì DETAILED ANALYSIS")
print("="*70 + "\n")

# 1) Confusion Matrix
cm_rf_smote = confusion_matrix(y_val_encoded, val_pred_rf_smote)

fig, ax = plt.subplots(figsize=(12, 10))
sns.heatmap(
    cm_rf_smote, annot=True, fmt='d', cmap=cmap_custom,
    xticklabels=class_names, yticklabels=class_names,
    cbar_kws={'label': 'Count'}, ax=ax,
    linewidths=0.5, linecolor='gray',
    annot_kws={'fontsize': 9}
)
ax.set_xlabel('Predicted Label', fontweight='bold', fontsize=11)
ax.set_ylabel('True Label', fontweight='bold', fontsize=11)
ax.set_title('Random Forest + SMOTE ‚Äì Confusion Matrix (Validation Set)',
             fontweight='bold', fontsize=13)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_4_rf_smote_confusion_matrix.png'), dpi=150, bbox_inches='tight')
plt.show()
print(" Confusion matrix salvata\n")

# 2) Per-class metrics
report_rf_smote = classification_report(
    y_val_encoded, val_pred_rf_smote,
    target_names=class_names,
    output_dict=True
)
perclass_rf_smote = pd.DataFrame(report_rf_smote).T.iloc[:-3, :]
perclass_rf_smote['support'] = perclass_rf_smote['support'].astype(int)
perclass_rf_smote.index.name = 'Class'

print("Per-Class Metrics (RF + SMOTE):")
print(perclass_rf_smote[['precision', 'recall', 'f1-score', 'support']].to_string())

perclass_rf_smote.to_csv(os.path.join(OUTPUT_DIR, '06_4_rf_smote_perclass_metrics.csv'))
print("\n Per-class metrics salvate\n")

# 3) Confronto baseline vs SMOTE (RF)
comparison_rf = pd.DataFrame({
    'Class': class_names,
    'F1_Baseline': [report_rf[c]['f1-score'] for c in class_names],
    'F1_SMOTE': [report_rf_smote[c]['f1-score'] for c in class_names]
})
comparison_rf['Delta_F1'] = comparison_rf['F1_SMOTE'] - comparison_rf['F1_Baseline']

print("Confronto F1-Score: RF Baseline vs RF+SMOTE:")
print(comparison_rf.to_string(index=False))
comparison_rf.to_csv(os.path.join(OUTPUT_DIR, '06_4_rf_comparison_baseline_smote.csv'), index=False)
print("\n Confronto salvato\n")

# Plot delta F1
fig, ax = plt.subplots(figsize=(10, 6))
colors = ['green' if x > 0 else 'red' for x in comparison_rf['Delta_F1']]
ax.barh(comparison_rf['Class'], comparison_rf['Delta_F1'], color=colors, alpha=0.8)
ax.axvline(0, color='black', linewidth=0.8, linestyle='--')
ax.set_xlabel('ŒîF1 (SMOTE - Baseline)', fontweight='bold')
ax.set_title('Random Forest: Impact of SMOTE on F1-Score per Class', fontweight='bold')
ax.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_4_rf_smote_delta_f1.png'), dpi=150)
plt.show()
print(" Delta F1 plot salvato\n")

# 4) Feature importance
fi_rf_smote = pd.DataFrame({
    'Feature': X_train.columns,
    'Importance': rf_smote.feature_importances_
}).sort_values('Importance', ascending=False).head(20)

fig, ax = plt.subplots(figsize=(12, 8))
ax.barh(fi_rf_smote['Feature'], fi_rf_smote['Importance'], color='steelblue', alpha=0.9)
ax.invert_yaxis()
ax.set_xlabel('Gini Importance', fontweight='bold')
ax.set_title('Random Forest + SMOTE ‚Äì Top 20 Features', fontweight='bold')
ax.grid(axis='x', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_4_rf_smote_feature_importance.png'), dpi=150)
plt.show()
print(" Feature importance salvata\n")

fi_rf_smote.to_csv(os.path.join(OUTPUT_DIR, '06_4_rf_smote_feature_importance.csv'), index=False)

# 5) Misclassification
misclass_rf_smote = []
for true_idx in range(len(class_names)):
    for pred_idx in range(len(class_names)):
        if true_idx != pred_idx:
            count = cm_rf_smote[true_idx, pred_idx]
            if count > 0:
                misclass_rf_smote.append({
                    'True_Class': class_names[true_idx],
                    'Predicted_Class': class_names[pred_idx],
                    'Count': count
                })

misclass_rf_smote_df = pd.DataFrame(misclass_rf_smote).sort_values('Count', ascending=False).head(10)
print("Top-10 Misclassifications (RF + SMOTE):")
print(misclass_rf_smote_df.to_string(index=False))
misclass_rf_smote_df.to_csv(os.path.join(OUTPUT_DIR, '06_4_rf_smote_top10_errors.csv'), index=False)
print("\n Misclassification analysis salvata\n")

print("="*70)
print("SECTION 6.4: RANDOM FOREST + SMOTE ‚Äì COMPLETED")
print("="*70 + "\n")


In [None]:
# SECTION 6.5: LIGHTGBM + SMOTE ‚Äì DETAILED ANALYSIS

print("\n" + "="*70)
print("SECTION 6.5: LIGHTGBM + SMOTE ‚Äì DETAILED ANALYSIS")
print("="*70 + "\n")

# 1) Confusion Matrix
cm_lgb_smote = confusion_matrix(y_val_encoded, val_pred_lgb_smote)

fig, ax = plt.subplots(figsize=(12, 10))
sns.heatmap(
    cm_lgb_smote, annot=True, fmt='d', cmap=cmap_custom,
    xticklabels=class_names, yticklabels=class_names,
    cbar_kws={'label': 'Count'}, ax=ax,
    linewidths=0.5, linecolor='gray',
    annot_kws={'fontsize': 9}
)
ax.set_xlabel('Predicted Label', fontweight='bold', fontsize=11)
ax.set_ylabel('True Label', fontweight='bold', fontsize=11)
ax.set_title('LightGBM + SMOTE ‚Äì Confusion Matrix (Validation Set)',
             fontweight='bold', fontsize=13)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_5_lgb_smote_confusion_matrix.png'), dpi=150, bbox_inches='tight')
plt.show()
print(" Confusion matrix salvata\n")

# 2) Per-class metrics
report_lgb_smote = classification_report(
    y_val_encoded, val_pred_lgb_smote,
    target_names=class_names,
    output_dict=True
)
perclass_lgb_smote = pd.DataFrame(report_lgb_smote).T.iloc[:-3, :]
perclass_lgb_smote['support'] = perclass_lgb_smote['support'].astype(int)
perclass_lgb_smote.index.name = 'Class'

print("Per-Class Metrics (LightGBM + SMOTE):")
print(perclass_lgb_smote[['precision', 'recall', 'f1-score', 'support']].to_string())

perclass_lgb_smote.to_csv(os.path.join(OUTPUT_DIR, '06_5_lgb_smote_perclass_metrics.csv'))
print("\n Per-class metrics salvate\n")

# 3) Confronto baseline vs SMOTE (LGB)
comparison_lgb = pd.DataFrame({
    'Class': class_names,
    'F1_Baseline': [report_lgb[c]['f1-score'] for c in class_names],
    'F1_SMOTE': [report_lgb_smote[c]['f1-score'] for c in class_names]
})
comparison_lgb['Delta_F1'] = comparison_lgb['F1_SMOTE'] - comparison_lgb['F1_Baseline']

print("Confronto F1-Score: LGB Baseline vs LGB+SMOTE:")
print(comparison_lgb.to_string(index=False))
comparison_lgb.to_csv(os.path.join(OUTPUT_DIR, '06_5_lgb_comparison_baseline_smote.csv'), index=False)
print("\n Confronto salvato\n")

# Plot delta F1
fig, ax = plt.subplots(figsize=(10, 6))
colors = ['green' if x > 0 else 'red' for x in comparison_lgb['Delta_F1']]
ax.barh(comparison_lgb['Class'], comparison_lgb['Delta_F1'], color=colors, alpha=0.8)
ax.axvline(0, color='black', linewidth=0.8, linestyle='--')
ax.set_xlabel('ŒîF1 (SMOTE - Baseline)', fontweight='bold')
ax.set_title('LightGBM: Impact of SMOTE on F1-Score per Class', fontweight='bold')
ax.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_5_lgb_smote_delta_f1.png'), dpi=150)
plt.show()
print(" Delta F1 plot salvato\n")

# 4) Feature importance
fi_lgb_smote = pd.DataFrame({
    'Feature': X_train.columns,
    'Importance': lgb_smote.feature_importances_
}).sort_values('Importance', ascending=False).head(20)

fig, ax = plt.subplots(figsize=(12, 8))
ax.barh(fi_lgb_smote['Feature'], fi_lgb_smote['Importance'], color='forestgreen', alpha=0.9)
ax.invert_yaxis()
ax.set_xlabel('Split Importance', fontweight='bold')
ax.set_title('LightGBM + SMOTE ‚Äì Top 20 Features', fontweight='bold')
ax.grid(axis='x', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_5_lgb_smote_feature_importance.png'), dpi=150)
plt.show()
print(" Feature importance salvata\n")

fi_lgb_smote.to_csv(os.path.join(OUTPUT_DIR, '06_5_lgb_smote_feature_importance.csv'), index=False)

# 5) Misclassification
misclass_lgb_smote = []
for true_idx in range(len(class_names)):
    for pred_idx in range(len(class_names)):
        if true_idx != pred_idx:
            count = cm_lgb_smote[true_idx, pred_idx]
            if count > 0:
                misclass_lgb_smote.append({
                    'True_Class': class_names[true_idx],
                    'Predicted_Class': class_names[pred_idx],
                    'Count': count
                })

misclass_lgb_smote_df = pd.DataFrame(misclass_lgb_smote).sort_values('Count', ascending=False).head(10)
print("Top-10 Misclassifications (LGB + SMOTE):")
print(misclass_lgb_smote_df.to_string(index=False))
misclass_lgb_smote_df.to_csv(os.path.join(OUTPUT_DIR, '06_5_lgb_smote_top10_errors.csv'), index=False)
print("\n Misclassification analysis salvata\n")

print("="*70)
print("SECTION 6.5: LIGHTGBM + SMOTE ‚Äì COMPLETED")
print("="*70 + "\n")


In [None]:
# SECTION 6.6: MLP + SMOTE ‚Äì DETAILED ANALYSIS

print("\n" + "="*70)
print("SECTION 6.6: MLP + SMOTE ‚Äì DETAILED ANALYSIS")
print("="*70 + "\n")

# 1) Confusion Matrix
cm_mlp_smote = confusion_matrix(y_val_encoded, val_pred_mlp_smote)

fig, ax = plt.subplots(figsize=(12, 10))
sns.heatmap(
    cm_mlp_smote, annot=True, fmt='d', cmap=cmap_custom,
    xticklabels=class_names, yticklabels=class_names,
    cbar_kws={'label': 'Count'}, ax=ax,
    linewidths=0.5, linecolor='gray',
    annot_kws={'fontsize': 9}
)
ax.set_xlabel('Predicted Label', fontweight='bold', fontsize=11)
ax.set_ylabel('True Label', fontweight='bold', fontsize=11)
ax.set_title('MLP + SMOTE ‚Äì Confusion Matrix (Validation Set)',
             fontweight='bold', fontsize=13)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_6_mlp_smote_confusion_matrix.png'), dpi=150, bbox_inches='tight')
plt.show()
print(" Confusion matrix salvata\n")

# 2) Per-class metrics
report_mlp_smote = classification_report(
    y_val_encoded, val_pred_mlp_smote,
    target_names=class_names,
    output_dict=True
)
perclass_mlp_smote = pd.DataFrame(report_mlp_smote).T.iloc[:-3, :]
perclass_mlp_smote['support'] = perclass_mlp_smote['support'].astype(int)
perclass_mlp_smote.index.name = 'Class'

print("Per-Class Metrics (MLP + SMOTE):")
print(perclass_mlp_smote[['precision', 'recall', 'f1-score', 'support']].to_string())

perclass_mlp_smote.to_csv(os.path.join(OUTPUT_DIR, '06_6_mlp_smote_perclass_metrics.csv'))
print("\n Per-class metrics salvate\n")

# 3) Confronto baseline vs SMOTE (MLP)
comparison_mlp = pd.DataFrame({
    'Class': class_names,
    'F1_Baseline': [report_mlp[c]['f1-score'] for c in class_names],
    'F1_SMOTE': [report_mlp_smote[c]['f1-score'] for c in class_names]
})
comparison_mlp['Delta_F1'] = comparison_mlp['F1_SMOTE'] - comparison_mlp['F1_Baseline']

print("Confronto F1-Score: MLP Baseline vs MLP+SMOTE:")
print(comparison_mlp.to_string(index=False))
comparison_mlp.to_csv(os.path.join(OUTPUT_DIR, '06_6_mlp_comparison_baseline_smote.csv'), index=False)
print("\n Confronto salvato\n")

# Plot delta F1
fig, ax = plt.subplots(figsize=(10, 6))
colors = ['green' if x > 0 else 'red' for x in comparison_mlp['Delta_F1']]
ax.barh(comparison_mlp['Class'], comparison_mlp['Delta_F1'], color=colors, alpha=0.8)
ax.axvline(0, color='black', linewidth=0.8, linestyle='--')
ax.set_xlabel('ŒîF1 (SMOTE - Baseline)', fontweight='bold')
ax.set_title('MLP: Impact of SMOTE on F1-Score per Class', fontweight='bold')
ax.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_6_mlp_smote_delta_f1.png'), dpi=150)
plt.show()
print(" Delta F1 plot salvato\n")

# 4) Training history
history_mlp_smote_df = pd.DataFrame(history_mlp_smote.history)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss
axes[0].plot(history_mlp_smote_df.index, history_mlp_smote_df['loss'], label='Train Loss', linewidth=2)
axes[0].plot(history_mlp_smote_df.index, history_mlp_smote_df['val_loss'], label='Val Loss', linewidth=2)
axes[0].set_xlabel('Epoch', fontweight='bold')
axes[0].set_ylabel('Loss', fontweight='bold')
axes[0].set_title('MLP + SMOTE ‚Äì Training Loss', fontweight='bold')
axes[0].legend()
axes[0].grid(alpha=0.3)

# Accuracy
axes[1].plot(history_mlp_smote_df.index, history_mlp_smote_df['accuracy'], label='Train Accuracy', linewidth=2)
axes[1].plot(history_mlp_smote_df.index, history_mlp_smote_df['val_accuracy'], label='Val Accuracy', linewidth=2)
axes[1].set_xlabel('Epoch', fontweight='bold')
axes[1].set_ylabel('Accuracy', fontweight='bold')
axes[1].set_title('MLP + SMOTE ‚Äì Training Accuracy', fontweight='bold')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '06_6_mlp_smote_training_history.png'), dpi=150)
plt.show()
print(" Training history salvata\n")

history_mlp_smote_df.to_csv(os.path.join(OUTPUT_DIR, '06_6_mlp_smote_training_history.csv'), index_label='Epoch')

# 5) Misclassification
misclass_mlp_smote = []
for true_idx in range(len(class_names)):
    for pred_idx in range(len(class_names)):
        if true_idx != pred_idx:
            count = cm_mlp_smote[true_idx, pred_idx]
            if count > 0:
                misclass_mlp_smote.append({
                    'True_Class': class_names[true_idx],
                    'Predicted_Class': class_names[pred_idx],
                    'Count': count
                })

misclass_mlp_smote_df = pd.DataFrame(misclass_mlp_smote).sort_values('Count', ascending=False).head(10)
print("Top-10 Misclassifications (MLP + SMOTE):")
print(misclass_mlp_smote_df.to_string(index=False))
misclass_mlp_smote_df.to_csv(os.path.join(OUTPUT_DIR, '06_6_mlp_smote_top10_errors.csv'), index=False)
print("\n Misclassification analysis salvata\n")

print("="*70)
print("SECTION 6.6: MLP + SMOTE ‚Äì COMPLETED")
print("="*70 + "\n")


---

# SECTION 7: Final Test Set Evaluation & Model Selection

## Obiettivo

1. **Selezionare il modello migliore** basandosi su validation performance
2. **Valutare sul test set** (prima e unica volta)
3. **Confrontare test vs validation** per verificare generalizzazione
4. **Presentare risultati finali** con metriche complete

## Workflow

1. Selezione modello basata su validation F1-score macro
2. Predizioni sul test set
3. Confusion matrix test
4. Per-class metrics test
5. Confronto test/validation
6. Dichiarazione performance finale

---


In [None]:
# SECTION 7.1: TEST SET PREDICTIONS

print("\n" + "="*70)
print("SECTION 7: FINAL TEST SET EVALUATION")
print("="*70 + "\n")

print("Generating predictions on test set (completely unseen data)...\n")

# ========== BASELINE MODELS ==========
print("[1/6] RF Baseline...")
test_pred_rf = rf_baseline.predict(X_test)
acc_rf_test = accuracy_score(y_test_encoded, test_pred_rf)
f1_rf_test = f1_score(y_test_encoded, test_pred_rf, average='macro')
print(f"   Test Accuracy: {acc_rf_test:.4f}, F1-Macro: {f1_rf_test:.4f}")

print("[2/6] LGB Baseline...")
test_pred_lgb = lgb_baseline.predict(X_test)
acc_lgb_test = accuracy_score(y_test_encoded, test_pred_lgb)
f1_lgb_test = f1_score(y_test_encoded, test_pred_lgb, average='macro')
print(f"   Test Accuracy: {acc_lgb_test:.4f}, F1-Macro: {f1_lgb_test:.4f}")

print("[3/6] MLP Baseline...")
test_pred_mlp_proba = mlp_baseline.predict(x_test_scaled, verbose=0)
test_pred_mlp = np.argmax(test_pred_mlp_proba, axis=1)
acc_mlp_test = accuracy_score(y_test_encoded, test_pred_mlp)
f1_mlp_test = f1_score(y_test_encoded, test_pred_mlp, average='macro')
print(f"   Test Accuracy: {acc_mlp_test:.4f}, F1-Macro: {f1_mlp_test:.4f}")

# ========== SMOTE MODELS ==========
print("[4/6] RF + SMOTE...")
test_pred_rf_smote = rf_smote.predict(X_test)
acc_rf_smote_test = accuracy_score(y_test_encoded, test_pred_rf_smote)
f1_rf_smote_test = f1_score(y_test_encoded, test_pred_rf_smote, average='macro')
print(f"   Test Accuracy: {acc_rf_smote_test:.4f}, F1-Macro: {f1_rf_smote_test:.4f}")

print("[5/6] LGB + SMOTE...")
test_pred_lgb_smote = lgb_smote.predict(X_test)
acc_lgb_smote_test = accuracy_score(y_test_encoded, test_pred_lgb_smote)
f1_lgb_smote_test = f1_score(y_test_encoded, test_pred_lgb_smote, average='macro')
print(f"   Test Accuracy: {acc_lgb_smote_test:.4f}, F1-Macro: {f1_lgb_smote_test:.4f}")

print("[6/6] MLP + SMOTE...")
test_pred_mlp_smote_proba = mlp_smote.predict(x_test_scaled, verbose=0)
test_pred_mlp_smote = np.argmax(test_pred_mlp_smote_proba, axis=1)
acc_mlp_smote_test = accuracy_score(y_test_encoded, test_pred_mlp_smote)
f1_mlp_smote_test = f1_score(y_test_encoded, test_pred_mlp_smote, average='macro')
print(f"   Test Accuracy: {acc_mlp_smote_test:.4f}, F1-Macro: {f1_mlp_smote_test:.4f}")

print("\n All test predictions generated\n")


In [None]:
# SECTION 7.2: COMPREHENSIVE VALIDATION vs TEST COMPARISON

results_final = pd.DataFrame({
    'Model': [
        'RF_Baseline', 'LGB_Baseline', 'MLP_Baseline',
        'RF_SMOTE', 'LGB_SMOTE', 'MLP_SMOTE'
    ],
    'Val_Accuracy': [
        acc_rf_val, acc_lgb_val, acc_mlp_val,
        acc_rf_smote_val, acc_lgb_smote_val, acc_mlp_smote_val
    ],
    'Val_F1_Macro': [
        f1_rf_val, f1_lgb_val, f1_mlp_val,
        f1_rf_smote_val, f1_lgb_smote_val, f1_mlp_smote_val
    ],
    'Test_Accuracy': [
        acc_rf_test, acc_lgb_test, acc_mlp_test,
        acc_rf_smote_test, acc_lgb_smote_test, acc_mlp_smote_test
    ],
    'Test_F1_Macro': [
        f1_rf_test, f1_lgb_test, f1_mlp_test,
        f1_rf_smote_test, f1_lgb_smote_test, f1_mlp_smote_test
    ]
})

# Calcola gap (validation - test) per overfitting detection
results_final['Gap_Accuracy'] = results_final['Val_Accuracy'] - results_final['Test_Accuracy']
results_final['Gap_F1'] = results_final['Val_F1_Macro'] - results_final['Test_F1_Macro']

# Ordina per Test F1-Macro (metrica principale)
results_final = results_final.sort_values('Test_F1_Macro', ascending=False).reset_index(drop=True)

print("="*70)
print("COMPREHENSIVE RESULTS: VALIDATION vs TEST")
print("="*70 + "\n")
print(results_final.to_string(index=False))

results_final.to_csv(os.path.join(OUTPUT_DIR, '07_final_results_validation_vs_test.csv'), index=False)
print("\n Risultati salvati\n")

# Identifica best model
best_model_name = results_final.iloc[0]['Model']
best_f1_test = results_final.iloc[0]['Test_F1_Macro']
best_acc_test = results_final.iloc[0]['Test_Accuracy']

print("="*70)
print("üèÜ BEST MODEL IDENTIFIED")
print("="*70)
print(f"  Model:         {best_model_name}")
print(f"  Test F1-Macro: {best_f1_test:.4f}")
print(f"  Test Accuracy: {best_acc_test:.4f}")
print("="*70 + "\n")


In [None]:
# SECTION 7.3: VALIDATION vs TEST VISUALIZATION (Fixed xlim)

fig, axes = plt.subplots(2, 2, figsize=(16, 12))

colors_models = ['#2E86AB', '#A23B72', '#F18F01', '#2E86AB', '#A23B72', '#F18F01']

# 1) Validation Accuracy
bars1 = axes[0, 0].barh(results_final['Model'], results_final['Val_Accuracy'],
                         color=colors_models, alpha=0.8, edgecolor='black')
axes[0, 0].set_xlabel('Validation Accuracy', fontweight='bold', fontsize=11)
axes[0, 0].set_title('Validation Accuracy by Model', fontweight='bold', fontsize=12)
min_val_acc = results_final['Val_Accuracy'].min()
axes[0, 0].set_xlim([max(0, min_val_acc - 0.05), 1.0])  #  max(0, ...) evita negativi
axes[0, 0].grid(axis='x', alpha=0.3, linestyle='--')

# 2) Test Accuracy
bars2 = axes[0, 1].barh(results_final['Model'], results_final['Test_Accuracy'],
                         color=colors_models, alpha=0.8, edgecolor='black')
axes[0, 1].set_xlabel('Test Accuracy', fontweight='bold', fontsize=11)
axes[0, 1].set_title('Test Accuracy by Model', fontweight='bold', fontsize=12)
min_test_acc = results_final['Test_Accuracy'].min()
axes[0, 1].set_xlim([max(0, min_test_acc - 0.05), 1.0])  # 
axes[0, 1].grid(axis='x', alpha=0.3, linestyle='--')

# 3) Validation F1-Macro
bars3 = axes[1, 0].barh(results_final['Model'], results_final['Val_F1_Macro'],
                         color=colors_models, alpha=0.8, edgecolor='black')
axes[1, 0].set_xlabel('Validation F1-Macro', fontweight='bold', fontsize=11)
axes[1, 0].set_title('Validation F1-Macro by Model', fontweight='bold', fontsize=12)
min_val_f1 = results_final['Val_F1_Macro'].min()
axes[1, 0].set_xlim([max(0, min_val_f1 - 0.05), 1.0])  #  Ora mostra anche LGB_Baseline
axes[1, 0].grid(axis='x', alpha=0.3, linestyle='--')

# 4) Test F1-Macro
bars4 = axes[1, 1].barh(results_final['Model'], results_final['Test_F1_Macro'],
                         color=colors_models, alpha=0.8, edgecolor='black')
axes[1, 1].set_xlabel('Test F1-Macro', fontweight='bold', fontsize=11)
axes[1, 1].set_title('Test F1-Macro by Model (PRIMARY METRIC)', fontweight='bold', fontsize=12)
min_test_f1 = results_final['Test_F1_Macro'].min()
axes[1, 1].set_xlim([max(0, min_test_f1 - 0.05), 1.0])  # 
axes[1, 1].grid(axis='x', alpha=0.3, linestyle='--')

plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '07_validation_vs_test_comparison.png'), dpi=150, bbox_inches='tight')
plt.show()

print(" Validation vs Test comparison plots salvati\n")

In [None]:
# SECTION 7.4: BEST MODEL - DETAILED TEST SET ANALYSIS

print("="*70)
print(f"DETAILED ANALYSIS: {best_model_name} ON TEST SET")
print("="*70 + "\n")

# Ottieni predictions del best model
if best_model_name == 'RF_Baseline':
    best_test_pred = test_pred_rf
elif best_model_name == 'LGB_Baseline':
    best_test_pred = test_pred_lgb
elif best_model_name == 'MLP_Baseline':
    best_test_pred = test_pred_mlp
elif best_model_name == 'RF_SMOTE':
    best_test_pred = test_pred_rf_smote
elif best_model_name == 'LGB_SMOTE':
    best_test_pred = test_pred_lgb_smote
else:  # MLP_SMOTE
    best_test_pred = test_pred_mlp_smote

# Confusion Matrix
cm_best = confusion_matrix(y_test_encoded, best_test_pred)

fig, ax = plt.subplots(figsize=(12, 10))
sns.heatmap(
    cm_best, annot=True, fmt='d', cmap=cmap_custom,
    xticklabels=class_names, yticklabels=class_names,
    cbar_kws={'label': 'Count'}, ax=ax,
    linewidths=0.5, linecolor='gray',
    annot_kws={'fontsize': 9}
)
ax.set_xlabel('Predicted Label', fontweight='bold', fontsize=11)
ax.set_ylabel('True Label', fontweight='bold', fontsize=11)
ax.set_title(f'{best_model_name} ‚Äì Confusion Matrix (TEST SET)',
             fontweight='bold', fontsize=14)
plt.tight_layout()
plt.savefig(os.path.join(IMG_DIR, '07_best_model_test_confusion_matrix.png'), dpi=150, bbox_inches='tight')
plt.show()
print(" Best model confusion matrix salvata\n")

# Per-class metrics
report_best = classification_report(
    y_test_encoded, best_test_pred,
    target_names=class_names,
    output_dict=True
)
perclass_best = pd.DataFrame(report_best).T.iloc[:-3, :]
perclass_best['support'] = perclass_best['support'].astype(int)
perclass_best.index.name = 'Class'

print(f"Per-Class Metrics ({best_model_name} on TEST SET):")
print(perclass_best[['precision', 'recall', 'f1-score', 'support']].to_string())

perclass_best.to_csv(os.path.join(OUTPUT_DIR, '07_best_model_test_perclass_metrics.csv'))
print("\n Best model per-class metrics salvate\n")

print("="*70)
print("SECTION 7 COMPLETED")
print("="*70 + "\n")
