# Model Training & Comparative Evaluation
**Goal:** Comparative evaluation of models trained on baseline data (`simple`) vs. enriched experimental data (`solid`).

**Methodology**
* **Experiment Design:** Parallel testing of two algorithms—XGBoost and Neural Networks—across both data variants.
* **Consistency:** Using specialized managers (`XgboostManager`, `ModelManager`).
* **Target:** Determining if advanced feature engineering consistently improves classification performance.

In [1]:
import sys
from pathlib import Path

data_path = Path.cwd().parent / "data"
simple_data_path = data_path / "data_simple_eda.csv"
solid_data_path = data_path / "data_solid_eda.csv"


### XGBoost Model Analysis
**Goal:** Evaluating tree-based model performance and the impact of the "Solid" feature set.

**Methodology**
* **Simple Variant:** Training on 7 baseline features to establish a performance benchmark (Accuracy: 0.80, F1: 0.61).
* **Solid Variant:** Training on 16 engineered features. This version showed an improvement in accuracy (0.81) and a reduction in False Positives (from 299 to 279).
* **Comparison:** Validating if non-linear features like product_density increase the model's precision in detecting churn.

In [2]:
current_dir = Path.cwd().parent / "src_tree"
if current_dir.exists():
    if str(current_dir) not in sys.path:
        sys.path.append(str(current_dir))

from xgboost_manager import XgboostManager


In [3]:
tree_manager = XgboostManager(simple_data_path, "churn")
tree_manager.prepare_data()
tree_manager.train_model()


--- Metryki dla XGBoost (expanded_eda=False) ---
accuracy : 0.801
f1_score : 0.6074950690335306
precision : 0.5074135090609555
recall : 0.7567567567567568
roc_auc : 0.8694835050767256

--- Macierz Pomyłek ---
[[1294  299]
 [  99  308]]


{'accuracy': 0.801,
 'f1_score': 0.6074950690335306,
 'precision': 0.5074135090609555,
 'recall': 0.7567567567567568,
 'roc_auc': 0.8694835050767256}

In [4]:
tree_manager = XgboostManager(solid_data_path, "churn", expanded_eda=True)
tree_manager.prepare_data()
tree_manager.train_model()


--- Metryki dla XGBoost (expanded_eda=True) ---
accuracy : 0.811
f1_score : 0.6197183098591549
precision : 0.524701873935264
recall : 0.7567567567567568
roc_auc : 0.8665861547217478

--- Macierz Pomyłek ---
[[1314  279]
 [  99  308]]


{'accuracy': 0.811,
 'f1_score': 0.6197183098591549,
 'precision': 0.524701873935264,
 'recall': 0.7567567567567568,
 'roc_auc': 0.8665861547217478}

### Neural Network Analysis
**Goal:** Testing a Deep Learning architecture (4-layer MLP) on both datasets.

**Methodology**
* **Architecture:** Using a robust network with Batch Normalization and Dropout (0.2) to ensure stability and prevent overfitting.
* **Simple Variant Results:** Unexpectedly high performance with 8.05 accuracy, suggesting the baseline features are highly efficient for this architecture.
* **Solid Variant Results:** Slight performance dip (0.787 accuracy), indicating that the expanded feature set might introduce noise for this specific neural network setup.

In [5]:
current_dir = Path.cwd().parent / "src_nn"
if current_dir.exists():
    if str(current_dir) not in sys.path:
        sys.path.append(str(current_dir))

from model_manager import ModelManager

In [6]:

manager = ModelManager(simple_data_path, "churn", 32) 
manager.preparation()
manager.train()


--- Metryki dla Neural Network (expanded_eda=False) ---
accuracy : 0.725
f1_score : 0.548440065681445
precision : 0.41133004926108374
recall : 0.8226600985221675
roc_auc : 0.871621412810354

--- Macierz Pomyłek ---
[[558 239]
 [ 36 167]]


{'accuracy': 0.725,
 'f1_score': 0.548440065681445,
 'precision': 0.41133004926108374,
 'recall': 0.8226600985221675,
 'roc_auc': 0.871621412810354}

In [7]:
manager = ModelManager(solid_data_path, "churn", 32) 
manager.preparation()
manager.train()


--- Metryki dla Neural Network (expanded_eda=False) ---
accuracy : 0.737
f1_score : 0.5534804753820034
precision : 0.422279792746114
recall : 0.8029556650246306
roc_auc : 0.8570377833130397

--- Macierz Pomyłek ---
[[574 223]
 [ 40 163]]


{'accuracy': 0.737,
 'f1_score': 0.5534804753820034,
 'precision': 0.422279792746114,
 'recall': 0.8029556650246306,
 'roc_auc': 0.8570377833130397}

### Statistical Benchmarking & Final Validation
**Goal:** Establishing a robust performance baseline by averaging multiple independent training cycles.

**Methodology**
* **N-Fold Averaging:** Each configuration is trained n=5 times to eliminate randomness.
* **Isolation:** A fresh `Manager` instance is created for every run.
* **Metric Aggregation:** Final performance is calculated as the mean of all metrics (Accuracy, F1, Precision, Recall, ROC AUC).

In [8]:
import pandas as pd

def run_5fold_benchmark(manager_type, data_path, name, **kwargs):
    results_list = []
    print(f"\n>>> Starting Benchmark: {name}")
    
    for i in range(5):
        if manager_type == "NN":
            manager = ModelManager(
                data_path, 
                kwargs.get("churn_col", "churn"), 
                kwargs.get("batch_size", 32),
                expanded_eda=kwargs.get("expanded_eda", False)
            )
            manager.preparation()
            res = manager.train()
        elif manager_type == "XGB":
            manager = XgboostManager(
                data_path, 
                kwargs.get("churn_col", "churn"), 
                expanded_eda=kwargs.get("expanded_eda", False)
            )
            manager.prepare_data()
            res = manager.train_model()
            
        results_list.append(res)
        print(f"Run {i+1}/5 completed.")

    df = pd.DataFrame(results_list)
    avg_results = df.mean().to_dict()
    avg_results['experiment'] = name
    return avg_results

In [9]:
summary_results = []

summary_results.append(run_5fold_benchmark("XGB", simple_data_path, "XGB_Simple", expanded_eda=False))
summary_results.append(run_5fold_benchmark("XGB", solid_data_path, "XGB_Solid", expanded_eda=True))
summary_results.append(run_5fold_benchmark("NN", simple_data_path, "NN_Simple", expanded_eda=False))
summary_results.append(run_5fold_benchmark("NN", solid_data_path, "NN_Solid", expanded_eda=True))



>>> Starting Benchmark: XGB_Simple

--- Metryki dla XGBoost (expanded_eda=False) ---
accuracy : 0.801
f1_score : 0.6074950690335306
precision : 0.5074135090609555
recall : 0.7567567567567568
roc_auc : 0.8694835050767256

--- Macierz Pomyłek ---
[[1294  299]
 [  99  308]]
Run 1/5 completed.

--- Metryki dla XGBoost (expanded_eda=False) ---
accuracy : 0.801
f1_score : 0.6074950690335306
precision : 0.5074135090609555
recall : 0.7567567567567568
roc_auc : 0.8694835050767256

--- Macierz Pomyłek ---
[[1294  299]
 [  99  308]]
Run 2/5 completed.

--- Metryki dla XGBoost (expanded_eda=False) ---
accuracy : 0.801
f1_score : 0.6074950690335306
precision : 0.5074135090609555
recall : 0.7567567567567568
roc_auc : 0.8694835050767256

--- Macierz Pomyłek ---
[[1294  299]
 [  99  308]]
Run 3/5 completed.

--- Metryki dla XGBoost (expanded_eda=False) ---
accuracy : 0.801
f1_score : 0.6074950690335306
precision : 0.5074135090609555
recall : 0.7567567567567568
roc_auc : 0.8694835050767256

--- Macier

In [10]:
comparison_df = pd.DataFrame(summary_results).set_index('experiment')
display(comparison_df.sort_values(by='recall', ascending=False))

Unnamed: 0_level_0,accuracy,f1_score,precision,recall,roc_auc
experiment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
NN_Solid,0.7352,0.559338,0.422507,0.827586,0.863767
NN_Simple,0.7452,0.567955,0.433215,0.824631,0.869802
XGB_Solid,0.811,0.619718,0.524702,0.756757,0.866586
XGB_Simple,0.801,0.607495,0.507414,0.756757,0.869484


### Final Results Overview
**Goal:** Comparing the averaged performance of all experimental variants to identify the optimal model-data configuration.

**Key Findings**
* **XGBoost Performance:** The `XGB_Solid` variant achieved the highest overall Accuracy (0.811) and F1-score (0.619), proving that tree-based models benefit significantly from the "Solid" feature set.
* **Neural Network behavior:** `NN_Simple` slightly outperformed the "Solid" variant in Accuracy and AUC, suggesting that for this specific MLP architecture, baseline features provide a cleaner training signal.
* **Feature Engineering Impact:** The "Solid" approach consistently improved Precision in XGBoost, effectively reducing the number of false positive predictions.
* **Recall dominance:** Neural Networks maintained a significantly higher Recall (~0.82) than XGBoost (~0.75), demonstrating a superior ability to identify potential churners.