# ▸ Model Training & Performance Evaluation


This notebook covers model training for turbulence-risk classification using stratified 10-fold cross-validation. The focus is on building models that are both accurate and robust to class imbalance, and comparing multiple ML algorithms to identify the most effective one.


---


## 1. Why 10-Fold Stratified Cross-Validation?

In datasets with class imbalance (like this dataset in this project where severe turbulence is rare), stratified CV helps ensure that each fold maintains the original class distribution. This avoids overfitting to the majority class and gives a more realistic view of model performance.

Used a 10 folds to strike a balance between training time and robust performance metrics.

Each model was evaluated using:

- Accuracy
- Precision
- Recall
- F1-score

All results reflect the average across the 10 folds.

---

## 2. Training the Models

The following models were trained and evaluated:

- XGBoost (main model)
- Random Forest
- LightGBM
- CatBoost
- TabNet (deep learning)
- Logistic Regression
- Naive Bayes
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)

All models followed the same pipeline:

- Train using stratified CV
- Predict probabilities
- Convert to binary predictions using an optimal threshold (0.45 - chosen by best F1)
- Save the model and optimal threshold

Here’s how it looked in code:

```python
# For each model:
model, best_thresh = train_model_cv(X, y, model_type=model_name, threshold=0.45)
trained_models[model_name] = model
optimal_thresholds[model_name] = best_thrsh

```

---

## 3. Cross-Validation Performance
Models across Accuracy, Precision, Recall, and F1-score were compared to understand trade-offs.

**Performance Comparison Across Models**

![comparison_output_report](images/comparison_output_report.png)

- **XGBoost, Random Forest, LightGBM, and CatBoost** performed consistently well across all metrics.
- **TabNet** also gave strong results, especially in Recall, highlighting its strength on rare events.
- **Naive Bayes, KNN, and SVM** underperformed, especially in Precision and F1 - not ideal for imbalanced classification.

---

## 4. ROC-AUC Analysis
The ROC curve gives insight into the true positive vs false positive tradeoff at all thresholds. AUC (Area Under Curve) was used as a key comparison metric.

![roc_auc_curve](images/roc_auc_curve.png)

- **XGBoost, CatBoost, LightGBM, and Random Forest** all reached an AUC of ~0.97.
- **TabNet** also followed closely with an AUC of 0.96.
- **SVM** lagged behind significantly, making it the weakest fit for this task.

---

## 5. Why XGBoost?
XGBoost was chosen as the primary model based on:

- Consistent top-tier performance in all metrics
- Flexibility in tuning for class imbalance (via scale_pos_weight)
- Robustness to overfitting due to inbuilt regularization
- Fast training even on large feature sets

It gave the best balance of Recall and F1, which is critical for detecting rare but severe turbulence events without triggering too many false alarms.

---

## ▸ In conclusion
The final model ensemble highlights how different algorithms handle turbulence prediction differently. By training and validating using the same pipeline, a fair comparison was ensured.
- Business impact: Choosing the right model directly affects flight safety insights. A model with high Recall ensures fewer missed warnings, while strong Precision avoids unnecessary alerts.
- Technical reliability: Robust cross-validation ensures that our results generalize well and aren't artifacts of overfitting or sampling errors.

The top models from this step are carried forward into downstream testing and risk map visualizations.