# Model Comparison

## Purpose

The goal of this notebook is to summarize and compare all models to clarify the trade-offs between predictive accuracy, stability, and interpretability.

## Models Compared

The following models were trained and evaluated using the same processed dataset
and train–test splits:

- Linear Regression / Multiple Linear Regression
- Decision Tree Regressor
- Bagging Regressor
- XGBoost Regressor

Each model represents an increase in flexibility and complexity, allowing us to
observe how performance changes as modeling assumptions are relaxed.

## Overall Performance Comparison

Across models, predictive performance improved monotonically as model complexity increased.

- Linear regression served as a baseline and achieved an R² of approximately 0.60.
- The decision tree captured nonlinear patterns and improved R² to around 0.77 after pruning.
- Bagging further reduced variance and increased R² to approximately 0.82.
XGBoost achieved the strongest performance, with R² values close to 0.85 and the lowest RMSE.

This progression shows the importance of nonlinear modeling and ensemble methods
for airfare pricing data.

## Accuracy vs. Stability Trade-off

While more complex models generally improved accuracy, they also differed in stability.

- Linear regression was sensitive to outliers and required manual trimming of extreme values.
- Decision trees improved flexibility but exhibited high variance, making them sensitive
to tree depth and data splits.
- Bagging addressed this issue by averaging predictions across many trees trained on
bootstrap samples. The resulting model showed strong stability.
- XGBoost further improved accuracy through sequential error correction. Despite its higher complexity, residual plots and cross-validation curves showed
stable convergence and well-behaved errors, indicating strong generalization.

## Interpretability Considerations

Each model presents a different balance between interpretability and predictive power.

- Linear regression provides the most transparent interpretation through coefficients.
- Decision trees offer intuitive explanations and clear feature importance.
- Bagging retains partial interpretability through aggregated feature importance.
- XGBoost delivers the highest predictive accuracy but is less transparent due to
complex interactions and boosting dynamics.

As model complexity increases, interpretability gradually decreases, requiring a
trade-off depending on the intended use case.

## Consistency of Key Features

Despite differences in modeling approach, all models consistently identified
the same core drivers of flight prices:

- Flight duration
- Number of stops
- Airline
- Journey timing (day and month)

This consistency strengthens confidence in the learned pricing patterns and suggests
that these features capture fundamental aspects of airfare formation.

## Final Takeaways

- Model performance improves steadily as complexity increases.
- Ensemble methods substantially outperform single-model approaches.
- Bagging provides a strong balance between accuracy, stability, and interpretability.
- XGBoost offers the best overall predictive performance, particularly for practical
forecasting applications.
- Simpler models remain valuable for interpretability and understanding pricing structure.

Ultimately, the optimal model choice depends on whether the priority lies in
explainability or predictive accuracy.

In [1]:
import pandas as pd

pd.DataFrame({
    "Model": ["Linear Regression", "Decision Tree", "Bagging", "XGBoost"],
    "R2 (approx.)": [0.60, 0.77, 0.82, 0.85],
    "Primary Strength": [
        "Interpretability",
        "Nonlinear structure",
        "Stability",
        "Predictive accuracy"
    ]
})


Unnamed: 0,Model,R2 (approx.),Primary Strength
0,Linear Regression,0.6,Interpretability
1,Decision Tree,0.77,Nonlinear structure
2,Bagging,0.82,Stability
3,XGBoost,0.85,Predictive accuracy
