# Baseline Model Evaluation

**Goal:** Establish performance benchmarks for ML models

**Deliverables:**
- Heuristic baseline models (naive mean, feature-based)
- Target variable analysis
- Performance evaluation (RMSE, MAE, R²)
- Problem difficulty assessment

In [2]:
import pandas as pd
import numpy as np

y_train = pd.read_csv('../data/processed/y_train.csv')['ARRIVAL_DELAY']

print("="*60)
print("TARGET VARIABLE ANALYSIS")
print("="*60)
print(f"Mean:   {y_train.mean():.2f} min")
print(f"Std:    {y_train.std():.2f} min")
print(f"Median: {y_train.median():.2f} min")
print(f"Min:    {y_train.min():.2f} min")
print(f"Max:    {y_train.max():.2f} min")

print(f"\nPercentiles:")
print(f"25%: {y_train.quantile(0.25):.2f} min")
print(f"50%: {y_train.quantile(0.50):.2f} min")
print(f"75%: {y_train.quantile(0.75):.2f} min")
print(f"95%: {y_train.quantile(0.95):.2f} min")

# Koliko letova kasni >30 min?
print(f"\nFlights delayed >30 min: {(y_train > 30).sum() / len(y_train) * 100:.1f}%")
print(f"Flights delayed >60 min: {(y_train > 60).sum() / len(y_train) * 100:.1f}%")

TARGET VARIABLE ANALYSIS
Mean:   3.51 min
Std:    31.93 min
Median: -5.00 min
Min:    -87.00 min
Max:    167.00 min

Percentiles:
25%: -13.00 min
50%: -5.00 min
75%: 8.00 min
95%: 66.00 min

Flights delayed >30 min: 11.1%
Flights delayed >60 min: 5.6%


## Key Observations

1. **Skewed Distribution**: Median (-5 min) < Mean (3.51 min) indicates right skew
2. **High Variance**: STD (31.93 min) >> Mean (3.51 min) suggests high unpredictability
3. **Long Tail**: 5% of flights delayed >60 minutes (business-critical segment)

## Key Findings

### 1. RMSE ≈ STD → Problem is Hard
- Naive mean RMSE (32.06) ≈ STD (31.93) means random-guess performance
- Feature-based only 0.10 min better → destination alone insufficient

### 2. R² ≈ 0 → High Inherent Noise
- R² = 0.006 means only 0.6% variance explained
- Missing real-time factors: weather, ATC, mechanical issues

### 3. Realistic ML Expectations
- Linear Regression target: R² = 0.10-0.20 (15-30x improvement!)
- Random Forest target: R² = 0.15-0.30 
- Focus on **relative improvement**, not absolute R²

### 4. Business Value
- Even 10-20% RMSE reduction has significant operational benefits
- Better gate management, crew scheduling, passenger communication