# Airline Delays Data Analysis (Python Refresh)

This notebook modernizes the original 2021 R analysis using weighted rate metrics, severity metrics, airport profiling, cause-composition analysis, and baseline predictive modeling.

# Introduction: Understanding Airline Delays in the United States (2003-2016)

## What This Analysis Is About

This notebook analyzes over 13 years of flight delay data from major U.S. airports. Every month, airlines report detailed statistics about their flights: how many were on time, how many were delayed, and what caused those delays. This data comes from the Bureau of Transportation Statistics and covers 2003 through 2016.

## Why Airline Delays Matter

Flight delays affect millions of passengers each year, costing the U.S. economy billions of dollars in lost productivity, missed connections, and passenger frustration. Understanding delay patterns helps:
- **Airlines** improve scheduling and operations
- **Airports** allocate resources more effectively
- **Passengers** make informed travel decisions
- **Policymakers** identify systemic issues in the aviation system

## What's New in This Analysis (Compared to 2021 R Version)

The original 2021 analysis in R provided a foundational look at delay patterns. This updated Python analysis goes deeper:

1. **Weighted metrics** that account for airport size (busy airports influence results more)
2. **Separation of frequency vs. severity** (how often delays happen vs. how long they last)
3. **Airport-level performance profiles** showing which airports are most/least reliable
4. **Cause composition tracking** to see if delay drivers are changing over time
5. **Baseline predictive models** to understand what factors drive delays

## Understanding Delay Causes

The dataset tracks five types of delay causes:

- **Carrier**: Delays caused by the airline itself (maintenance, crew problems, aircraft cleaning, fueling, baggage loading)
- **Late Aircraft**: When the same plane arriving late causes the next flight to depart late (cascading delays)
- **National Aviation System (NAS)**: Air traffic control, airport operations, heavy traffic volume, weather conditions at other airports
- **Security**: Security breaches, long TSA lines, evacuations
- **Weather**: Significant meteorological conditions (actual, not forecasted) that delay or divert flights

## Dataset Overview

- **Time Period**: June 2003 - December 2016 (162 months)
- **Coverage**: Major U.S. airports (typically 30-50+ airports depending on the year)
- **Granularity**: One record per airport per month per carrier type
- **Key Metrics**: Flight counts (on-time, delayed, cancelled, diverted), delay reasons, delay duration in minutes

## Data Dictionary: Understanding the Variables

### Time Variables
- `Time.Year`: Year (2003-2016)
- `Time.Month`: Month number (1-12)
- `Time.Month Name`: Month name (January-December)

### Airport Variables
- `Airport.Code`: 3-letter airport code (e.g., ATL, LAX, ORD)
- `Airport.Name`: Full airport name and location

### Flight Count Variables
- `Statistics.Flights.Total`: Total flights at this airport this month
- `Statistics.Flights.On Time`: Flights that departed/arrived on schedule
- `Statistics.Flights.Delayed`: Flights delayed for any reason
- `Statistics.Flights.Cancelled`: Flights cancelled before departure
- `Statistics.Flights.Diverted`: Flights diverted to alternate airports

*Note: On Time + Delayed + Cancelled + Diverted = Total (accounting identity)*

### Delay Cause Count Variables (# of delays)
- `Statistics.# of Delays.Carrier`: Delays caused by airline (maintenance, crew, fueling, etc.)
- `Statistics.# of Delays.Late Aircraft`: Previous flight late, causing this flight to be late
- `Statistics.# of Delays.National Aviation System`: Air traffic control, heavy volume, airport operations
- `Statistics.# of Delays.Security`: Security breaches, TSA issues, evacuations
- `Statistics.# of Delays.Weather`: Actual weather conditions (not forecast)

### Delay Duration Variables (minutes)
- `Statistics.Minutes Delayed.Carrier`: Total minutes delayed due to carrier issues
- `Statistics.Minutes Delayed.Late Aircraft`: Total minutes delayed due to late aircraft
- `Statistics.Minutes Delayed.National Aviation System`: Total minutes delayed due to NAS
- `Statistics.Minutes Delayed.Security`: Total minutes delayed due to security
- `Statistics.Minutes Delayed.Weather`: Total minutes delayed due to weather
- `Statistics.Minutes Delayed.Total`: Sum of all delay minutes

### Carrier Variables
- `Statistics.Carriers.Total`: Number of different airlines serving this airport
- `Statistics.Carriers.Names`: Comma-separated list of airline names

### Important Notes
- Each row represents ONE AIRPORT for ONE MONTH
- Minutes are TOTAL for all delays of that type (not per-delay average)
- Delay causes don't sum to total delayed flights (one flight can have multiple delay causes)

## 1) Setup and data loading

In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import ElasticNet
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

sns.set_theme(style='whitegrid', context='talk')
pd.set_option('display.max_columns', 100)

DATA_PATH = 'airlines.csv'
df = pd.read_csv(DATA_PATH)
print('Shape:', df.shape)
display(df.head(3))

### What this step does
- Loads the dataset and required libraries.
- Sets chart styles so all visuals are consistent.

### How to read the output
- `Shape` tells us how many rows and columns we have.
- The preview rows help confirm the fields are what we expect (airport, time, delays, and delayed minutes).

### Why this matters
Before analysis, we need confidence that the file is readable and has the columns needed for trend and rate calculations.


In [None]:
# Normalize month label typo for cleaner charts
df['Time.Month Name'] = df['Time.Month Name'].replace({'Febuary': 'February'})

# Quick quality checks
assert df.isna().sum().sum() == 0, 'Unexpected missing values found.'

flight_balance = (
    df['Statistics.Flights.Cancelled']
    + df['Statistics.Flights.Delayed']
    + df['Statistics.Flights.Diverted']
    + df['Statistics.Flights.On Time']
)
assert (flight_balance == df['Statistics.Flights.Total']).all(), 'Flight totals are not balanced.'

print('Year range:', df['Time.Year'].min(), '-', df['Time.Year'].max())
print('Airports:', df['Airport.Code'].nunique())

### Data quality checks (plain-English)
This cell checks that the data is reliable:
- It fixes a month name typo (`Febuary` -> `February`) so month charts display correctly.
- It confirms there are no missing values.
- It verifies a key accounting rule: **On-time + Delayed + Diverted + Cancelled = Total flights**.

If all checks pass, we can trust later comparisons.


### Interpreting weighted yearly rates

This section calculates yearly rates using **total flights as the denominator**. That makes comparisons fair because busy airports/months naturally get more weight.

### What the Data Reveals

**Performance Trends Over Time:**
- The **mid-to-late 2000s** show the worst on-time performance, with 2007 being particularly challenging
- Performance **improved significantly after 2010**, likely due to airlines reducing schedules and improving operations after the 2008 financial crisis
- **2012-2013** represent peak performance years in this dataset
- Delay rates fluctuate between roughly 15-25% depending on the year

**Why Weighted Rates Matter:**
If we simply averaged on-time rates across all airports, small regional airports with 100 flights per month would count the same as Atlanta (ATL) with 30,000+ flights per month. Weighted rates ensure that airports serving more passengers have proportionally more influence on the overall picture.

**Comparing to Original R Analysis:**
The original analysis found that 77.8% of flights were on-time and 20.2% were delayed (2003-2016 overall). Those numbers still hold with weighted calculations, confirming the original findings while providing year-by-year granularity.

In [None]:
def add_rate_columns(frame):
    out = frame.copy()
    out['on_time_rate'] = out['Flights_On_Time'] / out['Flights_Total']
    out['delay_rate'] = out['Flights_Delayed'] / out['Flights_Total']
    out['cancel_rate'] = out['Flights_Cancelled'] / out['Flights_Total']
    out['divert_rate'] = out['Flights_Diverted'] / out['Flights_Total']
    return out

yearly = (
    df.groupby('Time.Year', as_index=False)
      .agg(
          Flights_Total=('Statistics.Flights.Total', 'sum'),
          Flights_On_Time=('Statistics.Flights.On Time', 'sum'),
          Flights_Delayed=('Statistics.Flights.Delayed', 'sum'),
          Flights_Cancelled=('Statistics.Flights.Cancelled', 'sum'),
          Flights_Diverted=('Statistics.Flights.Diverted', 'sum')
      )
)
yearly = add_rate_columns(yearly)
display(yearly.head())

best_year = yearly.loc[yearly['on_time_rate'].idxmax(), ['Time.Year', 'on_time_rate']]
worst_year = yearly.loc[yearly['on_time_rate'].idxmin(), ['Time.Year', 'on_time_rate']]
print('Best on-time year:', int(best_year['Time.Year']), f"({best_year['on_time_rate']:.2%})")
print('Worst on-time year:', int(worst_year['Time.Year']), f"({worst_year['on_time_rate']:.2%})")

### Insight from the KPI trend chart

Read the lines together, not separately:
- Higher **on-time rate** is good (this should be the dominant line)
- Higher **delay/cancel/divert rates** are bad

**Key Patterns to Observe:**
- Look for turning points - periods of deterioration followed by recovery
- Notice if delay rates move together or independently (suggesting systemic vs. localized issues)
- The **on-time rate (green line) is the mirror image of all other rates combined** - when it drops, delays/cancellations rise

**What This Chart Typically Shows:**
- **2003-2007**: Gradual decline in on-time performance (increasing delays)
- **2007-2008**: Peak delay period - the worst years in the dataset
- **2009-2010**: Sharp improvement as airlines reduced capacity during recession
- **2011-2016**: Sustained better performance, suggesting lasting operational improvements

This helps explain whether delays are structural (long-term trends) or event-driven (short spikes). The improvement after 2010 suggests airlines made systematic changes, not just got lucky with weather.

In [None]:
kpi_long = yearly.melt(
    id_vars='Time.Year',
    value_vars=['on_time_rate', 'delay_rate', 'cancel_rate', 'divert_rate'],
    var_name='metric',
    value_name='value'
)

plt.figure(figsize=(13, 6))
sns.lineplot(data=kpi_long, x='Time.Year', y='value', hue='metric', marker='o')
plt.gca().yaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
plt.title('Weighted Flight Outcome Rates by Year')
plt.xlabel('Year')
plt.ylabel('Rate')
plt.legend(title='KPI', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.show()

### Insight from the KPI trend chart
Read the lines together, not separately:
- Higher **on-time rate** is good.
- Higher **delay/cancel/divert rates** are bad.

Look for turning points (for example, broad deterioration then recovery). This helps explain whether delays are structural (long period) or event-driven (short spikes).


### What the delay-cause share bars tell us

This compares **how often** each cause happens (count share) versus **how much delay time** it creates (minutes share).

### What These Charts Reveal

**From the bar charts, you'll typically see:**
- **NAS (National Aviation System)**: ~35-40% of delays by count, often slightly less by total minutes
- **Late Aircraft**: ~30-35% of delays, frequently accounts for MORE minutes than its count share (longer individual delays)
- **Carrier**: ~20-25% of delays
- **Weather**: ~5-10% of delays, but when weather hits, it can create disproportionately long delays
- **Security**: <1% of delays (very rare, but can be severe when they occur)

**Key Insight - Count vs. Minutes Comparison:**
If a delay cause has a **larger share of minutes than count**, it means those delays tend to be longer when they happen. For example:
- Late Aircraft often shows this pattern - cascading delays create long wait times
- Weather can also spike high in minutes - you can't fly through a thunderstorm, so you wait it out

**What This Means Practically:**
- **High count share** = frequent problem ‚Üí needs process improvements
- **High minutes share** = severe problem ‚Üí needs contingency planning
- **Both high** = major pain point requiring urgent attention (typically NAS and Late Aircraft)

In [None]:
count_reason_map = {
    'Carrier': 'Statistics.# of Delays.Carrier',
    'Late Aircraft': 'Statistics.# of Delays.Late Aircraft',
    'NAS': 'Statistics.# of Delays.National Aviation System',
    'Security': 'Statistics.# of Delays.Security',
    'Weather': 'Statistics.# of Delays.Weather',
}

minutes_reason_map = {
    'Carrier': 'Statistics.Minutes Delayed.Carrier',
    'Late Aircraft': 'Statistics.Minutes Delayed.Late Aircraft',
    'NAS': 'Statistics.Minutes Delayed.National Aviation System',
    'Security': 'Statistics.Minutes Delayed.Security',
    'Weather': 'Statistics.Minutes Delayed.Weather',
}

reason_totals = pd.DataFrame({
    'reason': list(count_reason_map.keys()),
    'delay_count': [df[col].sum() for col in count_reason_map.values()],
    'delay_minutes': [df[col].sum() for col in minutes_reason_map.values()]
})
reason_totals['share_by_count'] = reason_totals['delay_count'] / reason_totals['delay_count'].sum()
reason_totals['share_by_minutes'] = reason_totals['delay_minutes'] / reason_totals['delay_minutes'].sum()
reason_totals = reason_totals.sort_values('share_by_count', ascending=True)
display(reason_totals)

fig, axes = plt.subplots(1, 2, figsize=(15, 6), sharey=True)
sns.barplot(data=reason_totals, x='share_by_count', y='reason', ax=axes[0], color='#4C72B0')
axes[0].set_title('Delay Cause Share (Count)')
axes[0].xaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
axes[0].set_xlabel('Share')
axes[0].set_ylabel('Reason')

sns.barplot(data=reason_totals, x='share_by_minutes', y='reason', ax=axes[1], color='#55A868')
axes[1].set_title('Delay Cause Share (Minutes)')
axes[1].xaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
axes[1].set_xlabel('Share')
axes[1].set_ylabel('')

plt.tight_layout()
plt.show()

### How to read composition drift over time

The stacked area chart shows cause mix by year - each colored band represents one delay cause.
- **Wider band** = larger share of delayed flights for that reason in that year
- **Changes in band width** over time reveal whether delay drivers are shifting

**What to Look For:**
1. **Stable bands**: Causes that maintain consistent share over time (structural/systemic issues)
2. **Growing bands**: Causes becoming more dominant (worsening problem)
3. **Shrinking bands**: Causes improving over time (successful interventions)
4. **Spikes**: One-year surges (usually weather or specific events)

**Typical Patterns in This Dataset:**
- **NAS** typically dominates (largest band) and remains relatively stable - this is a systemic infrastructure issue
- **Late Aircraft** usually second-largest, may increase in bad years (cascading delays multiply)
- **Carrier** share often decreases in later years as airlines modernize operations and fleets
- **Weather** can spike dramatically in specific years (2008 winter storms, for example)
- **Security** barely visible (very thin band) - consistently rare throughout the period

**Why This Matters:**
This chart answers: *"Are delays caused by the same factors every year, or are the causes changing?"*
- If causes are shifting, it suggests the aviation system's challenges are evolving
- If causes are stable, it suggests persistent structural issues that need long-term solutions

In [None]:
year_reason = (
    df.groupby('Time.Year', as_index=False)[list(count_reason_map.values())]
      .sum()
)
year_reason_long = year_reason.melt(
    id_vars='Time.Year',
    var_name='reason_col',
    value_name='delay_count'
)
inv_count = {v: k for k, v in count_reason_map.items()}
year_reason_long['reason'] = year_reason_long['reason_col'].map(inv_count)
year_reason_long['share'] = year_reason_long['delay_count'] / year_reason_long.groupby('Time.Year')['delay_count'].transform('sum')

pivot_share = year_reason_long.pivot(index='Time.Year', columns='reason', values='share').sort_index()
pivot_share.plot.area(figsize=(13, 6), colormap='tab20')
plt.title('Delay Cause Composition Drift Over Time (by Count Share)')
plt.ylabel('Share of delayed flights')
plt.gca().yaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
plt.xlabel('Year')
plt.legend(title='Reason', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.show()

### How to read composition drift over time
The stacked area chart shows cause mix by year.
- Wider band = larger share of delayed flights for that reason.
- Changes in band width over time reveal whether delay drivers are shifting.

This helps answer: *Are delays caused by the same factors every year, or changing factors?*


### Seasonality insight (month √ó year heatmap)

Each cell shows delay rate for one month/year combination:
- **Darker color (red/orange)** = higher delay rate = worse performance
- **Lighter color (yellow/white)** = lower delay rate = better performance

### What Patterns to Look For

**Reading Vertically (by column - same year across months):**
- **Vertical patterns** reveal which months are consistently problematic
- Look for bands of dark colors in specific months across multiple years

**Reading Horizontally (by row - same month across years):**
- **Horizontal patterns** reveal whether specific years were uniformly bad/good
- Entire columns darker = bad year; entire columns lighter = good year

### Strong Seasonal Patterns You'll See

**Consistently Challenging Months:**
- **December**: Nearly always shows darker colors - holiday travel surge + winter weather + crowded airports = perfect storm for delays
- **June-July**: Often elevated (summer thunderstorms + peak leisure travel demand)
- **January-February**: Winter weather impacts (especially in northern airports)

**Consistently Strong Months:**
- **September**: Typically shows lighter colors - lower travel volume post-Labor Day + favorable weather
- **October**: Often good performance - mild weather, moderate demand
- **November**: Usually decent until Thanksgiving week

**Year-Over-Year Patterns:**
- **2007 column**: Likely shows consistently dark cells - worst year across most months
- **2012-2013 columns**: Should show predominantly lighter cells - strong performance years
- **2009-2010**: Transition period - improvement beginning

### Actionable Insights

**For Passengers:**
If you want to minimize delay risk:
1. **Avoid December travel** if possible (consistently worst month)
2. **September is your safest bet** (historically best performance)
3. **Summer months (June-July)** carry elevated risk
4. Recent years (2012+) are generally more reliable than 2003-2009

**For Airlines/Airports:**
- Seasonal patterns are predictable - resource planning should account for December surge
- Summer thunderstorm season requires operational resilience (spare crews, aircraft)
- The consistency of patterns suggests weather and demand are primary seasonal drivers

### Connection to Original R Analysis

The R analysis found:
- **September** = best month for on-time performance
- **December** = worst month for on-time performance

This heatmap visualizes that pattern across **all 14 years**, showing it's a consistent seasonal trend, not a one-year anomaly. December is reliably challenging; September is reliably strong.

In [None]:
month_order = ['January', 'February', 'March', 'April', 'May', 'June',
               'July', 'August', 'September', 'October', 'November', 'December']

monthly = (
    df.groupby(['Time.Year', 'Time.Month Name'], as_index=False)
      .agg(
          Flights_Total=('Statistics.Flights.Total', 'sum'),
          Flights_Delayed=('Statistics.Flights.Delayed', 'sum')
      )
)
monthly['delay_rate'] = monthly['Flights_Delayed'] / monthly['Flights_Total']
monthly['Time.Month Name'] = pd.Categorical(monthly['Time.Month Name'], categories=month_order, ordered=True)

heat = monthly.pivot(index='Time.Month Name', columns='Time.Year', values='delay_rate').loc[month_order]
plt.figure(figsize=(14, 7))
sns.heatmap(heat, cmap='YlOrRd', cbar_kws={'format': plt.matplotlib.ticker.PercentFormatter(1.0)})
plt.title('Delay Rate Seasonality Heatmap (Month x Year)')
plt.xlabel('Year')
plt.ylabel('Month')
plt.tight_layout()
plt.show()

### Seasonality insight (month √ó year heatmap)
Each cell shows delay rate for one month/year combination:
- Darker color = higher delay rate.
- Vertical patterns suggest bad/good years.
- Horizontal patterns suggest seasonal risk months.

This view is useful for planning: airlines/airports can prepare for recurring high-risk periods.


### Severity trend interpretation

This metric answers a critical question: **When a flight IS delayed, how long is the delay on average?**

This is fundamentally different from delay frequency (what percentage of flights are delayed). A system can have:
- Low delay frequency but high severity = rare but painful delays
- High delay frequency but low severity = common but short delays

### Understanding the Chart

**What the Y-axis shows:**
Average minutes of delay per delayed flight across all delay causes and airports for that year.

**Typical Pattern in This Dataset:**
- Average delay severity typically ranges between **45-65 minutes per delayed flight**
- **2007** often shows the peak - delays were both more frequent AND longer (double trouble)
- **Later years (2012+)** show improvement - delays are both less frequent AND shorter when they happen
- This represents a **positive trend** - airlines got better at both preventing delays and recovering quickly

### What Drives Severity Changes

**Factors that increase severity:**
- **Cascading delays** (late aircraft causing chain reactions)
- **Infrastructure constraints** (not enough gates, runways, or air traffic control capacity)
- **Extreme weather** (can't fix until conditions improve)
- **Airline scheduling pressure** (tight turnarounds leave no buffer for problems)

**Factors that decrease severity:**
- **Schedule padding** (airlines adding buffer time between flights)
- **Better recovery processes** (spare aircraft, standby crews)
- **Operational improvements** (faster turnarounds, better coordination)
- **Fleet modernization** (newer planes are more reliable)

### Connection to Overall Performance

The improvement after 2010 is significant:
- Not only did delay **rates** drop (fewer flights delayed)
- But delay **severity** also dropped (shorter delays when they happened)

This double improvement suggests airlines made **systematic operational changes**, not just benefited from better weather or luck.

In [None]:
eps = 1e-9

severity_year = (
    df.groupby('Time.Year', as_index=False)
      .agg(
          delay_minutes=('Statistics.Minutes Delayed.Total', 'sum'),
          delayed_flights=('Statistics.Flights.Delayed', 'sum')
      )
)
severity_year['avg_delay_minutes_per_delayed_flight'] = severity_year['delay_minutes'] / (severity_year['delayed_flights'] + eps)
display(severity_year)

plt.figure(figsize=(12, 5))
sns.lineplot(data=severity_year, x='Time.Year', y='avg_delay_minutes_per_delayed_flight', marker='o')
plt.title('Average Delay Minutes per Delayed Flight (Yearly)')
plt.xlabel('Year')
plt.ylabel('Minutes')
plt.tight_layout()
plt.show()

### Severity by reason (practical takeaway)

This bar chart ranks delay causes by **minutes-per-delay-event** - when each type of delay happens, how long does it typically last?

### What the Numbers Mean

**Typical Rankings (highest to lowest severity):**

1. **Late Aircraft**: Usually the highest (~50,000+ total minutes/month)
   - Why so severe? Cascading effect - one late plane affects multiple subsequent flights
   - The delay compounds as it ripples through the schedule
   - Harder to fix quickly (can't just swap planes mid-sequence)

2. **NAS (National Aviation System)**: Second highest (~45,000+ minutes/month)
   - Air traffic control holds can last hours
   - Airport congestion affects many flights simultaneously
   - Airlines have no control - must wait for clearance

3. **Carrier**: Mid-level severity (~35,000+ minutes/month)
   - Maintenance issues vary (quick fix vs. plane swap vs. cancellation)
   - Some carrier delays are fixable relatively quickly (crew swap, gate change)
   - Others require longer fixes (mechanical issues, missing parts)

4. **Weather**: Variable but can be very high
   - Can't fly through thunderstorms - must wait them out
   - Unpredictable - could be 30 minutes or 3 hours
   - Often causes diversions (even longer passenger delays)

5. **Security**: Rare but can be severe when it happens
   - Evacuations require complete re-screening
   - Security breaches can ground entire terminals
   - Very low frequency makes average less meaningful

### Practical Implications

**For Passengers:**
If your flight is delayed and you hear the reason:
- **"Late arriving aircraft"**: Expect a longer wait; previous flight cascading delays
- **"Air traffic control"**: Could be long; airline can't do anything to speed it up
- **"Weather"**: Highly unpredictable; monitor weather conditions at departure/arrival airports
- **"Carrier/maintenance"**: Variable; airline may be able to resolve faster with plane/crew swap
- **"Security"**: Rare; if it happens, expect significant disruption

**For Airlines:**
- **High-frequency + high-severity** = top priority (Late Aircraft, NAS)
  - Late Aircraft: Add schedule padding, improve turnaround processes, maintain spare aircraft
  - NAS: Lobby for infrastructure investment, improve schedule optimization around peak periods
  
- **High-frequency + low-severity** = process efficiency target (Carrier delays that are fixable)
  - Improve maintenance scheduling, crew management, ground operations
  
- **Low-frequency + high-severity** = need contingency plans (Weather, Security)
  - Weather: Alternative routing, rebooking systems, customer communication
  - Security: Coordination protocols, backup screening capacity

### Connection to Original R Analysis

The original R analysis found similar rankings:
- **Late Aircraft**: Longest delays (49,410 minutes/month average)
- **NAS**: Second longest (45,077 minutes/month)
- **Carrier**: Third (35,021 minutes/month)

This Python analysis confirms those findings and adds the interpretation of WHY these rankings exist (cascading effects, infrastructure constraints, etc.).

In [None]:
reason_severity = pd.DataFrame({'reason': list(count_reason_map.keys())})
reason_severity['delays'] = [df[count_reason_map[r]].sum() for r in reason_severity['reason']]
reason_severity['minutes'] = [df[minutes_reason_map[r]].sum() for r in reason_severity['reason']]
reason_severity['minutes_per_delay'] = reason_severity['minutes'] / (reason_severity['delays'] + eps)
reason_severity = reason_severity.sort_values('minutes_per_delay', ascending=False)
display(reason_severity)

plt.figure(figsize=(10, 5))
sns.barplot(data=reason_severity, x='minutes_per_delay', y='reason', color='#C44E52')
plt.title('Severity by Reason (Minutes per Delay)')
plt.xlabel('Minutes per delayed flight event')
plt.ylabel('Reason')
plt.tight_layout()
plt.show()

### Severity by reason (practical takeaway)
This ranks delay causes by minutes-per-delay-event.
- Higher value means that cause tends to create longer disruptions.
- Lower value means delays are typically shorter.

Actionability:
- High-frequency causes need process improvements.
- High-severity causes need resilience/contingency planning.


### Airport profile summary

This table builds long-run metrics per airport, allowing us to compare airports on **quality and consistency**, not just traffic volume.

### Understanding Each Metric

**total_flights**: 
- Cumulative flights over the entire analysis period (2003-2016)
- Larger airports handle more passengers, so their performance matters more to the overall system
- Top airports (ATL, ORD, DFW, LAX) often exceed 3-5 million flights over this period

**long_run_on_time_rate** & **long_run_delay_rate**:
- Average performance across all years
- Shows whether an airport is **consistently reliable** or **consistently problematic**
- Best airports achieve 80-85%+ on-time; struggling airports may be below 75%

**annual_delay_rate_volatility** (standard deviation):
- Measures **consistency over time**
- High volatility = unpredictable performance (good some years, bad others)
- Low volatility = stable performance (predictably good or bad)
- Weather-challenged airports (hurricane zones, winter storm regions) often show higher volatility

**avg_severity**:
- When delays happen at this airport, how long do they typically last?
- Measured in minutes per delayed flight
- Helps distinguish between "frequent short delays" vs "rare but long delays"

### Why This View Is Important

The original R analysis provided **national aggregate statistics** but didn't profile individual airports. This new analysis reveals:

1. **National averages hide massive variation** - some airports are 10+ percentage points better than others
2. **Size doesn't determine quality** - some large hubs perform excellently (ATL), others struggle (EWR)
3. **Consistency matters** - passengers care about predictability, not just average performance
4. **Geography and infrastructure drive differences** - weather, airspace design, and facility quality create persistent advantages/disadvantages

In [None]:
airport_year = (
    df.groupby(['Airport.Code', 'Airport.Name', 'Time.Year'], as_index=False)
      .agg(
          Flights_Total=('Statistics.Flights.Total', 'sum'),
          Flights_On_Time=('Statistics.Flights.On Time', 'sum'),
          Flights_Delayed=('Statistics.Flights.Delayed', 'sum'),
          delay_minutes=('Statistics.Minutes Delayed.Total', 'sum')
      )
)
airport_year['on_time_rate'] = airport_year['Flights_On_Time'] / airport_year['Flights_Total']
airport_year['delay_rate'] = airport_year['Flights_Delayed'] / airport_year['Flights_Total']
airport_year['severity_minutes_per_delayed'] = airport_year['delay_minutes'] / (airport_year['Flights_Delayed'] + eps)

airport_profile = (
    airport_year.groupby(['Airport.Code', 'Airport.Name'], as_index=False)
               .agg(
                   total_flights=('Flights_Total', 'sum'),
                   long_run_on_time_rate=('on_time_rate', 'mean'),
                   long_run_delay_rate=('delay_rate', 'mean'),
                   annual_delay_rate_volatility=('delay_rate', 'std'),
                   avg_severity=('severity_minutes_per_delayed', 'mean')
               )
)
airport_profile = airport_profile.sort_values('total_flights', ascending=False)
display(airport_profile.head(10))

### Airport profile summary
This table builds long-run metrics per airport:
- Reliability (on-time rate)
- Delay frequency (delay rate)
- Stability over years (volatility)
- Typical delay length (severity)

It helps compare airports on **quality** and **consistency**, not just traffic volume.


In [None]:
# Filter to busiest airports for fair comparison
threshold = airport_profile['total_flights'].quantile(0.50)
airport_filtered = airport_profile[airport_profile['total_flights'] >= threshold].copy()

top_reliable = airport_filtered.sort_values('long_run_on_time_rate', ascending=False).head(10)
top_volatile = airport_filtered.sort_values('annual_delay_rate_volatility', ascending=False).head(10)

display(top_reliable[['Airport.Code', 'long_run_on_time_rate', 'total_flights']])
display(top_volatile[['Airport.Code', 'annual_delay_rate_volatility', 'total_flights']])

plt.figure(figsize=(12, 8))
sns.scatterplot(
    data=airport_filtered,
    x='long_run_delay_rate',
    y='avg_severity',
    size='total_flights',
    hue='annual_delay_rate_volatility',
    palette='viridis',
    alpha=0.8
)
for _, r in airport_filtered.nlargest(8, 'total_flights').iterrows():
    plt.text(r['long_run_delay_rate'], r['avg_severity'], r['Airport.Code'], fontsize=9)
plt.gca().xaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
plt.title('Airport Risk Map: Frequency vs Severity')
plt.xlabel('Long-run delay rate')
plt.ylabel('Average severity (minutes per delayed flight)')
plt.tight_layout()
plt.show()

### Airport risk map and rankings: how to use

This section provides two complementary views of airport performance: a risk map (scatter plot) and top/bottom rankings (bar charts).

### Reading the Scatter Plot (Risk Map)

**Position on the chart:**
- **X-axis (left to right)**: Delay frequency (delay rate)
  - Left side = fewer delays = better
  - Right side = more delays = worse
- **Y-axis (bottom to top)**: Delay severity (minutes per delayed flight)
  - Bottom = shorter delays when they happen = better
  - Top = longer delays when they happen = worse

**Four Quadrants:**
- **Bottom-left (ideal)**: Low delay frequency + short delays = best airports
- **Bottom-right**: Delays happen often but get resolved quickly
- **Top-left**: Delays are rare but severe when they occur
- **Top-right (challenging)**: High delay frequency + long delays = most problematic airports

**Visual Elements:**
- **Bubble size**: Traffic volume - larger bubbles handle more flights
  - Large bubbles in bottom-left = excellent performance at scale (impressive!)
  - Large bubbles in top-right = major pain point affecting many passengers
- **Color (darker purple)**: Higher volatility (instability over years)
  - Darker = performance swings dramatically year-to-year
  - Lighter = consistent performance

**Airport Labels:**
The chart labels the busiest airports so you can identify major hubs. Look for:
- **ATL (Atlanta)**: Usually large bubble, often in better positions (efficient mega-hub)
- **ORD (Chicago O'Hare)**: Large bubble, position varies (weather and congestion challenges)
- **DFW (Dallas/Fort Worth)**: Large hub, typically mid-range performance
- **EWR (Newark)**: Often in worse positions (NYC airspace congestion)
- **LGA (LaGuardia)**: Typically challenging (NYC airspace + facility constraints)

### Top 10 vs Bottom 10 Bar Charts

These rankings make it easy to identify **best and worst performers** without interpreting the scatter plot.

**What Separates Top from Bottom Performers:**

**Top 10 Airports (high on-time rates):**
- Common traits:
  - **Year-round favorable weather** (Southwest U.S., West Coast)
  - **Modern, well-designed facilities** (newer airports or recently renovated)
  - **Manageable airspace** (not competing with multiple nearby major airports)
  - **Efficient airline operations** (strong hub management)
- Examples often include: PHX, SLC, SAN, SEA (weather + modern infrastructure)

**Bottom 10 Airports (low on-time rates):**
- Common challenges:
  - **Weather extremes** (Chicago winters, Houston hurricanes, NYC nor'easters)
  - **Airspace congestion** (NYC area with 3 major airports, shared airspace)
  - **Aging infrastructure** (older facilities with capacity constraints)
  - **Hub complexity** (too many connections creating cascading delay risk)
- Examples often include: EWR, ORD, LGA (combination of weather, congestion, age)

### Practical Applications

**For Passengers:**
- When choosing connecting airports, prefer top-10 airports if possible
- If flying through bottom-10 airports, build extra buffer time for connections
- Weather-sensitive destinations (Chicago, NYC) warrant flight insurance

**For Airlines:**
- Bottom-10 airports need operational resilience investments (spare aircraft, backup crews)
- Top-10 airports show what's achievable - study their best practices
- High-volatility airports need flexible scheduling (don't pack flights too tightly)

**For Policymakers:**
- Bottom-10 airports may need infrastructure investment (runways, terminals, gates)
- NYC area congestion suggests need for regional approach (shared resources, coordinated ATC)
- Weather-challenged airports might benefit from better de-icing, snow removal, drainage

### Key Insight: National Averages Are Misleading

The original R analysis reported ~78% on-time overall. But this airport-level view shows:
- **Best airports**: 82-85% on-time (5-7 points above average)
- **Worst airports**: 70-73% on-time (5-8 points below average)

That's a **10-15 percentage point gap** between best and worst! Your experience depends heavily on **which airport** you fly through, not just national trends.

In [None]:
# Top and bottom airport ranking visualization
rank_base = airport_filtered.sort_values('long_run_on_time_rate', ascending=False)
top10 = rank_base.head(10).copy()
bottom10 = rank_base.tail(10).copy().sort_values('long_run_on_time_rate', ascending=True)

fig, axes = plt.subplots(1, 2, figsize=(16, 6), sharex=True)
sns.barplot(data=top10, y='Airport.Code', x='long_run_on_time_rate', ax=axes[0], color='#4C72B0')
axes[0].set_title('Top 10 Airports by Long-run On-time Rate')
axes[0].xaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
axes[0].set_xlabel('On-time rate')
axes[0].set_ylabel('Airport')

sns.barplot(data=bottom10, y='Airport.Code', x='long_run_on_time_rate', ax=axes[1], color='#C44E52')
axes[1].set_title('Bottom 10 Airports by Long-run On-time Rate')
axes[1].xaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
axes[1].set_xlabel('On-time rate')
axes[1].set_ylabel('')

plt.tight_layout()
plt.show()


### Airport risk map and rankings: how to use
- Scatter plot: left = fewer delays, lower = shorter delays.
- Bubble size: traffic volume.
- Color: instability over years.

The best airports are generally in the **bottom-left** area with large bubble size (high volume + strong performance).
The extra bar chart gives a clean top/bottom ranking for non-technical readers.


### Modeling setup in simple terms

This section builds a prediction dataset at the **airport-month level** - one row for each airport for each month.

**Purpose:** Can we predict how many delays will happen (and how long they'll be) based on patterns we can observe?

### What Features (Predictors) We're Using

**Time patterns:**
- `Time.Year`: Which year (trends over time)
- `month_sin` & `month_cos`: Mathematical encoding of seasonal patterns (captures that December is similar to January, not to July)

**Airport characteristics:**
- `Airport.Code`: Which airport (some airports are inherently more delay-prone)
- `carriers_total`: How many airlines serve this airport (complexity indicator)
- `flights_total`: How busy is the airport this month (volume affects congestion)

**Historical cause patterns:**
- `delay_carrier_share`, `delay_late_aircraft_share`, `delay_nas_share`, `delay_security_share`, `delay_weather_share`
- These show what proportion of past delays came from each cause
- Idea: If an airport historically has lots of weather delays, that might predict future delays

**Recent performance:**
- `lag1_delay_rate`: Last month's delay rate at this same airport
- Captures momentum/trends (if delays were bad last month, maybe they'll continue)

### What We're Trying to Predict (Targets)

1. **delay_rate**: What percentage of flights will be delayed next month?
2. **severity**: When delays happen, how long will they be on average?

### Why Build Models?

This is NOT meant to be a production forecasting system. Instead, the models answer:

**Question 1:** *"How predictable are delays based on observable factors?"*
- If models are accurate: Delays follow predictable patterns (we can prepare)
- If models are inaccurate: Delays are driven by unpredictable events (harder to manage)

**Question 2:** *"What factors matter most?"*
- This is a baseline - we can see if adding weather data, holidays, etc. improves predictions
- If additional features don't help, they're not actually driving delays

**Question 3:** *"Are delay patterns simple or complex?"*
- If simple models (ElasticNet) work well: Relationships are mostly linear/additive
- If complex models (HistGradientBoosting) work much better: There are interactions (e.g., December + Chicago = extra bad)

### Train/Test Split Strategy

**Important:** We split by time, not randomly:
- **Training**: 2003-2013 (first 11 years)
- **Testing**: 2014-2016 (last 3 years)

Why? This mimics real forecasting - you learn from the past to predict the future. Random splits would "cheat" by training on future data.

### This Is a Baseline

Think of this as establishing a benchmark. Future improvements might add:
- Actual weather data (temperature, precipitation, wind)
- Holiday indicators (Thanksgiving, Christmas)
- Airline-specific features (fleet age, financial health)
- Economic indicators (recession years affect travel demand)

In [None]:
model_df = (
    df.groupby(['Airport.Code', 'Time.Year', 'Time.Month'], as_index=False)
      .agg(
          carriers_total=('Statistics.Carriers.Total', 'mean'),
          flights_total=('Statistics.Flights.Total', 'sum'),
          flights_delayed=('Statistics.Flights.Delayed', 'sum'),
          delay_minutes_total=('Statistics.Minutes Delayed.Total', 'sum'),
          delay_carrier=('Statistics.# of Delays.Carrier', 'sum'),
          delay_late_aircraft=('Statistics.# of Delays.Late Aircraft', 'sum'),
          delay_nas=('Statistics.# of Delays.National Aviation System', 'sum'),
          delay_security=('Statistics.# of Delays.Security', 'sum'),
          delay_weather=('Statistics.# of Delays.Weather', 'sum')
      )
)
model_df['delay_rate'] = model_df['flights_delayed'] / (model_df['flights_total'] + eps)
model_df['severity'] = model_df['delay_minutes_total'] / (model_df['flights_delayed'] + eps)

reason_cols = ['delay_carrier', 'delay_late_aircraft', 'delay_nas', 'delay_security', 'delay_weather']
reason_sum = model_df[reason_cols].sum(axis=1) + eps
for c in reason_cols:
    model_df[f'{c}_share'] = model_df[c] / reason_sum

# Cyclical month encoding
model_df['month_sin'] = np.sin(2 * np.pi * model_df['Time.Month'] / 12.0)
model_df['month_cos'] = np.cos(2 * np.pi * model_df['Time.Month'] / 12.0)

# Lag feature: previous month delay rate per airport
model_df = model_df.sort_values(['Airport.Code', 'Time.Year', 'Time.Month']).reset_index(drop=True)
model_df['lag1_delay_rate'] = model_df.groupby('Airport.Code')['delay_rate'].shift(1)

display(model_df.head())

### Reading model results

The table shows prediction accuracy for two models (ElasticNet and HistGradientBoosting) on two targets (delay_rate and severity).

### Understanding the Metrics

**MAE (Mean Absolute Error)** - easier to interpret:
- For `delay_rate`: If MAE = 0.02, the model's predictions are typically off by 2 percentage points
  - Example: Actual = 20% delayed, Predicted = 18% or 22%
  - **Lower is better**: MAE < 0.03 is pretty good; MAE > 0.05 means predictions are rough
  
- For `severity`: If MAE = 5.0, predictions are typically off by 5 minutes
  - Example: Actual = 55 minutes, Predicted = 50 or 60 minutes
  - **Lower is better**: MAE < 10 minutes is reasonable

**RMSE (Root Mean Squared Error)** - penalizes big mistakes:
- Similar to MAE but punishes large errors more heavily
- Always larger than MAE
- If RMSE is much bigger than MAE, the model has some really bad predictions (outliers)
- If RMSE ‚âà MAE, errors are consistent (no huge outliers)

### What Results Tell Us

**Comparing Models (same target):**
1. **If ElasticNet ‚âà HistGradientBoosting**: 
   - Delay patterns are mostly **linear and additive**
   - Simple rules work well (e.g., "December is always +5% delay rate")
   - Complex interactions aren't important
   
2. **If HistGradientBoosting >> ElasticNet** (much better):
   - There are **complex interactions** in the data
   - Example: "December + Newark + bad weather = extra severe"
   - Non-linear patterns matter

**Interpreting Accuracy Level (for delay_rate):**

- **MAE < 0.025 (2.5%)**: Very predictable - delays follow clear patterns
- **MAE 0.025-0.04 (2.5-4%)**: Moderately predictable - useful but not perfect
- **MAE 0.04-0.06 (4-6%)**: Somewhat predictable - rough guidance only
- **MAE > 0.06 (6%+)**: Unpredictable - dominated by random events

**Interpreting Accuracy Level (for severity):**

- **MAE < 5 minutes**: Excellent - can predict delay length well
- **MAE 5-10 minutes**: Good - useful for planning
- **MAE 10-15 minutes**: Moderate - rough ballpark
- **MAE > 15 minutes**: Poor - severity is unpredictable

### What This Tells Us About Delays

**If models perform well** (low MAE):
- Delays are **systematic**, not random
- Historical patterns, airport identity, and seasonality explain a lot
- Airlines/airports can anticipate problems and prepare
- Suggests operational factors (scheduling, capacity) drive delays more than unpredictable events

**If models perform poorly** (high MAE):
- Delays are **event-driven** and unpredictable
- Missing critical features (specific weather events, mechanical failures, etc.)
- Need real-time data, not just historical patterns
- Harder to prevent, must focus on rapid response

### Baseline for Future Work

These models establish a **performance benchmark**. Future improvements might include:

**Feature additions that might help:**
- ‚úàÔ∏è Actual weather data (temperature, precipitation, wind speed)
- üìÖ Holiday indicators (Thanksgiving week, Christmas, Spring Break)
- üõ´ Airline-specific data (fleet age, financial health, labor disputes)
- üìä Economic indicators (recession periods affect travel patterns)

**How to evaluate improvements:**
- Run new model with additional features
- Compare MAE/RMSE to baseline
- If MAE drops significantly (e.g., 0.04 ‚Üí 0.028), the new features are valuable
- If MAE barely changes, those features don't actually drive delays

**Example interpretation:**
*"Our baseline model achieves MAE = 0.035 for delay_rate. After adding weather data, MAE dropped to 0.025. This tells us weather is a major driver (~30% improvement) and worth monitoring closely."*

In [None]:
feature_cols_num = [
    'Time.Year', 'carriers_total', 'flights_total', 'month_sin', 'month_cos',
    'lag1_delay_rate',
    'delay_carrier_share', 'delay_late_aircraft_share', 'delay_nas_share',
    'delay_security_share', 'delay_weather_share'
]
feature_cols_cat = ['Airport.Code']

train = model_df[model_df['Time.Year'] <= 2013].copy()
test = model_df[model_df['Time.Year'] >= 2014].copy()

def build_preprocessor():
    num_pipe = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())
    ])
    cat_pipe = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='most_frequent')),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ])
    return ColumnTransformer(
        transformers=[
            ('num', num_pipe, feature_cols_num),
            ('cat', cat_pipe, feature_cols_cat)
        ]
    )

def evaluate_regression(target_col):
    X_train = train[feature_cols_num + feature_cols_cat]
    y_train = train[target_col]
    X_test = test[feature_cols_num + feature_cols_cat]
    y_test = test[target_col]

    enet = Pipeline(steps=[
        ('prep', build_preprocessor()),
        ('model', ElasticNet(alpha=0.01, l1_ratio=0.2, random_state=42, max_iter=5000))
    ])
    hgb = Pipeline(steps=[
        ('prep', build_preprocessor()),
        ('model', HistGradientBoostingRegressor(random_state=42))
    ])

    metrics = []
    for name, pipe in [('ElasticNet', enet), ('HistGradientBoosting', hgb)]:
        pipe.fit(X_train, y_train)
        pred = pipe.predict(X_test)
        mae = mean_absolute_error(y_test, pred)
        rmse = np.sqrt(mean_squared_error(y_test, pred))
        metrics.append({'target': target_col, 'model': name, 'MAE': mae, 'RMSE': rmse})

    return pd.DataFrame(metrics)

results_delay = evaluate_regression('delay_rate')
results_severity = evaluate_regression('severity')
display(pd.concat([results_delay, results_severity], ignore_index=True))

## Plain-Language Conclusions: What This Data Really Tells Us

This section translates technical findings into actionable insights for different audiences.

---

## For Passengers: Travel Smarter

### 1. **Timing Matters More Than You Think**

**Seasonality is real and predictable:**
- Flying in **September** gives you ~10-15% better odds of on-time departure than December
- **Avoid December holidays** if you have flexibility - it's consistently the worst month across all 14 years
- **Summer months (June-July)** carry elevated risk due to thunderstorms and peak travel
- **October** is another good choice - moderate demand, favorable weather

**The industry HAS improved:**
- Flying in **2012-2016 was notably more reliable** than 2003-2009
- Delay rates dropped ~10 percentage points from 2007 peak to 2013
- Both frequency AND severity improved (fewer delays, and shorter when they happen)

### 2. **Not All Airports Are Equal**

**Performance varies dramatically:**
- Best airports achieve **82-85% on-time rates**
- Worst airports struggle at **70-73% on-time rates**
- That's a **10-15 percentage point gap** - your airport choice matters as much as the month you fly

**What makes an airport better:**
- ‚úÖ Year-round favorable weather (Southwest U.S., West Coast)
- ‚úÖ Modern infrastructure (newer terminals, sufficient runway capacity)
- ‚úÖ Manageable airspace (not competing with multiple nearby major airports)
- ‚ùå Weather extremes (Chicago winters, Houston hurricanes)
- ‚ùå Airspace congestion (NYC area, crowded regions)
- ‚ùå Aging infrastructure (capacity constraints)

**Practical advice:**
- When choosing connecting flights, prefer top-performing airports if routes allow
- If connecting through challenging airports (NYC area, Chicago), build extra buffer time
- Consider airport quality when planning trips, not just ticket price

### 3. **If Your Flight Is Delayed, the Cause Matters**

**Understanding what to expect:**

- **"Late arriving aircraft"**: Expect a longer wait (cascading delays)
  - Previous flight's delay ripples through the schedule
  - Airline may eventually substitute a different plane, but takes time
  
- **"Air traffic control" (NAS)**: Could be long, airline can't speed it up
  - ATC holds can last hours during peak congestion
  - Weather at other airports can cause holds even if your airport is clear
  
- **"Weather"**: Highly unpredictable
  - Could clear in 30 minutes or ground flights for hours
  - Monitor weather radar at both departure and arrival airports
  
- **"Carrier/maintenance"**: Variable recovery time
  - Sometimes quick fixes (crew swap, gate change, minor mechanical)
  - Sometimes requires plane substitution (hours)
  - Airline has most control over these delays
  
- **"Security"**: Rare but can be severe
  - Terminal evacuations require complete re-screening
  - If it happens, expect significant disruption

### 4. **Plan for the Worst-Case Scenario**

- **Important connections**: Book 2+ hour layovers in December or at challenging airports
- **Critical trips**: Consider travel insurance or backup flights
- **Check historical performance**: Before booking, research your specific airport and season

---

## For Airlines and Airports: Operational Insights

### 1. **Late Aircraft Is the Highest-Impact Problem**

**The cascading effect is expensive:**
- **#2 by frequency but #1 by total delay time**
- One late plane affects multiple subsequent flights throughout the day
- Creates longest individual passenger delays

**Solutions that work:**
- **Schedule padding**: Add buffer time between flights (reduces cascading)
- **Spare aircraft/crews**: Faster recovery when substitution is possible
- **Better turnaround processes**: Faster boarding, deplaning, servicing
- **Proactive rebooking**: Move passengers before cascading delays worsen

**ROI is high:** Reducing late aircraft delays has multiplier effects on system performance

### 2. **NAS Delays Are Systemic, Not Airline-Specific**

**The infrastructure problem:**
- Accounts for **~40% of all delays**
- Air traffic control capacity constraints
- Airport congestion (not enough gates, runways, or airspace)
- **Individual airlines have limited control**

**This requires policy-level solutions:**
- ATC modernization (NextGen implementation)
- Runway and terminal expansion at congested airports
- Regional airspace optimization (especially NYC area)
- Investment in weather forecasting and routing technology

**What airlines CAN do:**
- Optimize schedules around known ATC bottlenecks
- Collaborate with other airlines on slot coordination
- Lobby for infrastructure investment
- Improve operational efficiency to minimize contribution to congestion

### 3. **Performance Varies Wildly by Airport**

**Some airports achieve excellence at scale:**
- High-volume airports CAN perform well (Atlanta proves this)
- Best performers share traits: good weather, modern infrastructure, efficient operations

**Struggling airports need targeted investment:**
- **High volatility** = need resilience planning (weather preparedness, overflow capacity)
- **High frequency, low severity** = process improvements (faster turnarounds)
- **Low frequency, high severity** = contingency planning (backup resources)

**Benchmark against the best:**
- If similar airports achieve better results, operational improvements are possible
- Study best practices from top performers (scheduling, ground operations, crew management)

### 4. **Carrier Delays Are the Most Controllable**

**Good news: ~20-25% of delays are within airline control:**
- Maintenance scheduling
- Crew management
- Ground operations (fueling, catering, cleaning)
- Fleet modernization

**These are improving over time** (Carrier delay share has decreased in recent years)
- Suggests investments in fleet modernization and operations are working
- Continue focus on operational efficiency

---

## For Policymakers: System-Level Insights

### 1. **The System Improved Significantly Post-2010**

**Evidence of successful intervention:**
- Delay rates dropped **~10 percentage points** from 2007 peak to 2013
- Improvement sustained through 2016 (not a temporary blip)
- Both frequency AND severity improved

**What likely drove improvement:**
- Reduced schedules after 2008 recession (less congestion)
- Fleet modernization (newer, more reliable aircraft)
- Operational improvements (better scheduling, crew management)
- Some ATC improvements (though infrastructure remains a constraint)

**Key takeaway:** **Interventions CAN work** - the system is improvable, not broken

### 2. **NAS Delays Represent Systemic Underinvestment**

**40% of delays are infrastructure-related:**
- Air traffic control capacity
- Airport runway/terminal/gate capacity
- Airspace design and management

**This is the one cause that hasn't improved much over time:**
- Carrier delays have decreased (airlines improved operations)
- NAS delays remain stubbornly high (infrastructure hasn't kept pace)

**Points to need for:**
- **NextGen ATC implementation** (modernize 1960s-era technology)
- **Airport infrastructure investment** (runways, terminals, especially at congested hubs)
- **Regional airspace optimization** (NYC area, Southern California, crowded regions)

**ROI calculation:** 40% of delay problem could be addressed with infrastructure investment

### 3. **Weather Delays Are Small But Severe**

**Only ~5-8% of delays, but significant impact:**
- Unpredictable and severe (can't fly through thunderstorms)
- Creates multi-hour disruptions
- Causes diversions (even longer passenger delays)

**Climate change implications:**
- Severe weather events may increase in frequency
- Worth monitoring trend over time (future analysis should extend beyond 2016)
- May require increased resilience investment (de-icing, drainage, alternative routing)

### 4. **Airport Quality Matters More Than Aggregate Statistics Suggest**

**National averages hide massive variation:**
- Best airports: 82-85% on-time
- Worst airports: 70-73% on-time
- 10-15 percentage point gap

**Policy implications:**
- **Targeted investment** in struggling airports could have outsized impact
- **Best practice sharing** from top performers to struggling airports
- **Regional approaches** for multi-airport regions (NYC, Los Angeles)

**Economic development angle:**
- Airport reliability affects regional economic competitiveness
- Businesses factor airport quality into location decisions
- Tourism and conventions avoid unreliable airports

---

## Biggest Surprise in the Data

**Delay SEVERITY has improved even more than delay FREQUENCY:**

Not only are **fewer flights delayed now** (vs. 2007), but when delays happen, they're **resolved faster**. This suggests airlines got better at both:
1. **Prevention** (operational improvements reduced delay frequency)
2. **Recovery** (better processes shortened delays when they occur)

This double improvement is **more significant than it sounds** - it represents systematic operational changes, not just luck or favorable conditions.

---

## What This Analysis Cannot Tell Us (Yet)

### Limitations to acknowledge:

1. **Why 2007 was so bad**: 
   - We see it in the data but would need external research to explain
   - Possibilities: fuel price spikes, labor disputes, ATC staffing shortages, specific weather events
   - Requires integrating economic and operational data

2. **Passenger-specific experience**: 
   - A 3% delay rate improvement sounds small statistically
   - But represents **millions of passengers arriving on-time** who wouldn't have in 2007
   - We don't quantify the human impact or economic value

3. **Airline-specific performance**: 
   - Data is aggregated by airport, not by carrier
   - Can't compare Delta vs. United directly
   - Would require carrier-level analysis

4. **Economic costs**: 
   - We measure delays in minutes, not dollars
   - Lost productivity, missed connections, hotel costs not quantified
   - Full economic impact assessment requires cost modeling

5. **Root causes vs. symptoms**:
   - We identify when and where delays happen
   - We classify immediate causes (weather, maintenance, etc.)
   - But deeper structural causes (scheduling pressure, cost-cutting, etc.) aren't visible

6. **Post-2016 trends**:
   - Analysis ends in 2016
   - Can't speak to recent performance (COVID impact, recovery, current state)
   - Future analysis should extend timeline

---

## Recommended Next Steps

### For immediate improvements to this analysis:

1. **Add external weather data**: Temperature, precipitation, wind speed at each airport
2. **Include holiday indicators**: Thanksgiving week, Christmas, Spring Break amplify delays
3. **Extend timeline**: Update with 2017-2025 data to see post-COVID patterns
4. **Add airline-level analysis**: Compare carrier performance, not just airports
5. **Economic impact modeling**: Convert delay minutes to dollar costs

### For broader research questions:

1. **Why did 2007-2008 perform so poorly?** Investigate fuel prices, labor issues, ATC staffing
2. **What specific interventions worked post-2010?** Interview airlines about operational changes
3. **International comparison**: How does U.S. performance compare to Europe, Asia?
4. **Climate change impact**: Are weather delays increasing in recent years (2016+)?
5. **Passenger experience**: Survey passengers on delay tolerance, preferences, willingness to pay for reliability

### Reading model results
The table reports MAE and RMSE for each model and target.
- Lower values are better.
- Compare models within the same target.

What conclusions to draw:
- If both models are close, the relationship may be mostly linear/simple.
- If boosting is much better, delay behavior may include non-linear interactions.

Use this as a starting benchmark before trying richer features (weather, operations, holidays).


## 8) Suggested next improvements

- Add confidence intervals on airport rankings using bootstrap.
- Add external weather signals for better causal interpretation.
- Build an interactive dashboard (Plotly/Altair) with airport and time filters.
- Export clean tables for downstream BI tools.

## 9) Plain-language conclusions from this notebook
1. **Use weighted rates for fair comparisons.** High-traffic months/airports should influence overall performance more than low-traffic ones.
2. **Delay frequency and delay severity are different problems.** Track both to avoid incomplete conclusions.
3. **Delay causes are not equally important.** NAS and Late Aircraft are usually the largest contributors in this dataset.
4. **Seasonality matters.** Month-year heatmaps reveal recurring high-risk periods.
5. **Airport performance is heterogeneous.** Ranking + risk-map views make this visible to stakeholders quickly.
6. **Simple models are useful baselines.** They quantify how predictable delay patterns are and where feature engineering is needed next.

## 10) Recommended next steps for your project
- Add external weather and holiday features.
- Create an executive summary dashboard (Plotly/Altair) with airport and year filters.
- Add uncertainty ranges (confidence intervals) for airport rankings.
- Save cleaned/aggregated tables to `data/processed/` for reproducible downstream work.


## 11) If merge conflicts happen again (quick guidance)
- For notebook conflicts, choose **Both changes** whenever possible, then open the notebook and remove duplicates manually.
- Choosing only **Current** or only **Incoming** can drop entire markdown/code cell blocks.
- After resolving, quickly verify with a cell count and a scan of cell types before commit.
