# Airline Delays in the United States (2003-2016)

This analysis explores 13 years of flight delay data from major U.S. airports. The dataset tracks monthly statistics: how many flights were on time, delayed, cancelled, or diverted, along with what caused those delays.

## Why This Matters

Flight delays cost the U.S. economy billions annually. They affect millions of passengers through missed connections, lost productivity, and travel stress. Understanding delay patterns helps airlines improve operations, airports allocate resources better, and passengers make smarter travel decisions.

## Evolution from the 2021 R Analysis

The original R analysis provided the foundation by identifying key patterns: which months and years had the most delays, what caused them, and how long they lasted. This Python analysis builds on that work with:

- **Weighted metrics** that give busy airports appropriate influence
- **Frequency vs. severity separation** to distinguish how often delays happen from how long they last
- **Airport-level profiles** showing which airports perform best and worst
- **Composition tracking** to see if delay causes are shifting over time
- **Predictive modeling** to understand what drives delays

## The Five Types of Delays

Every delay gets classified into one of five causes:

**Carrier**: Problems with the airline itself—maintenance, crew issues, baggage loading, fueling, cleaning

**Late Aircraft**: When an incoming plane arrives late, the next flight using that same plane departs late (cascading delays)

**National Aviation System (NAS)**: Air traffic control holds, airport congestion, high traffic volume

**Security**: Security breaches, long TSA lines, terminal evacuations

**Weather**: Actual meteorological conditions that prevent safe flight operations

## Dataset Scope

- **Period**: June 2003 through December 2016
- **Coverage**: 30-50+ major U.S. airports per year
- **Structure**: One record per airport per month
- **Metrics**: Flight counts, delay causes, total delay minutes

## Data Dictionary: Understanding the Variables

### Time Variables
- `Time.Year`: Year (2003-2016)
- `Time.Month`: Month number (1-12)
- `Time.Month Name`: Month name (January-December)

### Airport Variables
- `Airport.Code`: 3-letter airport code (e.g., ATL, LAX, ORD)
- `Airport.Name`: Full airport name and location

### Flight Count Variables
- `Statistics.Flights.Total`: Total flights at this airport this month
- `Statistics.Flights.On Time`: Flights that departed/arrived on schedule
- `Statistics.Flights.Delayed`: Flights delayed for any reason
- `Statistics.Flights.Cancelled`: Flights cancelled before departure
- `Statistics.Flights.Diverted`: Flights diverted to alternate airports

*Note: On Time + Delayed + Cancelled + Diverted = Total (accounting identity)*

### Delay Cause Count Variables (# of delays)
- `Statistics.# of Delays.Carrier`: Delays caused by airline (maintenance, crew, fueling, etc.)
- `Statistics.# of Delays.Late Aircraft`: Previous flight late, causing this flight to be late
- `Statistics.# of Delays.National Aviation System`: Air traffic control, heavy volume, airport operations
- `Statistics.# of Delays.Security`: Security breaches, TSA issues, evacuations
- `Statistics.# of Delays.Weather`: Actual weather conditions (not forecast)

### Delay Duration Variables (minutes)
- `Statistics.Minutes Delayed.Carrier`: Total minutes delayed due to carrier issues
- `Statistics.Minutes Delayed.Late Aircraft`: Total minutes delayed due to late aircraft
- `Statistics.Minutes Delayed.National Aviation System`: Total minutes delayed due to NAS
- `Statistics.Minutes Delayed.Security`: Total minutes delayed due to security
- `Statistics.Minutes Delayed.Weather`: Total minutes delayed due to weather
- `Statistics.Minutes Delayed.Total`: Sum of all delay minutes

### Carrier Variables
- `Statistics.Carriers.Total`: Number of different airlines serving this airport
- `Statistics.Carriers.Names`: Comma-separated list of airline names

### Important Notes
- Each row represents ONE AIRPORT for ONE MONTH
- Minutes are TOTAL for all delays of that type (not per-delay average)
- Delay causes don't sum to total delayed flights (one flight can have multiple delay causes)

## Setup

In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import ElasticNet
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

sns.set_theme(style='whitegrid', context='talk')
pd.set_option('display.max_columns', 100)

DATA_PATH = 'airlines.csv'
df = pd.read_csv(DATA_PATH)
print('Shape:', df.shape)
display(df.head(3))

In [None]:
# Normalize month label typo for cleaner charts
df['Time.Month Name'] = df['Time.Month Name'].replace({'Febuary': 'February'})

# Quick quality checks
assert df.isna().sum().sum() == 0, 'Unexpected missing values found.'

flight_balance = (
    df['Statistics.Flights.Cancelled']
    + df['Statistics.Flights.Delayed']
    + df['Statistics.Flights.Diverted']
    + df['Statistics.Flights.On Time']
)
assert (flight_balance == df['Statistics.Flights.Total']).all(), 'Flight totals are not balanced.'

print('Year range:', df['Time.Year'].min(), '-', df['Time.Year'].max())
print('Airports:', df['Airport.Code'].nunique())

### Data Quality Checks
This cell checks that the data is reliable:
- It fixes a month name typo (`Febuary` -> `February`) so month charts display correctly.
- It confirms there are no missing values.
- It verifies a key accounting rule: **On-time + Delayed + Diverted + Cancelled = Total flights**.

If all checks pass, we can trust later comparisons.


### Performance Trends Over Time

The chart shows on-time and delay rates for each year, weighted by flight volume. This means busy airports like Atlanta influence the numbers more than small regional airports—reflecting the actual passenger experience.

**What we see:**

The **mid-to-late 2000s were rough**. Performance declined steadily from 2003, hitting bottom in 2007-2008 when nearly a quarter of all flights were delayed.

Then **things got better**. Starting in 2009-2010, delay rates dropped significantly. By 2012-2013, performance reached its best levels of the entire period. Delay rates settled around 15-17%, compared to 23-25% during the 2007-2008 peak.

**Why weighted rates matter:**

Without weighting, a tiny airport with 100 flights monthly would count as much as Atlanta with 30,000+ flights. That wouldn't reflect reality. Weighted rates ensure that delays affecting the most passengers carry the most weight in our analysis.

**Connection to the original analysis:**

The 2021 R analysis found 77.8% of flights were on-time overall, with 20.2% delayed. Our weighted calculations confirm these numbers while adding year-by-year detail that shows how dramatically performance has improved since the 2007-2008 low point.

In [None]:
def add_rate_columns(frame):
    out = frame.copy()
    out['on_time_rate'] = out['Flights_On_Time'] / out['Flights_Total']
    out['delay_rate'] = out['Flights_Delayed'] / out['Flights_Total']
    out['cancel_rate'] = out['Flights_Cancelled'] / out['Flights_Total']
    out['divert_rate'] = out['Flights_Diverted'] / out['Flights_Total']
    return out

yearly = (
    df.groupby('Time.Year', as_index=False)
      .agg(
          Flights_Total=('Statistics.Flights.Total', 'sum'),
          Flights_On_Time=('Statistics.Flights.On Time', 'sum'),
          Flights_Delayed=('Statistics.Flights.Delayed', 'sum'),
          Flights_Cancelled=('Statistics.Flights.Cancelled', 'sum'),
          Flights_Diverted=('Statistics.Flights.Diverted', 'sum')
      )
)
yearly = add_rate_columns(yearly)
display(yearly.head())

best_year = yearly.loc[yearly['on_time_rate'].idxmax(), ['Time.Year', 'on_time_rate']]
worst_year = yearly.loc[yearly['on_time_rate'].idxmin(), ['Time.Year', 'on_time_rate']]
print('Best on-time year:', int(best_year['Time.Year']), f"({best_year['on_time_rate']:.2%})")
print('Worst on-time year:', int(worst_year['Time.Year']), f"({worst_year['on_time_rate']:.2%})")

### Reading The Trend Lines

The chart tracks four metrics over time. Higher on-time rates (green) are good. Higher delay, cancellation, and diversion rates are bad.

**Key patterns:**

**2003-2007**: Steady decline. The on-time rate drops while delays climb. By 2007, the system is at its worst.

**2007-2008**: Peak crisis. This is rock bottom for the 13-year period.

**2009-2010**: Sharp turnaround. Airlines reduced schedules during the recession, cutting congestion. Operations improved quickly.

**2011-2016**: Sustained improvement. The gains held. This wasn't a lucky year—it represents lasting operational changes.

The on-time rate (green line) and delay rate (red line) are mirror images. When one goes up, the other goes down. That's because they're measuring opposite sides of the same thing: whether flights departed on schedule.

In [None]:
kpi_long = yearly.melt(
    id_vars='Time.Year',
    value_vars=['on_time_rate', 'delay_rate', 'cancel_rate', 'divert_rate'],
    var_name='metric',
    value_name='value'
)

plt.figure(figsize=(13, 6))
sns.lineplot(data=kpi_long, x='Time.Year', y='value', hue='metric', marker='o')
plt.gca().yaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
plt.title('Weighted Flight Outcome Rates by Year')
plt.xlabel('Year')
plt.ylabel('Rate')
plt.legend(title='KPI', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.show()

### Chart Insights
Read the lines together, not separately:
- Higher **on-time rate** is good.
- Higher **delay/cancel/divert rates** are bad.

Look for turning points (for example, broad deterioration then recovery). This helps explain whether delays are structural (long period) or event-driven (short spikes).


### What Causes Delays

The charts compare how often each cause happens versus how much total delay time it creates.

**The breakdown:**

**NAS (National Aviation System)**: 35-40% of all delays. Air traffic control holds, airport congestion, and high traffic volume.

**Late Aircraft**: 30-35% of delays. When the incoming plane is late, your departing flight is late. These delays cascade through the schedule.

**Carrier**: 20-25% of delays. Airline-specific problems like maintenance, crew issues, or fueling delays.

**Weather**: 5-10% of delays. Thunderstorms, ice, fog—conditions that make flying unsafe.

**Security**: Less than 1% of delays. Rare but includes terminal evacuations and security breaches.

**A key difference:**

Late Aircraft delays take longer to resolve than the others. When you see this cause taking up a bigger slice of total delay minutes than delay count, it means these delays last longer. Cascading delays build on themselves—one late plane affects multiple subsequent flights.

Weather follows a similar pattern. You can't fly through a thunderstorm, so you wait. A weather delay might be 30 minutes or 3 hours depending on conditions.

**What this means:**

Frequent problems need better processes. Severe problems need backup plans. Problems that are both frequent and severe (NAS and Late Aircraft) need the most attention.

In [None]:
count_reason_map = {
    'Carrier': 'Statistics.# of Delays.Carrier',
    'Late Aircraft': 'Statistics.# of Delays.Late Aircraft',
    'NAS': 'Statistics.# of Delays.National Aviation System',
    'Security': 'Statistics.# of Delays.Security',
    'Weather': 'Statistics.# of Delays.Weather',
}

minutes_reason_map = {
    'Carrier': 'Statistics.Minutes Delayed.Carrier',
    'Late Aircraft': 'Statistics.Minutes Delayed.Late Aircraft',
    'NAS': 'Statistics.Minutes Delayed.National Aviation System',
    'Security': 'Statistics.Minutes Delayed.Security',
    'Weather': 'Statistics.Minutes Delayed.Weather',
}

reason_totals = pd.DataFrame({
    'reason': list(count_reason_map.keys()),
    'delay_count': [df[col].sum() for col in count_reason_map.values()],
    'delay_minutes': [df[col].sum() for col in minutes_reason_map.values()]
})
reason_totals['share_by_count'] = reason_totals['delay_count'] / reason_totals['delay_count'].sum()
reason_totals['share_by_minutes'] = reason_totals['delay_minutes'] / reason_totals['delay_minutes'].sum()
reason_totals = reason_totals.sort_values('share_by_count', ascending=True)
display(reason_totals)

fig, axes = plt.subplots(1, 2, figsize=(15, 6), sharey=True)
sns.barplot(data=reason_totals, x='share_by_count', y='reason', ax=axes[0], color='#4C72B0')
axes[0].set_title('Delay Cause Share (Count)')
axes[0].xaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
axes[0].set_xlabel('Share')
axes[0].set_ylabel('Reason')

sns.barplot(data=reason_totals, x='share_by_minutes', y='reason', ax=axes[1], color='#55A868')
axes[1].set_title('Delay Cause Share (Minutes)')
axes[1].xaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
axes[1].set_xlabel('Share')
axes[1].set_ylabel('')

plt.tight_layout()
plt.show()

### How Delay Causes Change Over Time

The stacked area chart shows which causes dominate in each year. Each colored band represents one delay type. Wider bands mean that cause accounted for more delays that year.

**What to look for:**

**Stable bands** = persistent problems that don't improve much over time

**Growing bands** = problems getting worse

**Shrinking bands** = problems improving

**Spikes** = one-year surges, usually weather-related

**Patterns in this data:**

**NAS dominates throughout**. It's the widest band and stays relatively constant. This is a structural infrastructure problem, not something airlines can easily fix.

**Late Aircraft is consistently second**. In bad years (like 2007), it grows even larger as delays cascade more severely.

**Carrier delays shrink in later years**. Airlines have modernized fleets and improved operations, reducing maintenance and crew-related delays.

**Weather spikes in specific years** (like 2008). Severe storm seasons show up as temporary bumps.

**Security is barely visible**. It's so rare that it doesn't register as a meaningful band.

**The takeaway:**

Delay causes remain remarkably stable over time. The aviation system faces the same core challenges year after year. This suggests persistent structural issues rather than random bad luck.

In [None]:
year_reason = (
    df.groupby('Time.Year', as_index=False)[list(count_reason_map.values())]
      .sum()
)
year_reason_long = year_reason.melt(
    id_vars='Time.Year',
    var_name='reason_col',
    value_name='delay_count'
)
inv_count = {v: k for k, v in count_reason_map.items()}
year_reason_long['reason'] = year_reason_long['reason_col'].map(inv_count)
year_reason_long['share'] = year_reason_long['delay_count'] / year_reason_long.groupby('Time.Year')['delay_count'].transform('sum')

pivot_share = year_reason_long.pivot(index='Time.Year', columns='reason', values='share').sort_index()
pivot_share.plot.area(figsize=(13, 6), colormap='tab20')
plt.title('Delay Cause Composition Drift Over Time (by Count Share)')
plt.ylabel('Share of delayed flights')
plt.gca().yaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
plt.xlabel('Year')
plt.legend(title='Reason', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.show()

**How to read composition drift over time**

The stacked area chart shows cause mix by year.
- Wider band = larger share of delayed flights for that reason.
- Changes in band width over time reveal whether delay drivers are shifting.

This helps answer: *Are delays caused by the same factors every year, or changing factors?*


### When Delays Happen: The Seasonal Pattern

The heatmap shows delay rates for every month of every year. Darker colors mean more delays. Lighter colors mean fewer delays.

**The worst month: December**

December is consistently dark across all 14 years. It's the most delayed month, every single year. Why? Holiday travel surges, winter weather, and packed airports create a perfect storm.

**The best month: September**

September is consistently light. After Labor Day, travel demand drops and weather improves. This pattern holds year after year.

**Summer has mixed results**

June and July show moderate delays. Thunderstorm season and peak leisure travel both contribute.

**The 2007 column**

Look at the 2007 vertical column. Nearly every cell is dark. That entire year was bad, regardless of month.

**The 2012-2013 columns**

These years show predominantly lighter cells. Performance improved across the board.

**What this means for travelers:**

Want to minimize delay risk? Fly in September. Avoid December if possible. Summer carries moderate risk.

The consistency of this pattern over 14 years proves it's structural, not random. December will always be challenging. September will always be reliable. Plan accordingly.

In [None]:
month_order = ['January', 'February', 'March', 'April', 'May', 'June',
               'July', 'August', 'September', 'October', 'November', 'December']

monthly = (
    df.groupby(['Time.Year', 'Time.Month Name'], as_index=False)
      .agg(
          Flights_Total=('Statistics.Flights.Total', 'sum'),
          Flights_Delayed=('Statistics.Flights.Delayed', 'sum')
      )
)
monthly['delay_rate'] = monthly['Flights_Delayed'] / monthly['Flights_Total']
monthly['Time.Month Name'] = pd.Categorical(monthly['Time.Month Name'], categories=month_order, ordered=True)

heat = monthly.pivot(index='Time.Month Name', columns='Time.Year', values='delay_rate').loc[month_order]
plt.figure(figsize=(14, 7))
sns.heatmap(heat, cmap='YlOrRd', cbar_kws={'format': plt.matplotlib.ticker.PercentFormatter(1.0)})
plt.title('Delay Rate Seasonality Heatmap (Month x Year)')
plt.xlabel('Year')
plt.ylabel('Month')
plt.tight_layout()
plt.show()

### Seasonality Insights
Each cell shows delay rate for one month/year combination:
- Darker color = higher delay rate.
- Vertical patterns suggest bad/good years.
- Horizontal patterns suggest seasonal risk months.

This view is useful for planning: airlines/airports can prepare for recurring high-risk periods.


### How Long Delays Last

This chart answers a different question than delay frequency. It shows: when a flight IS delayed, how long is the average delay?

A system can have frequent short delays or rare long delays. This metric separates those two scenarios.

**The pattern:**

Delays typically last 45-65 minutes on average per delayed flight.

**2007 was bad in two ways.** Not only did more flights get delayed, but those delays lasted longer. Double trouble.

**2012 onward shows improvement.** Delays became both less frequent AND shorter. This is a big deal—it means airlines got better at preventing delays AND at recovering quickly when they happen.

**What makes delays longer:**

- Cascading effects (one late plane affects many flights)
- Infrastructure limits (not enough gates or runways to recover quickly)
- Extreme weather (nothing to do but wait for conditions to improve)
- Tight schedules (no buffer time built in)

**What makes delays shorter:**

- Schedule padding (buffer time between flights)
- Spare aircraft and crews (faster substitutions)
- Better turnaround processes (faster boarding and servicing)
- Modern fleets (newer planes break down less)

The post-2010 improvement in both delay frequency and severity shows airlines made systematic operational changes, not just got lucky.

In [None]:
eps = 1e-9

severity_year = (
    df.groupby('Time.Year', as_index=False)
      .agg(
          delay_minutes=('Statistics.Minutes Delayed.Total', 'sum'),
          delayed_flights=('Statistics.Flights.Delayed', 'sum')
      )
)
severity_year['avg_delay_minutes_per_delayed_flight'] = severity_year['delay_minutes'] / (severity_year['delayed_flights'] + eps)
display(severity_year)

plt.figure(figsize=(12, 5))
sns.lineplot(data=severity_year, x='Time.Year', y='avg_delay_minutes_per_delayed_flight', marker='o')
plt.title('Average Delay Minutes per Delayed Flight (Yearly)')
plt.xlabel('Year')
plt.ylabel('Minutes')
plt.tight_layout()
plt.show()

### Which Delays Last Longest

This chart ranks delay causes by total delay time per month.

**1. Late Aircraft: ~50,000 minutes monthly**

The longest delays. Why? Cascading effects. One late plane affects every subsequent flight using that aircraft. The delay compounds through the day.

Airlines can't easily fix this mid-sequence. They'd need to substitute a different plane, which requires spare aircraft to be available.

**2. NAS: ~45,000 minutes monthly**

Air traffic control holds can last hours. When ATC says wait, you wait. Airlines have zero control.

Airport congestion creates the same problem—multiple flights stuck waiting for gates or runway slots.

**3. Carrier: ~35,000 minutes monthly**

Airline-specific problems. Maintenance takes time. Some issues (crew swaps, gate changes) can be fixed relatively quickly. Others (mechanical problems) require longer repairs.

**4. Weather: Variable**

Unpredictable. Could be 30 minutes or 3 hours. You can't fly through a thunderstorm, so everyone waits. Often causes diversions, which create even longer passenger delays.

**5. Security: Rare but severe**

Terminal evacuations require complete re-screening of all passengers. Security breaches can ground entire terminals. Very low frequency makes the average less meaningful.

**What this means when your flight is delayed:**

If the gate agent says "late arriving aircraft," expect a longer wait. The problem is systemic, not quick-fix.

If they say "air traffic control," the airline can't speed it up. You're waiting on ATC clearance.

If they say "weather," check the radar. You're waiting for conditions to improve.

If they say "maintenance," it's variable. Could be resolved in 20 minutes or require a plane swap.

**The 2021 R analysis found the same rankings.** This confirms Late Aircraft and NAS create the longest delays, just as they did in the original analysis.

In [None]:
reason_severity = pd.DataFrame({'reason': list(count_reason_map.keys())})
reason_severity['delays'] = [df[count_reason_map[r]].sum() for r in reason_severity['reason']]
reason_severity['minutes'] = [df[minutes_reason_map[r]].sum() for r in reason_severity['reason']]
reason_severity['minutes_per_delay'] = reason_severity['minutes'] / (reason_severity['delays'] + eps)
reason_severity = reason_severity.sort_values('minutes_per_delay', ascending=False)
display(reason_severity)

plt.figure(figsize=(10, 5))
sns.barplot(data=reason_severity, x='minutes_per_delay', y='reason', color='#C44E52')
plt.title('Severity by Reason (Minutes per Delay)')
plt.xlabel('Minutes per delayed flight event')
plt.ylabel('Reason')
plt.tight_layout()
plt.show()

### Severity By Eeason
This ranks delay causes by minutes-per-delay-event.
- Higher value means that cause tends to create longer disruptions.
- Lower value means delays are typically shorter.

Actionability:
- High-frequency causes need process improvements.
- High-severity causes need resilience/contingency planning.


### Airport Performance Profiles

This table shows long-run performance for each airport across the entire 2003-2016 period.

**The metrics:**

**total_flights**: How many flights the airport handled over 13 years. Major hubs like Atlanta, Chicago, and Dallas handle 3-5+ million flights.

**long_run_on_time_rate** and **long_run_delay_rate**: Average performance across all years. The best airports achieve 82-85% on-time. Struggling airports drop below 75%.

**annual_delay_rate_volatility**: Consistency measure. High volatility means performance swings wildly year-to-year (unpredictable). Low volatility means stable, predictable performance.

**avg_severity**: When delays happen here, how long do they last? Measured in minutes per delayed flight.

**Why this matters:**

The 2021 R analysis gave us national averages. This table reveals something important: **airports vary dramatically**.

Some airports are consistently excellent. Some struggle year after year. Size doesn't determine quality—Atlanta handles massive volume efficiently while some smaller airports struggle.

Geography, weather, infrastructure, and operations create persistent advantages and disadvantages. Your airport choice matters as much as your travel timing.

In [None]:
airport_year = (
    df.groupby(['Airport.Code', 'Airport.Name', 'Time.Year'], as_index=False)
      .agg(
          Flights_Total=('Statistics.Flights.Total', 'sum'),
          Flights_On_Time=('Statistics.Flights.On Time', 'sum'),
          Flights_Delayed=('Statistics.Flights.Delayed', 'sum'),
          delay_minutes=('Statistics.Minutes Delayed.Total', 'sum')
      )
)
airport_year['on_time_rate'] = airport_year['Flights_On_Time'] / airport_year['Flights_Total']
airport_year['delay_rate'] = airport_year['Flights_Delayed'] / airport_year['Flights_Total']
airport_year['severity_minutes_per_delayed'] = airport_year['delay_minutes'] / (airport_year['Flights_Delayed'] + eps)

airport_profile = (
    airport_year.groupby(['Airport.Code', 'Airport.Name'], as_index=False)
               .agg(
                   total_flights=('Flights_Total', 'sum'),
                   long_run_on_time_rate=('on_time_rate', 'mean'),
                   long_run_delay_rate=('delay_rate', 'mean'),
                   annual_delay_rate_volatility=('delay_rate', 'std'),
                   avg_severity=('severity_minutes_per_delayed', 'mean')
               )
)
airport_profile = airport_profile.sort_values('total_flights', ascending=False)
display(airport_profile.head(10))

### Airport Summary
This table builds long-run metrics per airport:
- Reliability (on-time rate)
- Delay frequency (delay rate)
- Stability over years (volatility)
- Typical delay length (severity)

It helps compare airports on **quality** and **consistency**, not just traffic volume.


In [None]:
# Filter to busiest airports for fair comparison
threshold = airport_profile['total_flights'].quantile(0.50)
airport_filtered = airport_profile[airport_profile['total_flights'] >= threshold].copy()

top_reliable = airport_filtered.sort_values('long_run_on_time_rate', ascending=False).head(10)
top_volatile = airport_filtered.sort_values('annual_delay_rate_volatility', ascending=False).head(10)

display(top_reliable[['Airport.Code', 'long_run_on_time_rate', 'total_flights']])
display(top_volatile[['Airport.Code', 'annual_delay_rate_volatility', 'total_flights']])

plt.figure(figsize=(12, 8))
sns.scatterplot(
    data=airport_filtered,
    x='long_run_delay_rate',
    y='avg_severity',
    size='total_flights',
    hue='annual_delay_rate_volatility',
    palette='viridis',
    alpha=0.8
)
for _, r in airport_filtered.nlargest(8, 'total_flights').iterrows():
    plt.text(r['long_run_delay_rate'], r['avg_severity'], r['Airport.Code'], fontsize=9)
plt.gca().xaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
plt.title('Airport Risk Map: Frequency vs Severity')
plt.xlabel('Long-run delay rate')
plt.ylabel('Average severity (minutes per delayed flight)')
plt.tight_layout()
plt.show()

### The Airport Risk Map

The scatter plot positions each airport by two factors: how often delays happen (x-axis) and how long they last (y-axis).

**Bottom-left = best**: Low delay frequency, short delays when they happen

**Top-right = worst**: High delay frequency, long delays when they happen

**Bubble size** = traffic volume. Larger bubbles handle more passengers.

**Color darkness** = volatility. Darker purple means performance swings wildly year-to-year.

**What makes top airports excellent:**

- Year-round good weather (Southwest U.S., West Coast)
- Modern infrastructure (newer terminals, sufficient capacity)
- Manageable airspace (not competing with multiple nearby airports)

Examples: Phoenix, Salt Lake City, San Diego, Seattle

**What makes bottom airports struggle:**

- Weather extremes (Chicago winters, Houston hurricanes, NYC nor'easters)
- Airspace congestion (NYC has three major airports sharing crowded airspace)
- Aging infrastructure (capacity constraints, outdated facilities)

Examples: Newark, O'Hare, LaGuardia

**The 10-15 percentage point gap:**

The original R analysis reported 78% on-time overall. But that hides massive variation:

- Best airports: 82-85% on-time
- Worst airports: 70-73% on-time

Your airport matters as much as which month you fly. Choose connections through top-performing airports when routes allow. Build extra buffer time at airports with high delay rates.

Weather-challenged and congested airports need extra planning. That cheap connection through Newark or O'Hare in December? You're taking a risk.

In [None]:
# Top and bottom airport ranking visualization
rank_base = airport_filtered.sort_values('long_run_on_time_rate', ascending=False)
top10 = rank_base.head(10).copy()
bottom10 = rank_base.tail(10).copy().sort_values('long_run_on_time_rate', ascending=True)

fig, axes = plt.subplots(1, 2, figsize=(16, 6), sharex=True)
sns.barplot(data=top10, y='Airport.Code', x='long_run_on_time_rate', ax=axes[0], color='#4C72B0')
axes[0].set_title('Top 10 Airports by Long-run On-time Rate')
axes[0].xaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
axes[0].set_xlabel('On-time rate')
axes[0].set_ylabel('Airport')

sns.barplot(data=bottom10, y='Airport.Code', x='long_run_on_time_rate', ax=axes[1], color='#C44E52')
axes[1].set_title('Bottom 10 Airports by Long-run On-time Rate')
axes[1].xaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter(1.0))
axes[1].set_xlabel('On-time rate')
axes[1].set_ylabel('')

plt.tight_layout()
plt.show()


**Airport risk map and rankings: How to use**

- Scatter plot: left = fewer delays, lower = shorter delays.
- Bubble size: traffic volume.
- Color: instability over years.

The best airports are generally in the **bottom-left** area with large bubble size (high volume + strong performance).
The extra bar chart gives a clean top/bottom ranking for non-technical readers.


### Can We Predict Delays?

This section builds prediction models to answer: how predictable are delays based on patterns we can measure?

**What we're predicting:**

1. **Delay rate**: What percentage of flights will be delayed next month at each airport?
2. **Severity**: When delays happen, how long will they last?

**What information we're using:**

- Which year and month (trends and seasonality)
- Which airport (some are inherently more delay-prone)
- How busy the airport is (flight volume)
- Number of airlines serving the airport (complexity)
- Historical delay cause patterns (if weather caused many delays before, it might again)
- Last month's delay rate (momentum/trends)

**Why build models?**

Not to forecast the future, but to understand: are delays systematic and predictable, or random and chaotic?

If models work well: Delays follow patterns. Airlines can anticipate and prepare.

If models fail: Delays are driven by unpredictable events we can't see in this data.

**How we test:**

We train on 2003-2013 data, then predict 2014-2016. This mimics real forecasting—learning from the past to predict the future.

**This is a baseline:**

Future improvements could add weather data, holiday indicators, airline-specific factors, or economic conditions. By establishing baseline performance now, we'll know whether those additions actually help.

In [None]:
model_df = (
    df.groupby(['Airport.Code', 'Time.Year', 'Time.Month'], as_index=False)
      .agg(
          carriers_total=('Statistics.Carriers.Total', 'mean'),
          flights_total=('Statistics.Flights.Total', 'sum'),
          flights_delayed=('Statistics.Flights.Delayed', 'sum'),
          delay_minutes_total=('Statistics.Minutes Delayed.Total', 'sum'),
          delay_carrier=('Statistics.# of Delays.Carrier', 'sum'),
          delay_late_aircraft=('Statistics.# of Delays.Late Aircraft', 'sum'),
          delay_nas=('Statistics.# of Delays.National Aviation System', 'sum'),
          delay_security=('Statistics.# of Delays.Security', 'sum'),
          delay_weather=('Statistics.# of Delays.Weather', 'sum')
      )
)
model_df['delay_rate'] = model_df['flights_delayed'] / (model_df['flights_total'] + eps)
model_df['severity'] = model_df['delay_minutes_total'] / (model_df['flights_delayed'] + eps)

reason_cols = ['delay_carrier', 'delay_late_aircraft', 'delay_nas', 'delay_security', 'delay_weather']
reason_sum = model_df[reason_cols].sum(axis=1) + eps
for c in reason_cols:
    model_df[f'{c}_share'] = model_df[c] / reason_sum

# Cyclical month encoding
model_df['month_sin'] = np.sin(2 * np.pi * model_df['Time.Month'] / 12.0)
model_df['month_cos'] = np.cos(2 * np.pi * model_df['Time.Month'] / 12.0)

# Lag feature: previous month delay rate per airport
model_df = model_df.sort_values(['Airport.Code', 'Time.Year', 'Time.Month']).reset_index(drop=True)
model_df['lag1_delay_rate'] = model_df.groupby('Airport.Code')['delay_rate'].shift(1)

display(model_df.head())

### Model Performance

The table shows prediction accuracy. Two models (ElasticNet and HistGradientBoosting) predicting two targets (delay_rate and severity).

**Understanding MAE (Mean Absolute Error):**

For delay_rate, if MAE = 0.03, predictions are typically off by 3 percentage points.
- Actual: 20% delayed
- Predicted: 17% or 23%

Lower is better:
- MAE < 0.03 = good predictions
- MAE 0.03-0.05 = moderate
- MAE > 0.05 = rough estimates

For severity, if MAE = 5.0, predictions are off by 5 minutes on average.

**Understanding RMSE:**

Similar to MAE but punishes large errors more. If RMSE is much bigger than MAE, the model has some really bad predictions. If they're similar, errors are consistent.

**What results tell us about delays:**

If both models perform similarly: Delays follow simple, predictable patterns. Basic rules work.

If HistGradientBoosting beats ElasticNet significantly: Complex interactions exist (December + Newark + weather = extra bad).

If MAE is low: Delays are systematic. Historical patterns, airport identity, and seasonality explain most variation. Airlines can plan for this.

If MAE is high: Delays are event-driven and unpredictable. Need real-time data or factors we don't have in this dataset.

**Next steps:**

Add weather data, holidays, airline-specific features. If MAE drops significantly, those factors matter. If MAE barely changes, they don't actually drive delays.

In [None]:
feature_cols_num = [
    'Time.Year', 'carriers_total', 'flights_total', 'month_sin', 'month_cos',
    'lag1_delay_rate',
    'delay_carrier_share', 'delay_late_aircraft_share', 'delay_nas_share',
    'delay_security_share', 'delay_weather_share'
]
feature_cols_cat = ['Airport.Code']

train = model_df[model_df['Time.Year'] <= 2013].copy()
test = model_df[model_df['Time.Year'] >= 2014].copy()

def build_preprocessor():
    num_pipe = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())
    ])
    cat_pipe = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='most_frequent')),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ])
    return ColumnTransformer(
        transformers=[
            ('num', num_pipe, feature_cols_num),
            ('cat', cat_pipe, feature_cols_cat)
        ]
    )

def evaluate_regression(target_col):
    X_train = train[feature_cols_num + feature_cols_cat]
    y_train = train[target_col]
    X_test = test[feature_cols_num + feature_cols_cat]
    y_test = test[target_col]

    enet = Pipeline(steps=[
        ('prep', build_preprocessor()),
        ('model', ElasticNet(alpha=0.01, l1_ratio=0.2, random_state=42, max_iter=5000))
    ])
    hgb = Pipeline(steps=[
        ('prep', build_preprocessor()),
        ('model', HistGradientBoostingRegressor(random_state=42))
    ])

    metrics = []
    for name, pipe in [('ElasticNet', enet), ('HistGradientBoosting', hgb)]:
        pipe.fit(X_train, y_train)
        pred = pipe.predict(X_test)
        mae = mean_absolute_error(y_test, pred)
        rmse = np.sqrt(mean_squared_error(y_test, pred))
        metrics.append({'target': target_col, 'model': name, 'MAE': mae, 'RMSE': rmse})

    return pd.DataFrame(metrics)

results_delay = evaluate_regression('delay_rate')
results_severity = evaluate_regression('severity')
display(pd.concat([results_delay, results_severity], ignore_index=True))

## Conclusions

---

## For Travelers

### Timing Matters

**September is your best bet.** Consistently the best month across all 14 years. Lower travel demand post-Labor Day, good weather.

**Avoid December.** Consistently the worst month. Holiday crowds, winter weather, and packed airports create delays.

**Summer is moderate risk.** June-July see more delays due to thunderstorms and peak travel.

**The industry improved.** Flying in 2012-2016 was significantly more reliable than 2003-2009. Delay rates dropped from 23-25% (2007-2008) to 15-17% (2012-2016).

### Airport Quality Varies Dramatically

Best airports: 82-85% on-time
Worst airports: 70-73% on-time

That's a 10-15 percentage point gap. Your airport matters as much as your travel month.

**Top performers:** Southwest U.S., West Coast airports with good weather and modern infrastructure

**Struggling airports:** NYC area (airspace congestion), Chicago (winter weather), Houston (hurricanes)

When booking connections, prefer airports with better track records. Build extra buffer time at challenging airports.

### What Different Delay Causes Mean

**Late Aircraft:** Expect longer waits. Cascading delays through the schedule. The airline needs to substitute another plane.

**Air Traffic Control (NAS):** Could take hours. The airline can't speed this up. You're waiting on ATC clearance.

**Weather:** Unpredictable. Could clear in 30 minutes or last hours. Check the radar.

**Maintenance/Carrier:** Variable. Could be a quick fix or require plane substitution.

**Security:** Rare. If it happens (evacuation, breach), expect major disruption.

---

## For Airlines and Airports

### Late Aircraft Is Your Biggest Problem

32.8% of delays by count but #1 in total delay time. Cascading effects create the longest passenger delays.

**Solutions:**
- Add schedule padding between flights
- Maintain spare aircraft and crews for faster substitution
- Improve turnaround processes (faster boarding, servicing)
- Proactive rebooking before cascades worsen

ROI is high. Reducing late aircraft delays has multiplier effects.

### NAS Delays Need Infrastructure Investment

40% of all delays stem from air traffic control and airport capacity constraints. Individual airlines have limited control.

This requires policy-level solutions: ATC modernization (NextGen), runway expansion at congested airports, regional airspace optimization.

What airlines can do: Optimize schedules around ATC bottlenecks, coordinate slot management with other carriers, lobby for infrastructure funding.

### Airport Performance Varies—Learn From the Best

Some airports achieve 82-85% on-time despite high volume. Others struggle at 70-73%.

Study what top performers do differently: schedule optimization, ground operations efficiency, weather preparedness, infrastructure quality.

Benchmark against similar airports. If they perform better, operational improvements are possible.

### Carrier Delays Are Improving

Good news: The 20-25% of delays within airline control (maintenance, crews, ground ops) are decreasing over time.

Fleet modernization and operational improvements are working. Continue investing in efficiency.

---

## For Policymakers

### The System Can Improve—We Have Proof

Delay rates dropped 10 percentage points from 2007 peak to 2013. The improvement held through 2016.

This wasn't luck. Airlines reduced schedules (less congestion), modernized fleets, and improved operations. **Interventions work.**

### Infrastructure Is the Persistent Problem

NAS delays (air traffic control, airport capacity) account for 40% of all delays. Unlike carrier delays, which have improved, NAS delays remain stubbornly high.

This points to underinvestment:
- Air traffic control runs on 1960s-era technology
- Congested airports lack runway/terminal capacity
- Regional airspace (NYC, Southern California) is overloaded

40% of the delay problem could be addressed with infrastructure funding.

### Airport Quality Gaps Present Opportunities

National averages hide 10-15 percentage point gaps between best and worst airports.

Targeted investment in struggling airports could have outsized impact. Best practice sharing from top performers to struggling ones. Regional approaches for multi-airport areas.

Airport reliability affects economic competitiveness. Businesses factor airport quality into location decisions.

### Weather Delays May Increase

Currently 5-8% of delays, but severe weather creates multi-hour disruptions. Climate change may increase frequency.

Monitor trends beyond 2016. Consider resilience investments: improved de-icing, drainage, alternative routing capabilities.

---

## The Biggest Surprise

**Severity improved even more than frequency.**

Not only do fewer flights get delayed now versus 2007, but when delays happen, they get resolved faster.

This means airlines improved at both prevention AND recovery. That represents systematic operational changes, not just favorable conditions.

---

## What We Still Don't Know

**Why was 2007 so bad?** We see it in the data but would need external research (fuel prices, labor disputes, ATC staffing) to explain it.

**Passenger impact:** A 3% delay rate improvement sounds small statistically but represents millions more passengers arriving on-time. We don't quantify the human or economic impact.

**Airline-specific performance:** Data is aggregated by airport. Can't compare Delta vs. United directly.

**Economic costs:** We measure minutes, not dollars lost to productivity, missed connections, and hotel costs.

**Post-2016 trends:** Analysis ends in 2016. Can't speak to COVID impact, recovery, or current performance.

---

## Next Steps to Improve This Analysis

1. Add external weather data (temperature, precipitation, wind)
2. Include holiday indicators (Thanksgiving, Christmas, Spring Break)
3. Extend to 2017-2025 data for post-COVID patterns
4. Break out airline-specific performance
5. Model economic impact (convert delay minutes to cost estimates)
6. Investigate 2007-2008 root causes through external research
7. Compare U.S. performance to international aviation systems

### Reading model results
The table reports MAE and RMSE for each model and target.
- Lower values are better.
- Compare models within the same target.

What conclusions to draw:
- If both models are close, the relationship may be mostly linear/simple.
- If boosting is much better, delay behavior may include non-linear interactions.

Use this as a starting benchmark before trying richer features (weather, operations, holidays).
