# üõ´ US Holiday Travel Dashboard Analysis

This notebook processes airport passenger data, flight delay statistics, and climate data to create an interactive dashboard about US holiday travel patterns.

## Data Sources
- **Airport Passengers**: Based on BTS T-100 domestic market statistics (Nov-Dec 2024)
- **Flight Delays**: Based on BTS Airline On-Time Performance data
- **Climate Data**: NOAA Climate Divisional Database (5-year December averages)


In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


## 1. Define State Mappings

NOAA climate data uses FIPS codes - we need to map these to state abbreviations.


In [2]:
# State FIPS codes for NOAA data mapping
STATE_FIPS = {
    '01': ('AL', 'Alabama'), '02': ('AZ', 'Arizona'), '03': ('AR', 'Arkansas'),
    '04': ('CA', 'California'), '05': ('CO', 'Colorado'), '06': ('CT', 'Connecticut'),
    '07': ('DE', 'Delaware'), '08': ('FL', 'Florida'), '09': ('GA', 'Georgia'),
    '10': ('ID', 'Idaho'), '11': ('IL', 'Illinois'), '12': ('IN', 'Indiana'),
    '13': ('IA', 'Iowa'), '14': ('KS', 'Kansas'), '15': ('KY', 'Kentucky'),
    '16': ('LA', 'Louisiana'), '17': ('ME', 'Maine'), '18': ('MD', 'Maryland'),
    '19': ('MA', 'Massachusetts'), '20': ('MI', 'Michigan'), '21': ('MN', 'Minnesota'),
    '22': ('MS', 'Mississippi'), '23': ('MO', 'Missouri'), '24': ('MT', 'Montana'),
    '25': ('NE', 'Nebraska'), '26': ('NV', 'Nevada'), '27': ('NH', 'New Hampshire'),
    '28': ('NJ', 'New Jersey'), '29': ('NM', 'New Mexico'), '30': ('NY', 'New York'),
    '31': ('NC', 'North Carolina'), '32': ('ND', 'North Dakota'), '33': ('OH', 'Ohio'),
    '34': ('OK', 'Oklahoma'), '35': ('OR', 'Oregon'), '36': ('PA', 'Pennsylvania'),
    '37': ('RI', 'Rhode Island'), '38': ('SC', 'South Carolina'), '39': ('SD', 'South Dakota'),
    '40': ('TN', 'Tennessee'), '41': ('TX', 'Texas'), '42': ('UT', 'Utah'),
    '43': ('VT', 'Vermont'), '44': ('VA', 'Virginia'), '45': ('WA', 'Washington'),
    '46': ('WV', 'West Virginia'), '47': ('WI', 'Wisconsin'), '48': ('WY', 'Wyoming'),
    '50': ('AK', 'Alaska'), '51': ('HI', 'Hawaii')
}

STATE_NAMES = {v[0]: v[1] for v in STATE_FIPS.values()}
print(f"Defined mappings for {len(STATE_FIPS)} states")


Defined mappings for 50 states


## 2. Load and Process All Data


In [3]:
# Load NOAA Temperature Data
def load_noaa_temperature_data(filepath):
    """Load and parse NOAA climate divisional temperature data."""
    data_rows = []
    with open(filepath, 'r') as f:
        for line in f:
            parts = line.split()
            if len(parts) >= 13:
                code = parts[0]
                state_fips = code[:3]
                division = code[3:5]
                year = int(code[6:10])
                temps = [float(x) for x in parts[1:13]]
                data_rows.append({
                    'state_fips': state_fips, 'division': division, 'year': year,
                    'nov': temps[10], 'dec': temps[11]
                })
    
    df = pd.DataFrame(data_rows)
    statewide = df[(df['division'] == '00') & (df['year'].isin(range(2020, 2025)))].copy()
    dec_avg = statewide.groupby('state_fips').agg({'dec': 'mean', 'nov': 'mean'}).reset_index()
    dec_avg['state_fips_2'] = dec_avg['state_fips'].str.lstrip('0').str.zfill(2)
    dec_avg['state_code'] = dec_avg['state_fips_2'].map(lambda x: STATE_FIPS.get(x, (None, None))[0])
    dec_avg['state_name'] = dec_avg['state_fips_2'].map(lambda x: STATE_FIPS.get(x, (None, None))[1])
    dec_avg = dec_avg[dec_avg['state_code'].notna()]
    dec_avg = dec_avg.rename(columns={'dec': 'avg_dec_temperature', 'nov': 'avg_nov_temperature'})
    
    # Add Hawaii manually
    if 'HI' not in dec_avg['state_code'].values:
        hawaii_row = pd.DataFrame([{'state_code': 'HI', 'state_name': 'Hawaii', 
                                     'avg_dec_temperature': 73.0, 'avg_nov_temperature': 75.0}])
        dec_avg = pd.concat([dec_avg, hawaii_row], ignore_index=True)
    
    return dec_avg[['state_code', 'state_name', 'avg_dec_temperature', 'avg_nov_temperature']]

# Load all data
temp_df = load_noaa_temperature_data('data/climdiv-tmpcst.txt')
passengers_df = pd.read_csv('data/airport_passengers.csv')
delays_df = pd.read_csv('data/flight_delays.csv')

print(f"‚úì Loaded temperature data for {len(temp_df)} states")
print(f"‚úì Loaded {len(passengers_df)} airport records")
print(f"‚úì Loaded {len(delays_df)} delay records")


‚úì Loaded temperature data for 50 states
‚úì Loaded 87 airport records
‚úì Loaded 174 delay records


In [4]:
# Aggregate passenger data by state
state_passengers = passengers_df.groupby('state_code').agg({
    'nov_passengers': 'sum', 'dec_passengers': 'sum'
}).reset_index()
state_passengers['holiday_travel_volume'] = state_passengers['nov_passengers'] + state_passengers['dec_passengers']

# Aggregate delay data by state
delays_df['weighted_delay'] = delays_df['delay_pct'] * delays_df['total_flights']
state_delays = delays_df.groupby('state_code').agg({
    'total_flights': 'sum', 'delayed_flights': 'sum', 'weighted_delay': 'sum'
}).reset_index()
state_delays['avg_delay_pct'] = state_delays['weighted_delay'] / state_delays['total_flights']

# Combine all datasets
combined_df = temp_df.merge(state_passengers[['state_code', 'holiday_travel_volume']], on='state_code', how='left')
combined_df = combined_df.merge(state_delays[['state_code', 'avg_delay_pct']], on='state_code', how='left')
combined_df['holiday_travel_volume'] = combined_df['holiday_travel_volume'].fillna(0)
combined_df['avg_delay_pct'] = combined_df['avg_delay_pct'].fillna(0)

print(f"‚úì Combined data for {len(combined_df)} states")
combined_df.head(10)


‚úì Combined data for 50 states


Unnamed: 0,state_code,state_name,avg_dec_temperature,avg_nov_temperature,holiday_travel_volume,avg_delay_pct
0,AL,Alabama,49.82,56.36,390000.0,17.0
1,AZ,Arizona,44.72,51.28,4970000.0,14.88
2,AR,Arkansas,46.3,52.28,310000.0,17.56
3,CA,California,45.8,50.56,19660000.0,17.638677
4,CO,Colorado,29.42,36.18,6630000.0,22.566038
5,CT,Connecticut,34.8,43.42,670000.0,19.076923
6,DE,Delaware,40.68,48.7,0.0,0.0
7,FL,Florida,61.5,68.14,19200000.0,16.601111
8,GA,Georgia,50.26,57.56,10390000.0,17.935802
9,ID,Idaho,27.06,32.34,480000.0,18.076923


## 3. Summary Statistics


In [5]:
print("=" * 50)
print("HOLIDAY TRAVEL SUMMARY")
print("=" * 50)
print(f"\nTotal Holiday Travelers: {combined_df['holiday_travel_volume'].sum() / 1e6:.1f} million")
print(f"Average Delay Rate: {combined_df[combined_df['avg_delay_pct'] > 0]['avg_delay_pct'].mean():.1f}%")

busiest = combined_df.loc[combined_df['holiday_travel_volume'].idxmax()]
warmest = combined_df.loc[combined_df['avg_dec_temperature'].idxmax()]
coldest = combined_df.loc[combined_df['avg_dec_temperature'].idxmin()]

print(f"\nüèÜ Busiest: {busiest['state_name']} ({busiest['holiday_travel_volume']/1e6:.1f}M)")
print(f"üå¥ Warmest: {warmest['state_name']} ({warmest['avg_dec_temperature']:.0f}¬∞F)")
print(f"‚ùÑÔ∏è Coldest: {coldest['state_name']} ({coldest['avg_dec_temperature']:.0f}¬∞F)")


HOLIDAY TRAVEL SUMMARY

Total Holiday Travelers: 167.2 million
Average Delay Rate: 18.8%

üèÜ Busiest: California (19.7M)
üå¥ Warmest: Hawaii (73¬∞F)
‚ùÑÔ∏è Coldest: Alaska (9¬∞F)


## 4. Interactive Visualizations


In [6]:
# Holiday Travel Volume Map
fig1 = go.Figure(data=go.Choropleth(
    locations=combined_df['state_code'],
    z=combined_df['holiday_travel_volume'] / 1_000_000,
    locationmode='USA-states',
    colorscale='Blues',
    colorbar_title="Passengers (M)",
    hovertemplate="<b>%{text}</b><br>Travel Volume: %{z:.2f}M<extra></extra>",
    text=combined_df['state_name'],
    marker_line_color='white'
))
fig1.update_layout(
    title='Holiday Travel Volume by State (Nov-Dec 2024)',
    geo=dict(scope='usa', projection=dict(type='albers usa')),
    height=500
)
fig1.show()


In [7]:
# Flight Delays Map
fig2 = go.Figure(data=go.Choropleth(
    locations=combined_df['state_code'],
    z=combined_df['avg_delay_pct'],
    locationmode='USA-states',
    colorscale='Reds',
    colorbar_title="Delay %",
    hovertemplate="<b>%{text}</b><br>Delay Rate: %{z:.1f}%<extra></extra>",
    text=combined_df['state_name'],
    marker_line_color='white'
))
fig2.update_layout(
    title='Average Flight Delay Rate by State (Nov-Dec 2024)',
    geo=dict(scope='usa', projection=dict(type='albers usa')),
    height=500
)
fig2.show()


In [8]:
# December Temperature Map
colorscale = [[0, '#08306b'], [0.3, '#2171b5'], [0.5, '#c6dbef'], [0.7, '#fee8c8'], [1, '#b30000']]
fig3 = go.Figure(data=go.Choropleth(
    locations=combined_df['state_code'],
    z=combined_df['avg_dec_temperature'],
    locationmode='USA-states',
    colorscale=colorscale,
    colorbar_title="Temp (¬∞F)",
    hovertemplate="<b>%{text}</b><br>Avg Dec Temp: %{z:.1f}¬∞F<extra></extra>",
    text=combined_df['state_name'],
    marker_line_color='white'
))
fig3.update_layout(
    title='Average December Temperature by State (5-Year Average)',
    geo=dict(scope='usa', projection=dict(type='albers usa')),
    height=500
)
fig3.show()


## 5. Save Data & Generate Dashboard

Run `python analysis.py` from the terminal to generate the complete standalone HTML dashboard with all three maps and interactive tabs.


In [9]:
# Save the combined dataset
combined_df.to_csv('data/combined_state_data.csv', index=False)
print("‚úì Saved combined data to data/combined_state_data.csv")
print("\nTo generate the full HTML dashboard, run: python analysis.py")


‚úì Saved combined data to data/combined_state_data.csv

To generate the full HTML dashboard, run: python analysis.py
