# ✈️  Exploratory Data Analysis for the Kaggle Airline Delays dataset
## 📌 Introduction
This dataset contains information on **airline delays in the United States** for **December 2019 and December 2020**.  
The focus is on arrival flights, cancellations, diversions, and delays, along with their causes (carrier, weather, airspace system, security, and late aircraft).  

## 📊 Description
- **Timeframe**: December 2019 and December 2020  
- **Scope**: U.S. domestic flights  
- **Unit of analysis**: Flights aggregated **per carrier per U.S. city**  
- **Features included**:  
  - `arr_flights` — total arrival flights  
  - `carrier_ct` — number of carrier-related delays  
  - `weather_ct` — number of weather-related delays  
  - `nas_ct` — number of NAS/system delays  
  - `security_ct` — number of security-related delays  
  - `late_aircraft_ct` — number of late aircraft delays  
  - `arr_cancelled` — cancelled flights  
  - `arr_diverted` — diverted flights  


The dataset allows us to explore delay patterns, identify the main contributors to disruptions, and compare performance across carriers and cities.  
Source: Kaggle, Airline Delays CSV file.


In [1]:
# -------------------------------
# Import necessary libraries
# -------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ydata_profiling import ProfileReport
from scipy import stats
import plotly.express as px
import plotly.graph_objects as go


# Load the dataset
# -------------------------------
df_airline_delays= pd.read_csv("airline_delay.csv")

# Data set display
# -------------------------------
print("═" * 50)
print("FIRST 10 ROWS OF DATASET:")
print("═" * 50)
df_airline_delays.head(10)


══════════════════════════════════════════════════
FIRST 10 ROWS OF DATASET:
══════════════════════════════════════════════════


Unnamed: 0,year,month,carrier,carrier_name,airport,airport_name,arr_flights,arr_del15,carrier_ct,weather_ct,...,security_ct,late_aircraft_ct,arr_cancelled,arr_diverted,arr_delay,carrier_delay,weather_delay,nas_delay,security_delay,late_aircraft_delay
0,2020,12,9E,Endeavor Air Inc.,ABE,"Allentown/Bethlehem/Easton, PA: Lehigh Valley ...",44.0,3.0,1.63,0.0,...,0.0,1.25,0.0,1.0,89.0,56.0,0.0,3.0,0.0,30.0
1,2020,12,9E,Endeavor Air Inc.,ABY,"Albany, GA: Southwest Georgia Regional",90.0,1.0,0.96,0.0,...,0.0,0.0,0.0,0.0,23.0,22.0,0.0,1.0,0.0,0.0
2,2020,12,9E,Endeavor Air Inc.,AEX,"Alexandria, LA: Alexandria International",88.0,8.0,5.75,0.0,...,0.0,0.65,0.0,1.0,338.0,265.0,0.0,45.0,0.0,28.0
3,2020,12,9E,Endeavor Air Inc.,AGS,"Augusta, GA: Augusta Regional at Bush Field",184.0,9.0,4.17,0.0,...,0.0,3.0,0.0,0.0,508.0,192.0,0.0,92.0,0.0,224.0
4,2020,12,9E,Endeavor Air Inc.,ALB,"Albany, NY: Albany International",76.0,11.0,4.78,0.0,...,0.0,1.0,1.0,0.0,692.0,398.0,0.0,178.0,0.0,116.0
5,2020,12,9E,Endeavor Air Inc.,ATL,"Atlanta, GA: Hartsfield-Jackson Atlanta Intern...",5985.0,445.0,142.89,11.96,...,1.0,127.79,5.0,0.0,30756.0,16390.0,1509.0,5060.0,16.0,7781.0
6,2020,12,9E,Endeavor Air Inc.,ATW,"Appleton, WI: Appleton International",142.0,14.0,5.36,0.0,...,0.0,0.94,1.0,0.0,436.0,162.0,0.0,182.0,0.0,92.0
7,2020,12,9E,Endeavor Air Inc.,AVL,"Asheville, NC: Asheville Regional",147.0,10.0,6.04,1.0,...,0.0,1.96,0.0,1.0,1070.0,838.0,141.0,24.0,0.0,67.0
8,2020,12,9E,Endeavor Air Inc.,AZO,"Kalamazoo, MI: Kalamazoo/Battle Creek Internat...",84.0,14.0,6.24,0.96,...,0.0,0.0,1.0,1.0,2006.0,1164.0,619.0,223.0,0.0,0.0
9,2020,12,9E,Endeavor Air Inc.,BDL,"Hartford, CT: Bradley International",150.0,19.0,5.7,0.0,...,0.0,1.23,3.0,0.0,846.0,423.0,0.0,389.0,0.0,34.0


In [2]:
# -------------------------------
# Basic information about the dataset
# -------------------------------
print("\n" + "═" * 50)
print("BASIC INFORMATION ABOUT THE DATASET:")
print("═" * 50)

print(f"\nDataset dimensions: {df_airline_delays.shape}")
print(f"\nNumber of rows: {df_airline_delays.shape[0]}")
print(f"\nNumber of columns: {df_airline_delays.shape[1]}")




══════════════════════════════════════════════════
BASIC INFORMATION ABOUT THE DATASET:
══════════════════════════════════════════════════

Dataset dimensions: (3351, 21)

Number of rows: 3351

Number of columns: 21


In [3]:
# -------------------------------
# Information on data types
# -------------------------------

print("\n" + "═" * 50)
print("INFORMATION ON DATA TYPES:")
print("═" * 50)
df_airline_delays.info()



══════════════════════════════════════════════════
INFORMATION ON DATA TYPES:
══════════════════════════════════════════════════
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3351 entries, 0 to 3350
Data columns (total 21 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   year                 3351 non-null   int64  
 1   month                3351 non-null   int64  
 2   carrier              3351 non-null   object 
 3   carrier_name         3351 non-null   object 
 4   airport              3351 non-null   object 
 5   airport_name         3351 non-null   object 
 6   arr_flights          3343 non-null   float64
 7   arr_del15            3343 non-null   float64
 8   carrier_ct           3343 non-null   float64
 9   weather_ct           3343 non-null   float64
 10  nas_ct               3343 non-null   float64
 11  security_ct          3343 non-null   float64
 12  late_aircraft_ct     3343 non-null   float64
 13  arr_canc

In [4]:
# -------------------------------
# Descriptive statistics
# -------------------------------
print("\n" + "═" * 50)
print("DESCRIPTIVE STATISTICS:")
print("═" * 50)
print(df_airline_delays.describe().round(2))


══════════════════════════════════════════════════
DESCRIPTIVE STATISTICS:
══════════════════════════════════════════════════
          year   month  arr_flights  arr_del15  carrier_ct  weather_ct  \
count  3351.00  3351.0      3343.00    3343.00     3343.00     3343.00   
mean   2019.46    12.0       298.27      51.00       16.07        1.44   
std       0.50     0.0       852.44     146.48       41.76        4.82   
min    2019.00    12.0         1.00       0.00        0.00        0.00   
25%    2019.00    12.0        35.00       5.00        1.49        0.00   
50%    2019.00    12.0        83.00      12.00        4.75        0.06   
75%    2020.00    12.0       194.50      33.00       12.26        1.01   
max    2020.00    12.0     19713.00    2289.00      697.00       89.42   

        nas_ct  security_ct  late_aircraft_ct  arr_cancelled  arr_diverted  \
count  3343.00      3343.00           3343.00        3343.00       3343.00   
mean     16.18         0.14             17.17     

# ✈️ Descriptive Statistics Analysis of Flight Delays

## 🔎 General Overview
- **Year**: Data only covers **2019 and 2020**.  
- **Month**: All records are for **December (12)** → dataset is focused only on December 2019 and December 2020.  
- **Count**: ~3,351 rows → solid sample size.  

---

## 🛫 Flights
- **arr_flights (arrival flights)**  
  - **Mean**: ~298  
  - **Median (50%)**: 83  
  - **Max**: 19,713  
  - ➝ A few very large airports dominate, while most airports handle <100 flights.  

- **arr_cancelled (canceled flights)**  
  - **Median**: 0  
  - **Max**: 224  
  - ➝ Most airports had no cancellations, but some outliers show large numbers.  

- **arr_diverted (diverted flights)**  
  - **Median**: 0  
  - **Max**: 42  
  - ➝ Diversions are rare.  

---

## ⏱️ Delays and Causes
- **arr_del15 (flights delayed >15 min)**  
  - **Mean**: 51  
  - **Median**: 12  
  - **Max**: 2,289  
  - ➝ Most airports have relatively few delays, but some airports have thousands.  

- **Delay causes (counts of delayed flights):**
  - **Carrier_ct (airline responsibility)** → mean 16, **max ~697**  
  - **Late_aircraft_ct (previous aircraft delayed)** → mean 17, **max ~820**  
  - **NAS_ct (airspace system issues)** → mean 16, **max ~1,039**  
  - **Weather_ct** → mean 1.4, **max ~89**  
  - **Security_ct** → mean 0.14, **max ~17**  

👉 **Conclusion**: The top three causes are **airline responsibility, late aircraft, and NAS issues**. Weather and security play a much smaller role.  

---

## ⏳ Length of Delays
- **arr_delay (total arrival delay minutes)**  
  - **Mean**: 1.25 minutes  
  - **Median**: 0 minutes  
  - **Max**: 1,150 minutes  
  - ➝ Most flights are on time; averages are inflated by outliers.  

- **Other delays (minutes):**  
  - **Carrier delay, weather delay, NAS delay, late aircraft delay** → mostly 0, but extreme outliers exist.  
  - **Security delay** → nearly always 0.  

---

## 📊 Key Insights
1. **Highly skewed distribution** → Most flights/airports experience little to no delay, but a few extreme cases distort averages.  
2. **Main drivers of delays**:  
   - Airline responsibility  
   - Late aircraft from previous flights  
   - NAS (air traffic system) issues  
3. **Weather and security delays are rare**, though extreme weather can cause spikes.  
4. **Cancellations and diversions are uncommon**, but some airports show clusters of cancellations.  
5. **Median values (50%) give a clearer picture** than means, since the dataset is dominated by outliers.  


In [5]:
# -----------------------------------------------
# Check unique values for categorical columns
# -----------------------------------------------
print("UNIQUE VALUES FOR CATEGORICAL COLUMNS:")
print("═" * 50)
print("Number of Unique Carriers:", df_airline_delays['carrier'].nunique())
print("Number of Unique Airports:", df_airline_delays['airport'].nunique())
print("Months in Data:", df_airline_delays['month'].unique())
print("Years in Data:", df_airline_delays['year'].unique())


UNIQUE VALUES FOR CATEGORICAL COLUMNS:
══════════════════════════════════════════════════
Number of Unique Carriers: 17
Number of Unique Airports: 360
Months in Data: [12]
Years in Data: [2020 2019]


In [6]:
# -------------------------------
# Missing values in dataset
# -------------------------------
print("\n" + "═" * 50)
print("MISSING VALUES IN DATASET:")
print("═" * 50)
missing_values = df_airline_delays.isnull().sum()
missing_percent = (df_airline_delays.isnull().sum() / len(df_airline_delays)) * 100
missing_df = pd.DataFrame({
    'Missing Values': missing_values,
    'Percentage (%)': missing_percent
})
print(missing_df[missing_df['Missing Values'] > 0])


══════════════════════════════════════════════════
MISSING VALUES IN DATASET:
══════════════════════════════════════════════════
                     Missing Values  Percentage (%)
arr_flights                       8        0.238735
arr_del15                         8        0.238735
carrier_ct                        8        0.238735
weather_ct                        8        0.238735
nas_ct                            8        0.238735
security_ct                       8        0.238735
late_aircraft_ct                  8        0.238735
arr_cancelled                     8        0.238735
arr_diverted                      8        0.238735
arr_delay                         8        0.238735
carrier_delay                     8        0.238735
weather_delay                     8        0.238735
nas_delay                         8        0.238735
security_delay                    8        0.238735
late_aircraft_delay               8        0.238735


In [7]:
# --------------------------------------------------------------
# Cleaning of numerical columns and filling with median values
# ---------------------------------------------------------------
df_clean = df_airline_delays.copy()
numeric_cols = df_airline_delays.select_dtypes(include=[np.number]).columns.tolist()
for col in numeric_cols:
    if df_airline_delays[col].isnull().any():
        df_clean[col] = df_airline_delays[col].fillna(df_airline_delays[col].median())

# ----- Cleaning of categorical columns -----         

df_clean = df_airline_delays.dropna(subset=[
    "arr_flights", "arr_del15", "carrier_ct", "weather_ct", 
    "nas_ct", "security_ct", "late_aircraft_ct", "arr_cancelled", 
    "arr_diverted", "arr_delay", "carrier_delay", "weather_delay", 
    "nas_delay", "security_delay", "late_aircraft_delay"
])

In [8]:
# -----------------------------------------------
# Checking for NaN values ​​after cleanup
# -----------------------------------------------
print("\n" + "═" * 50)
print("CHECKING FOR NAN VALUES AFTER CLEANUP:")
print("═" * 50)

missing_after = df_clean.isnull().sum()
print(missing_after)


══════════════════════════════════════════════════
CHECKING FOR NAN VALUES AFTER CLEANUP:
══════════════════════════════════════════════════
year                   0
month                  0
carrier                0
carrier_name           0
airport                0
airport_name           0
arr_flights            0
arr_del15              0
carrier_ct             0
weather_ct             0
nas_ct                 0
security_ct            0
late_aircraft_ct       0
arr_cancelled          0
arr_diverted           0
arr_delay              0
carrier_delay          0
weather_delay          0
nas_delay              0
security_delay         0
late_aircraft_delay    0
dtype: int64


In [9]:
print(df_clean.columns)


Index(['year', 'month', 'carrier', 'carrier_name', 'airport', 'airport_name',
       'arr_flights', 'arr_del15', 'carrier_ct', 'weather_ct', 'nas_ct',
       'security_ct', 'late_aircraft_ct', 'arr_cancelled', 'arr_diverted',
       'arr_delay', 'carrier_delay', 'weather_delay', 'nas_delay',
       'security_delay', 'late_aircraft_delay'],
      dtype='object')


In [10]:
# -----------------------------------------------
# Correlation matrix and heatmap visualization
# -----------------------------------------------

# ----- Select numeric columns ----- 
numeric_cols = df_clean.select_dtypes(include='number')


# ----- Compute correlation ----- 
correlation_matrix = numeric_cols.corr()


#  -----Filter strong correlations (0.7 < abs(corr) < 1) ----- 
abs_corr = correlation_matrix.abs()
top_cols = abs_corr[(abs_corr > 0.7) & (abs_corr < 1.0)] \
    .dropna(axis=0, how='all') \
    .dropna(axis=1, how='all') \
    .columns
filtered_corr = numeric_cols[top_cols].corr() if len(top_cols) > 0 else correlation_matrix


# ----- Plotly heatmap ----- 
fig = px.imshow(
    filtered_corr,
    text_auto=".2f",
    color_continuous_scale="RdBu",
    zmin=-1,
    zmax=1,
    aspect="auto",
    title="Correlation Matrix Heatmap"
)

fig.update_layout(
    plot_bgcolor="white",
    paper_bgcolor="lightgray"
)

fig.show()


In [11]:
# -----------------------------------------
# Display of the strongest correlations
# ----------------------------------------
print("\n" + "═" * 50)
print("STRONGEST CORRELATIONS:")
print("═" * 50)


corr_pairs = correlation_matrix.unstack().sort_values(key=abs, ascending=False)
corr_pairs = corr_pairs[corr_pairs != 1.0]  
print(corr_pairs.head(10))



══════════════════════════════════════════════════
STRONGEST CORRELATIONS:
══════════════════════════════════════════════════
late_aircraft_ct     late_aircraft_delay    0.962579
late_aircraft_delay  late_aircraft_ct       0.962579
arr_delay            arr_del15              0.960250
arr_del15            arr_delay              0.960250
late_aircraft_ct     arr_del15              0.949273
arr_del15            late_aircraft_ct       0.949273
arr_delay            late_aircraft_delay    0.948945
late_aircraft_delay  arr_delay              0.948945
arr_flights          arr_del15              0.928553
arr_del15            arr_flights            0.928553
dtype: float64


In [12]:
# ----------------------------------------
# Define delay types for count analysis
# ----------------------------------------
delay_count_columns = ['carrier_ct', 'weather_ct', 'nas_ct', 'security_ct', 'late_aircraft_ct']

In [13]:
# ------------------------
# Summary statistics
# -------------------------

# Ensure df_clean is a proper copy
# -----------------------------------------
df_clean = df_clean.copy()

# ----- Calculate delay rate ----- 
df_clean['delay_rate'] = df_clean['arr_del15'] / df_clean['arr_flights']


# ----- Calculate delay rate----- 
df_clean['delay_rate'] = df_clean['arr_del15'] / df_clean['arr_flights']

# ----- Airlines with highest delay rate (with minimum 100 flights for significance)----- 
significant_airlines = df_clean[df_clean['arr_flights'] >= 100]
airline_delay_rates = significant_airlines.groupby('carrier_name')['delay_rate'].mean().sort_values(ascending=False)

print("Airlines with Highest Delay Rates (min 100 flights):")
print(airline_delay_rates.head(10))

# ----- Airports with highest delay rate ----- 
significant_airports = df_clean[df_clean['arr_flights'] >= 100]
airport_delay_rates = significant_airports.groupby('airport_name')['delay_rate'].mean().sort_values(ascending=False)

print("\nAirports with Highest Delay Rates (min 100 flights):")
print(airport_delay_rates.head(10))


# ----- Define the correct delay count columns ----- 
delay_count_columns = ['carrier_ct', 'weather_ct', 'nas_ct', 'security_ct', 'late_aircraft_ct']

# ----- Most common delay type by airline ----- 
delay_by_airline = df_clean.groupby('carrier_name')[delay_count_columns].sum()
delay_by_airline['total_delays'] = delay_by_airline.sum(axis=1)

for delay_type in delay_count_columns:
    delay_by_airline[f'{delay_type}_pct'] = delay_by_airline[delay_type] / delay_by_airline['total_delays']

print("\nPrimary Delay Type by Airline:")
for airline in delay_by_airline.index:
    pct_columns = [f'{dt}_pct' for dt in delay_count_columns]
    primary_delay = delay_by_airline.loc[airline, pct_columns].idxmax()
    primary_delay = primary_delay.replace('_pct', '')
    print(f"{airline}: {primary_delay}")

# ----- Summary statistics ----- 
print(f"\nTotal Flights: {df_clean['arr_flights'].sum():,}")
print(f"Total Delayed Flights: {df_clean['arr_del15'].sum():,}")
print(f"Overall Delay Rate: {df_clean['arr_del15'].sum()/df_clean['arr_flights'].sum()*100:.2f}%")


Airlines with Highest Delay Rates (min 100 flights):
carrier_name
JetBlue Airways            0.273524
Allegiant Air              0.264285
ExpressJet Airlines LLC    0.248437
Mesa Airlines Inc.         0.237746
PSA Airlines Inc.          0.214595
Frontier Airlines Inc.     0.188769
Alaska Airlines Inc.       0.178865
Envoy Air                  0.171796
Southwest Airlines Co.     0.167964
United Air Lines Inc.      0.165303
Name: delay_rate, dtype: float64

Airports with Highest Delay Rates (min 100 flights):
airport_name
Concord, NC: Concord Padgett Regional                                     0.418605
Redding, CA: Redding Municipal                                            0.322581
Bakersfield, CA: Meadows Field                                            0.314571
Aguadilla, PR: Rafael Hernandez                                           0.298507
Akron, OH: Akron-Canton Regional                                          0.295359
Trenton, NJ: Trenton Mercer                                

In [14]:
# ------------------------------------------------------
# Top airlines and airports interactive visualization
# ------------------------------------------------------

# ----- Prepare data -----
top_airlines = airline_delay_rates.sort_values(ascending=False)
top_airports = airport_delay_rates.sort_values(ascending=False)

# ----- Plot top airlines -----
fig_airlines = go.Figure()
fig_airlines.add_trace(go.Bar(
    x=top_airlines.values[:10],
    y=top_airlines.index[:10],
    orientation='h',
    marker=dict(color=top_airlines.values[:10], colorscale='Reds'),
    name='Airlines'
))

# ----- Dropdown buttons for top N airlines ----- 
buttons_airlines = []
max_airlines = min(20, len(top_airlines))
for n in range(1, max_airlines + 1):
    buttons_airlines.append(dict(
        label=f"Top {n}",
        method="update",
        args=[{"x": [top_airlines.values[:n]],
               "y": [top_airlines.index[:n]],
               "marker.color": [top_airlines.values[:n]]},
              {"title": f"Top {n} Airlines by Delay Rate",
               "xaxis": {"title": "Delay Rate"},
               "yaxis": {"title": "Airline", "autorange": "reversed"}}]
    ))

fig_airlines.update_layout(
    title="Top 10 Airlines by Delay Rate",
    xaxis=dict(title="Delay Rate"),
    yaxis=dict(title="Airline", autorange="reversed"),
    plot_bgcolor='lightgray',
    paper_bgcolor='lightgray',
    updatemenus=[dict(
        active=9,  # default top 10
        buttons=buttons_airlines,
        direction="down",
        showactive=True,
        x=0,
        y=1.15
    )]
)

# ----- Plot top airports -----
fig_airports = go.Figure()
fig_airports.add_trace(go.Bar(
    x=top_airports.values[:10],
    y=top_airports.index[:10],
    orientation='h',
    marker=dict(color=top_airports.values[:10], colorscale='Blues'),
    name='Airports'
))

# ----- Dropdown buttons for top N airports ----- 
buttons_airports = []
max_airports = min(20, len(top_airports))
for n in range(1, max_airports + 1):
    buttons_airports.append(dict(
        label=f"Top {n}",
        method="update",
        args=[{"x": [top_airports.values[:n]],
               "y": [top_airports.index[:n]],
               "marker.color": [top_airports.values[:n]]},
              {"title": f"Top {n} Airports by Delay Rate",
               "xaxis": {"title": "Delay Rate"},
               "yaxis": {"title": "Airport", "autorange": "reversed"}}]
    ))

fig_airports.update_layout(
    title="Top 10 Airports by Delay Rate",
    xaxis=dict(title="Delay Rate"),
    yaxis=dict(title="Airport", autorange="reversed"),
    plot_bgcolor='lightgray',
    paper_bgcolor='lightgray',
    updatemenus=[dict(
        active=9,  # default top 10
        buttons=buttons_airports,
        direction="down",
        showactive=True,
        x=0,
        y=1.15
    )]
)

# ----- Show plots -----
fig_airlines.show()
fig_airports.show()


In [15]:
# ---------------------------------
# Pie chart visualization
# ---------------------------------

# ----- Sum total delays per type ----- 
delay_type_totals = df_clean[delay_count_columns].sum().reset_index()
delay_type_totals.columns = ['Delay Type', 'Total']

# ----- Create interactive pie chart----- 
fig_pie = px.pie(
    delay_type_totals,
    names='Delay Type',
    values='Total',
    hole=0.3  # donut chart
)

# ----- Show labels on pie slices: percent + name ----- 
fig_pie.update_traces(
    textinfo='percent+label',  # show label and percentage
    textposition='inside',
    pull=[0.05]*len(delay_type_totals)
)

# ----- Update layout to make title visible ----- 
fig_pie.update_layout(
    title=dict(
        text='Proportion of Delay Types',
        font=dict(color='black', size=24),  # use black to be visible
        x=0.5  # center title
    ),
    plot_bgcolor='lightgray',
    paper_bgcolor='lightgray',
    width=1200,
    height=500
)

fig_pie.show()


# ✈️ Conclusion and Insights

## 📊 Key Findings from Airline Delay Analysis

### 1. 🎯 Overall Performance
- **Total Flights:** 997,120  
- **Delayed Flights:** 170,477  
- **Overall Delay Rate:** 17.10%  
- **Interpretation:** Roughly 1 in 6 flights experienced a delay  

### 2. 🏆 Carrier Performance Ranking
- **Highest Delay Rate:** JetBlue Airways (27.35%)  
- **Other high-delay carriers:** Allegiant Air (26.43%), ExpressJet (24.84%)  
- **Lowest among top 10:** United Airlines (16.53%)  
- **Major national carriers** (United, Southwest, Alaska): 16–18% delay rate  

### 3. 🛫 Airport Performance Ranking
- **Worst Performer:** Concord, NC (Concord Padgett Regional) – 41.86%  
- **Other high-delay airports:** Redding, CA (32.26%), Bakersfield, CA (31.46%)  
- Most high-delay airports are small/regional with limited capacity  
- Larger hubs maintain lower, more stable delay rates  

### 4. ⚠️ Delay Distribution Patterns
- Delays are heavily concentrated in smaller carriers and airports  
- Regional airports with limited infrastructure/weather exposure face extreme rates  
- JetBlue and Allegiant alone contribute disproportionately to total delays  

### 5. 📈 Operational Insights
- Smaller/low-cost carriers face higher operational challenges vs. major airlines  
- Hub airports manage large volumes better than small airports with fewer flights  
- Delay rates vary widely: worst airports/airlines are nearly double the system average  

### 6. 🎯 Recommendations for Improvement
- Focus on operational reliability for regional/low-cost carriers (JetBlue, Allegiant)  
- Infrastructure/ATC support for regional airports with 30%+ delay rates  
- Prioritize predictive scheduling and turnaround efficiency  
- Investigate geographic/weather-specific mitigation strategies (e.g., Aspen)  

### 7. 🔍 Areas for Further Analysis
- Seasonal delay patterns (summer vs. winter)  
- Time-of-day effects (morning vs. evening flights)  
- Route-specific performance (origin–destination pairs)  
- Delay cause breakdown (carrier vs. weather vs. NAS vs. late aircraft)  
- Cost and passenger impact of delays at high-delay airports  

---

## 📝 Summary
The overall U.S. flight delay rate is **17.1%**, but certain airlines and airports significantly exceed this benchmark.  
JetBlue, Allegiant, and regional airports like Concord and Redding drive a disproportionate share of delays.  
Improvements in carrier operations and regional airport infrastructure could substantially reduce system-wide disruptions.
