## Choosing Variables
I want to focus on the number of arrival flights delayed by 15 minutes or more, because that’s the point where customers start getting frustrated. It’s also interesting to look at the different types of delays and flight counts to see how they affect overall performance. Finally, checking canceled or diverted flights helps understand bigger disruptions.

In [None]:
delay_var = [
    'carrier_delay',
    'weather_delay',
    'nas_delay',
    'security_delay',
    'late_aircraft_delay'
]

print(delays_df[delay_var].describe())

plt.figure(figsize=(15,6))

for i, col in enumerate(delay_var, 1):
    plt.subplot(1, 8, i)
    sns.boxplot(y=delays_df[col].clip(upper=7200) / 60)
    plt.title(col)
    plt.ylabel('Hours (Capped at 5 days)')
plt.tight_layout()
plt.show()

In [None]:
carrier_delays = delays_df.groupby('carrier')['arr_del15'].median().sort_values(ascending=False)

plt.figure(figsize=(10,8))
sns.barplot(x=carrier_delays.index, y=carrier_delays.values)
plt.xlabel("Carrier")
plt.ylabel("Number of Delays (15+ min)")
plt.title("Median Delay by Carrier")
plt.tight_layout()
plt.show()

In [None]:
monthly_delays = delays_df.groupby(['year', 'month'])['weather_delay'].median()

monthly_delays.plot(figsize=(15,8))
plt.xlabel("Year, Month")
plt.ylabel("Median Arrival Delays (15+ min)")
plt.title("Median Arrival Delays Over Time")
plt.margins(x=0)
plt.show()