# Does weather affect flights

Dataset taken from [a datacamp lab](https://projects.datacamp.com/projects/1962)

- `flights_weather2022.csv` contains the same flight information as well as weather conditions such as 
 
| Variable   | Description                                           |
|------------|-------------------------------------------------------|
| `dep_time`   | Departure time (in the format hhmm) where `NA` corresponds to a cancelled flight        |
| `dep_delay`  | Departure delay, in minutes (negative for early)    |
| `origin`     | Origin airport where flight starts (IATA code)
| `airline`    | Carrier/airline name                        |
| `dest`       | Destination airport where flight lands (IATA code)  
| `visib`      | Visibility (in miles)                                 |
| `wind_gust`  | Wind gust speed (in mph)  | 

The goal of this notebook is to check if the weather (wind, temperature, humidity, visibility, preasure, temperature) has an impact on flight cancelations and delayed departures.

In [None]:
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 
import numpy as np 

df = pd.read_csv('../data/flights_weather2022.csv')

df.head()

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df['canceled'] = df['dep_time'].isna()

In [None]:
df.columns

In [None]:

columns = ['temp', 'dewp', 'humid', 'wind_dir', 'wind_speed',
       'wind_gust', 'precip', 'pressure', 'visib']

def impact_of_weather(df, in_col):
    # Set up the figure with subplots arranged in two rows
    num_cols = 3  # Number of plots per row
    num_rows = -(-len(columns) // num_cols)  # Calculate rows needed
    
    fig, axes = plt.subplots(num_rows, num_cols, figsize=(5 * num_cols, 5 * num_rows))
    
    # Flatten axes array for easier indexing
    axes = axes.flatten()
    
    # Loop over columns to create a boxplot for each
    for i, column in enumerate(columns):
        sns.boxplot(data=df, y=column, hue=in_col, ax=axes[i])  # Use `ax=axes[i]` for each subplot
        axes[i].set_title(f'Boxplot of {column}')
    
    # Hide any unused subplots
    for j in range(i + 1, len(axes)):
        axes[j].axis('off')
    
    plt.tight_layout()  # Adjust layout to prevent overlap
    plt.show()


impact_of_weather(df,'canceled')

In [None]:
df['delayed'] = df['dep_delay']>0

In [None]:
impact_of_weather(df,'delayed')

In [None]:
sns.kdeplot(df[df['canceled']==True].visib)

In [None]:
sns.kdeplot(df[df['canceled']==False].visib)

In [None]:
from scipy.stats import mannwhitneyu

visib_canceled = df[df['canceled'] == True]['visib']
visib_not_canceled = df[df['canceled'] == False]['visib']

# Mann-Whitney U Test
stat, p_value = mannwhitneyu(visib_canceled, visib_not_canceled)
print("Mann-Whitney U Test: p-value =", p_value)

With such small p-value we can conclude that the visibility has a statistically significant impact in the cancelation of flights.

In [None]:

visib_delayed = df[df['delayed'] == True]['visib']
visib_not_delayed = df[df['delayed'] == False]['visib']

# Mann-Whitney U Test
stat, p_value = mannwhitneyu(visib_delayed, visib_not_delayed)
print("Mann-Whitney U Test: p-value =", p_value)

It seems to have an impact in the delays too.