## RCA analysis of flight stats during Hurricane Sandy

![Hurricane Sandy](hurricane-sandy-nasa-image.jpg)

Hurricane Sandy hit the northeast coast of the United States on the 29th of October 2012 and dissipated on the 2nd of November 2012. The most affected cities were New York, Philadelphia, Boston, and Washington DC. Therefore, we're analyzing the following airports: 

**New York City:**
* John F. Kennedy International Airport (JFK)
* LaGuardia Airport (LGA)
* Newark Liberty International Airport (EWR)

**Philadelphia:**
* Philadelphia International Airport (PHL)

**Boston:**
* Logan International Airport (BOS)

**Washington D.C.:**
* Ronald Reagan Washington National Airport (DCA)
* Washington Dulles International Airport (IAD)
* Baltimore/Washington International Thurgood Marshall Airport (BWI)

### Hypotheses: 
1. Hurricane Sandy caused a drastic increase in flight cancellations from the 29th of October 2012 to 2nd of November 2012 from the following airports: JFK, LGA, EWR, PHL BOS, DCA, IAD, BWI.  
2. On the 29th of October 2012, most flights landing in airports JFK, LGA, EWR, PHL BOS, DCA, IAD, BWI were diverted. 
3. By comparing weather and flight data from 2011 and 2012, it's apparent that Hurricane Sandy was the primary cause for a large amount of flight cancellations in 2012. 

**To be found:**
* What percentage of flights were actually cancelled?
* What's the cut off wind speed and rain (prcp) that leads to flight delays or cancellations?
    * 30-35 km/h is the cut off -> plot the cancellations in relation to the cut off wind speed. 
* Does the combination of extreme temperatures and precipitation affect cancellations?  


In [4]:
import pandas as pd
import requests
from dotenv import load_dotenv
import os
import matplotlib.pyplot as plt
import seaborn as sns
import sqlalchemy
import time
import json
load_dotenv()

True

### Weather data - 2012

In [None]:
#Weather data for 2012
#API URL and headers
#Weather data for 2011
url = 'https://meteostat.p.rapidapi.com/point/daily'
headers = {
   "x-rapidapi-host": 'meteostat.p.rapidapi.com',
   "x-rapidapi-key": os.getenv('x-rapidapi-key')  # Ensure this environment variable is set
}

#airports and their coordinates
airports = {
    "JFK": {"lat": 40.6413, "lon": -73.7781},  # John F. Kennedy International Airport
    "LGA": {"lat": 40.7769, "lon": -73.8740},  # LaGuardia Airport
    "EWR": {"lat": 40.6895, "lon": -74.1745},  # Newark Liberty International Airport
    "PHL": {"lat": 39.8729, "lon": -75.2437},  # Philadelphia International Airport
    "BOS": {"lat": 42.3656, "lon": -71.0096},  # Boston Logan International Airport
    "DCA": {"lat": 38.8512, "lon": -77.0402},  # Ronald Reagan Washington National Airport
    "IAD": {"lat": 38.9531, "lon": -77.4565},  # Washington Dulles International Airport
    "BWI": {"lat": 39.1754, "lon": -76.6684}   # Baltimore/Washington International Thurgood Marshall Airport
}

#date range
start_date = "2012-10-01"
end_date = "2012-11-30"
weather_data_2012 = []

for airport_code, coordinates in airports.items():
    parameters = {
        "lat": coordinates["lat"],
        "lon": coordinates["lon"],
        "start": start_date,
        "end": end_date,
        "units": "metric"
    }
    time.sleep(1) 
    response = requests.get(url, headers=headers, params=parameters)

    if response.status_code == 200:
        data = response.json()
        for daily_data in data['data']:
            daily_data['airport_code'] = airport_code
            weather_data_2012.append(daily_data)
    else:
        print(f"Error fetching data for {airport_code}: {response.status_code} - {response.text}") 

#list of dictionaries to a DataFrame
weather_df_2012 = pd.DataFrame(weather_data_2012)


print(weather_df_2012.head())

In [None]:
#Cleaning steps for 2012
#Dropping empty columns
weather_df_2012 = weather_df_2012.drop(columns=['wpgt', 'tsun'])

#Filling missing values in wspd with 0
weather_df_2012['wspd'].fillna(0, inplace=True)

#Convert date column to datetime
weather_df_2012['date'] = pd.to_datetime(weather_df_2012['date']) 

In [None]:
# Write records stored in a dataframe to SQL database
table_name = 'weather_data_2012'
schema = 'cgn_analytics_24_3'
engine = get_engine()

if engine!=None:
    try:
        weather_df_2012.to_sql(table_name, # Name of SQL table
                        con=engine, # Engine or connection
                        if_exists='replace', # Drop the table before inserting new values 
                        schema=schema, # your class schema
                        index=False, # Write DataFrame index as a column
                        chunksize=5000, # Specify the number of rows in each batch to be written at a time
                        method='multi') # Pass multiple values in a single INSERT clause
        print(f"The {table_name} table was imported successfully.")
    # Error handling
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
        engine = None

### Weather data - 2011

In [None]:
#Weather data for 2011
#API URL and headers
url = 'https://meteostat.p.rapidapi.com/point/daily'
headers = {
   "x-rapidapi-host": 'meteostat.p.rapidapi.com',
   "x-rapidapi-key": os.getenv('x-rapidapi-key')  # Ensure this environment variable is set
}

#airports and their coordinates
airports = {
    "JFK": {"lat": 40.6413, "lon": -73.7781},  # John F. Kennedy International Airport
    "LGA": {"lat": 40.7769, "lon": -73.8740},  # LaGuardia Airport
    "EWR": {"lat": 40.6895, "lon": -74.1745},  # Newark Liberty International Airport
    "PHL": {"lat": 39.8729, "lon": -75.2437},  # Philadelphia International Airport
    "BOS": {"lat": 42.3656, "lon": -71.0096},  # Boston Logan International Airport
    "DCA": {"lat": 38.8512, "lon": -77.0402},  # Ronald Reagan Washington National Airport
    "IAD": {"lat": 38.9531, "lon": -77.4565},  # Washington Dulles International Airport
    "BWI": {"lat": 39.1754, "lon": -76.6684}   # Baltimore/Washington International Thurgood Marshall Airport
}

#date range
start_date = "2011-10-01"
end_date = "2011-11-30"
weather_data_2011 = []

for airport_code, coordinates in airports.items():
    parameters = {
        "lat": coordinates["lat"],
        "lon": coordinates["lon"],
        "start": start_date,
        "end": end_date,
        "units": "metric"
    }
    time.sleep(1)
    response = requests.get(url, headers=headers, params=parameters)

    if response.status_code == 200:
        data = response.json()
        for daily_data in data['data']:
            daily_data['airport_code'] = airport_code
            weather_data_2011.append(daily_data)
    else:
        print(f"Error fetching data for {airport_code}: {response.status_code} - {response.text}") 

#list of dictionaries to a DataFrame
weather_df_2011 = pd.DataFrame(weather_data_2011)

In [None]:
#Cleaning steps for 2011
#Dropping empty columns
weather_df_2011 = weather_df_2011.drop(columns=['wpgt', 'tsun'])

#Filling missing values in wspd with 0
weather_df_2011['wspd'].fillna(0, inplace=True)

#Convert date column to datetime
weather_df_2011['date'] = pd.to_datetime(weather_df_2011['date'])  

In [None]:
# Write records stored in a dataframe to SQL database
table_name = 'weather_data_2011'
schema = 'cgn_analytics_24_3'
engine = get_engine()

if engine!=None:
    try:
        weather_df_2011.to_sql(table_name, # Name of SQL table
                        con=engine, # Engine or connection
                        if_exists='replace', # Drop the table before inserting new values 
                        schema=schema, # your class schema
                        index=False, # Write DataFrame index as a column
                        chunksize=5000, # Specify the number of rows in each batch to be written at a time
                        method='multi') # Pass multiple values in a single INSERT clause
        print(f"The {table_name} table was imported successfully.")
    # Error handling
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
        engine = None