In [61]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# BNA Airport-Flight Delay Analysis: Pre-COVID, During-COVID, and Post-COVID Impact

## Project Overview
This capstone project analyzes flight performance at **Nashville International Airport (BNA)** with a focus on **departure delays, arrival delays, and cancellations** across major U.S. airlines. The goal is to understand how airline operations at BNA were affected **before, during, and after the COVID-19 pandemic**, and how recovery patterns varied by airline.

## Key Questions
- How did departure and arrival delays at BNA change during COVID?
- Were cancellations more volatile than delays at BNA?
- Did low-cost and legacy carriers operating at BNA recover at different rates?
- Which airlines showed faster post-COVID operational stability at BNA?


In [91]:
departures = pd.read_csv('../data/Departure.csv',encoding='latin1',low_memory=False)

In [92]:
departures.shape

(497696, 16)

In [93]:
departures.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 497696 entries, 0 to 497695
Data columns (total 16 columns):
 #   Column                                    Non-Null Count   Dtype 
---  ------                                    --------------   ----- 
 0   Carrier Code                              497696 non-null  object
 1   Airlines                                  497696 non-null  object
 2   Date (MM/DD/YYYY)                         497696 non-null  object
 3   Flight Number                             497696 non-null  int64 
 4   Destination Airport                       497696 non-null  object
 5   Destination City                          497696 non-null  object
 6   Scheduled departure time                  497696 non-null  object
 7   Actual departure time                     497673 non-null  object
 8   Scheduled elapsed time (Minutes)          497696 non-null  int64 
 9   Actual elapsed time (Minutes)             497696 non-null  int64 
 10  Departure delay (Minutes)       

In [94]:
departures.isna().sum()

Carrier Code                                 0
Airlines                                     0
Date (MM/DD/YYYY)                            0
Flight Number                                0
Destination Airport                          0
Destination City                             0
Scheduled departure time                     0
Actual departure time                       23
Scheduled elapsed time (Minutes)             0
Actual elapsed time (Minutes)                0
Departure delay (Minutes)                    0
Delay Carrier (Minutes)                      0
Delay Weather (Minutes)                      0
Delay National Aviation System (Minutes)     0
Delay Security (Minutes)                     0
Delay Late Aircraft Arrival (Minutes)        0
dtype: int64

In [95]:
departures.head(2)

Unnamed: 0,Carrier Code,Airlines,Date (MM/DD/YYYY),Flight Number,Destination Airport,Destination City,Scheduled departure time,Actual departure time,Scheduled elapsed time (Minutes),Actual elapsed time (Minutes),Departure delay (Minutes),Delay Carrier (Minutes),Delay Weather (Minutes),Delay National Aviation System (Minutes),Delay Security (Minutes),Delay Late Aircraft Arrival (Minutes)
0,AA,American Airlines,1/1/2018,469,PHL,"Philadelphia, PA",9:24:00 AM,9:35:00 AM,124,109,11,0,0,0,0,0
1,AA,American Airlines,1/1/2018,602,DFW,"DallasFort Worth, TX",8:51:00 PM,9:13:00 PM,139,132,22,0,0,0,0,15


##BOM(Byte Order Mark ) issue was there before column name so used (encoding="utf-8-sig) to remove it for all df.

In [102]:
arrivals = pd.read_csv('../data/Arrivals.csv', encoding='utf-8-sig', low_memory=False)

In [103]:
arrivals.shape

(513001, 16)

In [104]:
arrivals.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513001 entries, 0 to 513000
Data columns (total 16 columns):
 #   Column                                    Non-Null Count   Dtype 
---  ------                                    --------------   ----- 
 0   Carrier Code                              513001 non-null  object
 1   Airlines                                  513001 non-null  object
 2   Date (MM/DD/YYYY)                         513001 non-null  object
 3   Flight Number                             513001 non-null  int64 
 4   Origin Airport                            513001 non-null  object
 5   Origin City                               513001 non-null  object
 6   Scheduled Arrival Time                    513001 non-null  object
 7   Actual Arrival Time                       512744 non-null  object
 8   Scheduled Elapsed Time (Minutes)          513001 non-null  int64 
 9   Actual Elapsed Time (Minutes)             513001 non-null  int64 
 10  Arrival Delay (Minutes)         

In [105]:
origin_nulls = arrivals[arrivals['Origin City'].isna()]
origin_nulls.head()

Unnamed: 0,Carrier Code,Airlines,Date (MM/DD/YYYY),Flight Number,Origin Airport,Origin City,Scheduled Arrival Time,Actual Arrival Time,Scheduled Elapsed Time (Minutes),Actual Elapsed Time (Minutes),Arrival Delay (Minutes),Delay Carrier (Minutes),Delay Weather (Minutes),Delay National Aviation System (Minutes),Delay Security (Minutes),Delay Late Aircraft Arrival (Minutes)


In [106]:
arrivals.isna().sum()

Carrier Code                                  0
Airlines                                      0
Date (MM/DD/YYYY)                             0
Flight Number                                 0
Origin Airport                                0
Origin City                                   0
Scheduled Arrival Time                        0
Actual Arrival Time                         257
Scheduled Elapsed Time (Minutes)              0
Actual Elapsed Time (Minutes)                 0
Arrival Delay (Minutes)                       0
Delay Carrier (Minutes)                       0
Delay Weather (Minutes)                       0
Delay National Aviation System (Minutes)      0
Delay Security (Minutes)                      0
Delay Late Aircraft Arrival (Minutes)         0
dtype: int64

In [107]:
arrivals.head(2)

Unnamed: 0,Carrier Code,Airlines,Date (MM/DD/YYYY),Flight Number,Origin Airport,Origin City,Scheduled Arrival Time,Actual Arrival Time,Scheduled Elapsed Time (Minutes),Actual Elapsed Time (Minutes),Arrival Delay (Minutes),Delay Carrier (Minutes),Delay Weather (Minutes),Delay National Aviation System (Minutes),Delay Security (Minutes),Delay Late Aircraft Arrival (Minutes)
0,AA,American Airlines,1/1/2018,829,PHL,"Philadelphia, PA",9:34:00 PM,9:29:00 PM,139,129,-5,0,0,0,0,0
1,AA,American Airlines,1/1/2018,851,CLT,"Charlotte, NC",12:09:00 PM,12:05:00 PM,89,90,-4,0,0,0,0,0


In [108]:
cancellations = pd.read_csv('../data/Airlines_Cancellation.csv',encoding='utf-8-sig',low_memory=False)

In [109]:
cancellations.shape

(10370, 6)

In [110]:
cancellations.isna().sum()

Carrier Code           0
Airlines               0
Date (MM/DD/YYYY)      0
Flight_Number          0
Destination Airport    0
Destination City       0
dtype: int64

### Data Cleaning & Preparation (Excel + Python)

- The original dataset contained several missing (null) values.
- Initial data cleaning was performed in **Excel** before importing the data into Python.
- The following columns were removed in Excel due to high null counts and limited analytical value:
  - `Tail Number`
  - `Taxi-Out Time`
  - `Wheels-Off Time`
- After cleaning, the updated CSV files were imported into the Jupyter Notebook for further analysis.
- A new column (City) was created to map **airport codes** to their corresponding **city and state** using a lookup approach.

### DataFrames Used for Analysis

 **Departure DataFrame**  
  Contains information related to scheduled vs actual departure times, departure delays, and associated factors.


In [111]:
departures.head(2)

Unnamed: 0,Carrier Code,Airlines,Date (MM/DD/YYYY),Flight Number,Destination Airport,Destination City,Scheduled departure time,Actual departure time,Scheduled elapsed time (Minutes),Actual elapsed time (Minutes),Departure delay (Minutes),Delay Carrier (Minutes),Delay Weather (Minutes),Delay National Aviation System (Minutes),Delay Security (Minutes),Delay Late Aircraft Arrival (Minutes)
0,AA,American Airlines,1/1/2018,469,PHL,"Philadelphia, PA",9:24:00 AM,9:35:00 AM,124,109,11,0,0,0,0,0
1,AA,American Airlines,1/1/2018,602,DFW,"DallasFort Worth, TX",8:51:00 PM,9:13:00 PM,139,132,22,0,0,0,0,15


- **Arrival DataFrame**  
  Includes arrival times, arrival delays, and on-time performance metrics for inbound flights.

In [112]:
arrivals.head(2)

Unnamed: 0,Carrier Code,Airlines,Date (MM/DD/YYYY),Flight Number,Origin Airport,Origin City,Scheduled Arrival Time,Actual Arrival Time,Scheduled Elapsed Time (Minutes),Actual Elapsed Time (Minutes),Arrival Delay (Minutes),Delay Carrier (Minutes),Delay Weather (Minutes),Delay National Aviation System (Minutes),Delay Security (Minutes),Delay Late Aircraft Arrival (Minutes)
0,AA,American Airlines,1/1/2018,829,PHL,"Philadelphia, PA",9:34:00 PM,9:29:00 PM,139,129,-5,0,0,0,0,0
1,AA,American Airlines,1/1/2018,851,CLT,"Charlotte, NC",12:09:00 PM,12:05:00 PM,89,90,-4,0,0,0,0,0


- **Cancellation DataFrame**  
  Focuses on flight cancellations and related attributes such as airline, airport, and date.

In [113]:
cancellations.head(2)

Unnamed: 0,Carrier Code,Airlines,Date (MM/DD/YYYY),Flight_Number,Destination Airport,Destination City
0,AA,American Airlines,1/4/2018,469,PHL,"Philadelphia, PA"
1,AA,American Airlines,1/8/2018,1899,PHL,"Philadelphia, PA"
