# Analysis of Flight Arrival Delays
## Author: Ranil Rai

## Introduction

This Jupyter Notebook is dedicated to analyzing the flight arrival data for two major airlines, ALASKA and AMWEST. We aim to explore the on-time performance and delay patterns across five destinations. By leveraging the power of pandas within Python, we have structured our dataset from a chart into a CSV format and performed an in-depth analysis to compare the delay rates between the airlines. The insights gained from this analysis will provide a clearer picture of each airline's punctuality, ultimately aiding travelers and stakeholders in making informed decisions.

The dataset includes the number of on-time and delayed arrivals, providing a basis for calculating the delay rate. Our primary objective is to identify which airline has a higher rate of delays and to present our findings in a clear, concise manner.


# Flight Delay Data Creation

In this section, we are creating a DataFrame with the flight delay data provided for ALASKA and AMWEST airlines. The data includes the number of on-time and delayed flights to five destinations: Los Angeles, Phoenix, San Diego, San Francisco, and Seattle.

We then convert this DataFrame into CSV format and save it as 'flight_delays.csv'. This CSV file will be used for further analysis to compare the arrival delays between the two airlines.


In [2]:
import pandas as pd

# Data from the project image
data = {
    'Airline': ['ALASKA', 'ALASKA', 'AMWEST', 'AMWEST'],
    'Status': ['on time', 'delayed', 'on time', 'delayed'],
    'Los Angeles': [497, 62, 694, 117],
    'Phoenix': [221, 12, 4840, 415],
    'San Diego': [212, 20, 383, 65],
    'San Francisco': [503, 102, 320, 129],
    'Seattle': [1841, 305, 201, 61]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Convert the DataFrame to CSV format and save it to a file
csv_data = df.to_csv(index=False)
with open('flight_delays.csv', 'w') as file:
    file.write(csv_data)


## Load and Verify Data

The CSV file `flight_delays.csv` has been created and saved. Now, we will load this file back into a pandas DataFrame to ensure that the data has been saved correctly. This step mimics the process of reading a dataset from a file, which is a common practice in data analysis workflows.


In [3]:
# Read the CSV file back into a DataFrame
df = pd.read_csv('flight_delays.csv')

# Display the DataFrame to verify it's loaded correctly
df.head()


Unnamed: 0,Airline,Status,Los Angeles,Phoenix,San Diego,San Francisco,Seattle
0,ALASKA,on time,497,221,212,503,1841
1,ALASKA,delayed,62,12,20,102,305
2,AMWEST,on time,694,4840,383,320,201
3,AMWEST,delayed,117,415,65,129,61


In [4]:
# Calculate the total number of flights for each airline
df['Total Flights'] = df['Los Angeles'] + df['Phoenix'] + df['San Diego'] + df['San Francisco'] + df['Seattle']

# Calculate the total number of on-time and delayed flights for each airline
total_on_time = df[df['Status'] == 'on time']['Total Flights'].sum()
total_delayed = df[df['Status'] == 'delayed']['Total Flights'].sum()

# Calculate the total flights for each airline
total_flights_alaska = df[df['Airline'] == 'ALASKA']['Total Flights'].sum()
total_flights_amwest = df[df['Airline'] == 'AMWEST']['Total Flights'].sum()

# Calculate the total delayed flights for each airline
delayed_flights_alaska = df[(df['Airline'] == 'ALASKA') & (df['Status'] == 'delayed')]['Total Flights'].sum()
delayed_flights_amwest = df[(df['Airline'] == 'AMWEST') & (df['Status'] == 'delayed')]['Total Flights'].sum()

# Calculate the delay rate for each airline
delay_rate_alaska = delayed_flights_alaska / total_flights_alaska
delay_rate_amwest = delayed_flights_amwest / total_flights_amwest

# Print out the results
print(f"Total flights (ALASKA): {total_flights_alaska}")
print(f"Total delayed flights (ALASKA): {delayed_flights_alaska}")
print(f"Delay rate (ALASKA): {delay_rate_alaska:.2%}")

print(f"Total flights (AMWEST): {total_flights_amwest}")
print(f"Total delayed flights (AMWEST): {delayed_flights_amwest}")
print(f"Delay rate (AMWEST): {delay_rate_amwest:.2%}")


Total flights (ALASKA): 3775
Total delayed flights (ALASKA): 501
Delay rate (ALASKA): 13.27%
Total flights (AMWEST): 7225
Total delayed flights (AMWEST): 787
Delay rate (AMWEST): 10.89%


## Flight Delay Analysis

We have calculated the total number of flights and the number of delayed flights for each airline to compare their performance in terms of delay rates. The delay rate is calculated as the proportion of delayed flights out of the total number of flights.

Based on our calculations:
- **ALASKA** had a total of **3,775** flights, with **501** of them being delayed, resulting in a delay rate of **13.27%**.
- **AMWEST** had a total of **7,225** flights, with **787** of them being delayed, resulting in a delay rate of **10.89%**.

This preliminary analysis indicates that, while AMWEST had more flights overall, it also had a lower percentage of delays compared to ALASKA. This could suggest that AMWEST is more efficient in managing flight schedules and minimizing delays, despite the larger volume of flights. However, further analysis might be required to understand the context of these delays, such as weather conditions, logistical challenges, or other factors that could influence these figures.


## Conclusion

In conclusion, based on the delay rates alone, AMWEST shows a better performance in managing flight schedules and minimizing delays. This analysis can serve as a starting point for further investigation into the causative factors of delays, which can enable airlines to implement targeted improvements in their service.
