# Flight Arrival Delays: ALASKA vs. AM WEST

This project analyzes arrival delay data for two airlines, ALASKA and AM WEST, across five major destinations: Los Angeles, Phoenix, San Diego, San Francisco, and Seattle. The goal is to structure the raw data into a format suitable for analysis, import it into pandas, and compare delay patterns between the two airlines. By calculating totals, delay rates, and other basic metrics, we aim to identify which airline experiences more delays overall and whether certain destinations see higher delay frequencies. The findings will help illustrate how simple data management and analysis techniques in Python can reveal meaningful insights from real-world data.


| Airline   | Status    | Los Angeles | Phoenix | San Diego | San Francisco | Seattle |
|-----------|-----------|-------------|---------|-----------|---------------|---------|
| ALASKA    | on time   | 497         | 221     | 212       | 503           | 1841    |
| ALASKA    | delayed   | 62          | 12      | 20        | 102           | 605     |
| AM WEST   | on time   | 694         | 4840    | 383       | 320           | 201     |
| AM WEST   | delayed   | 117         | 415     | 65        | 129           | 61      |



## 1. Data Summary

First, we will examine the arrival delay data to get a sense of the overall patterns. 
This includes looking at total flights, the number of on-time vs. delayed flights, 
and the general performance of each airline across the five destinations.


In [97]:
import pandas as pd

df = pd.read_csv("is362_project1.csv")
df

Unnamed: 0,Airline,Status,Los Angeles,Phoenix,San Diego,San Francisco,Seattle
0,ALASKA,on time,497,221,212,503,1841
1,ALASKA,delayed,62,12,20,102,605
2,AM WEST,on time,694,4840,383,320,201
3,AM WEST,delayed,117,415,65,129,61


In [98]:
# Total flights per airline/status across all destinations
df_total = df.copy()
df_total['Total'] = df_total.iloc[:, 2:].sum(axis=1)
df_total[['Airline', 'Status', 'Total']]



Unnamed: 0,Airline,Status,Total
0,ALASKA,on time,3274
1,ALASKA,delayed,801
2,AM WEST,on time,6438
3,AM WEST,delayed,787


Here we sum across all destinations to see the total number of on-time and delayed flights for each airline. 
This helps us compare the airlines at a glance.


In [99]:
#Calculate totals by airline
airline_totals = df_total.groupby('Airline')['Total'].sum()

#Calculate delayed by airline
delayed_totals = df_total[df_total['Status'] == 'delayed'].groupby('Airline')['Total'].sum()

#Calculate the percentage
percent_delayed = (delayed_totals / airline_totals * 100).round(2)
percent_delayed

Airline
ALASKA     19.66
AM WEST    10.89
Name: Total, dtype: float64

We calculate the percentage of flights that were delayed for each airline by dividing the number of 
delayed flights by the total flights and multiplying by 100. This gives us a clear measure of performance. We can see that overall, AM West is on time more than Alaska.


In [100]:
#Calculate total of delayed flights by location
delays_by_dest = df[df['Status'] == 'delayed'].iloc[:, 2:].sum().sort_values()
total_by_dest = df.iloc[:, 2:].sum().sort_values()

delays_by_dest, total_by_dest


(San Diego         85
 Los Angeles      179
 San Francisco    231
 Phoenix          427
 Seattle          666
 dtype: int64,
 San Diego         680
 San Francisco    1054
 Los Angeles      1370
 Seattle          2708
 Phoenix          5488
 dtype: int64)

Here we sum the delayed flights for each destination across both airlines. 
This allows us to see which destinations have the most delays. We also find the total flights by destination.


In [101]:

percent_delayed_dest = (delays_by_dest / total_by_dest * 100).round(2).sort_values()
percent_delayed_dest

Phoenix           7.78
San Diego        12.50
Los Angeles      13.07
San Francisco    21.92
Seattle          24.59
dtype: float64

Here we calculate the percentage of delayed flights at each destination by dividing the delayed flights 
by the total flights at that destination. This helps identify which airports have the highest delay rates. It's useful to compare this to the count of delayed flights, as high volume airports might show an alarming number of delayed flights while having a lower percentage. For instance, we can see that Pheonix has one of the highest counts for delayed flights, but this seems to be due to high overall traffic, as it's percentage of delayed flights is the lowest. 


In [102]:

numeric_cols = df.columns[2:]
airline_totals = df.groupby('Airline')[numeric_cols].transform('sum')


# divide each cell by its airline total
percent_df = df.copy()
percent_df[numeric_cols] = df[numeric_cols] / airline_totals * 100

percent_df[numeric_cols] = percent_df[numeric_cols].round(2)

percent_df

Unnamed: 0,Airline,Status,Los Angeles,Phoenix,San Diego,San Francisco,Seattle
0,ALASKA,on time,88.91,94.85,91.38,83.14,75.27
1,ALASKA,delayed,11.09,5.15,8.62,16.86,24.73
2,AM WEST,on time,85.57,92.1,85.49,71.27,76.72
3,AM WEST,delayed,14.43,7.9,14.51,28.73,23.28



Here, we look at the percentage of total flights at each destination. This shows which airline runs on time more often and where delays are more common. This makes it easy to compare relative performance across airlines and destinations. For example, you can see that both airlines run relatively on time at Phoenix, while Alaska signigicantly out performs AM West in San Francisco. 


## Conclusion

We can determine a couple things from viewing the data as above. Overall, Alaska is more likely to be delayed as opposed to AM West. Despite having one of the largest amounts of delayed flights, Phoenix actually has the best on-time percentage. We can also see that while Alaska is more consistently on time to San Francisco, in general the destination makes a lot more of a difference in whether a flight is delayed rather than airline. 