# IS 362 – Week 5 Assignment: U.S. Airport Analysis with Python and Pandas

This assignment focuses on applying Python and pandas to analyze real-world flight data from the [nycflights13 dataset](https://github.com/hadley/nycflights13/tree/master/data-raw). The main goal is to answer three specific questions:

1. What is the northernmost airport in the United States?
2. What is the easternmost airport in the United States?
3. On February 12, 2013, which New York area airport experienced the windiest weather?

To complete the assignment, I will use Python and pandas to process the dataset, research external sources to confirm findings, and clearly document both my code and assumptions. The final deliverable will include the analysis, supporting research, and verified answers for each question in a well-organized Jupyter notebook.


## Downloading and processing the data


In [14]:
import pandas as pd

airports_df = pd.read_csv('airports.csv', header=0)
weather_df = pd.read_csv('weather.csv', header=0)

print(airports_df.head(5))
print(weather_df.head(5))

   faa                           name        lat        lon   alt  tz dst  \
0  04G              Lansdowne Airport  41.130472 -80.619583  1044  -5   A   
1  06A  Moton Field Municipal Airport  32.460572 -85.680028   264  -6   A   
2  06C            Schaumburg Regional  41.989341 -88.101243   801  -6   A   
3  06N                Randall Airport  41.431912 -74.391561   523  -5   A   
4  09J          Jekyll Island Airport  31.074472 -81.427778    11  -5   A   

              tzone  
0  America/New_York  
1   America/Chicago  
2   America/Chicago  
3  America/New_York  
4  America/New_York  
  origin  year  month  day  hour   temp   dewp  humid  wind_dir  wind_speed  \
0    EWR  2013      1    1     1  39.02  26.06  59.37     270.0    10.35702   
1    EWR  2013      1    1     2  39.02  26.96  61.63     250.0     8.05546   
2    EWR  2013      1    1     3  39.02  28.04  64.43     240.0    11.50780   
3    EWR  2013      1    1     4  39.92  28.04  62.21     250.0    12.65858   
4    EWR  

The data looks well-organized and usable as is.

## Finding the Northernmost and Easternmost Airports

To identify these airports, we can use the built-in pandas function `nlargest()`.

- **Northernmost airport:** We can sort the data by **latitude** in descending order to find the airport farthest north.
- **Easternmost airport:** We can do a similar approach with **longitude**, since the easternmost airport will have the longitude **closest to 0**.

### My Predictions

I am not super familiar with too many airports, but I definitely would bet that the northernmost airport would be in Alaska somewhere, and the easternmost is probably in Maine.


In [15]:
northern = airports_df.nlargest(5, "lat")
eastern = airports_df.nlargest(5, 'lon')
print(northern)
print(eastern)

     faa                                        name        lat         lon  \
417  EEN                     Dillant Hopkins Airport  72.270833   42.898333   
230  BRW                  Wiley Post Will Rogers Mem  71.285446 -156.766003   
110  AIN                          Wainwright Airport  70.638056 -159.994722   
708  K03                               Wainwright As  70.613378 -159.860350   
152  ATK  Atqasuk Edward Burnell Sr Memorial Airport  70.467300 -157.436000   

     alt  tz dst              tzone  
417  149  -5   A                NaN  
230   44  -9   A  America/Anchorage  
110   41  -9   A  America/Anchorage  
708   35  -9   A  America/Anchorage  
152   96  -9   A  America/Anchorage  
      faa                           name        lat         lon   alt  tz dst  \
1290  SYA                   Eareckson As  52.712275  174.113620    98  -9   A   
942   MYF               Montgomery Field  32.475900  117.759000    17   8   A   
396   DVT  Deer Valley Municipal Airport  33.411700  1

## Examining the Top 5 Northernmost and Easternmost Airports

Generating the top 5 northernmost and easternmost airports reveals a few hidden issues in the dataset.

### Northernmost Airports

First, it appears that the latitude and longitude for **Dillant Hopkins Airport** were entered incorrectly, possibly swapped. The actual coordinates are most likely around **42.8° N** and **-72.27° W**. After correcting for this, we can determine that **Wiley Post–Will Rogers Memorial Airport** in Alaska is the northernmost airport in the United States, which is confirmed by a quick Google search.

### Easternmost Airports

Some issues also appear in the top list of easternmost airports. Dillant Hopkins’ incorrect data shows up here as well. Additionally, if we simply take the largest longitude values, several airports appear with longitudes approaching **180°**, almost directly opposite Greenwich, England (0° longitude). Research shows that many of these locations are **U.S. military airports abroad**.

If we want to include only domestic airports, we need a different approach. One solution is to use pandas’ native functions `nsmallest()`, `iloc()`, and `index` to find the **absolute value** of the longitude column, select the smallest 5 values, and use their indexes to retrieve the corresponding rows:


In [16]:
eastern = airports_df.loc[airports_df['lon'].abs().nsmallest(5).index]
print(eastern)

      faa                                 name        lat        lon  alt  tz  \
417   EEN              Dillant Hopkins Airport  72.270833  42.898333  149  -5   
444   EPM           Eastport Municipal Airport  44.910111 -67.012694   45  -5   
624   HUL                         Houlton Intl  46.123083 -67.792056  489  -5   
259   CAR                         Caribou Muni  46.871500 -68.017917  626  -5   
1101  PQI  Northern Maine Rgnl At Presque Isle  46.688958 -68.044797  534  -5   

     dst             tzone  
417    A               NaN  
444    A  America/New_York  
624    A  America/New_York  
259    A  America/New_York  
1101   A  America/New_York  


We, once again, see **Dillant Hopkins'** incorrect values appearing in our list. We can safely ignore this entry and dtermine that **Eastport Municipal Airport** in Maine is the easternmost domestic airport in the US.


## Windiest Airport on 02/12/2013 in New York

The dataset includes only New York area airports, so we do not need to filter by location.

Rather than looking at individual hours, we can calculate the **average wind speed for each airport** on February 12, 2013. Using pandas’ `groupby()` function, we can group the data by the `origin` column, compute the mean of the `wind_speed` column, and then identify the **top airport with the highest average wind speed** for that day.


In [20]:
day_filter = weather_df[(weather_df['year'] == 2013) & (weather_df['month'] == 2) & (weather_df['day'] == 12)]


windiest_grouped = day_filter.groupby('origin')['wind_speed'].mean().sort_values(ascending=False)

print(windiest_grouped)

origin
EWR    56.38822
LGA    14.96014
JFK    14.38475
Name: wind_speed, dtype: float64


We can see that **Newark** experienced the windiest weather among the three New York area airports on February 12th, 2013.

One thing to note is that using the **average** alone does not account for potential outliers or incorrectly entered data. To identify any unusual values, we can examine the **top 5 largest and smallest `wind_speed` rows** to check for exceptional entries.


In [21]:
windiest = day_filter.nlargest(5, 'wind_speed')
least_windy = day_filter.nsmallest(5, 'wind_speed')
print(windiest)
print(least_windy)

      origin  year  month  day  hour   temp   dewp  humid  wind_dir  \
1009     EWR  2013      2   12     3  39.02  26.96  61.63     260.0   
18417    LGA  2013      2   12     2  42.98  26.06  50.94     290.0   
1018     EWR  2013      2   12    12  44.06  26.06  48.87     270.0   
18428    LGA  2013      2   12    13  44.06  23.00  43.02     300.0   
1008     EWR  2013      2   12     2  39.92  28.04  62.21     270.0   

       wind_speed  wind_gust  precip  pressure  visib             time_hour  
1009   1048.36058        NaN     0.0    1008.3   10.0  2013-02-12T08:00:00Z  
18417    23.01560   31.07106     0.0    1007.1   10.0  2013-02-12T07:00:00Z  
1018     21.86482   31.07106     0.0    1012.5   10.0  2013-02-12T17:00:00Z  
18428    21.86482   25.31716     0.0    1011.7   10.0  2013-02-12T18:00:00Z  
1008     20.71404   25.31716     0.0    1007.8   10.0  2013-02-12T07:00:00Z  
      origin  year  month  day  hour   temp   dewp  humid  wind_dir  \
1029     EWR  2013      2   12    

We can see that there is one data point in **Newark** that is far too high and clearly an outlier. The next highest `wind_speed` is around 23, so we can create a **mask** to filter out values above, say, 30.

Using this filtered data, we can then **recalculate the mean** to obtain a more accurate representation of the windiest airport overall.


In [22]:
mask = day_filter['wind_speed'] <= 50
windiest_grouped = day_filter[mask].groupby('origin')['wind_speed'].mean().sort_values(ascending=False)

print(windiest_grouped)

origin
LGA    14.960140
JFK    14.384750
EWR    13.258987
Name: wind_speed, dtype: float64


This give us a better read of values and that LaGuardia was the windiest airport on the relevant day.
