### Assignment:
Your task in this week’s assignment is to answer three questions:
1. What is the northernmost airport in the United States?
2. What is the easternmost airport in the United States?
3. On February 12th, 2013, which New York area airport had the windiest weather?

For each of the three questions listed above, you’ll need to verify your answers (in two cases by searching for additional confirming information, and in the third case being alert for possible outliers).

Use the information in the .CSV files found at https://github.com/hadley/nycflights13/tree/master/data-raw for your source data.

### Python Code for Imports

In [1]:
# numpy and panda standard imports
import numpy as np
import pandas as pd

### Setting Up the Data Files

In [2]:
# url for airports csv file
airports_csv = 'https://raw.githubusercontent.com/hadley/nycflights13/master/data-raw/airports.csv'

# url for weather csv file
weather_csv = 'https://raw.githubusercontent.com/hadley/nycflights13/master/data-raw/weather.csv'

### Reading the Data
We will first read the data from the csv files into DataFrames for later use and view the first 5 rows of each to get an idea of what the data looks like.

In [3]:
# create DataFrame with airport data
df_airports = pd.read_csv(airports_csv)

# create DataFrame with weather data
df_weather = pd.read_csv(weather_csv)

# view the first 5 rows of airport data
df_airports.head()

Unnamed: 0,faa,name,lat,lon,alt,tz,dst,tzone
0,04G,Lansdowne Airport,41.130472,-80.619583,1044,-5,A,America/New_York
1,06A,Moton Field Municipal Airport,32.460572,-85.680028,264,-6,A,America/Chicago
2,06C,Schaumburg Regional,41.989341,-88.101243,801,-6,A,America/Chicago
3,06N,Randall Airport,41.431912,-74.391561,523,-5,A,America/New_York
4,09J,Jekyll Island Airport,31.074472,-81.427778,11,-5,A,America/New_York


In [4]:
# view the first 5 rows of weather data
df_weather.head()

Unnamed: 0,origin,year,month,day,hour,temp,dewp,humid,wind_dir,wind_speed,wind_gust,precip,pressure,visib,time_hour
0,EWR,2013,1,1,1,39.02,26.06,59.37,270.0,10.35702,,0.0,1012.0,10.0,2013-01-01T06:00:00Z
1,EWR,2013,1,1,2,39.02,26.96,61.63,250.0,8.05546,,0.0,1012.3,10.0,2013-01-01T07:00:00Z
2,EWR,2013,1,1,3,39.02,28.04,64.43,240.0,11.5078,,0.0,1012.5,10.0,2013-01-01T08:00:00Z
3,EWR,2013,1,1,4,39.92,28.04,62.21,250.0,12.65858,,0.0,1012.2,10.0,2013-01-01T09:00:00Z
4,EWR,2013,1,1,5,39.02,28.04,64.43,260.0,12.65858,,0.0,1011.9,10.0,2013-01-01T10:00:00Z


### Latitude and Longitude Definitions
1. Latitude runs 0–90° north and south. Latitude degrees with a positive number are located in the northern hemisphere and degrees with a negative ("-") sign indicates a location in the southern hemisphere.
2. Longitude runs 0–180° east and west.  Longitude degrees with a positive number are located in the eastern hemisphere and degrees with a negative ("-") sign indicates a location in the western hemisphere.

### Question 1 - Northernmost Airport in the United States
When looking at a map, the **northernmost part of the United States is the state of Alaska**. Alaska is located in the northern hemisphere, so the **latitude values will be positive**. It is also located in the western hemisphere, so the **longitude values will be negatibve**.

We'll start by first finding the top 5 airports that have the highest latitude.

In [5]:
# make a copy of initial DataFrame
airports = df_airports.copy()

# sort DataFrame by "lat" column and view first 5 rows
airports.sort_values('lat', ascending=False).head()

Unnamed: 0,faa,name,lat,lon,alt,tz,dst,tzone
417,EEN,Dillant Hopkins Airport,72.270833,42.898333,149,-5,A,
230,BRW,Wiley Post Will Rogers Mem,71.285446,-156.766003,44,-9,A,America/Anchorage
110,AIN,Wainwright Airport,70.638056,-159.994722,41,-9,A,America/Anchorage
708,K03,Wainwright As,70.613378,-159.86035,35,-9,A,America/Anchorage
152,ATK,Atqasuk Edward Burnell Sr Memorial Airport,70.4673,-157.436,96,-9,A,America/Anchorage


According to the data, **Dillant Hopkins Airport** is the most northern airport. However, we see a positive value for "lon", so this means the location isn't located in the United States. When researching this airport, we find that Dillant Hopkins Airport is actually located in New Hampshire, United States with a latitude value of 42° and longitude value of -72°. The data has the coordinates reversed and without the negative sign on the "lon". Site for reference: https://latitude.to/articles-by-country/us/united-states/179120/dillanthopkins-airport.

Upon further investigation of the data, we can also see that there is a timezone value "tzone". This means that we can use this to further our DataFrame query to narrow down our search. **Alaska airports have "America/Anchorage"** as the "tzone".

We will first correct the Dillant Hopkins Airport data and then view the top 5 airports that have the highest latitude using our new query.

In [6]:
# update the lat and lon values at index 417
airports.at[417, 'lat'] = 42.898333
airports.at[417, 'lon'] = -72.270833
airports.at[417, 'tzone'] = 'America/New_York'

# serach by Anchorage timezone, sort DataFrame by "lat" column and view first 5 rows
airports[airports['tzone'] == 'America/Anchorage'].sort_values('lat', ascending=False).head()

Unnamed: 0,faa,name,lat,lon,alt,tz,dst,tzone
230,BRW,Wiley Post Will Rogers Mem,71.285446,-156.766003,44,-9,A,America/Anchorage
110,AIN,Wainwright Airport,70.638056,-159.994722,41,-9,A,America/Anchorage
708,K03,Wainwright As,70.613378,-159.86035,35,-9,A,America/Anchorage
152,ATK,Atqasuk Edward Burnell Sr Memorial Airport,70.4673,-157.436,96,-9,A,America/Anchorage
1363,UUK,Ugnu-Kuparuk Airport,70.330833,-149.5975,67,-9,A,America/Anchorage


Looking at this new data, we can see that the most northern airport is now **Wiley Post Will Rogers Mem**. The coordinates satisfy the positive latitude and negative longitude values that we were expecting. When researching this airport, we find that the coordiates are indeed correct and is the northern most airport in the United States. Site for reference: https://latitude.to/articles-by-country/us/united-states/27649/wiley-postwill-rogers-memorial-airport

**ANSWER**: Wiley Post Will Rogers Mem	

### Question 2 - Easternmost Airport in the United States
When looking at a map, the **easternmost part of the United States is the state of Maine**. Maine is located in the northern hemisphere, so the **latitude values will be positive**. It is also located in the western hemisphere, so the **longitude values will be negative**. Maine follows the Eastern Timezone, so the "tzone" value will be **"America/New_York"**.

We'll start by finding the top 5 airports that have the New York timezone and sort from high to low on the logitude value. The larger the "lon" value, the further east the airport will be.

In [7]:
# search by New York timezone, sort DataFrame by "lon" column and view first 5 rows
airports[airports['tzone'] == 'America/New_York'].sort_values('lon', ascending=False).head()

Unnamed: 0,faa,name,lat,lon,alt,tz,dst,tzone
444,EPM,Eastport Municipal Airport,44.910111,-67.012694,45,-5,A,America/New_York
624,HUL,Houlton Intl,46.123083,-67.792056,489,-5,A,America/New_York
259,CAR,Caribou Muni,46.8715,-68.017917,626,-5,A,America/New_York
1101,PQI,Northern Maine Rgnl At Presque Isle,46.688958,-68.044797,534,-5,A,America/New_York
1398,WFK,Northern Aroostook Regional Airport,47.285556,-68.312778,988,-5,A,America/New_York


According to the data, **Eastport Municipal Airport** is the most eastern airport in the United States. The coordinates match the values that we were expecting and doing some research shows that the values of "lat" and "lon" look to be correct. Site for reference: https://latitude.to/articles-by-country/us/united-states/209992/eastport-municipal-airport.

Doing some more research, we do indeed find that Eastport Municipal Airport	is the easternmost airport in the USA. Site for reference: https://www.eastport-me.gov/eastport-municipal-airport-kepm

**ANSWER**: Eastport Municipal Airport

### Question 3 - Windiest Weather on February 12, 2013
In order to find the windiest weather on Februrary 12, 2013, we will first need to filter our data to only include data from that day.

In [8]:
# make a copy of initial DataFrame
weather = df_weather.copy()

# store data that fits our date criteria 
filtered = weather[(weather['year'] == 2013) & (weather['month'] == 2) & (weather['day'] == 12)]

Now that we have the data for the specific day, we can group the data by airports and find the average wind speed for each.

In [9]:
# group by airports and get averages
filtered.groupby('origin').mean()

Unnamed: 0_level_0,year,month,day,hour,temp,dewp,humid,wind_dir,wind_speed,wind_gust,precip,pressure,visib
origin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
EWR,2013.0,2.0,12.0,11.5,40.5575,26.42,58.179167,268.75,56.38822,25.173313,0.0,1011.477273,10.0
JFK,2013.0,2.0,12.0,11.5,41.2475,25.295,53.634583,279.166667,14.38475,24.584845,0.0,1011.479167,9.875
LGA,2013.0,2.0,12.0,11.5,41.9375,24.305,49.937917,279.130435,14.96014,24.885618,0.0,1011.113043,10.0


Looking at the data, **Newark (EWR) airport looks to have the highest average wind speed**. Comapring it to JFK and LGA, there is a large difference, so this means that there are some variant data that could be inflating the number.

To check for any outliers to explain this, we'll first sort the wind speeds by high to low.

In [10]:
# sort DataFrame by wind speed
filtered.sort_values('wind_speed', ascending=False).head()

Unnamed: 0,origin,year,month,day,hour,temp,dewp,humid,wind_dir,wind_speed,wind_gust,precip,pressure,visib,time_hour
1009,EWR,2013,2,12,3,39.02,26.96,61.63,260.0,1048.36058,,0.0,1008.3,10.0,2013-02-12T08:00:00Z
18417,LGA,2013,2,12,2,42.98,26.06,50.94,290.0,23.0156,31.07106,0.0,1007.1,10.0,2013-02-12T07:00:00Z
1018,EWR,2013,2,12,12,44.06,26.06,48.87,270.0,21.86482,31.07106,0.0,1012.5,10.0,2013-02-12T17:00:00Z
18428,LGA,2013,2,12,13,44.06,23.0,43.02,300.0,21.86482,25.31716,0.0,1011.7,10.0,2013-02-12T18:00:00Z
18429,LGA,2013,2,12,14,44.06,23.0,43.02,300.0,20.71404,25.31716,0.0,1011.5,10.0,2013-02-12T19:00:00Z


We can now see that there is one entry for EWR that has was an extremely high value that could be causing the unexpected average. To fix this, we will remove this entry from our DataFrame and then run our avergae again.

In [11]:
# drop row form DataFrame and save results
filtered = filtered.drop(1009)

# group by airports and get averages
filtered.groupby('origin').mean()

Unnamed: 0_level_0,year,month,day,hour,temp,dewp,humid,wind_dir,wind_speed,wind_gust,precip,pressure,visib
origin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
EWR,2013.0,2.0,12.0,11.869565,40.624348,26.396522,58.02913,269.130435,13.258987,25.173313,0.0,1011.628571,10.0
JFK,2013.0,2.0,12.0,11.5,41.2475,25.295,53.634583,279.166667,14.38475,24.584845,0.0,1011.479167,9.875
LGA,2013.0,2.0,12.0,11.5,41.9375,24.305,49.937917,279.130435,14.96014,24.885618,0.0,1011.113043,10.0


Without the bad data, we now see the wind speed averages closer together. LaGuardia Airport had the windiest weather according to the data.

**ANSWER**: LGA - LaGuardia Airport