# Week 5 Problem 3

If you are not using the `Assignments` tab on the course JupyterHub server to read this notebook, read [Activating the assignments tab](https://github.com/UI-DataScience/info490-fa16/blob/master/Week2/assignments/README.md).

A few things you should keep in mind when working on assignments:

1. Make sure you fill in any place that says `YOUR CODE HERE`. Do **not** write your answer in anywhere else other than where it says `YOUR CODE HERE`. Anything you write anywhere else will be removed or overwritten by the autograder.

2. Before you submit your assignment, make sure everything runs as expected. Go to menubar, select _Kernel_, and restart the kernel and run all cells (_Restart & Run all_).

3. Do not change the title (i.e. file name) of this notebook.

4. Make sure that you save your work (in the menubar, select _File_ → _Save and CheckPoint_)

5. You are allowed to submit an assignment multiple times, but only the most recent submission will be graded.

In [1]:
import requests
from nose.tools import assert_equal

# Problem 1. Requests.

In this problem, we are going to make a request to download the [airline on-time performance data](http://stat-computing.org/dataexpo/2009/) that uses [IATA codes](https://en.wikipedia.org/wiki/International_Air_Transport_Association_airport_code) to identify airports. For example,

```shell
$ head /home/data_scientist/data/2001.csv
```
```
Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay
2001,1,17,3,1806,1810,1931,1934,US,375,N700��,85,84,60,-3,-4,BWI,CLT,361,5,20,0,NA,0,NA,NA,NA,NA,NA
2001,1,18,4,1805,1810,1938,1934,US,375,N713��,93,84,64,4,-5,BWI,CLT,361,9,20,0,NA,0,NA,NA,NA,NA,NA
2001,1,19,5,1821,1810,1957,1934,US,375,N702��,96,84,80,23,11,BWI,CLT,361,6,10,0,NA,0,NA,NA,NA,NA,NA
2001,1,20,6,1807,1810,1944,1934,US,375,N701��,97,84,66,10,-3,BWI,CLT,361,4,27,0,NA,0,NA,NA,NA,NA,NA
2001,1,21,7,1810,1810,1954,1934,US,375,N768��,104,84,62,20,0,BWI,CLT,361,4,38,0,NA,0,NA,NA,NA,NA,NA
2001,1,22,1,1807,1810,1931,1934,US,375,N722��,84,84,61,-3,-3,BWI,CLT,361,12,11,0,NA,0,NA,NA,NA,NA,NA
2001,1,23,2,1802,1810,1924,1934,US,375,N732��,82,84,61,-10,-8,BWI,CLT,361,5,16,0,NA,0,NA,NA,NA,NA,NA
2001,1,24,3,1804,1810,1922,1934,US,375,N737��,78,84,60,-12,-6,BWI,CLT,361,4,14,0,NA,0,NA,NA,NA,NA,NA
2001,1,25,4,1812,1810,1925,1934,US,375,N767��,73,84,52,-9,2,BWI,CLT,361,6,15,0,NA,0,NA,NA,NA,NA,NA
```

Here, BWI and CLT are IATA codes for Baltimore-Washington International airport and Charlotte Douglas International Airport.

We need a way to look up the IATA codes and match them with the city names or the airport names. You could use a supplementary data set such as [airports.csv](http://stat-computing.org/dataexpo/2009/supplemental-data.html) that contains all this information, but let's pretend that no such file exists and we have to gather this information ourselves.

FAA provides a [web service](http://services.faa.gov/docs/services/airport/#airportStatus) that lets us obtain various information on U.S. airports, including known delays and weather data. You should read http://services.faa.gov/docs/services/airport/ and try a few sample requests to make sure you understand how it works. When you make a query, the FAA web service will respond with a text in an XML or a JSON format. We will learn more about how to work with these data formats later in the course. Since we haven't covered these data formats, all you have to do is use Requests to make an HTTP request and get a text response. Once you have a text reponse, I will provide code that converts this text into a Python dictionary. (JSON is essentially a Python dictionary. If you understand Python dictionaries, you know JSON.)

## 1. Function: get_airport()

In the following code cell, write a function named `get_airport()` that takes an IATA code and returns the text response of airport status. 

- As mentioned above, this function makes a request to [airport service](http://services.faa.gov/docs/services/airport/).
- The first argument (`iata`) is a string that represents the IATA code of an airport, e.g. `"ORD"`, `"SFO"`, or `"JFK"`.
- The function should return the [response content](http://www.python-requests.org/en/latest/user/quickstart/#response-content) of the `requests.get()` object. For example,
  ```python
  >>> ord_json = get_airport("ORD")
  >>> ord_json
  ```
  ```
  '{"delay":"true","IATA":"ORD","state":"Illinois","name":"Chicago OHare International","weather:{"visibility":6.00,"weather":"Thunderstorm Light Rain Fog/Mist","meta":{"credit":"NOAA\'s National Weather Service","updated":"6:51 PM Local","url":"http://weather.gov/"},"temp":"72.0 F (22.2 C)","wind":"North at 0.0mph"},"ICAO":"KORD","city":"Chicago","status":{"reason":"WEATHER / THUNDERSTORMS","closureBegin":"","endTime":"","minDelay":"","avgDelay":"2 hours and 44 minutes","maxDelay":"","closureEnd":"","trend":"","type":"Ground Delay"}} '
  ```
  ```python
  >>> type(ord_json)
  ```
  ```
  dict
  ```
 
- The format of the reponse from the [airport service](http://services.faa.gov/docs/services/airport/) can either be in JSON or XML, but our function should always return a **JSON** string.
- The [airport service](http://services.faa.gov/docs/services/airport/) page provides **a sample XML request**, and a JSON request is in the **same format**.
- Read the [Requests](http://docs.python-requests.org/en/latest/user/quickstart/) documentation for more information.
- If the [airport service](http://services.faa.gov/docs/services/airport/) is down (which is quite unlikely), make a post in our [Moodle forum](https://learn.illinois.edu/mod/forum/view.php?id=1571626) if nobody has done that already. 

In [2]:
def get_airport(iata):
    '''
    Makes an HTTP request to http://services.faa.gov/airport/status/{iata}
    and returns the JSON response. (You need to figure out the correct url
    by reading http://services.faa.gov/docs/services/airport/.)
    
    Parameters
    ----------
    iata: String. Three-letter airport IATA code.
    
    Returns
    -------
    A JSON string.
    '''
    
    # YOUR CODE HERE
    import requests
    r = requests.get('http://services.faa.gov/airport/status/{}?format=application/json'.format(iata))
    result = r.text
    return result

As promised, in the following code cell is a piece of code that converts a JSON string into a dictionary. The `json.loads()` function converts a JSON string into a Python dictionary.

In [3]:
import json

ord_json = json.loads(get_airport("ORD"))
sfo_json = json.loads(get_airport("SFO"))

The following code cell checks if `ord_json` and `sfo_json` are dictionaries and check their values. Make sure that the code cell produces no errors.

In [4]:
assert_equal(isinstance(ord_json, dict), True)
assert_equal(ord_json["IATA"], "ORD")
assert_equal(ord_json["city"], "Chicago")
assert_equal(ord_json["state"], "Illinois")
assert_equal(ord_json["name"], "Chicago OHare International")

assert_equal(isinstance(sfo_json, dict), True)
assert_equal(sfo_json["IATA"], "SFO")
assert_equal(sfo_json["city"], "San Francisco")
assert_equal(sfo_json["state"], "California")
assert_equal(sfo_json["name"], "San Francisco International")

The [airports.csv](http://stat-computing.org/dataexpo/2009/supplemental-data.html) file that I mentioned earlier looks like this:

```shell
$ head /home/data_scientist/data/airports.csv
```
```
"iata","airport","city","state","country","lat","long"
"00M","Thigpen ","Bay Springs","MS","USA",31.95376472,-89.23450472
"00R","Livingston Municipal","Livingston","TX","USA",30.68586111,-95.01792778
"00V","Meadow Lake","Colorado Springs","CO","USA",38.94574889,-104.5698933
"01G","Perry-Warsaw","Perry","NY","USA",42.74134667,-78.05208056
"01J","Hilliard Airpark","Hilliard","FL","USA",30.6880125,-81.90594389
"01M","Tishomingo County","Belmont","MS","USA",34.49166667,-88.20111111
"02A","Gragg-Wade ","Clanton","AL","USA",32.85048667,-86.61145333
"02C","Capitol","Brookfield","WI","USA",43.08751,-88.17786917
"02G","Columbiana County","East Liverpool","OH","USA",40.67331278,-80.64140639
```

With our `get_airport()` function, we can reproduce every columns of the `airports.csv` file except the latitude and the longitude.

## 2. Function: write_airports()

Write a function named `write_airports()` that takes a list of dictionaries, and writes in a file named `top_20_airports.csv` the columns `iata`, `airport`, `city`, `state`, and `country`.

Here are the IATA codes for the top 20 U.S. airports.

In [5]:
airports = ['ORD', 'DFW', 'ATL', 'LAX', 'PHX',
            'STL', 'DTW', 'MSP', 'LAS', 'BOS',
            'DEN', 'IAH', 'CLT', 'SFO', 'EWR',
            'PHL', 'LGA', 'PIT', 'SEA', 'BWI']

Using the above list, we can build a list of dictionaries by reading the JSON strings that we get from `get_airport()`.

In [6]:
list_of_dictionaries = [json.loads(get_airport(a)) for a in airports]

print(list_of_dictionaries[0])

{'ICAO': 'KORD', 'delay': 'false', 'name': 'Chicago OHare International', 'weather': {'meta': {'updated': '9:51 PM Local', 'url': 'http://weather.gov/', 'credit': "NOAA's National Weather Service"}, 'wind': 'Northeast at 10.4mph', 'visibility': 10.0, 'temp': '69.0 F (20.6 C)', 'weather': 'Mostly Cloudy'}, 'city': 'Chicago', 'state': 'Illinois', 'IATA': 'ORD', 'status': {'closureEnd': '', 'minDelay': '', 'maxDelay': '', 'endTime': '', 'reason': 'No known delays for this airport.', 'closureBegin': '', 'trend': '', 'type': '', 'avgDelay': ''}}


Now, in the following code cell, write a function named `write_airports()` that

- Takes a list of dictionaries as the first arguemnt,
- Takes a string (output file name, e.g. `top_20_airports.csv`) as the second argument,
- Iterates through the list and prints out `IATA`, `name`, `city`, `state`, and `country` columns, separated by commas (no spaces), and
- Writes the result to a file whose name is sepcified by `filename`.

In [7]:
def write_airports(list_of_dictionaries, filename):
    '''
    Takes a list of dictionaries and creates a CSV file from the list.
    Dictionaries have keys: IATA, name, city, and state.
    The last column of the resulting CSV file is always USA.
    
    Parameters
    ----------
    list_of_dictionaries: a list of dictionaries.
    filename: output file name.
    
    Returns
    -------
    None.
    '''
    
    # YOUR CODE HERE
    with open(filename, 'w') as f:
        for i in range(len(list_of_dictionaries)):
            d = list_of_dictionaries[i]
            wrt = '{},{},{},{},USA\n'.format(d['IATA'], d['name'], d['city'], d['state'])
            f.write(wrt)
        
    return None

Check that the following code cell does not produce any errors.

In [8]:
write_airports(list_of_dictionaries, 'top_20_airports.csv')

%cat top_20_airports.csv

ORD,Chicago OHare International,Chicago,Illinois,USA
DFW,Dallas/Ft Worth International,Dallas-Ft Worth,Texas,USA
ATL,Hartsfield-Jackson Atlanta International,Atlanta,Georgia,USA
LAX,Los Angeles International,Los Angeles,California,USA
PHX,Phoenix Sky Harbor International,Phoenix,Arizona,USA
STL,Lambert-St Louis International,St Louis,Missouri,USA
DTW,Detroit Metropolitan Wayne County,Detroit,Michigan,USA
MSP,Minneapolis-St Paul International/Wold-Chamberlain,Minneapolis,Minnesota,USA
LAS,Las Vegas McCarran International,Las Vegas,Nevada,USA
BOS,General Edward Lawrence Logan International,Boston,Massachusetts,USA
DEN,Denver International,Denver,Colorado,USA
IAH,George Bush Intercontinental/Houston,Houston,Texas,USA
CLT,Charlotte Douglas International,Charlotte,North Carolina,USA
SFO,San Francisco International,San Francisco,California,USA
EWR,Newark International,Newark,New Jersey,USA
PHL,Philadelphia International,Philadelphia,Pennsylvania,USA
LGA,La Guardia,New York,Ne

In [9]:
answer = '''
ORD,Chicago OHare International,Chicago,Illinois,USA
DFW,Dallas/Ft Worth International,Dallas-Ft Worth,Texas,USA
ATL,Hartsfield-Jackson Atlanta International,Atlanta,Georgia,USA
LAX,Los Angeles International,Los Angeles,California,USA
PHX,Phoenix Sky Harbor International,Phoenix,Arizona,USA
STL,Lambert-St Louis International,St Louis,Missouri,USA
DTW,Detroit Metropolitan Wayne County,Detroit,Michigan,USA
MSP,Minneapolis-St Paul International/Wold-Chamberlain,Minneapolis,Minnesota,USA
LAS,Las Vegas McCarran International,Las Vegas,Nevada,USA
BOS,General Edward Lawrence Logan International,Boston,Massachusetts,USA
DEN,Denver International,Denver,Colorado,USA
IAH,George Bush Intercontinental/Houston,Houston,Texas,USA
CLT,Charlotte Douglas International,Charlotte,North Carolina,USA
SFO,San Francisco International,San Francisco,California,USA
EWR,Newark International,Newark,New Jersey,USA
PHL,Philadelphia International,Philadelphia,Pennsylvania,USA
LGA,La Guardia,New York,New York,USA
PIT,Pittsburgh International,Pittsburgh,Pennsylvania,USA
SEA,Seattle-Tacoma International,Seattle,Washington,USA
BWI,Baltimore-Washington International,Baltimore,Maryland,USA
'''.strip().split('\n')

with open('top_20_airports.csv') as f:
    for i, line in enumerate(f):
        assert_equal(line.strip(), answer[i])

## Cleaning up

In [10]:
!rm top_20_airports.csv  # remove the csv file 