# Extracting API Data Part II: This time with _request_ and _json_

We're going to revist API data by using a different API, and in doing so switching to a more conventional JSON format of response (the _arXiv_ API returns an XML response). In doing so we can extract raw data from our response with greater ease and also working with JSON will have advantages with our database choice for later sessions (and the IMA). Let's start by importing some packages:

In [1]:
import requests
import pandas as pd 
import json
import time

We'll use _requests_ rather than _urllib_ as we did in the Web Scraping tutorial. We're also using Python's in-built _json_ library to decode the data we receive.

Next we'll set up an API call:

In [2]:
location_url = "https://www.metaweather.com/api/location/search/?query="

def location_generator(query="London"):
    query = query.replace(" ", "%20")
    r = requests.get(location_url + query)
    return r

As you may be able to infer, our API is based on weather data. You can read the API documentation [here](https://www.metaweather.com/api/). Our overall goal is to write some to find the average temperature in several international cities and store them somewhere. The first part of this is finding the ID of the city we want to query.

In terms of the code, we're being a little bit cuter here than we have previously by building a function to make multi-searches a bit easier. We start with a base URL (everything apart from the search query) - "location_url". Secondly we are including in this a solution for when the city has a space in it. (Recall back from the previous Extracting API Data session that spaces needed to be encoded as "%20"). Our _replace()_ code will find any spaces in the text and replace them with "%20". Lastly we also have a default value for the query - "London". If the user doesn't pass a city it will default to London. Let's test this:

In [3]:
r = location_generator()
r.content

b'[{"title":"London","location_type":"City","woeid":44418,"latt_long":"51.506321,-0.12714"}]'

Everything seems to have worked! Let's also test our "%20" encoding with a two-word city name:

In [4]:
r = location_generator("New York")
r.content

b'[{"title":"New York","location_type":"City","woeid":2459115,"latt_long":"40.71455,-74.007118"}]'

Finally we will settle on the world's favourite city ...

In [5]:
r = location_generator("Coventry")
r.content

b'[{"title":"Coventry","location_type":"City","woeid":17044,"latt_long":"52.406979,-1.507760"}]'

The element we need from this is the location ID ("woeid"). _json_ provides an easy solution for convering our data into a dictionary:

In [6]:
response_list = r.json()
response_list

[{'title': 'Coventry',
  'location_type': 'City',
  'woeid': 17044,
  'latt_long': '52.406979,-1.507760'}]

So now we should be able to just get the value by using the key ... right?

In [7]:
response_list["woeid"]

TypeError: list indices must be integers or slices, not str

Oh dear! Error. Why is this? If we look at the output of [6] we should see our dictionary is actually inside a list ... "[ ]". We can get the data by first requesting the 0th item of the list: 

In [8]:
response_list[0]["woeid"]

17044

Success! We can now move on to the second part of the task which is to get the weather on a particular day. Again we will do this with a function as below:

In [8]:
weather_url = "https://www.metaweather.com/api/location/"

def weather_generator(woeid=44418, day="2022/01/07"):
    r = requests.get(weather_url + str(woeid) + "/" + day)
    return r

The API call needs a location ID and a date. Again we add defaults to each in our function. We can try to get our Coventry weather using the default date (17th Jan 2022):

In [9]:
w = weather_generator(woeid=17044)
w.content

b'[{"id":5475309177536512,"weather_state_name":"Light Rain","weather_state_abbr":"lr","wind_direction_compass":"WSW","created":"2022-01-07T23:22:04.486780Z","applicable_date":"2022-01-07","min_temp":0.615,"max_temp":4.779999999999999,"the_temp":4.535,"wind_speed":9.12643789097537,"wind_direction":250.4997332413852,"air_pressure":1009.0,"humidity":78,"visibility":11.757896385110952,"predictability":75},{"id":4666185120481280,"weather_state_name":"Light Rain","weather_state_abbr":"lr","wind_direction_compass":"WSW","created":"2022-01-07T20:22:04.053026Z","applicable_date":"2022-01-07","min_temp":0.32999999999999996,"max_temp":4.64,"the_temp":4.505,"wind_speed":9.13679407751266,"wind_direction":250.4997332413852,"air_pressure":1009.0,"humidity":79,"visibility":11.757896385110952,"predictability":75},{"id":6324366129233920,"weather_state_name":"Light Rain","weather_state_abbr":"lr","wind_direction_compass":"WSW","created":"2022-01-07T17:22:05.848284Z","applicable_date":"2022-01-07","min_te

A very long and hard to read list. Let's see if _json_ can help:

In [10]:
weather_list = w.json()
weather_list

[{'id': 5475309177536512,
  'weather_state_name': 'Light Rain',
  'weather_state_abbr': 'lr',
  'wind_direction_compass': 'WSW',
  'created': '2022-01-07T23:22:04.486780Z',
  'applicable_date': '2022-01-07',
  'min_temp': 0.615,
  'max_temp': 4.779999999999999,
  'the_temp': 4.535,
  'wind_speed': 9.12643789097537,
  'wind_direction': 250.4997332413852,
  'air_pressure': 1009.0,
  'humidity': 78,
  'visibility': 11.757896385110952,
  'predictability': 75},
 {'id': 4666185120481280,
  'weather_state_name': 'Light Rain',
  'weather_state_abbr': 'lr',
  'wind_direction_compass': 'WSW',
  'created': '2022-01-07T20:22:04.053026Z',
  'applicable_date': '2022-01-07',
  'min_temp': 0.32999999999999996,
  'max_temp': 4.64,
  'the_temp': 4.505,
  'wind_speed': 9.13679407751266,
  'wind_direction': 250.4997332413852,
  'air_pressure': 1009.0,
  'humidity': 79,
  'visibility': 11.757896385110952,
  'predictability': 75},
 {'id': 6324366129233920,
  'weather_state_name': 'Light Rain',
  'weather_st

Much easier to read but still a very long list. The value we want is 'the_temp' but we can see there are many recordings over the course of the day. We can build a for loop to get the average:

In [11]:
count_temp = 0
n = 0

for result in weather_list:
    count_temp += result['the_temp']
    n += 1

avg_temp = count_temp / n
print(avg_temp)

4.621875000000003


Great! Let's put all that together and search average temperatures for three cities. Note, as the docs specify we should make only one request a minute we are using _time.sleep(60)_ to make the script wait for 60 seconds between calls. This does mean it will take > 6 minutes to run our script. Make a coffee.

In [12]:
output = []

city_list = ["London", "New York", "Beijing"]
day = "2022/01/07"

for city in city_list:
    # get location
    r = location_generator(city)
    response_list = r.json()
    time.sleep(60)
    
    # get weather 
    w = weather_generator(response_list[0]["woeid"], day)
    weather_list = w.json()
    time.sleep(60)
    
    # get average temp
    count_temp = 0
    n = 0

    for result in weather_list:
        count_temp += result['the_temp']
        n += 1

    avg_temp = count_temp / n
    avg_temp = round(avg_temp, 2)
    
    print(f"The average temp in {city} on {day} was {avg_temp}")
    
    output.append({"City": city, "Date": day, "Avg Temp": avg_temp})

The average temp in London on 2022/01/07 was 6.05
The average temp in New York on 2022/01/07 was 2.61
The average temp in Beijing on 2022/01/07 was 2.45


Now we have our data we can move it into something more familiar like _pandas:_

In [13]:
cityfacts_df = pd.DataFrame(output) 
cityfacts_df.head()

Unnamed: 0,City,Date,Avg Temp
0,London,2022/01/07,6.05
1,New York,2022/01/07,2.61
2,Beijing,2022/01/07,2.45


And that's it! Hopefully you can see that the use of _json_ makes the output much easier to work with. Remember this in the IMA!