# Using APIs for data

An API is one way of getting data from a web resource. Typically you get data by forming a URL - the URL is basically your 'question' (**query** or **request**), and the webpage that is delivered to you (the **endpoint**) contains the 'response' with the data, often in JSON format.

The [postcodes API](http://api.postcodes.io/), for example, can be queried by putting a postcode (without spaces) at the *end* of this URL:

`http://api.postcodes.io/postcodes/`

To ask about the postcode B42 2SU, then, you would add it to the end to form the URL:

`http://api.postcodes.io/postcodes/b422su`

If you go to that URL you will get a bunch of code in **JSON** - this is the data for that postcode. If you want it to look a bit easier to understand use the browser extension [JSONView](https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc?hl=en).



## Importing `pandas` to fetch the data from the API

To import JSON files we need to import the `pandas` library. This can load data directly from a URL which we will generate to query the API for JSON data.

In [1]:
#import the pandas library and call it 'pd' for the rest of the notebook
import pandas as pd

## Reading data from an online source

The `.read_json()` function from the `pandas` library can be used to import JSON an online source - you just need to use the URL of the file. 

Below we import JSON [from data.police.uk](https://data.police.uk/docs/method/forces/)

In [None]:
policedata = pd.read_json("https://data.police.uk/api/forces")
print(policedata.head())

                  id                            name
0  avon-and-somerset  Avon and Somerset Constabulary
1       bedfordshire             Bedfordshire Police
2     cambridgeshire     Cambridgeshire Constabulary
3           cheshire           Cheshire Constabulary
4     city-of-london           City of London Police


This particular data 'request' is very simple: the police API [describes it](https://data.police.uk/docs/method/forces/) as:

> "A list of all the police forces available via the API except the British Transport Police, which is excluded from the list returned. Unique force identifiers obtained here are used in requests for force-specific data via other methods."

The request doesn't require any particularly specific information - just one URL fetches all the data. That's quite unusual, however - you'll notice most APIs require information about the data you want.

## Querying the postcodes API

Back to the postcodes API we introduced at the beginning. 

Let's store the URL that we mentioned in an object in Python - we'll call it `url`:

In [2]:
url = "http://api.postcodes.io/postcodes/b47ap"

Note that this is only a string of characters - it is *not* the contents that can be found *at* that URL. But now that we've stored that URL address, we are going to grab some data from it.

In [3]:
json = pd.read_json(url)
print(json)

                            status                                             result
admin_county                   200                                               None
admin_district                 200                                         Birmingham
admin_ward                     200                                           Nechells
ccg                            200                        NHS Birmingham and Solihull
ced                            200                                               None
codes                          200  {'admin_district': 'E08000025', 'admin_county'...
country                        200                                            England
eastings                       200                                             407842
european_electoral_region      200                                      West Midlands
incode                         200                                                7AP
latitude                       200                    

## Drilling into the JSON

It's a good idea to have the URL open in a browser at the same time so you can see the structure and work out how to access the bit you're after. Again, you should use Chrome or Firefox with the extension [JSONView](https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc?hl=en) installed, as this makes it a lot easier to understand. (*Tip: hover over any element to see the 'path' to that element in the bottom left corner of the browser*).

The JSON itself has a tree-like structure with many different branches. Some parts are actually branches-of-branches. 

Those branches-of-branches are handled by `pandas` in a couple of different ways by storing them as dictionaries. 

If we print the contents of that object, you can see those by looking for curly brackets: the `codes` branch, for example, contains the data `{'admin_district': 'E08000025', 'admin_county'...}`

Let's try drilling down into the 'codes' part of the data frame to look further:


In [None]:
json['codes']

KeyError: ignored

We get an error, specifically a `KeyError` for 'codes', meaning that it cannot find a key with that name. Why?

If you check the page of JSON we grabbed this data from, you will see that actually the first two branches of the JSON data are 'status' and 'result' - the 'codes' branch doesn't come until *within* the 'result' branch.

What has happened is that `pandas` has treated those first two branches as the two columns of data. That's why our data has two problems: first, a column full of `200` which we don't need; and secondly, a data structure which is not ideal: what we would like to be column headings are actually at the start of each row.

Let's try to drill down into the 'result' branch instead:

In [None]:
print(json['result'])

admin_county                                                               None
admin_district                                                       Birmingham
admin_ward                                                             Nechells
ccg                                                 NHS Birmingham and Solihull
ced                                                                        None
codes                         {'admin_district': 'E08000025', 'admin_county'...
country                                                                 England
eastings                                                                 407842
european_electoral_region                                         West Midlands
incode                                                                      7AP
latitude                                                                52.4828
longitude                                                              -1.88596
lsoa                                    

Then let's try to go from there to 'codes':

In [None]:
json['result']['codes']

{'admin_county': 'E99999999',
 'admin_district': 'E08000025',
 'admin_ward': 'E05011155',
 'ccg': 'E38000220',
 'ccg_id': '15E',
 'ced': 'E99999999',
 'lau2': 'E08000025',
 'lsoa': 'E01033561',
 'msoa': 'E02001876',
 'nuts': 'TLG31',
 'parish': 'E43000250',
 'parliamentary_constituency': 'E14000564'}

And further still into the CCG code stored in 'ccg':

In [None]:
json['result']['codes']['ccg']

'E38000220'

If we wanted to grab the CCG code for a bunch of postcodes, this is how we might do it:

* Loop through the postcodes
* Generate a URL by adding that postcode to the end of the 'base' API query
* Fetch the JSON generated at that URL
* Drill down into the 'results > codes > ccg' branch of that JSON to get the data we need
* Add it to a data frame alongside the postcode
* Repeat!

## Forming a 'request' for the police data API

The [Police API documentation](https://data.police.uk/docs/) has a number of 'methods' that you can use to request data from their API. 

The '[crimes at location](https://data.police.uk/docs/method/crimes-at-location/)' method allows you to ask for data on crimes based on the location ID, or a latitude and longitude. An example is given for data from February 2017:

`https://data.police.uk/api/crimes-at-location?date=2017-02&lat=52.629729&lng=-1.131592`

However, that date is now so long ago that the URL doesn't actually work. Instead, change the year to 2021 to see a working example:

`https://data.police.uk/api/crimes-at-location?date=2021-02&lat=52.629729&lng=-1.131592`

Let's fetch that data.

In [None]:
crimedata = pd.read_json("https://data.police.uk/api/crimes-at-location?date=2021-02&lat=52.629729&lng=-1.131592")
print(crimedata)

        category location_type  ... location_subtype    month
0          drugs         Force  ...                   2021-02
1  violent-crime         Force  ...                   2021-02

[2 rows x 9 columns]


Now let's try to drill down into it to see the 'location' branch, because we know from the webpage at that URL it contains further sub-branches.

In [None]:
print(crimedata['location'])

0    {'latitude': '52.629909', 'street': {'id': 883...
1    {'latitude': '52.629909', 'street': {'id': 883...
Name: location, dtype: object


Then we try to drill down further...

In [None]:
print(crimedata['location']['street'])

KeyError: ignored

We get an error here - a `KeyError`. Why? Look at the output of `print(crimedata['location'])` - it was two rows, one starting with 0 and another with 1 before the keys appeared. Perhaps we should try an index instead?

In [None]:
print(crimedata['location'][0])

{'latitude': '52.629909', 'street': {'id': 883345, 'name': 'On or near Marquis Street'}, 'longitude': '-1.132073'}


That seems to work. Now instead of seeing two rows of data we just see the first one, and there are no indices at the start of each line.

Can we drill down further into that 'street' branch *for that row* now?

In [None]:
print(crimedata['location'][0]['street'])

{'id': 883345, 'name': 'On or near Marquis Street'}


Yes. And drill down once more into the final level of data?

In [None]:
print(crimedata['location'][0]['street']['name'])

On or near Marquis Street
