# Data acquisition with APIs

---

<a id="learning-objectives"></a>
## Learning Objectives
*After completing this notebook, you will be able to:*

- Make API requests with the `requests` library
- Work with JSON data in a similar way to dictionaries
- Convert JSON data to a `pandas` dataframe

## Contents:
* [Making API requests](#apis)
* [Converting JSON to a pandas dataframe](#pandas)

<a id="apis"></a>

# <font color='blue'> Making API requests 

Let's start by importing the **requests** library, which we'll be using to make API requests

In [None]:
import requests

Let's make a request to the astronauts API and view the resulting JSON.

In [None]:
astro_request = requests.get('http://api.open-notify.org/astros.json')
astro_json = astro_request.json()
print(astro_json)

## <font color='red'> Now you try: Astronaut API
    
1. Can you access the `status code` attribute of the `astro_request` object? What does the result mean? 


2. Access the `people` element of `astro_json`. What type of data is the result?


3. Use your knowledge of list functions to find the total number of people on the international space station, **without using the `number` element of the dictionary**.


---
## <font color='red'> Now you try: Share price API
    
1. Make an API request to: `https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=MSFT&interval=5min&apikey=demo` 


2. Check the status code of the result to confirm the request has been successful.


3. Inspect the JSON visually.


4. We can see that the JSON consists of two top-level keys: `Meta Data` which provides summary information about the data, and `Time Series (5min)` which provides the actual share price information we're after. Use the correct key to retrieve `Time Series (5min)` only, discarding the **Meta data** contained in the JSON.


5. Find the number of time points contained in the JSON.


6. Use the AlphaVantage documentation to figure out how to construct an API request (i.e. a URL) that will give you **daily** share prices for **Google**. **You'll need to sign up for a free API key and include this in your API request.**



## <font color='red'> Now you try: Police API
    
Let's experiment with the Police UK API. Try to complete the following tasks:

1. Using the documentation for the Police API, find out how to construct an API request to retrieve details of **stop and searches** in **London** during **May 2019** (hint: depending on which **API endpoint** you decide to use, you'll either need to look up the latitude and longitude of London using Google Maps or specify that you're interested in the `metropolitan` police force)

2. Make this API request in your browser and inspect the results visually. What information is being returned?


3. Using the `len` function, determine the total number of stop and searches recorded for this period.


4. Construct a `for` loop to build one big JSON object that contains stop and search data for London for the whole of 2019. 


5. How many stop and search incidents happened in London during this period? 


<a id="pandas"></a>

# <font color='blue'> Introducing Pandas

JSON itself isn't very intuitive to work with; a better way of viewing and manipulating this data would be to use a table format. 

The Python library **pandas** is well suited to data manipulation, cleaning and exploratory analysis. Let's try it out:

In [None]:
import pandas as pd

The main data structure in pandas library is the `DataFrame`; think of this as the Python equivalent of a table in Excel. 

It's very straightforward to convert JSON into DataFrame, using the `DataFrame` command. First, let's again request stop and search data in London for May 2019:

In [None]:
latitude = '51.535142'
longitude = '-0.124971'
date = '2019-05'
police_api_url = 'https://data.police.uk/api/stops-street?lat=' + latitude + '&lng=' + longitude +'&date=' + date
police_json = requests.get(police_api_url).json()

Then in one single step, we convert our JSON into a `DataFrame` using the `pd.DataFrame` function.

In [None]:
police_dataframe = pd.DataFrame(police_json)

In [None]:
type(police_dataframe)

We can preview the first 5 rows of our `DataFrame` using the `head()` method.

In [None]:
police_dataframe.head()

We can preview the first `n` rows by passing a parameter to the `head()` method

In [None]:
police_dataframe.head(10)

We can also use the `tail()` method to view the last rows of a `DataFrame`

In [None]:
police_dataframe.tail()

Notice that there's an **index** on the left hand side; this isn't part of the underlying data, it's added by Pandas to help us access different rows according to their position in the DataFrame- exactly like we did with lists.

Column names are automatically detected and formatted by Pandas.

We can access a list of column names, and the index of a `DatFrame` easily.

In [None]:
police_dataframe.columns

In [None]:
police_dataframe.index

We can also access its dimensions, in `(row,column)` format.

In [None]:
police_dataframe.shape

## <font color='red'> Now you try: Share price API
    
Let's convert our Google share price JSON into a `DataFrame` with Pandas.

1. Use the `pd.DataFrame()` function to convert the time series **only** in your Google share price JSON into a DataFrame. 


2. What goes wrong if you try convert the entire JSON response (including the `Meta Data` part) into a `DataFrame`? (Try it out!)


3. What looks weird about our time series `DataFrame`? Figure out what the correct `pandas` function is to reformat the `DataFrame` so that each row corresponds to a single time interval, and the columns correspond to the opening, closing, high and low share prices. 
