# Data acquisition with APIs

---

## <font color='red'> Now you try: Astronaut API
    
1. Can you access the `status code` attribute of the `astro_request` object? What does the result mean? 


2. Access the `people` element of `astro_json`. What type of data is the result?


3. Use your knowledge of list functions to find the total number of people on the international space station, **without using the `number` element of the dictionary**.


In [7]:
import requests

astro_url = 'http://api.open-notify.org/astros.json'
astro_request = requests.get(astro_url)

astro_request.status_code

200

In [None]:
astro_json = astro_request.json()
astro_json

In [None]:
len(astro_json['people'])

---
## <font color='red'> Now you try: Share price API
    
1. Make an API request to: `https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=MSFT&interval=5min&apikey=demo` 


2. Check the status code of the result to confirm the request has been successful.


3. Inspect the JSON visually.


4. We can see that the JSON consists of two top-level keys: `Meta Data` which provides summary information about the data, and `Time Series (5min)` which provides the actual share price information we're after. Use the correct key to retrieve `Time Series (5min)` only, discarding the **Meta data** contained in the JSON.


5. Find the number of time points contained in the JSON.


6. Use the AlphaVantage documentation to figure out how to construct an API request (i.e. a URL) that will give you **daily** share prices for **Google**. **You'll need to sign up for a free API key and include this in your API request.**



In [8]:
msft_url = 'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=MSFT&interval=5min&apikey=demo'

msft_request = requests.get(msft_url)

msft_json = msft_request.json()
msft_json

{'Meta Data': {'1. Information': 'Intraday (5min) open, high, low, close prices and volume',
  '2. Symbol': 'MSFT',
  '3. Last Refreshed': '2020-02-05 14:25:00',
  '4. Interval': '5min',
  '5. Output Size': 'Compact',
  '6. Time Zone': 'US/Eastern'},
 'Time Series (5min)': {'2020-02-05 14:25:00': {'1. open': '179.2173',
   '2. high': '179.2700',
   '3. low': '179.0450',
   '4. close': '179.2050',
   '5. volume': '219168'},
  '2020-02-05 14:20:00': {'1. open': '179.0550',
   '2. high': '179.2200',
   '3. low': '179.0550',
   '4. close': '179.2200',
   '5. volume': '182582'},
  '2020-02-05 14:15:00': {'1. open': '178.9500',
   '2. high': '179.0900',
   '3. low': '178.9150',
   '4. close': '179.0600',
   '5. volume': '149093'},
  '2020-02-05 14:10:00': {'1. open': '178.8700',
   '2. high': '179.0100',
   '3. low': '178.7800',
   '4. close': '178.9550',
   '5. volume': '282058'},
  '2020-02-05 14:05:00': {'1. open': '179.0156',
   '2. high': '179.0400',
   '3. low': '178.8600',
   '4. clos

In [9]:
msft_json['Time Series (5min)']

{'2020-02-05 14:25:00': {'1. open': '179.2173',
  '2. high': '179.2700',
  '3. low': '179.0450',
  '4. close': '179.2050',
  '5. volume': '219168'},
 '2020-02-05 14:20:00': {'1. open': '179.0550',
  '2. high': '179.2200',
  '3. low': '179.0550',
  '4. close': '179.2200',
  '5. volume': '182582'},
 '2020-02-05 14:15:00': {'1. open': '178.9500',
  '2. high': '179.0900',
  '3. low': '178.9150',
  '4. close': '179.0600',
  '5. volume': '149093'},
 '2020-02-05 14:10:00': {'1. open': '178.8700',
  '2. high': '179.0100',
  '3. low': '178.7800',
  '4. close': '178.9550',
  '5. volume': '282058'},
 '2020-02-05 14:05:00': {'1. open': '179.0156',
  '2. high': '179.0400',
  '3. low': '178.8600',
  '4. close': '178.8700',
  '5. volume': '175378'},
 '2020-02-05 14:00:00': {'1. open': '179.0310',
  '2. high': '179.1700',
  '3. low': '179.0050',
  '4. close': '179.0100',
  '5. volume': '151933'},
 '2020-02-05 13:55:00': {'1. open': '179.3518',
  '2. high': '179.3700',
  '3. low': '179.0300',
  '4. clo

In [10]:
len(msft_json['Time Series (5min)'])

100

In [None]:
google_url = 'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=GOOG&apikey=YOUR_API_KEY_HERE'

google_request = requests.get(google_url)

google_json = google_request.json()
google_json

In [None]:
google_json['Time Series (Daily)']

In [None]:
len(google_json['Time Series (Daily)'])

## <font color='red'> Now you try: Police API
    
Let's experiment with the Police UK API. Try to complete the following tasks:

1. Using the documentation for the Police API, find out how to construct an API request to retrieve details of **stop and searches** in **London** during **May 2019** (hint: depending on which **API endpoint** you decide to use, you'll either need to look up the latitude and longitude of London using Google Maps or specify that you're interested in the `metropolitan` police force)

2. Make this API request in your browser and inspect the results visually. What information is being returned?


3. Using the `len` function, determine the total number of stop and searches recorded for this period.


4. Construct a `for` loop to build one big JSON object that contains stop and search data for London for the whole of 2019. 


5. How many stop and search incidents happened in London during this period? 


In [None]:
latitude = '51.535142'
longitude = '-0.124971'
date = '2019-05'

police_api_url = 'https://data.police.uk/api/stops-street?lat=' + latitude + '&lng=' + longitude +'&date=' + date



In [None]:
latitude = '51.535142'
longitude = '-0.124971'

police_json = []

for month in ['01','02','03','04','05','06','07','08','09','10','11','12']:
    
    date = '2019-'+month
    police_api_url = 'https://data.police.uk/api/stops-street?lat=' + latitude + '&lng=' + longitude +'&date=' + date
    
    police_json = police_json + requests.get(police_api_url).json()
    

Writing out `['01','02','03','04','05','06','07','08','09','10','11','12']` is a bit clunky and not very efficient. There are other ways we can generate a list of strings that we can then iterate over. Some faster ways of generating this list of months are shown below. They both use the `zfill()` method.

The `zfill()` method can be run on strings, to 'pad' a string with zeroes until the whole string reaches a given length. 

For example, to represent the number `1` as `01`, I would first convert it to a string and then use `zfill()` with the input `2` to specify that I want Python to add zeroes in front of the string until its total length is 2.

In [13]:
str(1).zfill(2)

'01'

If I used `zfill(2)` on a string that already had a length of 2, the function would have no effect. For example:

In [15]:
str(1).zfill(12)

'000000000001'

Similarly, to represent the number 1 as 001, I would first convert it to a string and then use zfill() with the input 3 to specify that I want Python to add zeroes in front of the string until its total length is 3.

In [14]:
str(1).zfill(3)

'001'

In [12]:
for month in range(1,13):
    
    date = '2019-' + str(month).zfill(2)
    print(date)

2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12


This next example is a more compact way of writing the `for` loop above. This way of looping over the elements in a list to create a new list is called a **list comprehension**. Don't worry too much about this at the moment. 

If you want to learn more about list comprehensions (they're a neat trick if you can get your head around them) try this guide: https://www.programiz.com/python-programming/list-comprehension

In [6]:
[str(i).zfill(2) for i in range(1,13)]

['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']

## <font color='red'> Now you try: Share price API
    
Let's convert our Google share price JSON into a `DataFrame` with Pandas.

1. Use the `pd.DataFrame()` function to convert the time series **only** in your Google share price JSON into a DataFrame. 


2. What goes wrong if you try convert the entire JSON response (including the `Meta Data` part) into a `DataFrame`? (Try it out!)


3. What looks weird about our time series `DataFrame`? Figure out what the correct `pandas` function is to reformat the `DataFrame` so that each row corresponds to a single time interval, and the columns correspond to the opening, closing, high and low share prices. 

**This solution uses a demo API URL rather than one that requires an API key, because it's a VERY bad idea to share an API key with anyone else, let alone push it to GitHub!**

In [17]:
import pandas as pd

msft_url = 'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=MSFT&apikey=demo'
msft_json = requests.get(msft_url).json()

pd.DataFrame(msft_json['Time Series (Daily)']).transpose()


Unnamed: 0,1. open,2. high,3. low,4. close,5. volume
2020-02-05,184.0300,184.2000,178.4101,179.2400,26137349
2020-02-04,177.1400,180.6400,176.3100,180.1200,36238402
2020-02-03,170.4300,174.5000,170.4000,174.3800,30149052
2020-01-31,172.2100,172.4000,169.5800,170.2300,36142690
2020-01-30,174.0500,174.0500,170.7900,172.7800,51597470
...,...,...,...,...,...
2019-09-19,140.3000,142.3700,140.0736,141.0700,36095413
2019-09-18,137.3600,138.6700,136.5299,138.5200,24473386
2019-09-17,136.9600,137.5200,136.4250,137.3900,17976285
2019-09-16,135.8300,136.7000,135.6600,136.3300,16731440
