# Introduction to JSON 

# Using APIs and JSON Data

## Objectives
You will be able to:
* Access and manipulate data inside a JSON file
* Pull data from an API

## Agenda

* Review JSON Schemas
* Introduce APIs.
* Walk through how to make an API request. 
* Practice making API requests and Parsing the data.


### What is the difference between a JSON and a python dictionary?

https://realpython.com/python-json/

## Starting Off

Run the cell of code below which imports a json file and then loads it into python. Investigate the resulting `data` variable and learn all you can abou the object. 


In [1]:
import json
f = open('output.json')
data = json.load(f)

In [2]:
type(data)

dict

In [3]:
data.keys()

dict_keys(['albums'])

In [4]:
data['albums'].keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [5]:
data['albums']['items']

[{'album_type': 'single',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2RdwBSPQiwcmiDo9kixcl8'},
    'href': 'https://api.spotify.com/v1/artists/2RdwBSPQiwcmiDo9kixcl8',
    'id': '2RdwBSPQiwcmiDo9kixcl8',
    'name': 'Pharrell Williams',
    'type': 'artist',
    'uri': 'spotify:artist:2RdwBSPQiwcmiDo9kixcl8'}],
  'available_markets': ['AD',
   'AR',
   'AT',
   'AU',
   'BE',
   'BG',
   'BO',
   'BR',
   'CA',
   'CH',
   'CL',
   'CO',
   'CR',
   'CY',
   'CZ',
   'DE',
   'DK',
   'DO',
   'EC',
   'EE',
   'ES',
   'FI',
   'FR',
   'GB',
   'GR',
   'GT',
   'HK',
   'HN',
   'HU',
   'ID',
   'IE',
   'IS',
   'IT',
   'JP',
   'LI',
   'LT',
   'LU',
   'LV',
   'MC',
   'MT',
   'MX',
   'MY',
   'NI',
   'NL',
   'NO',
   'NZ',
   'PA',
   'PE',
   'PH',
   'PL',
   'PT',
   'PY',
   'SE',
   'SG',
   'SK',
   'SV',
   'TR',
   'TW',
   'US',
   'UY'],
  'external_urls': {'spotify': 'https://open.spotify.com/album/5ZX4m5aVSmWQ5iHAPQpT71'},
  

In [6]:
new_dict = {'a': 1, 'b':2, }
new_dict

{'a': 1, 'b': 2}

In [7]:
json.dumps(new_dict)

'{"a": 1, "b": 2}'

## Loading the JSON file

As before, we begin by importing the json package, opening a file with python's built in function, and then loading that data in.

In [8]:
import json
f = open('output.json')
data = json.load(f)

In [9]:
json.dumps({'a': None})

'{"a": null}'

## Exploring JSON Schemas  

Recall that JSON files have a nested structure. The most granular level of raw data will be individual numbers (float/int) and strings. These in turn will be stored in the equivalent of python lists and dictionaries. Because these can be combined, we'll start exploring by checking the type of our root object, and start mapping out the hierarchy of the json file.

In [10]:
type(data)

dict

In [11]:
data

{'albums': {'href': 'https://api.spotify.com/v1/browse/new-releases?country=SE&offset=0&limit=20',
  'items': [{'album_type': 'single',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2RdwBSPQiwcmiDo9kixcl8'},
      'href': 'https://api.spotify.com/v1/artists/2RdwBSPQiwcmiDo9kixcl8',
      'id': '2RdwBSPQiwcmiDo9kixcl8',
      'name': 'Pharrell Williams',
      'type': 'artist',
      'uri': 'spotify:artist:2RdwBSPQiwcmiDo9kixcl8'}],
    'available_markets': ['AD',
     'AR',
     'AT',
     'AU',
     'BE',
     'BG',
     'BO',
     'BR',
     'CA',
     'CH',
     'CL',
     'CO',
     'CR',
     'CY',
     'CZ',
     'DE',
     'DK',
     'DO',
     'EC',
     'EE',
     'ES',
     'FI',
     'FR',
     'GB',
     'GR',
     'GT',
     'HK',
     'HN',
     'HU',
     'ID',
     'IE',
     'IS',
     'IT',
     'JP',
     'LI',
     'LT',
     'LU',
     'LV',
     'MC',
     'MT',
     'MX',
     'MY',
     'NI',
     'NL',
     'NO',
     'NZ',
    

As you can see, in this case, the first level of the hierarchy is a dictionary. Let's explore what keys are within this:

In [92]:
data.keys()

dict_keys(['albums'])

In this case, there is only a single key, 'albums', so we'll continue on down the pathway exploring and mapping out the hierarchy. Once again, let's start by checking the type of this nested data structure.

In [14]:
type(data['albums'])

dict

Another dictionary! So thus far, we have a dictionary within a dictionary. Once again, let's investigate what's within this dictionary (JSON calls the equivalent of Python dictionaries Objects.)

In [15]:
data['albums'].keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

At this point, things are starting to look something like this: 
<img src="json_diagram1.JPG" width=550>

At this point, if we were to continue checking individual data types, we have a lot to go through. To simplify this, let's use a for loop:

In [16]:
for key in data['albums'].keys():
    print(key, type(data['albums'][key]))

href <class 'str'>
items <class 'list'>
limit <class 'int'>
next <class 'str'>
offset <class 'int'>
previous <class 'NoneType'>
total <class 'int'>


In [40]:
json.dumps({1: 2})

'{"1": 2}'

In [94]:
json.loads('{"a":1, 1: 2}')

JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 9 (char 8)

Adding this to our diagram we now have something like this:
<img src="json_diagram2.JPG" width=550>

Normally, you may not draw out the full diagram as done here, but its a useful picture to have in mind, and in complex schemas, can be useful to map out. At this point, you also probably have a good idea of the general structure of the json file. However, there is still the list of items, which we could investigate further:

## What is an API?

**Application Program Interfaces**, or APIs, are commonly used to retrieve data from remote websites. Sites like Reddit, Twitter, and Facebook all offer certain data through their APIs. 

To use an API, you make a request to a remote web server, and retrieve the data you need.

Python with two built-in modules, `urllib` and `urllib2` to handle these requests but these could be very confusing  and the documentation is not clear.

To make these things simpler, one easy-to-use third-party library, known as` Requests`, is what most developers prefer to use it instead or urllib/urllib2. With this library, you can access content like web page headers, form data, files, and parameters via simple Python commands. It also allows you to access the response data in a simple way.

![](logo.png)

Below is how you would install and import the requests library before making any requests. 
```python
# Uncomment and install requests if you dont have it already
# !pip install requests

# Import requests to working environment
#import requests
```

In [12]:
pip install requests

Note: you may need to restart the kernel to use updated packages.


In [13]:
import requests


## The `.get()` Method

Now we have requests library ready in our working environment, we can start making some requests using the `.get()` method as shown below:


We can use a simple GET request to retrieve information from the OpenNotify API.




OpenNotify has several API endpoints. An endpoint is a server route that is used to retrieve different data from the API. For example, the /comments endpoint on the Reddit API might retrieve information about comments, whereas the /users endpoint might retrieve data about users. To access them, you would add the endpoint to the base url of the API.



In [14]:
# Make a get request to get the latest position of the international space station from the opennotify api.
response = requests.get("http://api.open-notify.org/iss-now.json")

# When is the ISS overhead?
#response = requests.get("http://api.open-notify.org/iss-pass.json?lat=41.4984174&lon=-81.6937287")

# Print the status code of the response.
print(response.status_code)

200


In [15]:
response_dict = response.json()

In [16]:
response_dict

{'iss_position': {'longitude': '71.5838', 'latitude': '22.4829'},
 'timestamp': 1563205057,
 'message': 'success'}

In [17]:
response_dict.keys()

dict_keys(['iss_position', 'timestamp', 'message'])

In [95]:
# Get the response from the API endpoint.
response = requests.get("http://api.open-notify.org/astros.json")
data = response.json()
# 9 people are currently in space.
print(data["number"])
print(data)

3
{'people': [{'craft': 'ISS', 'name': 'Alexey Ovchinin'}, {'craft': 'ISS', 'name': 'Nick Hague'}, {'craft': 'ISS', 'name': 'Christina Koch'}], 'number': 3, 'message': 'success'}


GET is by far the most used HTTP method. We can use GET request to retrieve data from any destination. 

## Status Codes
The request we make may not be always successful. The best way is to check the status code which gets returned with the response. Here is how you would do this. 


In [18]:
response.status_code == requests.codes.ok

True

In [19]:
requests.codes.IM_A_TEAPOT


418

So this is a good check to see if our request was successful. Depending on the status of the web server, the access rights of the clients and availibility of requested information. A web server may return a number of status codes within the response. Wikipedia has an exhaustive details on all these codes. [Check them out here](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)

### Common status codes

* 200 — everything went okay, and the result has been returned (if any)
* 301 — the server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
* 401 — the server thinks you’re not authenticated. This happens when you don’t send the right credentials to access an API (we’ll talk about authentication in a later post).
* 400 — the server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
* 403 — the resource you’re trying to access is forbidden — you don’t have the right permissions to see it.
* 404 — the resource you tried to access wasn’t found on the server.

In [20]:
response = requests.get("http://api.open-notify.org/iss-pass")
print(response.status_code)

404


### Hitting the right endpoint
iss-pass wasn’t a valid endpoint, so we got a 404 status code in response. We forgot to add `.json` at the end, as the API documentation states.

We’ll now make a GET request to http://api.open-notify.org/iss-pass.json.

In [21]:
response = requests.get("http://api.open-notify.org/iss-pass.json?lat=41.4984174&lon=-81.6937287")
print(response.status_code)

200



## Response Contents
Once we know that our request was successful and we have a valid response, we can check the returned information using `.text` property of the response object. 
```python
print (resp.text)
```

In [22]:
response = requests.get('http://api.open-notify.org/iss-pass.json?lat=10&lon=20')
print(response.status_code)
print(response.text)

200
{
  "message": "success", 
  "request": {
    "altitude": 100, 
    "datetime": 1563205978, 
    "latitude": 10.0, 
    "longitude": 20.0, 
    "passes": 5
  }, 
  "response": [
    {
      "duration": 457, 
      "risetime": 1563209976
    }, 
    {
      "duration": 634, 
      "risetime": 1563215672
    }, 
    {
      "duration": 621, 
      "risetime": 1563251349
    }, 
    {
      "duration": 477, 
      "risetime": 1563257206
    }, 
    {
      "duration": 155, 
      "risetime": 1563293573
    }
  ]
}



In [23]:
response = requests.get('http://api.open-notify.org/iss-now.json')
print(response.status_code)
print(response.text)

200
{"iss_position": {"longitude": "169.5447", "latitude": "46.8056"}, "timestamp": 1563206329, "message": "success"}


### Query parameters

If you look at the documentation for the OpenNotify API, we see that the ISS Pass endpoint requires two parameters.

We can do this by adding an optional keyword argument, params, to our request. In this case, there are two parameters we need to pass:

* lat — The latitude of the location we want.
* lon — The longitude of the location we want.
We can make a dictionary with these parameters, and then pass them into the requests.get function.

We can also do the same thing directly by adding the query parameters to the url, like this: http://api.open-notify.org/iss-pass.json?lat=40.71&lon=-74.

It’s almost always preferable to setup the parameters as a dictionary, because requests takes care of some things that come up, like properly formatting the query parameters.

We’ll make a request using the coordinates of New York City, and see what response we get.

In [24]:
# Set up the parameters we want to pass to the API.
# This is the latitude and longitude of New York City.
parameters = {"lat": 40.71, "lon": -74}


# Make a get request with the parameters.
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)


# Print the content of the response (the data the server returned)
print(response.content)
resp_dict = response.json()
print(resp_dict)

b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1563206008, \n    "latitude": 40.71, \n    "longitude": -74.0, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 470, \n      "risetime": 1563238418\n    }, \n    {\n      "duration": 648, \n      "risetime": 1563244094\n    }, \n    {\n      "duration": 605, \n      "risetime": 1563249934\n    }, \n    {\n      "duration": 552, \n      "risetime": 1563255819\n    }, \n    {\n      "duration": 601, \n      "risetime": 1563261653\n    }\n  ]\n}\n'
{'message': 'success', 'request': {'altitude': 100, 'datetime': 1563206008, 'latitude': 40.71, 'longitude': -74.0, 'passes': 5}, 'response': [{'duration': 470, 'risetime': 1563238418}, {'duration': 648, 'risetime': 1563244094}, {'duration': 605, 'risetime': 1563249934}, {'duration': 552, 'risetime': 1563255819}, {'duration': 601, 'risetime': 1563261653}]}


In [25]:
json.loads(response.content)

{'message': 'success',
 'request': {'altitude': 100,
  'datetime': 1563206008,
  'latitude': 40.71,
  'longitude': -74.0,
  'passes': 5},
 'response': [{'duration': 470, 'risetime': 1563238418},
  {'duration': 648, 'risetime': 1563244094},
  {'duration': 605, 'risetime': 1563249934},
  {'duration': 552, 'risetime': 1563255819},
  {'duration': 601, 'risetime': 1563261653}]}

So this returns a lot of information which by default is not really human understandable due to data encoding, HTML tags and other styling information that only a web browser can truly translate. In later lessons we shall look at how we can use ** Regular Exprerssions**  to clean this information and extract the required bits and pieces for analysis. 

## Response Headers
The response of an HTTP request can contain many headers that holds different bits of information. We can use `.header` property of the response object to access the header information as shown below:


In [26]:
# Code here 
dict(response.headers)

{'Server': 'nginx/1.10.3',
 'Date': 'Mon, 15 Jul 2019 15:58:50 GMT',
 'Content-Type': 'application/json',
 'Content-Length': '519',
 'Connection': 'keep-alive',
 'Via': '1.1 vegur'}

The content of the headers is our required element. You can see the key-value pairs holding various pieces of  information about the resource and request. Let's try to parse some of these values using the requests library:

```python
print(resp.headers['Content-Length'])  # length of the response
print(resp.headers['Date'])  # Date the response was sent
print(resp.headers['server'])   # Server type (google web service - GWS)
```