<div class="alert alert-block alert-info"><b>IAB303</b> - Data Analytics for Business Insight</div>

## PRACTICAL :: Open data for business environment assessment

1. Basics
2. Connect to API
3. Experiment with JSON from APIs

---
### [1] Basics

* JSON objects loaded as python Dicts
* JSON arrays loaded as python Lists

---
#### JSON objects

In [4]:
# import the python json library
import json

#  a simple json object
simpleJsonObj = '{ "lastname": "gibson","firstname": "andrew"}'
print("The type of simpleJsonObj:",type(simpleJsonObj))
loadedObj = json.loads(simpleJsonObj)

# print some information about the loaded object
print("loadedObj looks like:",loadedObj)
print("The type of loadedObj:",type(loadedObj))

The type of simpleJsonObj: <class 'str'>
loadedObj looks like: {'lastname': 'gibson', 'firstname': 'andrew'}
The type of loadedObj: <class 'dict'>


**DISCUSSION:** 
- What is a string?
- What happened when we loaded the string with the function `json.loads()`?
- A python `dict` is a dictionary which has `keys` and `values`. Using your common knowledge of a dictionary, what do you think keys and values are?
- What are the keys and values in the example above?

In [14]:
# Access dict values by key
value1 = loadedObj.get("firstname")
print("First name:",value1)
value2 = loadedObj["lastname"]
print("Last name:",value2)

First name: andrew
Last name: gibson


In [16]:
# Or a shortcut for the above
print("First name:",loadedObj['firstname'])
print("Last name:",loadedObj['lastname'])

First name: andrew
Last name: gibson


In [10]:
# Get all of the keys
print("keys:",loadedObj.keys())

# Get all of the values
print("values:",loadedObj.values())

# Get all of the items
print("items:",loadedObj.items())

keys: dict_keys(['lastname', 'firstname'])
values: dict_values(['gibson', 'andrew'])
items: dict_items([('lastname', 'gibson'), ('firstname', 'andrew')])


We can iterate over any of these lists (keys, values, items). This enables us to do something useful with each itme in the dictionary. For now, we are using print, but we could do anything (possible in Python) with each item by using the same approach. The basic pattern is: 

```
for ITEM in LIST:
    do something
```

In [13]:
# Iterate over the dictionary items
for key,value in loadedObj.items():
    print("The",key,"is",value)

The lastname is gibson
The firstname is andrew


**EXPERIMENT:** 
- try iterating over keys and values separately
- try changing the key and value above
- try adding more keys and values to the json object (hint: `mydict[key] = "value"`)
- try sorting the dictionary (hint: `sorted(mydict)` or `sorted(mydict.values())`)

---
#### JSON arrays

In [20]:
# try loading a json array
simpleJsonArray = '["element1",2,"el 3",432, { "name": "andrew" }]'

loadedArray = json.loads(simpleJsonArray)
print("loadedArray looks like:",loadedArray)
print("loadedArray type is:",type(loadedArray))

loadedArray looks like: ['element1', 2, 'el 3', 432, {'name': 'andrew'}]
loadedArray type is: <class 'list'>


In [22]:
# Access an element of a list using its index
elem = loadedArray[1]
print("The first element is:",elem)

The first element is: 2


- What is this incorrect?
- Modify the code to make it correct

In [24]:
# get the last element in the list
nameObj = loadedArray[-1]
print(nameObj)
print(type(nameObj))

{'name': 'andrew'}
<class 'dict'>


- What do you notice about this element type?
- What does this tell you that we can do with Python lists and dicts?
- Why is this important?

In [25]:
# Iterate over the list the same as we did for dicts
for entry in loadedArray:
    print(entry)

element1
2
el 3
432
{'name': 'andrew'}


In [27]:
# Test if an element exists
432 in loadedArray

True

In [29]:
# Make sure the type is correct
'el 3' in loadedArray

True

In [30]:
# What if the element isn't there?
'route66' in loadedArray

False

In [31]:
# What if the index is bigger than the list
loadedArray[14]

IndexError: list index out of range

**EXPERIMENT:** 
- try adding elements, to the original json array
- try adding json objects to the json array
- try adding a json array as an element in a json object and add that to the original array

---
### [2] Connect to an API

Starting with the workshop example, let's retrieve some JSON from XKCD and load it into Python Dict. For each code block, try and explain in your own words what the code is doing.

In [35]:
from urllib import request, response

# Fetch the data for the latest xkcd comic
comicRequest = request.Request('http://xkcd.com/info.0.json')
comicResponse = request.urlopen(comicRequest)
print(comicResponse.status)                   

200


- change `json` in the request to `js` and see what happens
- what does the code `200` mean? (hint: search `http status codes`)

In [36]:
# Get the body of the response
responseBody = comicResponse.read().decode('utf8')
print(responseBody)
print(type(responseBody))

{"month": "3", "num": 2121, "link": "", "year": "2019", "news": "", "safe_title": "Light Pollution", "transcript": "", "alt": "It's so sad how almost no one alive today can remember seeing the galactic rainbow, the insanity nebula, or the skull and glowing eyes of the Destroyer of Sagittarius.", "img": "https://imgs.xkcd.com/comics/light_pollution.png", "title": "Light Pollution", "day": "8"}
<class 'str'>


- This is a string, but what kind of data does it look like?
- What can we do with it?

In [37]:
#Read the JSON
jsonFromApi = json.loads(responseBody)
print(jsonFromApi)
print(type(jsonFromApi))

{'month': '3', 'num': 2121, 'link': '', 'year': '2019', 'news': '', 'safe_title': 'Light Pollution', 'transcript': '', 'alt': "It's so sad how almost no one alive today can remember seeing the galactic rainbow, the insanity nebula, or the skull and glowing eyes of the Destroyer of Sagittarius.", 'img': 'https://imgs.xkcd.com/comics/light_pollution.png', 'title': 'Light Pollution', 'day': '8'}
<class 'dict'>


In [38]:
#Now get the title of the comic
title = jsonFromApi["title"]
print(title)

Light Pollution


**EXPERIMENT:**
- we now have data in the same format as we were experimenting with before
- try and print out all of the keys
- try and get different values by using their key

---
### Experiment with JSON from APIs

- First create a function that gets json from a URL and returns a Python array or dict
- Try your function on: `https://dog.ceo/api/breeds/list/all` - how many breeds of hound are there?

In [41]:
# Function
def getDogBreeds(url):
    req = request.Request(url)
    resp = request.urlopen(req)
    body = resp.read().decode('utf8')
    return json.loads(body)

In [48]:
# Get the breeds
data = getDogBreeds("https://dog.ceo/api/breeds/list/all")
print("Data from server:\n",data)

# Get just the info we need
breeds = data["message"]
print("\nBreeds:\n",breeds) #Note the new lines to help with formatting

Data from server:
 {'status': 'success', 'message': {'affenpinscher': [], 'african': [], 'airedale': [], 'akita': [], 'appenzeller': [], 'basenji': [], 'beagle': [], 'bluetick': [], 'borzoi': [], 'bouvier': [], 'boxer': [], 'brabancon': [], 'briard': [], 'bulldog': ['boston', 'french'], 'bullterrier': ['staffordshire'], 'cairn': [], 'cattledog': ['australian'], 'chihuahua': [], 'chow': [], 'clumber': [], 'cockapoo': [], 'collie': ['border'], 'coonhound': [], 'corgi': ['cardigan'], 'cotondetulear': [], 'dachshund': [], 'dalmatian': [], 'dane': ['great'], 'deerhound': ['scottish'], 'dhole': [], 'dingo': [], 'doberman': [], 'elkhound': ['norwegian'], 'entlebucher': [], 'eskimo': [], 'frise': ['bichon'], 'germanshepherd': [], 'greyhound': ['italian'], 'groenendael': [], 'hound': ['afghan', 'basset', 'blood', 'english', 'ibizan', 'walker'], 'husky': [], 'keeshond': [], 'kelpie': [], 'komondor': [], 'kuvasz': [], 'labrador': [], 'leonberg': [], 'lhasa': [], 'malamute': [], 'malinois': [], 'm

In [52]:
# Answer question?
breed2find = "hound"
selected = breeds[breed2find]
# how many?
print("There are",len(selected),"breeds of",breed2find,":")
for breed in selected:
    print("\t",breed)

There are 6 breeds of hound :
	 afghan
	 basset
	 blood
	 english
	 ibizan
	 walker


**EXPERIMENT:**
- Try parsing the data from this API in different ways to answer different questions
- Try getting data from a different API and parsing it