<img src="https://github.com/christopherhuntley/BUAN5405-docs/blob/master/Slides/img/Dolan.png?raw=true" width="180px" align="right">

# Lesson 12: Odds and Ends
_Tuples inside of lists inside of dictionaries ..._

# Learning Objectives

## Theory / Be able to explain ...
- aaa

## Skills / Know how to  ...
- aaa

**This is new content you won't find in the Py4E book. It is, however, somewhat related to the JSON materials in Chapter 13.**

## So long, farewell, auf Wiedersehen, good night
> This is the end, beautiful friend -- _The Doors_

Now that we have pandas in our quiver, it would seem like "what else do we need to know?" Quite a lot actually. DataFrames and Series bring together so many best practices that programmers have earned and learned over the past decades. However, pandas does not say much about the types of data we pour into our DataFrames. For that we will often need to pull in other libraries with even more data types, many of which build on top of pandas or numpy to make our lives that much better. We will briefly cover two such data types: JSON trees and Time Series. 

Except for your final project, with this lesson we have reached the end of the course. I hope you have gotten what you needed while you were here. It's been fun putting all this together for you and I hope to see you in person sometime soon. 

Now it's your turn. You know just enough Python to learn the rest. If you want to get better then keep your hands on the keyboards working on your own projects. If happen to be on our Slack, then try out the `#python-slashes` channel to post your best one-liners. I heard a rumour that there might even be a prize for the best one. 

## ReST APIs and JSON Data
In order to work with _really big data_ one usually has to gather it straight from the cloud. There is no concept of files, filenames, and filepaths in such an environment. Instead, we use **APIs** that provide **endpoints**, **query strings**, and ... **JSON data**. 

An Application Programming Interface (API) provides a standard set of functions (and perhaps a few configuration constants like security keys) for delivering a service. While there are APIs for lots of uses, the relevant ones for a data scientist are likely to be [ReST APIs](https://en.wikipedia.org/wiki/Representational_state_transfer). A ReST API uses the https protocol (i.e., the web) with specially-crafted request **endpoints** that tell the service what you want. Each endpoint combines a https command (GET, POST, PUT, PATCH, and DELETE) with a URL pattern indicating what data or other resource is being accessed. For example, the following ReST API call asks GitHub for details about this very notebook.  

In [22]:
import requests

raw_json = requests.get("https://api.github.com/repos/christopherhuntley/BUAN5405-lessons/commits?path=L12_Odds_Ends.ipynb")
raw_json.text

'[{"sha":"81943548d82515962b3d01b8c2774393804fff15","node_id":"MDY6Q29tbWl0MjY5NzA5NTA2OjgxOTQzNTQ4ZDgyNTE1OTYyYjNkMDFiOGMyNzc0MzkzODA0ZmZmMTU=","commit":{"author":{"name":"Christopher Huntley","email":"christopher.huntley@gmail.com","date":"2020-06-20T05:42:17Z"},"committer":{"name":"Christopher Huntley","email":"christopher.huntley@gmail.com","date":"2020-06-20T05:42:17Z"},"message":"Swapped assignment order and added intro to Lesson 12.","tree":{"sha":"56519bc74c9a60615fd8af348d1e4144c77f3beb","url":"https://api.github.com/repos/christopherhuntley/BUAN5405-lessons/git/trees/56519bc74c9a60615fd8af348d1e4144c77f3beb"},"url":"https://api.github.com/repos/christopherhuntley/BUAN5405-lessons/git/commits/81943548d82515962b3d01b8c2774393804fff15","comment_count":0,"verification":{"verified":false,"reason":"unsigned","signature":null,"payload":null}},"url":"https://api.github.com/repos/christopherhuntley/BUAN5405-lessons/commits/81943548d82515962b3d01b8c2774393804fff15","html_url":"https://


Lovely, isn't it? This is **JavaScript Object Notation (JSON)**, which has become the lingua franca of data over the web. We can make it a little prettier using Python's built-in `json` library.

In [24]:
# raw_json was retrieved above.
import json 

github_data = json.loads(raw_json.text)  # convert to native Python objects
print(json.dumps(github_data, indent=4)) # pretty-print the JSON tree

[
    {
        "sha": "81943548d82515962b3d01b8c2774393804fff15",
        "node_id": "MDY6Q29tbWl0MjY5NzA5NTA2OjgxOTQzNTQ4ZDgyNTE1OTYyYjNkMDFiOGMyNzc0MzkzODA0ZmZmMTU=",
        "commit": {
            "author": {
                "name": "Christopher Huntley",
                "email": "christopher.huntley@gmail.com",
                "date": "2020-06-20T05:42:17Z"
            },
            "committer": {
                "name": "Christopher Huntley",
                "email": "christopher.huntley@gmail.com",
                "date": "2020-06-20T05:42:17Z"
            },
            "message": "Swapped assignment order and added intro to Lesson 12.",
            "tree": {
                "sha": "56519bc74c9a60615fd8af348d1e4144c77f3beb",
                "url": "https://api.github.com/repos/christopherhuntley/BUAN5405-lessons/git/trees/56519bc74c9a60615fd8af348d1e4144c77f3beb"
            },
            "url": "https://api.github.com/repos/christopherhuntley/BUAN5405-lessons/git/commits/81

While still not easy on the eyes, we can now at least start to make out the data structure. It looks to be a hierarchy (or **tree**) of **nested lists and dictionaries,** one inside another. In fact, that's exactly what it is. The `json` library's `loads()` function converted everything into native Python for us.  (Strangely, we had to convert it back to JSON using `dumps()` in order to pretty print it.)

Once we have the data in a Python-native tree format, it's pretty easy to traverse it to do things. We can refer to each **node** (item in the tree) using `[]` notation. We can determine a node's type of data (`list`, `dict`, `int`, `float`, `string`, `bytes`) using the `type()` or `instance_of()` functions. We can even select just the parts of the tree we want using slices. Though we won't do it here, we can even iterate through the tree searching for nodes of interest.

In [39]:
print("The root of the tree is a", type(github_data), "with",len(github_data), "items.") # ReST JSONs are always dicts or lists ...
print("The first branch is a", type(github_data[0]), "with",len(github_data), "items.")  # With nested dicts/lists inside; each list/dict is a branch of the tree
print("The commit message for the last commit was", github_data[0]['commit']['message']) # And the commit message is a leaf
print("The committer was \n",json.dumps(github_data[0]['committer'],indent=4))           # We can also take a cutting from the tree home with us.

The root ofthe tree is a <class 'list'> with 1 items.
The first branch is a <class 'dict'> with 1 items.
The commit message for the last commit was Swapped assignment order and added intro to Lesson 12.
The committer was 
 {
    "login": "christopherhuntley",
    "id": 6188254,
    "node_id": "MDQ6VXNlcjYxODgyNTQ=",
    "avatar_url": "https://avatars1.githubusercontent.com/u/6188254?v=4",
    "gravatar_id": "",
    "url": "https://api.github.com/users/christopherhuntley",
    "html_url": "https://github.com/christopherhuntley",
    "followers_url": "https://api.github.com/users/christopherhuntley/followers",
    "following_url": "https://api.github.com/users/christopherhuntley/following{/other_user}",
    "gists_url": "https://api.github.com/users/christopherhuntley/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/christopherhuntley/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/christopherhuntley/subscriptions",
    "organizations_url

## Time Series

---
## Exercises
**1. The code below retrieves a JSON tree with information about _Space Jam_, the 1996 Looney Tunes and MJ classic. Write your own code using the preset `spacejam` variable to answer the following questions:**
- What was the film's budget?
- Is the movie a thriller?
- What character does Michael Jordan play?
- How many crew members were there? 

For an extra challenge answer each of these questions by traversing the tree programmatically (with iteration or recursion) instead of using hardwired lookups like `spacejam[0]['cast'][0]`. 

In [37]:
import json
import requests

# spacejam.json is adapted from data downloaded from the TMDB api
spacejam = json.loads(requests.get("https://raw.githubusercontent.com/christopherhuntley/BUAN5405-lessons/master/spacejam.json").text)

In [None]:
# YOUR CODE HERE

**2. **