# Lab 7
The topics of this week  continues to be getting data, in this case using an API to access structured data. 

In this lab notebook you will gain experience reading data from and posting to an API. 


## Lab Setup

In [None]:
import requests
import json
import datetime
import time
from io import StringIO
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl 
%matplotlib inline  

import os
if os.environ["HOME"]=='/home/jovyan':
    !pip install --upgrade otter-grader
    
import otter
grader = otter.Notebook()

## API Getting Data

So far we have seen examples of getting data from an API.  These examples make use of GET requests from the API/server. 

Making a HTTP GET request can be done using several python libraries including: 

* httplib 
* urllib 
* requests 

We have been using the `requests` module.

Let's look at another example.

## Example: Google Books

Here we will examine using the Google Books API:  
https://developers.google.com/books/docs/overview


We will be using the "volumes" resource which does not require authentication.  
https://developers.google.com/books/docs/v1/getting_started#background-operations

Specifically, we will be using the query function to search by ISBN or book numbers. 
https://developers.google.com/books/docs/v1/using#PerformingSearch




In [None]:
# api-endpoint 
url = "https://www.googleapis.com/books/v1/volumes"
  
isbn = "isbn:0553386794"

# set the parameters to be sent to the API
params = {'q': isbn}

resp = requests.get(url, params)

Look at what the response is? 

How do we then extract the data?

In [None]:
resp

In [None]:
dat = resp.json()
#dat

# First, we can print it better! 
print(json.dumps(resp.json(), indent=4)[:800])

There is a lot of information here.  Explore the structure of the JSON information. 

In [None]:
dat.keys()

In [None]:
dat['kind']

In [None]:
dat['totalItems']

In [None]:
type(dat['items'])

In [None]:
# We can look at the first item on the list 
dat['items'][0]

In [None]:
'''We can investigate the keys where information is stored for each item'''
dat['items'][0].keys()

In [None]:
# You can start building pretty long lines of code to access information deep 
#  in the structure. 
# Print out the ISBN_10 number for the book 
dat['items'][0]['volumeInfo']['industryIdentifiers'][0]['identifier']

## Exercise 1 

Which of the Game of Thrones books is longest?

Get information about each book and print out the title and number of pages.  Then, report the book title and number of pages for the book with that is the longest.  

*Note, the API may return multiple entries for each isbn.  You may use the first entry for information.  If the information is missing a page number it is likely an audiobook, and you should then use the next entry for information.  If no entry has the title and page number information return the title as "no title" and the number of pages as '-1'.*

Collect the book information -- title, number of pages -- in a nested list, `ex1list` in the for loop. 

Create a DataFrame `ex1df` from this nested list with columns of `Title` and `NumPages`. 

For the book with the most pages, report its title `longestBookTitle` and number of pages `longestBookNumPages`. 

In [None]:
''' Following is the isbn codes for Game of Thrones books. '''

isbns = ['0553386794', '0345535421', '9780345543981', '0553390570', '1101886048']

In [None]:
'''
Iterate for each isbns to finds titles and pages for each item. 
Collect this information in a list. 
Look to use "volumeInfo" to gather the information needed.
Print the title + the number of pages in the loop. 

Outside the loop:
- Convert the list to a DataFrame, ex1df, column names 'Title' and 'NumPages' 
- Report longestBookTitle and longestBookNumPages.
'''

ex1list = [] 

for i in isbns: 
    params = {'q': 'isbn:' + i}
    resp = ...
    
    
    print(title + " has " + str(pages) + " pages.")
    
ex1df = ... 

longestBookTitle = ... 
longestBookNumPages = ... 

In [None]:
grader.check("q1")

## Example: iTunes Content 

Apple has a simple [API](https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/iTuneSearchAPI/Searching.html#//apple_ref/doc/uid/TP40017632-CH5-SW1) for looking up iTunes content.

In [None]:
# api-endpoint
url = 'https://itunes.apple.com/search'

# For example let's search for lord of the rings ebooks 
params = {'term': 'lord+of+the+rings', 'entity': 'ebook', 
         'limit': 3}

resp = requests.get(url, params)

In [None]:
resp

In [None]:
resp.json()

## Exercise 2

Search for the 50 "The Expanse" e-books (search may return fewer or slightly more). 

Create a DataFrame from the responses containing the `TrackName`, `TrackID`, `Price`, `AveRating`, `NumRating`. 

Sort the results from highest to lowest of `AveRating`, then by `NumRating`.

If any of the information you are meant to collect is missing, replace with `NaN`

In [None]:
url = 'https://itunes.apple.com/search'

# """ For example let's search for "The Expanse" ebooks """

params = {'term': 'expanse', 'entity': 'ebook', 'limit': 50}
resp = requests.get(url, params) 

#resp.json()

In [None]:
obj = json.loads(resp.text)
#obj       # comment out to explore, leave commented before submission

Try using at least two approaches to create the DataFrame, e.g., 

* *Method 1* - Keep track of rows in a list, convert nested lists to DataFrame.  Note, do not create an empty DataFrame and append entries in an iterator (this is not scalable)  
https://stackoverflow.com/questions/13784192/creating-an-empty-pandas-dataframe-and-then-filling-it/41529411#41529411
* *Method 2* - Use pandas `read_json` function to convert JSON to pandas object
* *Method 3* - Use `json_normalize` function to read in JSON to a flat table. 
The `json_normalize` function normalizes a semi-structured JSON data object into a flat table.   
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

<!-- BEGIN QUESTION -->



In [None]:
# State which method you are using: 
#  Method ???



q2df1 = ...

print(q2df1.shape)
q2df1.head()

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->



In [None]:
# State which method you are using: 
#  Method ??? 

q2df2 = ...


print(q2df2.shape)
q2df2.head()

<!-- END QUESTION -->

## Example: TV Shows 

Here we can use an API on tv show information:  
http://api.tvmaze.com/

In [None]:
# We can find the tvmaze id for a show based on the IMDB id. 
id_bcs = 'tt3032476'
resp = requests.get('http://api.tvmaze.com/lookup/shows?imdb=' + id_bcs)

In [None]:
resp.json()

## Exercise 3

Let's consider the 5 most viewed shows on Netflix (from their [2024 engagment report](https://www.tvguide.com/galleries/the-most-watched-netflix-shows-2024/)) as well as several shows that won Emmy's in 2024. 

For each show get information on the episodes. 

Consider using the endpoint - http://www.tvmaze.com/api#show-episode-list

Create a DataFrame, `q3df`, that reports for each show and season the number of episodes, the min, mean, and max running time as well as the min, mean, and max rating over the episodes that season. 

The DataFrame should have columns: `ShowName`, `Season`, `Num_Eps`, `Min_Run`, `Mean_Run`, `Max_Run`, `Min_Rating`, `Mean_Rating`, `Max_Rating`.  

In your solution, but in a cooling period of 2-5 seconds between API calls. You may want to look at using `time.sleep`


In [None]:
imdb_ids = ['tt5611024', 'tt8740790', 'tt13649112', 'tt13210838', 'tt9018736', 
           'tt11815682', 'tt2788316', 'tt5875444', 'tt14452776']

In [None]:
# Create a DataFrame "q3df"

q3df

In [None]:
grader.check("q3")

## Congratulations! You have finished Lab7! 

### Submission Instructions

Below, you will see a cell. Running this cell will automatically generate a zip file with your autograded answers. Once you submit this file to the Lab 7 assignment on Gradescope. 


Make sure you have run all cells in your notebook **in order** before running the cell below. The cell below will generate a zip file for you to submit. **Please save before exporting!**

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)