<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Practice Using APIs

_Authors: Dave Yerrington (SF), Sam Stack (DC)_

---

In this lab, we'll practice using some popular APIs to retrieve and store data.

In [1]:
# Imports at the top.
import json
import urllib
import pandas as pd
import numpy as np
import requests
import json
import re
import matplotlib.pyplot as plt
%matplotlib inline

## Exercise 2: IMDb TV Shows

---

Sometimes an API doesn't provide all of the information we'd like and we need to get creative.

Here we'll use a combination of scraping and API calls to find the ratings and networks of famous television shows.

### 3.A Get the Top TV Shows

IMDb contains data about movies and TV shows. Unfortunately, it doesn't have a public API.

The page http://www.imdb.com/chart/toptv/?ref_=nv_tp_tv250_2 contains the list of the top 250 television shows of all time. Retrieve the page using the `requests` library and then parse the HTML to obtain a list of the `television_ids` for these shows. You can parse it with regular expression or by using a library like `BeautifulSoup`.

> **Hint:** television_ids look like this: `tt2582802`.
> _Everything after "/title/" and before "/?"_

In [13]:
response = requests.get('http://www.imdb.com/chart/toptv/?ref_=nv_tp_tv250_2')

In [14]:
def get_top_250():
    response = requests.get('http://www.imdb.com/chart/toptv/?ref_=nv_tp_tv250_2')
    html = response.text
    # Use the greedy version to find everything after title to the next backslash in the a href element.
    entries = re.findall("<a href.*?/title/(.*?)/", html) 
    # Create a list of the top 250 results.
    return list(set(entries))

In [15]:
entries = get_top_250()

In [16]:
len(entries)

251

In [17]:
entries[0]

'tt1227926'

### 3.B Get Data on the Top TV Shows

Although IMBb doesn't have a public API, an open API exists at http://www.tvmaze.com/api.

Use this API to retrieve information about each of the 250 TV shows you extracted in the previous step.
1) Check the documentation of TVmaze's API to learn how to request show data by ID.
- Define a function that returns a Python object with select information for a given ID.
    - Show name.
    - Rating (avg).
    - Genre(s).
    - Network name.
    - Premiere date.
    - Status.
> Tip: The JSON object can easily be converted into a Python dictionary.

- Store the gathered information in a Pandas DataFrame.

Because the target information is in a JSON format, you'll need `json.loads(res.text)` in order to gather it.

Here's an example of the information and how we can interact with it:

In [18]:
# Example URL.
res=requests.get('http://api.tvmaze.com/lookup/shows?imdb=tt0944947')

# Status code.
print(res.status_code)

# Just the contents of the name element.
print(json.loads(res.text).get('name'))

# The entire contents.
print(json.loads(res.text))

200
Game of Thrones
{'id': 82, 'url': 'http://www.tvmaze.com/shows/82/game-of-thrones', 'name': 'Game of Thrones', 'type': 'Scripted', 'language': 'English', 'genres': ['Drama', 'Adventure', 'Fantasy'], 'status': 'Running', 'runtime': 60, 'premiered': '2011-04-17', 'officialSite': 'http://www.hbo.com/game-of-thrones', 'schedule': {'time': '21:00', 'days': ['Sunday']}, 'rating': {'average': 9.3}, 'weight': 99, 'network': {'id': 8, 'name': 'HBO', 'country': {'name': 'United States', 'code': 'US', 'timezone': 'America/New_York'}}, 'webChannel': {'id': 22, 'name': 'HBO Go', 'country': {'name': 'United States', 'code': 'US', 'timezone': 'America/New_York'}}, 'externals': {'tvrage': 24493, 'thetvdb': 121361, 'imdb': 'tt0944947'}, 'image': {'medium': 'http://static.tvmaze.com/uploads/images/medium_portrait/143/359013.jpg', 'original': 'http://static.tvmaze.com/uploads/images/original_untouched/143/359013.jpg'}, 'summary': '<p>Based on the bestselling book series <i>A Song of Ice and Fire</i> 

In [20]:
# Function to pull information from the API converting JSON into a Python dictionary element.
def get_entry(entry):
    res=requests.get('http://api.tvmaze.com/lookup/shows?imdb='+entry)
    if res.status_code == 200:
        results = json.loads(res.text)
        
        try:    
            status = results['status']
        except TypeError:
            status = 'NA'   
        try:
            rating = results['rating']['average']
        except TypeError:
            rating = 'NA'
        try:
            network = results['network']['name']
        except TypeError:
            network = 'NA'
        try:   
            title = results['name']
        except TypeError:
            title = 'NA'
        try:   
            genres = results['genres']
        except TypeError:
            genres = 'NA'
        try:   
            premier = results['premiered']
        except TypeError:
            premier = 'NA'
        shows_df.loc[len(shows_df)] = [title, rating, genres, network, premier, status]

In [21]:
# In both functions, we're looking for specific elements. If an element is missing, an error will return — thus the need
# for try and except statements.

In [22]:
shows_df= pd.DataFrame( columns = ['show_name', 'rating_avg', 'genres', 'network', 'premiere_date', 'status'])

for entry in entries:
    get_entry(entry)

In [23]:
shows_df.head()

Unnamed: 0,show_name,rating_avg,genres,network,premiere_date,status
0,Dr. Horrible's Sing-Along Blog,8.7,"[Comedy, Music, Science-Fiction]",,2008-07-15,Ended
1,Chef's Table,8.8,[Food],,2015-04-26,Running
2,Altered Carbon,8.8,"[Drama, Action, Crime, Science-Fiction, Thriller]",,2018-02-02,Running
3,Happy Valley,8.2,"[Drama, Crime]",BBC One,2014-04-29,Running
4,True Detective,8.5,"[Drama, Crime, Thriller, Mystery]",HBO,2014-01-12,Running


In [24]:
shows_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 235 entries, 0 to 234
Data columns (total 6 columns):
show_name        235 non-null object
rating_avg       229 non-null float64
genres           235 non-null object
network          235 non-null object
premiere_date    235 non-null object
status           235 non-null object
dtypes: float64(1), object(5)
memory usage: 12.9+ KB
