# TMDB API

I will be using the same list of cleaned show names to pass through the TMDB API. TMDB also has an API wrapper that I will be utilizing. As with the OMDB API, I acquired a key from the website.

Source: https://www.themoviedb.org/documentation/api?language=en-US

In [1]:
import pickle
import json
import requests
import re
import time
import pandas as pd
import numpy as np 
import pprint
from collections import defaultdict

pp = pprint.PrettyPrinter(indent=2)

# Loading in the list and cleaning the show names

I want to reduce this step to not be so verbose.

In [2]:
with open ('../0_Assets_&_Data/clean_show_list.pickle', 'rb') as fp:
    clean_show_list = pickle.load(fp)

# Getting series info from OMDB

As a first step, I'm iterating through the show list and making a get request for each show. I want to get plot for some NLP/topic modeling (fullplot=True) and see if there is enough information from RottenTomatoes to be used in modeling (tomatoes=True). 

# TMDB

In [3]:
import tmdbsimple as tmdb

In [4]:
tmdb.API_KEY = 'a317708aa6c2e5d9aa0213f98af91cd7'

In [5]:
search = tmdb.Search()

Creating a 'for' loop to search through the list of ABC show names

# Function to search TMDB and get IDs

For this API (wrapper), I'll need to use the search function to check that the name in my list of shows is in the TMDB database, then grab the tmdb ID to query for the full series data. They appear to use their own proprietary/internal ID naming format, which makes the IMDB IDs that I had previously collected not applicable here.

For the first step, I will grab the search results from the database.

In [6]:
def search_tmdb(link_list):
    tmdb_dict = {}
    count = 0
    for i in link_list:
        tmdb_dict[i] = search.tv(query=i)
        count += 1
        if count % 1000 == 0:
            print("Currently pulling: ", count)
    return tmdb_dict

In [8]:
tmdb_show_search = search_tmdb(clean_show_list)

In [None]:
search.tv(query='The Good Doctor')

# Grabbing TMDB IDs in a dict

With the dictionary of TMDB's search results prepared, I can now iterate through those results and grab the TMDB ID.

In [None]:
def get_tmdb_id(tmdb_dict):
    tmdb_id = {}
    bad_shows = []
    count = 0
    for i in tmdb_dict:
        if tmdb_dict[i]['results']:
            tmdb_id[i] = tmdb_dict[i]['results'][0]['id']
            count += 1
            if count % 250 == 0:
                print("Currently pulling: ", count)
        elif not tmdb_dict[i]['results']:
            bad_shows.append(i)
    return tmdb_id

An issue that I encountered while using this method of searching for the show by name in order to grab ID is the instances of duplicate show/movie names. It would be too tedious to check each result to ensure that the correct name is pulled, so I decided to default to the first result. This is not ideal, of course; I would think to make an exception in the conditional statement to capture searches that have more than 1 result.

In [None]:
tmdb_show_id = get_tmdb_id(tmdb_show_search)

Using the TMDB ID, I can now query the API to pull the series/season information. I'll need to create a function that can iterate through the dictionary of IDs (On second thought, a list of IDs may suffice?) to grab the season information into a new dictionary - the function will also need to be able to iterate through 'season_number' in order to capture all relevant information. I may need to use the actual API instead of this wrapper, as the res.status_code would be a very easy method for looping on. 

Otherwise, I may be able to grab the season number from the series info (above) and use that as the max value of range(1, max).

# Grabbing TMDB info

Using the TMDB ID, I can now query the API to pull the series/season information. The following function will iterate through the previously created dictionary of IDs to grab the season information into a new dictionary - the function will also need to be able to iterate through 'season_number' in order to capture all relevant information. This is a situation in which querying the API directly would have been beneficial, as the res.status_code would be a very easy conditional for looping on. 

In [None]:
def get_tmdb_show(link_dict):
    tmdb_show_info = {}
    for key in link_dict:
        try:
            tmdb_show_info[key] = tmdb.TV(id=link_dict[key]).info()
        except:
            continue
    return tmdb_show_info

In [None]:
tmdb_series_dict = get_tmdb_show(tmdb_show_id)

In [None]:
tmdb_series_dict.keys()

In [None]:
with open('../0_Assets_&_Data/tmdb_series_dict.json', 'w') as fp:
    json.dump(tmdb_series_dict, fp)

In [None]:
with open('../0_Assets_&_Data/tmdb_show_info.json', 'w') as fp:
    json.dump(tmdb_show_info, fp)