# TVDB API

The last API that I'll be utilizing is the TVDB API. This particular API doesn't have a Python Wrapper, so I'll be accessing it via Requests.

In [1]:
import pickle
import json
import requests
import re
import time
import pandas as pd
import numpy as np 
import pprint
from collections import defaultdict

pp = pprint.PrettyPrinter(indent=2)

# Loading in the list and cleaning the show names

In [2]:
with open ('../0_Assets_&_Data/clean_show_list.pickle', 'rb') as fp:
    clean_show_list = pickle.load(fp)

# API

This API only permits validation tokens per API key that are valid for 24-hours, so I will need to either make a POST request to refresh the token everyday or otherwise request a new key. Because I am accessing the API so infrequently, I will be requesting a new token every time I access the API rather than having a script to make the refresh request daily. 

In [17]:
tvdb_url = "https://api.thetvdb.com"

headers = {
  "apikey": "I4NH1SXHARPKRI2P",
  "userkey": "J3E04YJ3CJ2KT7UO",
  "username": "ericyyoo0016tf"
}

In [18]:
login_url = tvdb_url + "/login"
y = requests.post(login_url, json=headers)
y.status_code

200

In [19]:
token = y.text
token

'{"token":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE1NDAwMzk2NjAsImlkIjoiIiwib3JpZ19pYXQiOjE1Mzk5NTMyNjAsInVzZXJpZCI6NTEyMTc1LCJ1c2VybmFtZSI6ImVyaWN5eW9vMDAxNnRmIn0.sZjG2c1mu2r8Zo1y91WX2FrK9kUe5MmzBBiXYW4ACC6CSlXUga3CrFF6FP_yIv5eqQNKG-5cWEN0e8vV0FmhL2GyC5IC8AcS8eili5DTrhfIc6rNPIVG3DRRJZzDHYdmUbhTqs1NhB6U_La600u6V3R1P8aup-C5U_7KhyyJuGBxDNOkCGkZ8M-31aW52AC182OCWcuM8hMhkgi_VHC3SdSH4f9Co2sMNPva_mC1d0UWan1TCW9ui7pMpMXHbKMP9ecWAD43jx0qI3rpkiHKJyxClQGiY91tUg1olYU7k1HTx2KHqID8SfO55RorW_N2NlkX9t30fPOnPn8tHMMVcw"}'

In [20]:
headers2 = {"Authorization":"Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE1NDAwMzk2NjAsImlkIjoiIiwib3JpZ19pYXQiOjE1Mzk5NTMyNjAsInVzZXJpZCI6NTEyMTc1LCJ1c2VybmFtZSI6ImVyaWN5eW9vMDAxNnRmIn0.sZjG2c1mu2r8Zo1y91WX2FrK9kUe5MmzBBiXYW4ACC6CSlXUga3CrFF6FP_yIv5eqQNKG-5cWEN0e8vV0FmhL2GyC5IC8AcS8eili5DTrhfIc6rNPIVG3DRRJZzDHYdmUbhTqs1NhB6U_La600u6V3R1P8aup-C5U_7KhyyJuGBxDNOkCGkZ8M-31aW52AC182OCWcuM8hMhkgi_VHC3SdSH4f9Co2sMNPva_mC1d0UWan1TCW9ui7pMpMXHbKMP9ecWAD43jx0qI3rpkiHKJyxClQGiY91tUg1olYU7k1HTx2KHqID8SfO55RorW_N2NlkX9t30fPOnPn8tHMMVcw"}


## Searching titles to grab ID

As with the TMDB API, I will need to query the show names first to grab the TVDB ID in order to search the rest of the API. The function below iteratively updates the parameters that the requests.get method requires to grab each new show and throw it into another dictionary. 

In [21]:
def search_tvdb(link_list):
    tvdb_dict = {}
    count = 0

    for i in link_list:
        search_url = tvdb_url + "/search/series"
        params = {"name":i}
        tvdb_dict[i] = requests.get(search_url, 
                                    params=params, 
                                    headers=headers2
                                   ).json()
        count += 1
        if count % 1000 == 0:
            print("Current progress: ", count, " out of ", len(link_list))
    return tvdb_dict

In [23]:
#tvdb_search = search_tvdb(clean_show_list)

In [24]:
len(tvdb_search)

3733

In [None]:
tvdb_dict['The Good Doctor']

# Grab TVDB ID

Once I have the search results, I can extract the relevant information - in this case, that would be the 'id' feature. I will then be able to iterate through the IDs for the full show information.

In [25]:
def get_tvdb_id(link_dict):
    tvdb_id = {}

    for item in link_dict:
        if link_dict[item].get('Error'):
            continue
        else:
            tvdb_id[item] = link_dict[item]['data'][0]['id']
    return tvdb_id

In [26]:
tvdb_show_id = get_tvdb_id(tvdb_search)

Because this list was from a search, I ended up with some entries that pulled multiple results, and the one I wanted was the second in the list. For the time being, I will be forgoing this additional cleaning step.

## Series using ID

Once I have the ID from the above list, I can use it to query the API for the Series information. 

In [27]:
series_url = tvdb_url + "/series/" + '271683'
series_url

'https://api.thetvdb.com/series/271683'

In [28]:
res2 = requests.get(series_url, headers=headers2)

In [29]:
ga_series = res2.json()
pp.pprint(ga_series)

{ 'data': { 'added': '2013-07-21 02:12:20',
            'addedBy': 235,
            'airsDayOfWeek': 'Monday',
            'airsTime': '22:00',
            'aliases': ['The Good Doctor (KR) '],
            'banner': 'graphical/271683-g2.jpg',
            'firstAired': '2013-08-05',
            'genre': ['Drama', 'Family'],
            'id': 271683,
            'imdbId': 'tt3184708',
            'lastUpdated': 1539733721,
            'network': 'KBS TV2',
            'networkId': '',
            'overview': 'A medical drama based in the pediatrics '
                        'department.\r\n'
                        'This drama will draw the story of a young man, Park '
                        'Shi-on with Idiot Savant Syndrome who overcomes '
                        'obstacles to become a pediatric surgeon. He is a '
                        'pediatrician who despite his developmental '
                        'disabilities is a medical genius. He is a gifted '
                        'do

In [30]:
ga_series['data']

{'id': 271683,
 'seriesName': 'Good Doctor',
 'aliases': ['The Good Doctor (KR) '],
 'banner': 'graphical/271683-g2.jpg',
 'seriesId': '',
 'status': 'Ended',
 'firstAired': '2013-08-05',
 'network': 'KBS TV2',
 'networkId': '',
 'runtime': '65',
 'genre': ['Drama', 'Family'],
 'overview': 'A medical drama based in the pediatrics department.\r\nThis drama will draw the story of a young man, Park Shi-on with Idiot Savant Syndrome who overcomes obstacles to become a pediatric surgeon. He is a pediatrician who despite his developmental disabilities is a medical genius. He is a gifted doctor but has trouble with other areas of his life, such as relating to people socially. \r\nMeanwhile, Cha Yoon-seo is a pediatric surgical fellow and Kim Do-han is the best pediatric surgeons in Korea. He will finds himself frequently in confrontation with Park Shi-on.',
 'lastUpdated': 1539733721,
 'airsDayOfWeek': 'Monday',
 'airsTime': '22:00',
 'rating': '',
 'imdbId': 'tt3184708',
 'zap2itId': '',
 'a

Information to keep:
- id (for tracking - unless this is the key?)
- seriesName
- status
- firstAired
- network
- runtime
- genre
- overview
- airsDayOfWeek
- airsTime
- rating

# Cleaning the TVDB dictionary

The previous list I pulled had some empty results

In [31]:
def get_clean_tvdb(link_dict):
    clean_tvdb_dict = {}
    count = 0

    for i in link_dict:
        try:
            clean_tvdb_dict[i] = link_dict[i]['data']
            count += 1
            if count % 250 == 0:
                print("Currently pulling: ", count)
        except:
            continue
    return clean_tvdb_dict

In [32]:
tvdb_search_clean = get_clean_tvdb(tvdb_search)

In [33]:
len(tvdb_search_clean)

0

# Clean TVDB

In [34]:
def get_clean_tvdb_dict(link_dict):
    count = 0
    tvdb_subkey = {}
    tvdb_series_ = {}
    temp_list = []
    failures = []
    for i in link_dict: # Accessing dictionary where i = Show Name
        series_url = tvdb_url + '/series/' + str(link_dict[i][0]['id'])
        tvdb_series_[i] = requests.get(series_url, headers=headers2).json()
    return tvdb_series_

In [35]:
tvdb_show_info = get_clean_tvdb_dict(tvdb_search_clean)

In [36]:
with open('../0_Assets_&_Data/tvdb_show_info.json', 'w') as fp:
    json.dump(tvdb_show_info, fp)

# TVDB Episode Info

The next step would be to get the individual episode information per series from this API. Because this would be an extensive list/dictionary, I will be querying this on AWS into a Postgres server rather than locally.

In [38]:
def get_tvdb_series(show_dict):
    count = 0
    tvdb_subkey = {}
    tvdb_episodes = {}
    tvdb_series = {}
    temp_list = []
    failures = []
    for i in show_dict: # Accessing dictionary where i = Show Name
        ep_url = tvdb_url + '/episodes/' + str(show_dict[i][0]['id'])
        tvdb_subkey[show_dict[i][0]['id']] = requests.get(ep_url, headers=headers2).json()
        tvdb_series[i] = tvdb_subkey
        count += 1
    return tvdb_series

In [39]:
tvdb_series_info = get_tvdb_series(tvdb_search_clean)

In [40]:
len(tvdb_series_info)

0

# Next Steps

Once I have obtained all the relevant information from the APIs, I will want to save them as individual dictionaries/sparse dataframes.

In [None]:
with open('../0_Assets_&_Data/tvdb_dict.json', 'w') as fp:
    json.dump(tvdb_dict, fp)

In [None]:
with open('../0_Assets_&_Data/tvdb_series.json', 'w') as fp:
    json.dump(tvdb_series, fp)

In [None]:
with open('../0_Assets_&_Data/tvdb_episodes.json', 'w') as fp:
    json.dump(tvdb_episodes, fp)