## About Me 

This notebook is focused on using and querying TMDB for movies efficiently 

We start by using the logic in eda1.parse netflix data to split title into sections and identify which show are TV Show vs Movies. 

Next we take the movie list and search using the movie API and select the fields we want from the respoonse. 

The final logic can be found in the **get_movie_API_results** function 

Alternate idea: 
* Create database of all previously searched... no need to recall tmdb API --- can just search ones already archived 
- Not necessary right now

In [49]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import tmdbsimple as tmdb
import os
import sys
import pickle
import time
from tqdm import tqdm, tqdm_notebook
tqdm.pandas()

In [16]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [17]:
sys.path.append(os.path.abspath('../src'))

In [18]:
with open('../Data/api_key.pkl', 'rb') as hnd:
    tmdb.API_KEY = pickle.load(hnd)['api_key']

In [19]:
data = pd.read_csv('../Data/NetflixViewingHistory.csv')

In [20]:
import gather_data_deprecated as eda1

In [21]:
netflix_df = eda1.parse_netflix(data)

Total number of TV Show + Movies:  1405
TV Show vs Movie
Dataframe shape:  (1405, 6)


In [22]:
shows = netflix_df[netflix_df['TV_Show_flag'] == 'TV Show']
movies = netflix_df[netflix_df['TV_Show_flag'] == 'Movie']

In [23]:
search= tmdb.Search()

## Start with Movies

In [24]:
movies.head()

Unnamed: 0,Title,Date,Show Name,Season,Episode Name,TV_Show_flag
3,Trevor Noah: Son of Patricia,2018-11-23,Trevor Noah,Son of Patricia,,Movie
19,Captain Underpants: The First Epic Movie,2018-06-07,Captain Underpants,The First Epic Movie,,Movie
66,Saving Capitalism,2017-12-07,Saving Capitalism,,,Movie
67,Betting on Zero,2017-12-07,Betting on Zero,,,Movie
81,Banking on Bitcoin,2017-11-04,Banking on Bitcoin,,,Movie


In [25]:
row1 = movies.iloc[0]
row1

Title           Trevor Noah: Son of Patricia
Date                     2018-11-23 00:00:00
Show Name                        Trevor Noah
Season                       Son of Patricia
Episode Name                            None
TV_Show_flag                           Movie
Name: 3, dtype: object

In [26]:
row2 = movies.iloc[1] #6
row2

Title           Captain Underpants: The First Epic Movie
Date                                 2018-06-07 00:00:00
Show Name                             Captain Underpants
Season                              The First Epic Movie
Episode Name                                        None
TV_Show_flag                                       Movie
Name: 19, dtype: object

In [27]:
search_results = search.movie(query=row2['Title'])
n_results = len(search_results['results'])
print(n_results)
temp_id = search_results['results'][0]['id']
full_movie_results = tmdb.Movies(temp_id)

1


In [28]:
full_movie_results.info()

{'adult': False,
 'backdrop_path': '/5ceIqlRzWH4x1IPU9OfjXHO2Kz6.jpg',
 'belongs_to_collection': None,
 'budget': 38000000,
 'genres': [{'id': 28, 'name': 'Action'},
  {'id': 16, 'name': 'Animation'},
  {'id': 35, 'name': 'Comedy'},
  {'id': 10751, 'name': 'Family'}],
 'homepage': 'http://www.foxmovies.com/movies/captain-underpants-the-first-epic-movie',
 'id': 268531,
 'imdb_id': 'tt2091256',
 'original_language': 'en',
 'original_title': 'Captain Underpants: The First Epic Movie',
 'overview': 'Two mischievous kids hypnotize their mean elementary school principal and turn him into their comic book creation, the kind-hearted and elastic-banded Captain Underpants.',
 'popularity': 26.675,
 'poster_path': '/AjHZIkzhPXrRNE4VSLVWx6dirK9.jpg',
 'production_companies': [{'id': 521,
   'logo_path': '/kP7t6RwGz2AvvTkvnI1uteEwHet.png',
   'name': 'DreamWorks Animation',
   'origin_country': 'US'}],
 'production_countries': [{'iso_3166_1': 'US',
   'name': 'United States of America'}],
 'releas

In [29]:
normal_movie_fields = ['budget', 'genres', 'homepage', 'imdb_id', 'overview', 'popularity'\
                       , 'release_date', 'revenue', 'runtime', 'vote_average', 'vote_count']

In [30]:
set(normal_movie_fields).difference(set(full_movie_results.info().keys()))

set()

In [31]:
(full_movie_results.info().keys())

dict_keys(['adult', 'backdrop_path', 'belongs_to_collection', 'budget', 'genres', 'homepage', 'id', 'imdb_id', 'original_language', 'original_title', 'overview', 'popularity', 'poster_path', 'production_companies', 'production_countries', 'release_date', 'revenue', 'runtime', 'spoken_languages', 'status', 'tagline', 'title', 'video', 'vote_average', 'vote_count'])

In [55]:
def get_movie_API_results(movie_title): 
        
    # Select requested fields from response 
    normal_movie_fields = ['budget', 'homepage', 'imdb_id', 'overview', 'popularity'\
                           , 'release_date', 'revenue', 'runtime', 'vote_average', 'vote_count']
    
    # Find the Movie in TMDB 
    search_results = search.movie(query=movie_title)
    n_results = len(search_results['results'])
#     print("N Results: ", n_results)
    if n_results == 0:
        movie_results = {key:np.nan for key in normal_movie_fields}
        movie_results['Number of Search Results'] = n_results
        movie_results['title_query'] = movie_title 
        movie_results = pd.Series(movie_results)

        return(movie_results)
    
    temp_id = search_results['results'][0]['id']
    full_movie_results = tmdb.Movies(temp_id)

    assert(set(normal_movie_fields).difference(set(full_movie_results.info().keys()))== set())\
    , 'Movie result schema is missing a field'
    movie_results = {attr:getattr(full_movie_results, attr) for attr in normal_movie_fields}
    # TODO Fix genre parsing 
    
    
    # Append number of search results (incase there are multiple and we choose the wrong one)
    movie_results['Number of Search Results'] = n_results
    movie_results['title_query'] = movie_title
    movie_results = pd.Series(movie_results)
    
    time.sleep(0.3)
    return(movie_results)

In [38]:
r1 = get_movie_API_results(row1['Title'])
r1 

budget                                                                      0
homepage                               https://www.netflix.com/title/80239932
imdb_id                                                             tt9170648
overview                    Trevor Noah gets out from behind the "Daily Sh...
popularity                                                              6.902
release_date                                                       2018-11-20
revenue                                                                     0
runtime                                                                    63
vote_average                                                              7.1
vote_count                                                                 64
Number of Search Results                                                    1
title_query                                      Trevor Noah: Son of Patricia
dtype: object

In [39]:
r2 = get_movie_API_results(row2['Title'])
r2

budget                                                               38000000
homepage                    http://www.foxmovies.com/movies/captain-underp...
imdb_id                                                             tt2091256
overview                    Two mischievous kids hypnotize their mean elem...
popularity                                                             26.675
release_date                                                       2017-06-01
revenue                                                             125289450
runtime                                                                    89
vote_average                                                              6.1
vote_count                                                                774
Number of Search Results                                                    1
title_query                          Captain Underpants: The First Epic Movie
dtype: object

In [62]:
demo_df = movies['Title'].iloc[:20].apply(get_movie_API_results)

In [63]:
demo_df.head()

Unnamed: 0,budget,homepage,imdb_id,overview,popularity,release_date,revenue,runtime,vote_average,vote_count,Number of Search Results,title_query
3,0.0,https://www.netflix.com/title/80239932,tt9170648,"Trevor Noah gets out from behind the ""Daily Sh...",6.902,2018-11-20,0.0,63.0,7.1,64.0,1,Trevor Noah: Son of Patricia
19,38000000.0,http://www.foxmovies.com/movies/captain-underp...,tt2091256,Two mischievous kids hypnotize their mean elem...,26.675,2017-06-01,125289450.0,89.0,6.1,774.0,1,Captain Underpants: The First Epic Movie
66,0.0,http://www.netflix.com/savingcapitalism,tt6185286,Former Secretary of Labor Robert Reich meets w...,6.267,2017-08-25,0.0,73.0,6.9,27.0,1,Saving Capitalism
67,0.0,,tt3762912,Controversial hedge fund titan Bill Ackman is ...,9.855,2017-03-17,0.0,99.0,7.3,71.0,1,Betting on Zero
81,100000.0,https://invisiblemoneydocumentary.wordpress.com/,tt5033790,Not since the invention of the Internet has th...,7.893,2016-12-30,0.0,90.0,6.5,64.0,2,Banking on Bitcoin


In [65]:
with open('../Data/all_movies_results_df.pkl', 'rb') as hnd:
    all_movies_results_df = pickle.load( hnd)

In [66]:
all_movies_results_df.head()

Unnamed: 0,Number of Search Results,budget,homepage,imdb_id,overview,popularity,release_date,revenue,runtime,title_query,vote_average,vote_count
0,1,0.0,https://www.netflix.com/title/80239932,tt9170648,"Trevor Noah gets out from behind the ""Daily Sh...",3.145,2018-11-20,0.0,63.0,Trevor Noah: Son of Patricia,7.1,35.0
1,1,38000000.0,http://www.foxmovies.com/movies/captain-underp...,tt2091256,Two mischievous kids hypnotize their mean elem...,11.473,2017-06-01,125289450.0,89.0,Captain Underpants: The First Epic Movie,6.0,565.0
2,1,0.0,http://www.netflix.com/savingcapitalism,tt6185286,Former Secretary of Labor Robert Reich meets w...,1.285,2017-11-21,0.0,73.0,Saving Capitalism,7.1,17.0
3,1,0.0,,tt3762912,Controversial hedge fund titan Bill Ackman is ...,2.099,2017-03-17,0.0,99.0,Betting on Zero,7.4,45.0
4,1,100000.0,https://invisiblemoneydocumentary.wordpress.com/,tt5033790,Not since the invention of the Internet has th...,2.766,2016-12-30,0.0,90.0,Banking on Bitcoin,6.5,54.0


In [67]:
all_movies_results_df.describe()

Unnamed: 0,Number of Search Results,budget,popularity,revenue,runtime,vote_average,vote_count
count,48.0,45.0,45.0,45.0,45.0,45.0,45.0
mean,3.729167,33602220.0,12.186178,154248200.0,99.977778,6.802222,2826.733333
std,5.804399,57572900.0,12.541833,314480600.0,30.735299,1.348482,4509.87172
min,0.0,0.0,0.6,0.0,25.0,0.0,0.0
25%,1.0,0.0,3.356,0.0,87.0,6.3,56.0
50%,1.0,8000000.0,10.409,3142154.0,99.0,7.0,565.0
75%,3.0,39000000.0,17.455,161025600.0,120.0,7.6,4015.0
max,20.0,220000000.0,74.008,1519558000.0,165.0,8.4,18827.0


In [68]:
all_movies_results_df.isna().sum()

Number of Search Results     0
budget                       3
homepage                    27
imdb_id                      4
overview                     3
popularity                   3
release_date                 3
revenue                      3
runtime                      3
title_query                  0
vote_average                 3
vote_count                   3
dtype: int64

In [38]:
# Missing Rate
all_movies_results_df.isna().sum()/all_movies_results_df.shape[0]

Number of Search Results    0.000000
budget                      0.062500
homepage                    0.562500
imdb_id                     0.083333
overview                    0.062500
popularity                  0.062500
release_date                0.062500
revenue                     0.062500
runtime                     0.062500
title_query                 0.000000
vote_average                0.062500
vote_count                  0.062500
dtype: float64

In [39]:
missing_movies = all_movies_results_df[all_movies_results_df['budget'].isna()]

In [40]:
missing_movies['title_query']

12    BoJack Horseman Christmas Special: Sabrina's C...
31                       Samurai Champloo: Unholy Union
45                     House of Cards: Season 1 (Recap)
Name: title_query, dtype: object

In [41]:
os.path.isfile('../Data/all_movies_results_df.pkl')

True

In [42]:
movie_df_raw = eda1.generate_movie_df(netflix_df=netflix_df)

Existing pickle exists
Number of movies:  48
Number of missing movies:  3
12    BoJack Horseman Christmas Special: Sabrina's C...
31                       Samurai Champloo: Unholy Union
45                     House of Cards: Season 1 (Recap)
Name: title_query, dtype: object


In [44]:
movie_df_raw.head()

Unnamed: 0,Number of Search Results,budget,homepage,imdb_id,overview,popularity,release_date,revenue,runtime,title_query,vote_average,vote_count
0,1,0.0,https://www.netflix.com/title/80239932,tt9170648,"Trevor Noah gets out from behind the ""Daily Sh...",3.145,2018-11-20,0.0,63.0,Trevor Noah: Son of Patricia,7.1,35.0
1,1,38000000.0,http://www.foxmovies.com/movies/captain-underp...,tt2091256,Two mischievous kids hypnotize their mean elem...,11.473,2017-06-01,125289450.0,89.0,Captain Underpants: The First Epic Movie,6.0,565.0
2,1,0.0,http://www.netflix.com/savingcapitalism,tt6185286,Former Secretary of Labor Robert Reich meets w...,1.285,2017-11-21,0.0,73.0,Saving Capitalism,7.1,17.0
3,1,0.0,,tt3762912,Controversial hedge fund titan Bill Ackman is ...,2.099,2017-03-17,0.0,99.0,Betting on Zero,7.4,45.0
4,1,100000.0,https://invisiblemoneydocumentary.wordpress.com/,tt5033790,Not since the invention of the Internet has th...,2.766,2016-12-30,0.0,90.0,Banking on Bitcoin,6.5,54.0
