# Movies Recommender System

This is the second part of my Springboard Capstone Project on Movie Data Analysis and Recommendation Systems. In my first notebook ( The Story of Film ) https://www.kaggle.com/rounakbanik/the-story-of-film/, I attempted at narrating the story of film by performing an extensive exploratory data analysis on Movies Metadata collected from TMDB. I also built two extremely minimalist predictive models to predict movie revenue and movie success and visualise which features influence the output (revenue and success respectively).

In this notebook, I will attempt at implementing a few recommendation algorithms (content based, popularity based and collaborative filtering) and try to build an ensemble of these models to come up with our final recommendation system. With us, we have two MovieLens datasets.

**This dataset is an ensemble of data collected from TMDB and GroupLens.** The Movie Details, Credits and Keywords have been collected from the TMDB Open API. This product uses the TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional movies, actors and actresses, crew members, and TV shows. You can try it for yourself 

**The Full Dataset:**
Consists of 26,000,000 ratings and 750,000 tag applications applied to 45,000 movies by 270,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags.

**The Small Dataset:**
Comprises of 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users.
I will build a Simple Recommender using movies from the Full Dataset whereas all personalised recommender systems will make use of the small dataset (due to the computing power I possess being very limited). As a first step, I will build my simple recommender system.

#### Inspiration

This dataset was assembled as part of my second Capstone Project for Springboard's Data Science Career Track. I wanted to perform an extensive EDA on Movie Data to narrate the history and the story of Cinema and use this metadata in combination with MovieLens ratings to build various types of Recommender Systems.

Both my notebooks are available as kernels with this dataset: The Story of Film and Movie Recommender Systems

Some of the things you can do with this dataset: Predicting movie revenue and/or movie success based on a certain metric. What movies tend to get higher vote counts and vote averages on TMDB? Building Content Based and Collaborative Filtering Based Recommendation Engines.

**Context**

These files contain metadata for all 45,000 movies listed in the Full MovieLens Dataset. The dataset consists of movies released on or before July 2017. Data points include cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages.

This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website.


**Content**

This dataset consists of the following files:

movies_metadata.csv: 
The main Movies Metadata file. Contains information on 45,000 movies featured in the Full MovieLens dataset. Features include posters, backdrops, budget, revenue, release dates, languages, production countries and companies.

keywords.csv: 
Contains the movie plot keywords for our MovieLens movies. Available in the form of a stringified JSON Object.

credits.csv: 
Consists of Cast and Crew Information for all our movies. Available in the form of a stringified JSON Object.

links.csv: 
The file that contains the TMDB and IMDB IDs of all the movies featured in the Full MovieLens dataset.

links_small.csv: 
Contains the TMDB and IMDB IDs of a small subset of 9,000 movies of the Full Dataset.

ratings_small.csv: 
The subset of 100,000 ratings from 700 users on 9,000 movies.

The Full MovieLens Dataset consisting of 26 million ratings and 750,000 tag applications from 270,000 users on all the 45,000 movies in this dataset can be accessed here https://grouplens.org/datasets/movielens/latest/

In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from ast import literal_eval
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
from surprise import Reader, Dataset, SVD, evaluate

import warnings; warnings.simplefilter('ignore')

## 1) Simple Recommender
The Simple Recommender offers **generalized recommendations to every user based on movie popularity and (sometimes) genre.** 

The basic idea behind this recommender is that movies that are more popular and more critically acclaimed will have a higher probability of being liked by the average audience. This model does not give personalized recommendations based on the user.

The implementation of this model is extremely trivial. **All we have to do is sort our movies based on ratings and popularity and display the top movies of our list. As an added step, we can pass in a genre argument to get the top movies of a particular genre.**


In [2]:
md = pd. read_csv('the-movies-dataset/movies_metadata.csv')
print(md.columns)
md.head(2)


Index(['adult', 'belongs_to_collection', 'budget', 'genres', 'homepage', 'id',
       'imdb_id', 'original_language', 'original_title', 'overview',
       'popularity', 'poster_path', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title', 'video',
       'vote_average', 'vote_count'],
      dtype='object')


Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0


In [3]:
md['genres'][0]

"[{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]"

In [4]:
md['genres'].apply(lambda x: print(x))

[{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name

[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}]
[]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 

[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 10402, 'name': 'Music'}, {'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 10751, 'name': 'Family'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}, {'id': 18, 'name': 'Drama'}]
[]
[{'id': 12, 'name': 'Adventure'}, {'id': 28, 'name': 'Action'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}, {'id': 14, 'name': 'Fantasy'}]
[{'id': 35

[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10769, 'name': 'Foreign'}]
[]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 878, 'name': 'Science Fiction'}, {'id': 18, 'name': 'Drama'}]
[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 80, 'name': 'Crime'}, {'id': 35, 'name': 'Comedy'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 28, 'name': 'Action'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 9648, 'name': 'Mystery'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}, {'id': 9648, 'name': 'Mystery'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]


[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 80, 'name': 'Crime'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 37, 'name': 'Western'}, {'id': 9648, 'name': 'Mystery'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10752, 'name': 'War'}, {'id': 37, 'name': 'Western'}]
[{'id': 10752, 'name': 'War'}, {'id': 18, 'name': 'Drama'}, {'id': 36, 'name': 'History'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}, {'id': 37, 'name': 'Western'}]
[{'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}, {'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}, {'id': 10749, 'name': 'Romance'}, {'id': 10402, 'name': 'Music'}]
[{'id': 35, 'name': 'Comedy'}, {'id'

[{'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}, {'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 878, 'name': 'Science Fiction'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}, {'id': 10402, 'name': 'Music'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}, {'id': 10402, 'name': 'Music'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'name': 'Drama'}, {'id': 27, 'name': 'Horror'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Ad

[{'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 27, 'name': 'Horror'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 10752, 'name': 'War'}, {'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 

[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 878, 'name': 'Science Fiction'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 27, 'name': 'Horror'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 37, 'name': 'Western'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 28, 'name': 'Action'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
[{'id': 35, 'name': 'Comedy

[{'id': 28, 'name': 'Action'}, {'id': 18, 'name': 'Drama'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 9648, 'name': 'Mystery'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 27, 'name': 'Horror'}, {'id': 9648, 'name': 'Mystery'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}, {'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 16, 'name': 'Animation'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 27, 'name': 'Horror'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10752, 'name': 'War'}]
[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 80, 'name':

[{'id': 14, 'name': 'Fantasy'}, {'id': 27, 'name': 'Horror'}]
[{'id': 37, 'name': 'Western'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 14, 'name': 'Fantasy'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 18, 'name': 'Drama'}, {'id': 10752, 'name': 'War'}]
[{'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 10402, 'name': 'Music'}, {'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'name': 'Drama'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 10752, 'name'

[{'id': 10749, 'name': 'Romance'}, {'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 80, 'name': 'Crime'}, {'id': 37, 'name': 'Western'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10402, 'name': 'Music'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10402, 'name': 'Music'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 14, 'name': 'Fantasy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}, {'id': 10402, 'name': 'Music'}, {'id': 10749, 'name': 'Romance'}, {'id': 37, 'name': 'Western'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 37, 'name': 'Western'}]
[{'id': 28, 'n

[{'id': 18, 'name': 'Drama'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 18, 'name': 'Drama'}]
[{'id': 16, 'name': 'Animation'}, {'id': 10751, 'name': 'Family'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}, {'id': 14, 'name': 'Fantasy'}, {'id': 9648, 'name': 'Mystery'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 16, 'name': 'Animation'}, {'id': 10751, 'name': 'Family'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 37, 'name': 'Western'}, {'id': 18, 'name': 'Drama'}, {'id': 10769, 'name': 'Foreign'}, {'id': 36, 'name': 'History'}]
[{'id': 14, 'name': 'Fantasy'}, {'id': 18, 'name': 'Drama'}, {'id': 878, 'name': 'Science Fiction'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 9648, 'name': 'Mystery'}, {'id': 27, 'name': 'Horror'}]
[{'id': 18, 'name': 'Drama'}, {'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}, {'id': 9648, 'name': 'Mystery'}]
[{'

[]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 27, 'name': 'Horror'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}, {'id': 10749, 'name': 'Romance'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 36, 'name': 'History'}, {'id': 10752, 'name': 'War'}, {'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10402, 'name': 'Music'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10402, 'name': 'Music'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}, {'id': 10402, 'name': 'Music'}]
[{'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Rom

[{'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}, {'id': 36, 'name': 'History'}, {'id': 10752, 'name': 'War'}]
[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}, {'id': 37, 'name': 'Western'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10402, 'name': 'Music'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 10752, 'name': 'War'}, {'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}, {'id': 14, 'name': 'Fantasy'}, {'id': 878, 'name': 'Science Fiction'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy

[{'id': 18, 'name': 'Drama'}, {'id': 36, 'name': 'History'}, {'id': 80, 'name': 'Crime'}, {'id': 53, 'name': 'Thriller'}, {'id': 28, 'name': 'Action'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}, {'id': 37, 'name': 'Western'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10402, 'name': 'Music'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 99, 'name': 'Documentary'}]
[{'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 16, 'name': 'Animation'}, {'id': 10751, 'name': 'Family'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[

[{'id': 35, 'name': 'Comedy'}]
[{'id': 14, 'name': 'Fantasy'}, {'id': 28, 'name': 'Action'}, {'id': 27, 'name': 'Horror'}]
[{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 80, 'name': 'Crime'}, {'id': 27, 'name': 'Horror'}, {'id': 9648, 'name': 'Mystery'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 36, 'name': 'History'}, {'id': 18, 'name': 'Drama'}, {'id': 10402, 'name': 'Music'}]
[{'id': 28, 'name': 'Action'}, {'id': 18, 'name': 'Drama'}, {'id': 9648, 'name': 'Mystery'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}, {'id': 53, 'name': 'Thriller'}

[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 18, 'name': 'Drama'}, {'id': 37, 'name': 'Western'}, {'id': 80, 'name': 'Crime'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10752, 'name': 'War'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 10751, 'name': 'Family'}, {'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}, {'id': 27, 'name': 'Horror'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10752, 'name': 'War'}]
[{'id': 18, 'name': 'Drama'}]


[{'id': 10751, 'name': 'Family'}]
[{'id': 16, 'name': 'Animation'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 37, 'name': 'Western'}]
[{'id': 18, 'name': 'Drama'}, {'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}, {'id': 878, 'name': 'Science Fiction'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 18, 'name': 'Drama'}]
[]
[{'id': 16, 'name': 'Animation'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 99, 'name': 'Documentary'}]
[]
[{'id': 35, 'name': 'Comedy'}]
[]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 27, 'name': 'Horror'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}, {'id': 18, 'name': 'Dram

[{'id': 99, 'name': 'Documentary'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}, {'id': 10752, 'name': 'War'}]
[{'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}]
[]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}]
[]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 14, 'name': 'Fantasy'}, {'id': 27, 'name': 'Horror'}, {'id': 9648, 'name': 'Mystery'}]
[{'id': 18, 'name': 'Drama'}, {'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 27, 'name': 'Horror'}, {'id': 9648, 'name': 'Mystery'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 9648, 'name': 'Mystery'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 16, 'name': 'Animation'}, {'id': 18, 'name': 'Drama'}, {'id': 10752, 'name': 'War'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Com

[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 27, 'name': 'Horror'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10752, 'name': 'War'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}, {'id': 10752, 'name': 'War'}]
[]
[{'id': 80, 'name': 'Crime'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}, {'id': 53, 'name': 'Thriller'}, {'id': 80, 'name': 'Crime'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 27, 'name': 'Horror'}, {'id': 9648, 'name': 'Mystery'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 27, 'name': 'Horror'}, {'id': 878, 'name': 'Science Fiction'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 16, 'name': 'Animation'}, {'id': 10751, 'name': 'Family

[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 99, 'name': 'Documentary'}, {'id': 16, 'name': 'Animation'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}, {'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10402, 'name': 'Music'}]
[{'id': 18, 'name': 'Drama'}]
[]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 14,

[{'id': 18, 'name': 'Drama'}, {'id': 10752, 'name': 'War'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 10402, 'name': 'Music'}, {'id': 99, 'name': 'Documentary'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 99, 'name': 'Documentary'}, {'id': 10402, 'name': 'Music'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 28, 'name': 'Action'}]
[{'id': 10752, 'name': 'War'}, {'id': 28, 'name': 'Action'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 28, 'name': 'Action'}, {'id': 878, 'name': 'Science Fiction'}, {'id': 27, 'name': 'Horror'}, {'id': 9648, 'name': 'Mystery'}]
[{'id': 99, 'name': 'Documentary'}]
[]
[{'id': 1

[{'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}, {'id': 10749, 'name': 'Romance'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}]
[]
[{'id': 10749, 'name': 'Romance'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 10770, 'name': 'TV Movie'}, {'id': 18, 'name': 'Drama'}, {'id': 36, 'name': 'History'}, {'id': 10752, 'name': 'War'}]
[{'id': 18, 'name': 'Drama'}, {'id': 80, 'name': 'Crime'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 37, 'name': 'Western'}]
[{'id': 80, 'name': 'Crime'}, {'id': 28, 'name': 'Action'}]
[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name': 'Action'}, {'id': 53, 'name': 

[{'id': 9648, 'name': 'Mystery'}]
[{'id': 80, 'name': 'Crime'}, {'id': 12, 'name': 'Adventure'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}, {'id': 9648, 'name': 'Mystery'}]
[{'id': 18, 'name': 'Drama'}, {'id': 14, 'name': 'Fantasy'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 10770, 'name': 'TV Movie'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 37, 'name': 'Western'}]
[{'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 14, 'name': 'Fantasy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10769, 'name': 'Foreign'}, {'id': 36, 'name': 'History'}, {'id': 10751, 'name': 'Family'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 28, 'name': 'Action'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 18, 'name': 'Drama'

[{'id': 35, 'name': 'Comedy'}]
[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 9648, 'name': 'Mystery'}, {'id': 35, 'name': 'Comedy'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}, {'id': 9648, 'name': 'Mystery'}, {'id': 27, 'name': 'Horror'}]
[{'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 27, 'name': 'Horror'}]
[{'id': 14, 'name': 'Fantasy'}, {'id': 28, 'name': 'Action'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}, {'id': 10402, 'name': 'Music'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10402, 'name': 'Music'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 80, 'name': 'Crime'}, {'id': 18, '

[{'id': 35, 'name': 'Comedy'}]
[{'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 10752, 'name': 'War'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 28, 'name': 'Action'}, {'id': 37, 'name': 'Western'}]
[{'id': 80, 'name': 'Crime'}]
[{'id': 37, 'name': 'Western'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 10770, 'name': 'TV Movie'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10769, 'name': 'Foreign'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10402, 'name': 'Music'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
[{'id': 35, 'name': 'Co

[]
[{'id': 99, 'name': 'Documentary'}]
[]
[{'id': 36, 'name': 'History'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}, {'id': 36, 'name': 'History'}]
[{'id': 27, 'name': 'Horror'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 37, 'name': 'Western'}]
[{'id': 53, 'name': 'Thriller'}]
[{'id': 878, 'name': 'Science Fiction'}]
[{'id': 10751, 'name': 'Family'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 878, 'name': 'Science Fiction'}]
[]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10752, 'name': 'War'}]
[{'id': 36, 'name': 'History'}, {'id': 10752, 'name': 'War'}]
[{'id': 28, 'name': 'Action'}, {'id': 37, 'name': 'Western'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}, {'id': 9648, 'name': 'Mystery'}, {'id': 27, 'name': 'Horror'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 53, 'name'

[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}, {'id': 10402, 'name': 'Music'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 80, 'name': 'Crime'}, {'id': 9648, 'name': 'Mystery'}]
[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}]
[{'id': 10770, 'name': 'TV Movie'}, {'id': 28, 'name': 'Action'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 80, 'name': 'Crime'}]
[{'id': 18, 'name': 'Drama'}, {'id': 12, 'name': 'Adventure'}, {'id': 99, 'name': 'Documentary'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 14, 'name': 'Fantasy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, '

[{'id': 14, 'name': 'Fantasy'}, {'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 10770, 'name': 'TV Movie'}, {'id': 16, 'name': 'Animation'}, {'id': 10751, 'name': 'Family'}, {'id': 14, 'name': 'Fantasy'}]
[{'id': 10770, 'name': 'TV Movie'}, {'id': 12, 'name': 'Adventure'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 10770, 'name': 'TV Movie'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}, {'id': 14, 'name': 'Fantasy'}]
[{'id': 10751, 'name': 'Family'}]
[{'id': 16, 'name': 'Animation'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 27, 'name': 'Horror'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {

[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 27, 'name': 'Horror'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 27, 'name': 'Horror'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 53, 'name': 'Thriller'}]
[{'id': 27, 'name': 'Horror'}]
[{'id': 27, 'name': 'Horror'}]
[{'id': 27, 'name': 'Horror'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 9648, 'name': 'Mystery'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 27, 'name': 'Horror'}, {'id': 878, 'name': 'Science Fiction'}]
[

[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 878, 'name': 'Science Fiction'}, {'id': 18, 'name': 'Drama'}]
[{'id': 9648, 'name': 'Mystery'}, {'id': 27, 'name': 'Horror'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10402, 'name': 'Music'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}]
[]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 10751, 'name': 'Family'}, {'id': 16, 'name': 'Animation'}, {'id': 10402, 'name': 'Music'}]
[{'id': 16, 'name': 'Animation'}, {'id': 10402, 'name': 'Music'}]
[{'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 99, 'name': 'Documentary'}, {'id': 10402, 'name': 'Music'}]
[{'id': 18, 'name': 'Drama'}, {'id': 9648, 'name': 'Mystery'}, {'id':

[{'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 27, 'name': 'Horror'}, {'id': 18, 'name': 'Drama'}]
[]
[{'id': 53, 'name': 'Thriller'}, {'id': 80, 'name': 'Crime'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}]
[]
[{'id': 18, 'name': 'Drama'}, {'id': 12, 'name': 'Adventure'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 10752, 'name': 'War'}, {'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[]
[{'id': 28, 'name': 'Action'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 10752, 'name': 'War'}, {'id': 36, 'name': 'History'}, {'id': 18, 'name': 'Drama'}]
[{'id': 16, 'name': 'Animation'}]
[{'id': 10751, 'name': 'Family'}, {'id': 16, 'name': 'Animation'}, {'id': 12, 'name': 'Adventure'}]
[{'id': 27, 'name': 'Horror'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name': 'Action'}, {'id

[{'id': 18, 'name': 'Drama'}]
[{'id': 10751, 'name': 'Family'}, {'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 10751, 'name': 'Family'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 27, 'name': 'Horror'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 27, 'name': 'Horror'}, {'id': 18, 'name': 'Drama'}]
[{'id': 10751, 'name': 'Family'}, {'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 27, 'name': 'Horror'}, {'id': 878, 'name': 'Science Fiction'}, {'id': 16, 'name': 'Animation'}, {'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}]
[{'id': 14, 'name': 'Fantasy'}, {

[]
[{'id': 35, 'name': 'Comedy'}, {'id': 14, 'name': 'Fantasy'}, {'id': 27, 'name': 'Horror'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 9648, 'name': 'Mystery'}]
[{'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 27, 'name': 'Horror'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 27, 'name': 'Horror'}]
[{'id': 18, 'name': 'Drama'}, {'id': 878, 'name': 'Science Fiction'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}, {'i

[{'id': 35, 'name': 'Comedy'}]
[{'id': 28, 'name': 'Action'}, {'id': 16, 'name': 'Animation'}, {'id': 14, 'name': 'Fantasy'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 16, 'name': 'Animation'}, {'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 27, 'name': 'Horror'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}, {'id': 10769, 'name': 'Foreign'}]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}]
[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}, {'id': 35, 'name': 'Comedy'}, {'id': 80, 'name': 'Crime'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10769, 'name': 'Foreign'}]
[]
[]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'},

[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 10752, 'name': 'War'}, {'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 80, 'name': 'Crime'}, {'id': 9648, 'name': 'Mystery'}]
[{'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 18, 'name': 'Drama'}]
[]
[{'id': 878, 'name': 'Science Fiction'}, {'id': 53, 'name': 'Thriller'}, {'id': 18, 'name': 'Drama'}, {'id': 28, 'name': 'Action'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 99, 'name': 'Documentary'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 878, 'name': 'Science Fiction'}, {'id'

0        None
1        None
2        None
3        None
4        None
5        None
6        None
7        None
8        None
9        None
10       None
11       None
12       None
13       None
14       None
15       None
16       None
17       None
18       None
19       None
20       None
21       None
22       None
23       None
24       None
25       None
26       None
27       None
28       None
29       None
         ... 
45436    None
45437    None
45438    None
45439    None
45440    None
45441    None
45442    None
45443    None
45444    None
45445    None
45446    None
45447    None
45448    None
45449    None
45450    None
45451    None
45452    None
45453    None
45454    None
45455    None
45456    None
45457    None
45458    None
45459    None
45460    None
45461    None
45462    None
45463    None
45464    None
45465    None
Name: genres, Length: 45466, dtype: object

In [5]:
md['genres'] = md['genres'].fillna('[]').apply(literal_eval)
md['genres'][0]

[{'id': 16, 'name': 'Animation'},
 {'id': 35, 'name': 'Comedy'},
 {'id': 10751, 'name': 'Family'}]

In [6]:
md['genres'] = md['genres'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
md['genres'][:5]

0     [Animation, Comedy, Family]
1    [Adventure, Fantasy, Family]
2               [Romance, Comedy]
3        [Comedy, Drama, Romance]
4                        [Comedy]
Name: genres, dtype: object

**I use the TMDB Ratings to come up with our Top Movies Chart. I will use IMDB's weighted rating formula to construct my chart. Mathematically, it is represented as follows: Weighted Rating, (WR) = (vv+m.R)+(mv+m.C)**

where,

-  'v' is the number of votes for the movie
-  'm' is the minimum votes required to be listed in the chart
-  'R' is the average rating of the movie
-  'C' is the mean vote across the whole report

**The next step is to determine an appropriate value for 'm'**, the minimum votes required to be listed in the chart. We will use 95th percentile as our cutoff. In other words, for a movie to feature in the charts, it must have more votes than at least 95% of the movies in the list.

I will build our overall Top 250 Chart and will define a function to build charts for a particular genre. Let's begin!

In [7]:
vote_counts = md[md['vote_count'].notnull()]['vote_count'].astype('int')
vote_averages = md[md['vote_average'].notnull()]['vote_average'].astype('int')
C = vote_averages.mean()
C

5.244896612406511

In [8]:
#For a movie to feature in the charts, it must have more votes than at least 95% of the movies in the list,
m = vote_counts.quantile(0.95)
m

434.0

Therefore, to qualify to be considered for the chart, a movie has to have at least 434 votes on TMDB.

In [9]:
md['year'] = pd.to_datetime(md['release_date'], errors='coerce').apply(lambda x: str(x).split('-')[0] if x != np.nan else np.nan)

In [10]:
qualified = md[(md['vote_count'] >= m) & (md['vote_count'].notnull()) & (md['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity', 'genres']]
qualified['vote_count'] = qualified['vote_count'].astype('int')
qualified['vote_average'] = qualified['vote_average'].astype('int')
qualified.shape

(2274, 6)

Therefore, to qualify to be considered for the chart, a movie has to have at least 434 votes on TMDB. We also see that the average rating for a movie on TMDB is 5.244 on a scale of 10. 
>2274 Movies qualify to be on our chart.

In [11]:
def weighted_rating(x):
    v = x['vote_count']
    R = x['vote_average']
    return (v/(v+m) * R) + (m/(m+v) * C)

In [12]:
qualified['wr'] = qualified.apply(weighted_rating, axis=1)

In [13]:
qualified = qualified.sort_values('wr', ascending=False).head(250)

### Top Movies

In [14]:
qualified.head(5)

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,wr
15480,Inception,2010,14075,8,29.1081,"[Action, Thriller, Science Fiction, Mystery, A...",7.917588
12481,The Dark Knight,2008,12269,8,123.167,"[Drama, Action, Crime, Thriller]",7.905871
22879,Interstellar,2014,11187,8,32.2135,"[Adventure, Drama, Science Fiction]",7.897107
2843,Fight Club,1999,9678,8,63.8696,[Drama],7.881753
4863,The Lord of the Rings: The Fellowship of the Ring,2001,8892,8,32.0707,"[Adventure, Fantasy, Action]",7.871787


We see that three Christopher Nolan Films, Inception, The Dark Knight and Interstellar occur at the very top of our chart. The chart also indicates a strong bias of TMDB Users towards particular genres and directors.

Let us now construct our function that builds charts for particular genres. For this, we will use relax our default conditions to the 85th percentile instead of 95.

In [15]:
s = md.apply(lambda x: pd.Series(x['genres']),axis=1).stack().reset_index(level=1, drop=True)
s.name = 'genre'
gen_md = md.drop('genres', axis=1).join(s)

In [16]:
def build_chart(genre, percentile=0.85):
    df = gen_md[gen_md['genre'] == genre]
    vote_counts = df[df['vote_count'].notnull()]['vote_count'].astype('int')
    vote_averages = df[df['vote_average'].notnull()]['vote_average'].astype('int')
    C = vote_averages.mean()
    m = vote_counts.quantile(percentile)
    
    qualified = df[(df['vote_count'] >= m) & (df['vote_count'].notnull()) & (df['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity']]
    qualified['vote_count'] = qualified['vote_count'].astype('int')
    qualified['vote_average'] = qualified['vote_average'].astype('int')
    
    qualified['wr'] = qualified.apply(lambda x: (x['vote_count']/(x['vote_count']+m) * x['vote_average']) + (m/(m+x['vote_count']) * C), axis=1)
    qualified = qualified.sort_values('wr', ascending=False).head(250)
    
    return qualified

Let us see our method in action by displaying the Top 15 Romance Movies (Romance almost didn't feature at all in our Generic Top Chart despite being one of the most popular movie genres).

### Top Romance Movies

In [17]:
build_chart('Romance').head(5)

Unnamed: 0,title,year,vote_count,vote_average,popularity,wr
10309,Dilwale Dulhania Le Jayenge,1995,661,9,34.457,8.565285
351,Forrest Gump,1994,8147,8,48.3072,7.971357
876,Vertigo,1958,1162,8,18.2082,7.811667
40251,Your Name.,2016,1030,8,34.461252,7.789489
883,Some Like It Hot,1959,835,8,11.8451,7.745154


The top romance movie according to our metrics is Bollywood's Dilwale Dulhania Le Jayenge. This Shahrukh Khan starrer also happens to be one of my personal favorites.

## 2) Content Based Recommender
The recommender we built in the previous section suffers some severe limitations. For one, it gives the same recommendation to everyone, regardless of the user's personal taste. If a person who loves romantic movies (and hates action) were to look at our Top 15 Chart, s/he wouldn't probably like most of the movies. If s/he were to go one step further and look at our charts by genre, s/he wouldn't still be getting the best recommendations.

For instance, consider a person who loves Dilwale Dulhania Le Jayenge, My Name is Khan and Kabhi Khushi Kabhi Gham. One inference we can obtain is that the person loves the actor Shahrukh Khan and the director Karan Johar. Even if s/he were to access the romance chart, s/he wouldn't find these as the top recommendations.

To personalise our recommendations more, I am going to build an engine that computes similarity between movies based on certain metrics and suggests movies that are most similar to a particular movie that a user liked. Since we will be using movie metadata (or content) to build this engine, this also known as Content Based Filtering.

I will build two Content Based Recommenders based on:

Movie Overviews and Taglines
Movie Cast, Crew, Keywords and Genre
Also, as mentioned in the introduction, I will be using a subset of all the movies available to us due to limiting computing power available to me.



In [18]:
links_small = pd.read_csv('the-movies-dataset/links_small.csv')
links_small = links_small[links_small['tmdbId'].notnull()]['tmdbId'].astype('int')
print(links_small.head())

0      862
1     8844
2    15602
3    31357
4    11862
Name: tmdbId, dtype: int64


In [19]:
md = md.drop([19730, 29503, 35587])

In [20]:
#Check EDA Notebook for how and why I got these indices.
md['id'] = md['id'].astype('int')

In [21]:
smd = md[md['id'].isin(links_small)]
smd.shape

(9099, 25)

We have 9099 movies avaiable in our small movies metadata dataset which is 5 times smaller than our original dataset of 45000 movies.

### Movie Description Based Recommender

Let us first try to build a recommender using movie descriptions and taglines. We do not have a quantitative metric to judge our machine's performance so this will have to be done qualitatively.

In [22]:
smd['tagline'] = smd['tagline'].fillna('')
smd['description'] = smd['overview'] + smd['tagline']
smd['description'] = smd['description'].fillna('')

In [23]:
tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(smd['description'])

In [24]:
tfidf_matrix.shape

(9099, 268124)

Cosine Similarity
I will be using the Cosine Similarity to calculate a numeric quantity that denotes the similarity between two movies. Mathematically, it is defined as follows:

cosine(x,y)=x.y⊺||x||.||y|| 
Since we have used the TF-IDF Vectorizer, calculating the Dot Product will directly give us the Cosine Similarity Score. Therefore, we will use sklearn's linear_kernel instead of cosine_similarities since it is much faster.

In [25]:
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
cosine_sim[0]

array([1.        , 0.00680476, 0.        , ..., 0.        , 0.00344913,
       0.        ])

We now have a pairwise cosine similarity matrix for all the movies in our dataset. The next step is to write a function that returns the 30 most similar movies based on the cosine similarity score.

In [26]:
smd = smd.reset_index()
titles = smd['title']
indices = pd.Series(smd.index, index=smd['title'])

In [27]:
def get_recommendations(title):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:31]
    movie_indices = [i[0] for i in sim_scores]
    return titles.iloc[movie_indices]

We're all set. Let us now try and get the top recommendations for a few movies and see how good the recommendations are.

In [28]:
get_recommendations('The Godfather').head(10)

973      The Godfather: Part II
8387                 The Family
3509                       Made
4196         Johnny Dangerously
29               Shanghai Triad
5667                       Fury
2412             American Movie
1582    The Godfather: Part III
4221                    8 Women
2159              Summer of Sam
Name: title, dtype: object

In [29]:
get_recommendations('The Dark Knight').head(10)

7931                      The Dark Knight Rises
132                              Batman Forever
1113                             Batman Returns
8227    Batman: The Dark Knight Returns, Part 2
7565                 Batman: Under the Red Hood
524                                      Batman
7901                           Batman: Year One
2579               Batman: Mask of the Phantasm
2696                                        JFK
8165    Batman: The Dark Knight Returns, Part 1
Name: title, dtype: object

We see that for The Dark Knight, our system is able to identify it as a Batman film and subsequently recommend other Batman films as its top recommendations. But unfortunately, that is all this system can do at the moment. This is not of much use to most people as it doesn't take into considerations very important features such as cast, crew, director and genre, which determine the rating and the popularity of a movie. Someone who liked The Dark Knight probably likes it more because of Nolan and would hate Batman Forever and every other substandard movie in the Batman Franchise.

Therefore, we are going to use much more suggestive metadata than Overview and Tagline. In the next subsection, we will build a more sophisticated recommender that takes genre, keywords, cast and crew into consideration.

Metadata Based Recommender
To build our standard metadata based content recommender, we will need to merge our current dataset with the crew and the keyword datasets. Let us prepare this data as our first step.

In [30]:
credits = pd.read_csv('the-movies-dataset/credits.csv')
keywords = pd.read_csv('the-movies-dataset/keywords.csv')

In [31]:
keywords['id'] = keywords['id'].astype('int')
credits['id'] = credits['id'].astype('int')
md['id'] = md['id'].astype('int')
md.shape

(45463, 25)

In [32]:
md = md.merge(credits, on='id')
md = md.merge(keywords, on='id')

In [33]:
smd = md[md['id'].isin(links_small)]
smd.shape

(9219, 28)

We now have our cast, crew, genres and credits, all in one dataframe. Let us wrangle this a little more using the following intuitions:

Crew: From the crew, we will only pick the director as our feature since the others don't contribute that much to the feel of the movie.
Cast: Choosing Cast is a little more tricky. Lesser known actors and minor roles do not really affect people's opinion of a movie. Therefore, we must only select the major characters and their respective actors. Arbitrarily we will choose the top 3 actors that appear in the credits list.

In [34]:
smd['cast'] = smd['cast'].apply(literal_eval)
smd['crew'] = smd['crew'].apply(literal_eval)
smd['keywords'] = smd['keywords'].apply(literal_eval)
smd['cast_size'] = smd['cast'].apply(lambda x: len(x))
smd['crew_size'] = smd['crew'].apply(lambda x: len(x))

In [35]:
def get_director(x):
    for i in x:
        if i['job'] == 'Director':
            return i['name']
    return np.nan

In [36]:
smd['director'] = smd['crew'].apply(get_director)

In [37]:
smd['cast'] = smd['cast'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
smd['cast'] = smd['cast'].apply(lambda x: x[:3] if len(x) >=3 else x)

In [38]:
smd['keywords'] = smd['keywords'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])

My approach to building the recommender is going to be extremely hacky. What I plan on doing is creating a metadata dump for every movie which consists of genres, director, main actors and keywords. I then use a Count Vectorizer to create our count matrix as we did in the Description Recommender. The remaining steps are similar to what we did earlier: we calculate the cosine similarities and return movies that are most similar.

These are steps I follow in the preparation of my genres and credits data:

Strip Spaces and Convert to Lowercase from all our features. This way, our engine will not confuse between Johnny Depp and Johnny Galecki.
Mention Director 3 times to give it more weight relative to the entire cast.

In [39]:
smd['cast'] = smd['cast'].apply(lambda x: [str.lower(i.replace(" ", "")) for i in x])

In [40]:
smd['director'] = smd['director'].astype('str').apply(lambda x: str.lower(x.replace(" ", "")))
smd['director'] = smd['director'].apply(lambda x: [x,x, x])

Keywords
We will do a small amount of pre-processing of our keywords before putting them to any use. As a first step, we calculate the frequenct counts of every keyword that appears in the dataset.

In [41]:
s = smd.apply(lambda x: pd.Series(x['keywords']),axis=1).stack().reset_index(level=1, drop=True)
s.name = 'keyword'

In [42]:
s = s.value_counts()
s[:5]

independent film        610
woman director          550
murder                  399
duringcreditsstinger    327
based on novel          318
Name: keyword, dtype: int64

Keywords occur in frequencies ranging from 1 to 610. We do not have any use for keywords that occur only once. Therefore, these can be safely removed. Finally, we will convert every word to its stem so that words such as Dogs and Dog are considered the same.

In [43]:
s = s[s > 1]

In [44]:
stemmer = SnowballStemmer('english')
stemmer.stem('dogs')

'dog'

In [45]:
def filter_keywords(x):
    words = []
    for i in x:
        if i in s:
            words.append(i)
    return words

In [46]:
smd['keywords'] = smd['keywords'].apply(filter_keywords)
smd['keywords'] = smd['keywords'].apply(lambda x: [stemmer.stem(i) for i in x])
smd['keywords'] = smd['keywords'].apply(lambda x: [str.lower(i.replace(" ", "")) for i in x])

smd['soup'] = smd['keywords'] + smd['cast'] + smd['director'] + smd['genres']
smd['soup'] = smd['soup'].apply(lambda x: ' '.join(x))


In [47]:
count = CountVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
count_matrix = count.fit_transform(smd['soup'])

In [48]:
cosine_sim = cosine_similarity(count_matrix, count_matrix)

In [49]:
smd = smd.reset_index()
titles = smd['title']
indices = pd.Series(smd.index, index=smd['title'])

We will reuse the get_recommendations function that we had written earlier. Since our cosine similarity scores have changed, we expect it to give us different (and probably better) results. Let us check for The Dark Knight again and see what recommendations I get this time around.

In [50]:
get_recommendations('The Dark Knight').head(10)

8031         The Dark Knight Rises
6218                 Batman Begins
6623                  The Prestige
2085                     Following
7648                     Inception
4145                      Insomnia
3381                       Memento
8613                  Interstellar
7659    Batman: Under the Red Hood
1134                Batman Returns
Name: title, dtype: object

I am much more satisfied with the results I get this time around. The recommendations seem to have recognized other Christopher Nolan movies (due to the high weightage given to director) and put them as top recommendations. I enjoyed watching The Dark Knight as well as some of the other ones in the list including Batman Begins, The Prestige and The Dark Knight Rises.

We can of course experiment on this engine by trying out different weights for our features (directors, actors, genres), limiting the number of keywords that can be used in the soup, weighing genres based on their frequency, only showing movies with the same languages, etc.

Let me also get recommendations for another movie, Mean Girls which happens to be my girlfriend's favorite movie.

In [51]:
get_recommendations('Mean Girls').head(10)

3319               Head Over Heels
4763                 Freaky Friday
1329              The House of Yes
6277              Just Like Heaven
7905         Mr. Popper's Penguins
7332    Ghosts of Girlfriends Past
6959     The Spiderwick Chronicles
8883                      The DUFF
6698         It's a Boy Girl Thing
7377       I Love You, Beth Cooper
Name: title, dtype: object

Popularity and Ratings
One thing that we notice about our recommendation system is that it recommends movies regardless of ratings and popularity. It is true that Batman and Robin has a lot of similar characters as compared to The Dark Knight but it was a terrible movie that shouldn't be recommended to anyone.

Therefore, we will add a mechanism to remove bad movies and return movies which are popular and have had a good critical response.

I will take the top 25 movies based on similarity scores and calculate the vote of the 60th percentile movie. Then, using this as the value of  m , we will calculate the weighted rating of each movie using IMDB's formula like we did in the Simple Recommender section.

In [52]:
def improved_recommendations(title):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:26]
    movie_indices = [i[0] for i in sim_scores]
    
    movies = smd.iloc[movie_indices][['title', 'vote_count', 'vote_average', 'year']]
    vote_counts = movies[movies['vote_count'].notnull()]['vote_count'].astype('int')
    vote_averages = movies[movies['vote_average'].notnull()]['vote_average'].astype('int')
    C = vote_averages.mean()
    m = vote_counts.quantile(0.60)
    qualified = movies[(movies['vote_count'] >= m) & (movies['vote_count'].notnull()) & (movies['vote_average'].notnull())]
    qualified['vote_count'] = qualified['vote_count'].astype('int')
    qualified['vote_average'] = qualified['vote_average'].astype('int')
    qualified['wr'] = qualified.apply(weighted_rating, axis=1)
    qualified = qualified.sort_values('wr', ascending=False).head(10)
    return qualified

In [53]:
improved_recommendations('The Dark Knight')

Unnamed: 0,title,vote_count,vote_average,year,wr
7648,Inception,14075,8,2010,7.917588
8613,Interstellar,11187,8,2014,7.897107
6623,The Prestige,4510,8,2006,7.758148
3381,Memento,4168,8,2000,7.740175
8031,The Dark Knight Rises,9263,7,2012,6.921448
6218,Batman Begins,7511,7,2005,6.904127
1134,Batman Returns,1706,6,1992,5.846862
132,Batman Forever,1529,5,1995,5.054144
9024,Batman v Superman: Dawn of Justice,7189,5,2016,5.013943
1260,Batman & Robin,1447,4,1997,4.287233


Let me also get the recommendations for Mean Girls, my girlfriend's favorite movie.

In [54]:
improved_recommendations('Mean Girls')

Unnamed: 0,title,vote_count,vote_average,year,wr
1547,The Breakfast Club,2189,7,1985,6.709602
390,Dazed and Confused,588,7,1993,6.254682
8883,The DUFF,1372,6,2015,5.818541
3712,The Princess Diaries,1063,6,2001,5.781086
4763,Freaky Friday,919,6,2003,5.757786
6277,Just Like Heaven,595,6,2005,5.681521
6959,The Spiderwick Chronicles,593,6,2008,5.680901
7494,American Pie Presents: The Book of Love,454,5,2009,5.11969
7332,Ghosts of Girlfriends Past,716,5,2009,5.092422
7905,Mr. Popper's Penguins,775,5,2011,5.087912


Unfortunately, Batman and Robin does not disappear from our recommendation list. This is probably due to the fact that it is rated a 4, which is only slightly below average on TMDB. It certainly doesn't deserve a 4 when amazing movies like The Dark Knight Rises has only a 7. However, there is nothing much we can do about this. Therefore, we will conclude our Content Based Recommender section here and come back to it when we build a hybrid engine.

## 3) Collaborative Filtering
Our content based engine suffers from some severe limitations. It is only capable of suggesting movies which are close to a certain movie. That is, it is not capable of capturing tastes and providing recommendations across genres.

Also, the engine that we built is not really personal in that it doesn't capture the personal tastes and biases of a user. Anyone querying our engine for recommendations based on a movie will receive the same recommendations for that movie, regardless of who s/he is.

Therefore, in this section, we will use a technique called Collaborative Filtering to make recommendations to Movie Watchers. Collaborative Filtering is based on the idea that users similar to a me can be used to predict how much I will like a particular product or service those users have used/experienced but I have not.

I will not be implementing Collaborative Filtering from scratch. Instead, I will use the Surprise library that used extremely powerful algorithms like Singular Value Decomposition (SVD) to minimise RMSE (Root Mean Square Error) and give great recommendations.

In [55]:
reader = Reader()

In [56]:
ratings = pd.read_csv('the-movies-dataset/ratings_small.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [57]:
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
data.split(n_folds=5)

In [58]:
svd = SVD()
evaluate(svd, data, measures=['RMSE', 'MAE'])

Evaluating RMSE, MAE of algorithm SVD.

------------
Fold 1
RMSE: 0.8924
MAE:  0.6858
------------
Fold 2
RMSE: 0.8956
MAE:  0.6901
------------
Fold 3
RMSE: 0.8912
MAE:  0.6872
------------
Fold 4
RMSE: 0.9079
MAE:  0.6992
------------
Fold 5
RMSE: 0.8917
MAE:  0.6855
------------
------------
Mean RMSE: 0.8957
Mean MAE : 0.6895
------------
------------


CaseInsensitiveDefaultDict(list,
                           {'rmse': [0.8923543850271318,
                             0.8956203392055104,
                             0.8911822427785552,
                             0.9078690976304943,
                             0.891701688150575],
                            'mae': [0.6857732384323342,
                             0.6900901645998497,
                             0.6871850172832591,
                             0.6991928013058964,
                             0.6854974276649642]})

We get a mean Root Mean Sqaure Error of 0.8963 which is more than good enough for our case. Let us now train on our dataset and arrive at predictions.

In [59]:
trainset = data.build_full_trainset()
svd.train(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1a4724dd30>

Let us pick user 5000 and check the ratings s/he has given.

In [60]:
ratings[ratings['userId'] == 1]

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
5,1,1263,2.0,1260759151
6,1,1287,2.0,1260759187
7,1,1293,2.0,1260759148
8,1,1339,3.5,1260759125
9,1,1343,2.0,1260759131


In [61]:
svd.predict(1, 302, 3)

Prediction(uid=1, iid=302, r_ui=3, est=2.562467631764202, details={'was_impossible': False})

For movie with ID 302, we get an estimated prediction of 2.686. One startling feature of this recommender system is that it doesn't care what the movie is (or what it contains). It works purely on the basis of an assigned movie ID and tries to predict ratings based on how the other users have predicted the movie.



## 4) Hybrid Recommender

In this section, I will try to build a simple hybrid recommender that brings together techniques we have implemented in the content based and collaborative filter based engines. This is how it will work:

Input: User ID and the Title of a Movie
Output: Similar movies sorted on the basis of expected ratings by that particular user.

In [62]:
def convert_int(x):
    try:
        return int(x)
    except:
        return np.nan

In [63]:
id_map = pd.read_csv('the-movies-dataset/links_small.csv')[['movieId', 'tmdbId']]
id_map['tmdbId'] = id_map['tmdbId'].apply(convert_int)
id_map.columns = ['movieId', 'id']
id_map = id_map.merge(smd[['title', 'id']], on='id').set_index('title')
#id_map = id_map.set_index('tmdbId')

In [64]:
indices_map = id_map.set_index('id')

In [65]:
def hybrid(userId, title):
    idx = indices[title]
    tmdbId = id_map.loc[title]['id']
    #print(idx)
    movie_id = id_map.loc[title]['movieId']
    
    sim_scores = list(enumerate(cosine_sim[int(idx)]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:26]
    movie_indices = [i[0] for i in sim_scores]
    
    movies = smd.iloc[movie_indices][['title', 'vote_count', 'vote_average', 'year', 'id']]
    movies['est'] = movies['id'].apply(lambda x: svd.predict(userId, indices_map.loc[x]['movieId']).est)
    movies = movies.sort_values('est', ascending=False)
    return movies.head(10)

In [66]:
hybrid(1, 'Avatar')

Unnamed: 0,title,vote_count,vote_average,year,id,est
522,Terminator 2: Judgment Day,4274.0,7.7,1991,280,3.391458
1011,The Terminator,4208.0,7.4,1984,218,2.933516
974,Aliens,3282.0,7.7,1986,679,2.921238
8401,Star Trek Into Darkness,4479.0,7.4,2013,54138,2.891334
2014,Fantastic Planet,140.0,7.6,1973,16306,2.874881
1376,Titanic,7770.0,7.5,1997,597,2.848355
8658,X-Men: Days of Future Past,6155.0,7.5,2014,127585,2.826041
1621,Darby O'Gill and the Little People,35.0,6.7,1959,18887,2.74971
4017,Hawk the Slayer,13.0,4.5,1980,25628,2.723773
344,True Lies,1138.0,6.8,1994,36955,2.703014


In [67]:
hybrid(500, 'Avatar')

Unnamed: 0,title,vote_count,vote_average,year,id,est
974,Aliens,3282.0,7.7,1986,679,3.512356
8658,X-Men: Days of Future Past,6155.0,7.5,2014,127585,3.364598
522,Terminator 2: Judgment Day,4274.0,7.7,1991,280,3.191157
1011,The Terminator,4208.0,7.4,1984,218,3.175024
2014,Fantastic Planet,140.0,7.6,1973,16306,3.14592
8401,Star Trek Into Darkness,4479.0,7.4,2013,54138,3.11118
1668,Return from Witch Mountain,38.0,5.6,1978,14822,3.048983
3060,Sinbad and the Eye of the Tiger,39.0,6.3,1977,11940,2.989265
831,Escape to Witch Mountain,60.0,6.5,1975,14821,2.9657
1621,Darby O'Gill and the Little People,35.0,6.7,1959,18887,2.924761


We see that for our hybrid recommender, we get different recommendations for different users although the movie is the same. Hence, our recommendations are more personalized and tailored towards particular users.

## Conclusion
In this notebook, I have built 4 different recommendation engines based on different ideas and algorithms. They are as follows:

**Simple Recommender:** This system used overall TMDB Vote Count and Vote Averages to build Top Movies Charts, in general and for a specific genre. The IMDB Weighted Rating System was used to calculate ratings on which the sorting was finally performed.

**Content Based Recommender:** We built two content based engines; one that took movie overview and taglines as input and the other which took metadata such as cast, crew, genre and keywords to come up with predictions. We also deviced a simple filter to give greater preference to movies with more votes and higher ratings.

**Collaborative Filtering:** We used the powerful Surprise Library to build a collaborative filter based on single value decomposition. The RMSE obtained was less than 1 and the engine gave estimated ratings for a given user and movie.

**Hybrid Engine:** We brought together ideas from content and collaborative filterting to build an engine that gave movie suggestions to a particular user based on the estimated ratings that it had internally calculated for that user.