# Movie Recommender System

In this Project we are going to design a recommender system for movies

We are going to follow the approach of Content based recommender system where movies' details will help in designing one

The data set has been taken from Kaggle 
#### https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata

In [1]:
# importing required libraries
from sklearnex import patch_sklearn  # Sklearn Extension to inject optimized algorithms for better performance
patch_sklearn()  

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import re

import warnings
warnings.filterwarnings('ignore')

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


In [2]:
# printing all the columns and rows
pd.set_option("display.max_columns",None)
pd.set_option("display.min_rows",None)

In [3]:
# fetching data set
movies = pd.read_csv(r'C:\Users\nitee\OneDrive\Desktop\niteen\Movies Recommender system\tmdb_5000_movies.csv/tmdb_5000_movies.csv')
credits = pd.read_csv(r'C:\Users\nitee\OneDrive\Desktop\niteen\Movies Recommender system\tmdb_5000_credits.csv/tmdb_5000_credits.csv') 

In [4]:
movies.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500


In [5]:
movies.shape

(4803, 20)

In [6]:
movies.columns

Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',
       'original_title', 'overview', 'popularity', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title', 'vote_average',
       'vote_count'],
      dtype='object')

In [7]:
credits.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [8]:
credits.shape

(4803, 4)

In Credits dataset we have cast and crew info of each movies

So on merging both dataset on movie name i.e, 'title' column, we can have a complete info dataset

In [9]:
# merging both dataframes
movies = movies.merge(credits,left_on='id',right_on='movie_id')
movies.shape

(4803, 24)

In [10]:
movies[movies.title_x!=movies.title_y]

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title_x,vote_average,vote_count,movie_id,title_y,cast,crew


In [11]:
movies.drop(['id','title_x'],axis=1,inplace=True)
movies.rename(columns={'title_y':'title'},inplace=True)
movies.shape

(4803, 22)

As we can see total number of rows & columns are fine now lets check if there is any null data

In [12]:
movies.isnull().sum()

budget                     0
genres                     0
homepage                3091
keywords                   0
original_language          0
original_title             0
overview                   3
popularity                 0
production_companies       0
production_countries       0
release_date               1
revenue                    0
runtime                    2
spoken_languages           0
status                     0
tagline                  844
vote_average               0
vote_count                 0
movie_id                   0
title                      0
cast                       0
crew                       0
dtype: int64

## EDA

In [13]:
# lets see if there are any duplicate entries for movies
movies.movie_id.unique().shape

(4803,)

In [14]:
# lets check the distribution of different language movies
movies.original_language.value_counts()

en    4505
fr      70
es      32
zh      27
de      27
hi      19
ja      16
it      14
cn      12
ru      11
ko      11
pt       9
da       7
sv       5
nl       4
fa       4
th       3
he       3
cs       2
id       2
ro       2
ar       2
ta       2
ps       1
el       1
xx       1
af       1
is       1
ky       1
vi       1
pl       1
tr       1
no       1
nb       1
te       1
sl       1
hu       1
Name: original_language, dtype: int64

as we can see most of the movies are in english so we won't put much effect on our recommender system

In [15]:
movies[['runtime','vote_average','vote_count','revenue']].describe()

Unnamed: 0,runtime,vote_average,vote_count,revenue
count,4801.0,4803.0,4803.0,4803.0
mean,106.875859,6.092172,690.217989,82260640.0
std,22.611935,1.194612,1234.585891,162857100.0
min,0.0,0.0,0.0,0.0
25%,94.0,5.6,54.0,0.0
50%,103.0,6.2,235.0,19170000.0
75%,118.0,6.8,737.0,92917190.0
max,338.0,10.0,13752.0,2787965000.0


there are some movies with runtime 0, lets check that

In [16]:
movies[movies.runtime==0]

Unnamed: 0,budget,genres,homepage,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,vote_average,vote_count,movie_id,title,cast,crew
1011,0,"[{""id"": 27, ""name"": ""Horror""}]",,"[{""id"": 10292, ""name"": ""gore""}, {""id"": 12339, ...",de,The Tooth Fairy,A woman and her daughter (Nicole Muñoz) encoun...,0.716764,[],[],2006-08-08,0,0.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,,4.3,13,53953,The Tooth Fairy,"[{""cast_id"": 2, ""character"": ""Peter Campbell"",...","[{""credit_id"": ""52fe4885c3a36847f816b927"", ""de..."
3112,0,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 80, ""name...",,[],en,Blood Done Sign My Name,A drama based on the true story in which a bla...,0.397341,[],[],2010-02-01,0,0.0,[],Released,No one changes the world alone.,6.0,5,41894,Blood Done Sign My Name,"[{""cast_id"": 0, ""character"": ""Boo Tyson"", ""cre...","[{""credit_id"": ""58ba3af09251416073014bc1"", ""de..."
3669,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",http://www.romeothemovie.com/,[],en,Should've Been Romeo,"A self-centered, middle-aged pitchman for a po...",0.40703,"[{""name"": ""Phillybrook Films"", ""id"": 65147}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-04-28,0,0.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Even Shakespeare didn't see this one coming.,0.0,0,113406,Should've Been Romeo,"[{""cast_id"": 4, ""character"": ""Joey"", ""credit_i...","[{""credit_id"": ""5617d84d92514166e2001e21"", ""de..."
3809,4000000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",,[],en,How to Fall in Love,"An accountant, who never quite grew out of his...",1.923514,"[{""name"": ""Annuit Coeptis Entertainment Inc."",...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-21,0,0.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,,5.2,20,158150,How to Fall in Love,"[{""cast_id"": 1, ""character"": ""Annie Hayes"", ""c...","[{""credit_id"": ""52fe4bdd9251416c910e82a3"", ""de..."
3953,0,"[{""id"": 10752, ""name"": ""War""}, {""id"": 18, ""nam...",,"[{""id"": 187056, ""name"": ""woman director""}]",en,Fort McCoy,Unable to serve in World War II because of a h...,0.384496,[],[],2014-01-01,0,0.0,[],Released,,6.3,2,281230,Fort McCoy,"[{""cast_id"": 0, ""character"": ""Frank Stirn"", ""c...","[{""credit_id"": ""54e269aec3a368454b007976"", ""de..."
3992,0,[],,[],en,Sardaarji,A ghost hunter uses bottles to capture trouble...,0.296981,[],"[{""iso_3166_1"": ""IN"", ""name"": ""India""}]",2015-06-26,0,0.0,[],Released,,9.5,2,346081,Sardaarji,[],"[{""credit_id"": ""558ab3f4925141076f0001d7"", ""de..."
4068,0,[],,[],en,Sharkskin,The Post War II story of Manhattan born Mike E...,0.027801,[],[],2015-01-01,0,0.0,[],Released,,0.0,0,371085,Sharkskin,[],[]
4118,0,[],,[],en,Hum To Mohabbat Karega,"Raju, a waiter, is in love with the famous TV ...",0.001186,[],[],2000-05-26,0,0.0,[],Released,,0.0,0,325140,Hum To Mohabbat Karega,[],[]
4205,0,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 80, ""name...",http://www.imdb.com/title/tt1289419/,[],en,N-Secure,N-Secure is a no holds-barred thrilling drama ...,0.13456,[],[],2010-10-15,2592808,0.0,[],Released,,4.3,4,66468,N-Secure,"[{""cast_id"": 3, ""character"": ""David Alan Washi...","[{""credit_id"": ""52fe473ec3a368484e0bca79"", ""de..."
4210,0,"[{""id"": 10749, ""name"": ""Romance""}]",,[],hi,दिल जो भी कहे,"During the British rule in India, several Indi...",0.122704,"[{""name"": ""Entertainment One Pvt. Ltd."", ""id"":...","[{""iso_3166_1"": ""IN"", ""name"": ""India""}]",2006-12-07,0,0.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,,0.0,0,74084,Dil Jo Bhi Kahey...,"[{""cast_id"": 2, ""character"": ""Shekhar Sinha"", ...","[{""credit_id"": ""575d52eac3a3683168003910"", ""de..."


there are many entries with runtime 0

In [17]:
# lets check the vote counts of movies with vote_average 10
movies[movies.vote_average==10]

Unnamed: 0,budget,genres,homepage,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,vote_average,vote_count,movie_id,title,cast,crew
3519,0,"[{""id"": 35, ""name"": ""Comedy""}]",,"[{""id"": 131, ""name"": ""italy""}, {""id"": 8250, ""n...",en,Stiff Upper Lips,Stiff Upper Lips is a broad parody of British ...,0.356495,[],"[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",1998-06-12,0,99.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,,10.0,1,89861,Stiff Upper Lips,"[{""cast_id"": 11, ""character"": ""Emily"", ""credit...","[{""credit_id"": ""52fe4a2f9251416c910c5edb"", ""de..."
4045,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",,"[{""id"": 1415, ""name"": ""small town""}, {""id"": 15...",en,"Dancer, Texas Pop. 81","Four guys, best friends, have grown up togethe...",0.376662,"[{""name"": ""HSX Films"", ""id"": 4714}, {""name"": ""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",1998-05-01,565592,97.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,in the middle of nowhere they had everything,10.0,1,78373,"Dancer, Texas Pop. 81","[{""cast_id"": 1, ""character"": ""Keller Coleman"",...","[{""credit_id"": ""52fe499bc3a368484e13445b"", ""de..."
4247,1,"[{""id"": 10749, ""name"": ""Romance""}, {""id"": 35, ...",,[],en,Me You and Five Bucks,"A womanizing yet lovable loser, Charlie, a wai...",0.094105,[],[],2015-07-07,0,90.0,[],Released,"A story about second, second chances",10.0,2,361505,Me You and Five Bucks,[],[]
4662,0,"[{""id"": 35, ""name"": ""Comedy""}]",,"[{""id"": 10183, ""name"": ""independent film""}]",en,Little Big Top,An aging out of work clown returns to his smal...,0.0921,"[{""name"": ""Fly High Films"", ""id"": 24248}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2006-01-01,0,0.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Rumored,,10.0,1,40963,Little Big Top,"[{""cast_id"": 0, ""character"": ""Seymour"", ""credi...",[]


Only one or two people have voted for such movie

In [18]:
# lets check if title and oroginal title is different for any movie or not
movies[movies.original_title!=movies.title]

Unnamed: 0,budget,genres,homepage,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,vote_average,vote_count,movie_id,title,cast,crew
97,15000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",,"[{""id"": 1299, ""name"": ""monster""}, {""id"": 7671,...",ja,シン・ゴジラ,From the mind behind Evangelion comes a hit la...,9.476999,"[{""name"": ""Cine Bazar"", ""id"": 5896}, {""name"": ...","[{""iso_3166_1"": ""JP"", ""name"": ""Japan""}]",2016-07-29,77000000,120.0,"[{""iso_639_1"": ""it"", ""name"": ""Italiano""}, {""is...",Released,A god incarnate. A city doomed.,6.5,143,315011,Shin Godzilla,"[{""cast_id"": 4, ""character"": ""Rando Yaguchi : ...","[{""credit_id"": ""5921d321c3a368799b05933f"", ""de..."
215,130000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",,"[{""id"": 657, ""name"": ""fire""}, {""id"": 720, ""nam...",en,4: Rise of the Silver Surfer,The Fantastic Four return to the big screen as...,60.810723,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""DE"", ""name"": ""Germany""}, {""is...",2007-06-13,289047763,92.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Discover the secret of the Surfer.,5.4,2589,1979,Fantastic 4: Rise of the Silver Surfer,"[{""cast_id"": 7, ""character"": ""Reed Richards / ...","[{""credit_id"": ""52fe4328c3a36847f803eac7"", ""de..."
235,97250400,"[{""id"": 14, ""name"": ""Fantasy""}, {""id"": 12, ""na...",http://www.asterixauxjeuxolympiques.com/index.php,"[{""id"": 271, ""name"": ""competition""}, {""id"": 12...",fr,Astérix aux Jeux Olympiques,Astérix and Obélix have to win the Olympic Gam...,20.344364,"[{""name"": ""Constantin Film"", ""id"": 47}, {""name...","[{""iso_3166_1"": ""BE"", ""name"": ""Belgium""}, {""is...",2008-01-13,132900000,116.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,,5.0,471,2395,Asterix at the Olympic Games,"[{""cast_id"": 15, ""character"": ""Asterix"", ""cred...","[{""credit_id"": ""52fe4354c3a36847f804c0b1"", ""de..."
317,94000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 36, ""name...",http://www.theflowersofwarmovie.com/,"[{""id"": 173251, ""name"": ""forced prostitution""}...",zh,金陵十三釵,A Westerner finds refuge with a group of women...,12.516546,"[{""name"": ""Beijing New Picture Film Co. Ltd."",...","[{""iso_3166_1"": ""CN"", ""name"": ""China""}, {""iso_...",2011-12-15,95311434,145.0,"[{""iso_639_1"": ""zh"", ""name"": ""\u666e\u901a\u8b...",Released,,7.1,187,76758,The Flowers of War,"[{""cast_id"": 2, ""character"": ""John Miller"", ""c...","[{""credit_id"": ""52fe494bc3a368484e1244b5"", ""de..."
474,0,"[{""id"": 9648, ""name"": ""Mystery""}, {""id"": 18, ""...",,"[{""id"": 428, ""name"": ""nurse""}, {""id"": 658, ""na...",fr,Évolution,11-year-old Nicolas lives with his mother in a...,3.300061,"[{""name"": ""Ex Nihilo"", ""id"": 3307}, {""name"": ""...","[{""iso_3166_1"": ""BE"", ""name"": ""Belgium""}, {""is...",2015-09-14,0,81.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""}]",Released,,6.4,47,330770,Evolution,"[{""cast_id"": 4, ""character"": ""Nicolas"", ""credi...","[{""credit_id"": ""550540cd9251412c05002436"", ""de..."
488,86000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",,"[{""id"": 1158, ""name"": ""grandfather grandson re...",en,Arthur et les Minimoys,Arthur is a spirited ten-year old whose parent...,27.097932,"[{""name"": ""Canal Plus"", ""id"": 104}, {""name"": ""...","[{""iso_3166_1"": ""FR"", ""name"": ""France""}]",2006-12-13,107944236,94.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Adventure awaits in your own backyard.,6.0,639,9992,Arthur and the Invisibles,"[{""cast_id"": 20, ""character"": ""Arthur Montgome...","[{""credit_id"": ""52fe4559c3a36847f80c9381"", ""de..."
492,8000000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 16, ""nam...",,"[{""id"": 209714, ""name"": ""3d""}]",es,Don Gato: El inicio de la pandilla,Top Cat has arrived to charm his way into your...,0.719996,"[{""name"": ""Anima Estudios"", ""id"": 9965}, {""nam...","[{""iso_3166_1"": ""IN"", ""name"": ""India""}, {""iso_...",2015-10-30,0,89.0,[],Released,,5.3,9,293644,Top Cat Begins,"[{""cast_id"": 4, ""character"": ""Top Cat / Choo C...","[{""credit_id"": ""592741ec9251413b54029cf9"", ""de..."
561,74500000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 18, ""...",,"[{""id"": 380, ""name"": ""brother brother relation...",en,Deux frères,Two tigers are separated as cubs and taken int...,8.884318,"[{""name"": ""Path\u00e9 Renn Productions"", ""id"":...","[{""iso_3166_1"": ""FR"", ""name"": ""France""}, {""iso...",2004-04-07,62172050,109.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,"Two infant tiger cubs, separated from their pa...",6.9,180,1997,Two Brothers,"[{""cast_id"": 11, ""character"": ""Na\u00ef-Rea"", ...","[{""credit_id"": ""52fe432ac3a36847f803f4c5"", ""de..."
678,65000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 18, ""nam...",,[],zh,天將雄師,"Huo An, the commander of the Protection Squad ...",9.568884,"[{""name"": ""Shanghai Film Group"", ""id"": 3407}, ...","[{""iso_3166_1"": ""HK"", ""name"": ""Hong Kong""}, {""...",2015-02-19,121545703,127.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,When the Eagle meets the Dragon,5.9,145,300168,Dragon Blade,"[{""cast_id"": 2, ""character"": ""Huo Han"", ""credi...","[{""credit_id"": ""54536744c3a368148d00103d"", ""de..."
719,60000000,"[{""id"": 10402, ""name"": ""Music""}, {""id"": 99, ""n...",http://www.thisisit-movie.com,"[{""id"": 3490, ""name"": ""pop star""}, {""id"": 6027...",en,Michael Jackson's This Is It,"A compilation of interviews, rehearsals and ba...",15.798622,"[{""name"": ""Columbia Pictures"", ""id"": 5}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-10-28,0,111.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Like You've Never Seen Him Before,6.7,247,13576,This Is It,"[{""cast_id"": 2, ""character"": ""Himself"", ""credi...","[{""credit_id"": ""52fe457d9251416c750583b7"", ""de..."


As we can see it is so because we have movies in different languages

In [19]:
# lets see the status of movies
movies.status.unique()

array(['Released', 'Post Production', 'Rumored'], dtype=object)

In [20]:
# lets see rumored movies
movies[movies.status=='Rumored']

Unnamed: 0,budget,genres,homepage,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,vote_average,vote_count,movie_id,title,cast,crew
4401,0,"[{""id"": 28, ""name"": ""Action""}, {""id"": 35, ""nam...",,[],en,The Helix... Loaded,,0.0206,[],"[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2005-01-01,0,97.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Rumored,,4.8,2,43630,The Helix... Loaded,[],[]
4453,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",,"[{""id"": 1930, ""name"": ""kidnapping""}, {""id"": 97...",en,Crying with Laughter,Powerfully redemptive and darkly comedic reven...,0.108135,"[{""name"": ""Scottish Screen"", ""id"": 698}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""}]",2009-06-01,0,93.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Rumored,A Bad Trip Down Memory Lane,7.0,1,57294,Crying with Laughter,"[{""cast_id"": 3, ""character"": ""Joey Frisk"", ""cr...","[{""credit_id"": ""52fe492dc3a36847f818d031"", ""de..."
4508,56000,"[{""id"": 99, ""name"": ""Documentary""}]",http://www.facebook.com/theharvestfilm,"[{""id"": 1729, ""name"": ""migration""}, {""id"": 190...",en,The Harvest (La Cosecha),The story of the children who work 12-14 hour ...,0.010909,[],[],2011-07-29,0,80.0,[],Rumored,,0.0,0,70875,The Harvest (La Cosecha),[],"[{""credit_id"": ""52fe4816c3a368484e0e8bbd"", ""de..."
4662,0,"[{""id"": 35, ""name"": ""Comedy""}]",,"[{""id"": 10183, ""name"": ""independent film""}]",en,Little Big Top,An aging out of work clown returns to his smal...,0.0921,"[{""name"": ""Fly High Films"", ""id"": 24248}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2006-01-01,0,0.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Rumored,,10.0,1,40963,Little Big Top,"[{""cast_id"": 0, ""character"": ""Seymour"", ""credi...",[]
4754,0,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 35, ""name...",,[],en,The Naked Ape,The Naked Ape is a coming-of-age film followin...,0.077577,[],[],2006-09-16,0,110.0,[],Rumored,,5.0,1,84659,The Naked Ape,"[{""cast_id"": 1, ""character"": ""Alex"", ""credit_i...","[{""credit_id"": ""52fe49049251416c910a00b3"", ""de..."


## Feature Extarction

In [21]:
movies.columns

Index(['budget', 'genres', 'homepage', 'keywords', 'original_language',
       'original_title', 'overview', 'popularity', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'vote_average', 'vote_count',
       'movie_id', 'title', 'cast', 'crew'],
      dtype='object')

As we can see there are many columns that won't be necessary for our task. so we can drop those columns

In [22]:
# list of required features for our task 
col=['movie_id','title','genres','keywords','overview','production_companies','cast','crew']
movies=movies[col]

In [23]:
movies.head()

Unnamed: 0,movie_id,title,genres,keywords,overview,production_companies,cast,crew
0,19995,Avatar,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di...","[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [24]:
movies['production_companies'][0]

'[{"name": "Ingenious Film Partners", "id": 289}, {"name": "Twentieth Century Fox Film Corporation", "id": 306}, {"name": "Dune Entertainment", "id": 444}, {"name": "Lightstorm Entertainment", "id": 574}]'

 In columns like genres,keywords,cast & crew, there are many informations but we don't need all but names only

the entries are in the form dictionary so we can define a function and  extract the required infotmation

In [25]:
import ast

# this function will convert the text into dictionary and then will extract the required key value 

def convert(text):
    L = []
    for i in ast.literal_eval(text):
        L.append(i['name']) 
    return L 

In [26]:
movies.dropna(inplace=True)
movies.shape

(4800, 8)

In [27]:
movies['genres'] = movies['genres'].apply(convert)
movies.head()

Unnamed: 0,movie_id,title,genres,keywords,overview,production_companies,cast,crew
0,19995,Avatar,"[Action, Adventure, Fantasy, Science Fiction]","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di...","[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[Adventure, Fantasy, Action]","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[Action, Adventure, Crime]","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[Action, Crime, Drama, Thriller]","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[Action, Adventure, Science Fiction]","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [28]:
movies['keywords'] = movies['keywords'].apply(convert)
movies.head()

Unnamed: 0,movie_id,title,genres,keywords,overview,production_companies,cast,crew
0,19995,Avatar,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","In the 22nd century, a paraplegic Marine is di...","[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","Captain Barbossa, long believed to be dead, ha...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...",A cryptic message from Bond’s past sends him o...,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...",Following the death of District Attorney Harve...,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","John Carter is a war-weary, former military ca...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [29]:
movies['production_companies'] = movies['production_companies'].apply(convert)
movies.head()

Unnamed: 0,movie_id,title,genres,keywords,overview,production_companies,cast,crew
0,19995,Avatar,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","In the 22nd century, a paraplegic Marine is di...","[Ingenious Film Partners, Twentieth Century Fo...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","Captain Barbossa, long believed to be dead, ha...","[Walt Disney Pictures, Jerry Bruckheimer Films...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...",A cryptic message from Bond’s past sends him o...,"[Columbia Pictures, Danjaq, B24]","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...",Following the death of District Attorney Harve...,"[Legendary Pictures, Warner Bros., DC Entertai...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","John Carter is a war-weary, former military ca...",[Walt Disney Pictures],"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [30]:
movies.cast[1]

'[{"cast_id": 4, "character": "Captain Jack Sparrow", "credit_id": "52fe4232c3a36847f800b50d", "gender": 2, "id": 85, "name": "Johnny Depp", "order": 0}, {"cast_id": 5, "character": "Will Turner", "credit_id": "52fe4232c3a36847f800b511", "gender": 2, "id": 114, "name": "Orlando Bloom", "order": 1}, {"cast_id": 6, "character": "Elizabeth Swann", "credit_id": "52fe4232c3a36847f800b515", "gender": 1, "id": 116, "name": "Keira Knightley", "order": 2}, {"cast_id": 12, "character": "William \\"Bootstrap Bill\\" Turner", "credit_id": "52fe4232c3a36847f800b52d", "gender": 2, "id": 1640, "name": "Stellan Skarsg\\u00e5rd", "order": 3}, {"cast_id": 10, "character": "Captain Sao Feng", "credit_id": "52fe4232c3a36847f800b525", "gender": 2, "id": 1619, "name": "Chow Yun-fat", "order": 4}, {"cast_id": 9, "character": "Captain Davy Jones", "credit_id": "52fe4232c3a36847f800b521", "gender": 2, "id": 2440, "name": "Bill Nighy", "order": 5}, {"cast_id": 7, "character": "Captain Hector Barbossa", "credit_

Now in the cast we can have many casts for the movies but we will need only prominent ones so we shall restrict the number to 4

In [31]:
movies['cast'] = movies['cast'].apply(convert)
movies['cast'] = movies['cast'].apply(lambda x:x[0:4])
movies.head()

Unnamed: 0,movie_id,title,genres,keywords,overview,production_companies,cast,crew
0,19995,Avatar,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","In the 22nd century, a paraplegic Marine is di...","[Ingenious Film Partners, Twentieth Century Fo...","[Sam Worthington, Zoe Saldana, Sigourney Weave...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","Captain Barbossa, long believed to be dead, ha...","[Walt Disney Pictures, Jerry Bruckheimer Films...","[Johnny Depp, Orlando Bloom, Keira Knightley, ...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...",A cryptic message from Bond’s past sends him o...,"[Columbia Pictures, Danjaq, B24]","[Daniel Craig, Christoph Waltz, Léa Seydoux, R...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...",Following the death of District Attorney Harve...,"[Legendary Pictures, Warner Bros., DC Entertai...","[Christian Bale, Michael Caine, Gary Oldman, A...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","John Carter is a war-weary, former military ca...",[Walt Disney Pictures],"[Taylor Kitsch, Lynn Collins, Samantha Morton,...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [32]:
movies.crew[0]

'[{"credit_id": "52fe48009251416c750aca23", "department": "Editing", "gender": 0, "id": 1721, "job": "Editor", "name": "Stephen E. Rivkin"}, {"credit_id": "539c47ecc3a36810e3001f87", "department": "Art", "gender": 2, "id": 496, "job": "Production Design", "name": "Rick Carter"}, {"credit_id": "54491c89c3a3680fb4001cf7", "department": "Sound", "gender": 0, "id": 900, "job": "Sound Designer", "name": "Christopher Boyes"}, {"credit_id": "54491cb70e0a267480001bd0", "department": "Sound", "gender": 0, "id": 900, "job": "Supervising Sound Editor", "name": "Christopher Boyes"}, {"credit_id": "539c4a4cc3a36810c9002101", "department": "Production", "gender": 1, "id": 1262, "job": "Casting", "name": "Mali Finn"}, {"credit_id": "5544ee3b925141499f0008fc", "department": "Sound", "gender": 2, "id": 1729, "job": "Original Music Composer", "name": "James Horner"}, {"credit_id": "52fe48009251416c750ac9c3", "department": "Directing", "gender": 2, "id": 2710, "job": "Director", "name": "James Cameron"},

from the crew section we only need director's name and it can be extracted by using the job name as director

In [33]:
def fetch_director(text):
    L = []
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            L.append(i['name'])
    return L 

In [34]:
movies['crew'] = movies['crew'].apply(fetch_director)
movies.head(5)

Unnamed: 0,movie_id,title,genres,keywords,overview,production_companies,cast,crew
0,19995,Avatar,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","In the 22nd century, a paraplegic Marine is di...","[Ingenious Film Partners, Twentieth Century Fo...","[Sam Worthington, Zoe Saldana, Sigourney Weave...",[James Cameron]
1,285,Pirates of the Caribbean: At World's End,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","Captain Barbossa, long believed to be dead, ha...","[Walt Disney Pictures, Jerry Bruckheimer Films...","[Johnny Depp, Orlando Bloom, Keira Knightley, ...",[Gore Verbinski]
2,206647,Spectre,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...",A cryptic message from Bond’s past sends him o...,"[Columbia Pictures, Danjaq, B24]","[Daniel Craig, Christoph Waltz, Léa Seydoux, R...",[Sam Mendes]
3,49026,The Dark Knight Rises,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...",Following the death of District Attorney Harve...,"[Legendary Pictures, Warner Bros., DC Entertai...","[Christian Bale, Michael Caine, Gary Oldman, A...",[Christopher Nolan]
4,49529,John Carter,"[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","John Carter is a war-weary, former military ca...",[Walt Disney Pictures],"[Taylor Kitsch, Lynn Collins, Samantha Morton,...",[Andrew Stanton]


Now In the cast and crew as we can see the names have spaces and it can be treated as two different entities so we need to merge it into one word

In [35]:
def collapse(L):
    L1 = []
    for i in L:
        L1.append(i.replace(" ",""))
    return L1

In [36]:
movies['cast'] = movies['cast'].apply(collapse)
movies['crew'] = movies['crew'].apply(collapse)
movies['genres'] = movies['genres'].apply(collapse)
movies['keywords'] = movies['keywords'].apply(collapse)
movies['production_companies'] = movies['production_companies'].apply(collapse)
movies.head()

Unnamed: 0,movie_id,title,genres,keywords,overview,production_companies,cast,crew
0,19995,Avatar,"[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","In the 22nd century, a paraplegic Marine is di...","[IngeniousFilmPartners, TwentiethCenturyFoxFil...","[SamWorthington, ZoeSaldana, SigourneyWeaver, ...",[JamesCameron]
1,285,Pirates of the Caribbean: At World's End,"[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","Captain Barbossa, long believed to be dead, ha...","[WaltDisneyPictures, JerryBruckheimerFilms, Se...","[JohnnyDepp, OrlandoBloom, KeiraKnightley, Ste...",[GoreVerbinski]
2,206647,Spectre,"[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...",A cryptic message from Bond’s past sends him o...,"[ColumbiaPictures, Danjaq, B24]","[DanielCraig, ChristophWaltz, LéaSeydoux, Ralp...",[SamMendes]
3,49026,The Dark Knight Rises,"[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...",Following the death of District Attorney Harve...,"[LegendaryPictures, WarnerBros., DCEntertainme...","[ChristianBale, MichaelCaine, GaryOldman, Anne...",[ChristopherNolan]
4,49529,John Carter,"[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","John Carter is a war-weary, former military ca...",[WaltDisneyPictures],"[TaylorKitsch, LynnCollins, SamanthaMorton, Wi...",[AndrewStanton]


Now lets split the overview column into words and combine genres,keywords,overview,production_companies,cast & crew into one corpus

In [37]:
# splitting overview section into words
movies['overview'] = movies['overview'].apply(lambda x:x.split())

In [38]:
# combining all into one corpus
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['production_companies'] + movies['cast'] + movies['crew']
movies['tags'] = movies['tags'].apply(lambda x: " ".join(x))
data = movies.drop(columns=['overview','genres','keywords','production_companies','cast','crew'])
data.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."


## Model Building

In [48]:
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(max_features=10000,stop_words='english')

In [49]:
vector = tf.fit_transform(data['tags']).toarray()
vector.shape

(4800, 10000)

In [50]:
# model training
from sklearn.neighbors import NearestNeighbors

model_knn=NearestNeighbors(metric='cosine',algorithm='brute')
model_knn.fit(vector)

NearestNeighbors(algorithm='brute', metric='cosine')

In [51]:
# function to return recommended list of movies 
def recommend(movie):
    index = data[data['title'] == movie].index[0]
    distances,indices = model_knn.kneighbors(vector[index,:].reshape(1,-1),n_neighbors=11)
    for i in range(len(indices[0])):
        if(i==0):
            print("Movie Recommendations for {0} is :\n".format(movie))
        else:
            print(data.iloc[indices[0][i]].title)          

In [54]:
recommend('The Dark Knight Rises')

Movie Recommendations for The Dark Knight Rises is :

The Dark Knight
Batman Begins
Batman Returns
Batman Forever
Batman
Batman: The Dark Knight Returns, Part 2
Batman v Superman: Dawn of Justice
Batman & Robin
Slow Burn
Defendor


In [44]:
import pickle

In [46]:
pickle.dump(data,open('movie_list.pkl','wb'))
pickle.dump(similarity,open('similarity.pkl','wb'))