# **Movie Recommender system end to end deployment**

##  Project Summary 

## In this project, we leveraged the TMDB 5000 Movie Dataset to develop a movie recommender system. We performed data loading, preprocessing, including lower casing and text normalization, and text vectorization to convert textual features into numerical representations. Finally, we built the recommender system using collaborative filtering, content-based filtering, or hybrid methods to provide personalized movie recommendations to users based on their preferences and movie attributes.






## GitHub Link - 

https://github.com/nitish6121999/Movie-Recommender-system-end-to-end-deployment-project

# Import libraries

In [156]:
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

## Dataset loading

In [2]:
movies=pd.read_csv("tmdb_5000_movies.csv")
credits=pd.read_csv('tmdb_5000_credits.csv')

In [3]:
movies.head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800


In [4]:
credits.head(1)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


# Merging the datasets

In [5]:
movies=movies.merge(credits,on='title')

## Know your data

In [10]:
movies.shape

(4809, 23)

In [11]:
movies.describe()

Unnamed: 0,budget,id,popularity,revenue,runtime,vote_average,vote_count,movie_id
count,4809.0,4809.0,4809.0,4809.0,4807.0,4809.0,4809.0,4809.0
mean,29027800.0,57120.571429,21.491664,82275110.0,106.882255,6.092514,690.33167,57120.571429
std,40704730.0,88653.369849,31.803366,162837900.0,22.602535,1.193989,1234.187111,88653.369849
min,0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0
25%,780000.0,9012.0,4.66723,0.0,94.0,5.6,54.0,9012.0
50%,15000000.0,14624.0,12.921594,19170000.0,103.0,6.2,235.0,14624.0
75%,40000000.0,58595.0,28.350529,92913170.0,118.0,6.8,737.0,58595.0
max,380000000.0,459488.0,875.581305,2787965000.0,338.0,10.0,13752.0,459488.0


In [12]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4809 entries, 0 to 4808
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4809 non-null   int64  
 1   genres                4809 non-null   object 
 2   homepage              1713 non-null   object 
 3   id                    4809 non-null   int64  
 4   keywords              4809 non-null   object 
 5   original_language     4809 non-null   object 
 6   original_title        4809 non-null   object 
 7   overview              4806 non-null   object 
 8   popularity            4809 non-null   float64
 9   production_companies  4809 non-null   object 
 10  production_countries  4809 non-null   object 
 11  release_date          4808 non-null   object 
 12  revenue               4809 non-null   int64  
 13  runtime               4807 non-null   float64
 14  spoken_languages      4809 non-null   object 
 15  status               

In [14]:
movies.isnull().sum()

budget                     0
genres                     0
homepage                3096
id                         0
keywords                   0
original_language          0
original_title             0
overview                   3
popularity                 0
production_companies       0
production_countries       0
release_date               1
revenue                    0
runtime                    2
spoken_languages           0
status                     0
tagline                  844
title                      0
vote_average               0
vote_count                 0
movie_id                   0
cast                       0
crew                       0
dtype: int64

## Feature Engineering & Data Pre-processing

### We will consider only those columns which will be usefull for us.

In [18]:
movies=movies[['movie_id','title','overview','genres','keywords','cast','crew']]

In [19]:
movies.head(2)
           

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [20]:
movies.isnull().sum()

movie_id    0
title       0
overview    3
genres      0
keywords    0
cast        0
crew        0
dtype: int64

In [21]:
movies.dropna(inplace=True)

In [22]:
movies.duplicated().sum()

0

In [66]:
# ast.literal_eval converts strings of list to list
import ast 

In [24]:
movies.iloc[0].genres


'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

In [39]:
def convert(object):
    L= []
    for i in ast.literal_eval(object):
        L.append(i['name'])
    return L

In [40]:
movies['genres']=movies['genres'].apply(convert)

In [41]:
movies.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [42]:
movies['keywords']=movies['keywords'].apply(convert)

In [44]:
def count3 (object):
    L= []
    counter=0
    for i in ast.literal_eval(object):
        if counter!=3:
            L.append(i['name'])
            counter+=1
        else:
            break
    return L

In [47]:
movies['cast']=movies['cast'].apply(count3)

In [51]:
# now for the crew we must fetch the directors

In [52]:
movies['crew'][0]

'[{"credit_id": "52fe48009251416c750aca23", "department": "Editing", "gender": 0, "id": 1721, "job": "Editor", "name": "Stephen E. Rivkin"}, {"credit_id": "539c47ecc3a36810e3001f87", "department": "Art", "gender": 2, "id": 496, "job": "Production Design", "name": "Rick Carter"}, {"credit_id": "54491c89c3a3680fb4001cf7", "department": "Sound", "gender": 0, "id": 900, "job": "Sound Designer", "name": "Christopher Boyes"}, {"credit_id": "54491cb70e0a267480001bd0", "department": "Sound", "gender": 0, "id": 900, "job": "Supervising Sound Editor", "name": "Christopher Boyes"}, {"credit_id": "539c4a4cc3a36810c9002101", "department": "Production", "gender": 1, "id": 1262, "job": "Casting", "name": "Mali Finn"}, {"credit_id": "5544ee3b925141499f0008fc", "department": "Sound", "gender": 2, "id": 1729, "job": "Original Music Composer", "name": "James Horner"}, {"credit_id": "52fe48009251416c750ac9c3", "department": "Directing", "gender": 2, "id": 2710, "job": "Director", "name": "James Cameron"},

In [54]:
def fetch_dire(obj):
    l=[]
    for i in ast.literal_eval(obj):
        if i['job']=='Director':
            l.append(i["name"])
            break
    return l
            

In [56]:
movies['crew']=movies['crew'].apply(fetch_dire)

In [60]:
movies.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]


In [62]:
movies['overview']=movies['overview'].apply(lambda x:x.split())

In [63]:
movies.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]


In [69]:
movies['genres']=movies['genres'].apply(lambda x:[i.replace(' ','')for i in x])
movies['keywords']=movies['keywords'].apply(lambda x:[i.replace(' ','')for i in x])
movies['cast']=movies['cast'].apply(lambda x:[i.replace(' ','')for i in x])
movies['crew']=movies['crew'].apply(lambda x:[i.replace(' ','')for i in x])

In [70]:
movies.tail(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
4808,25975,My Date with Drew,"[Ever, since, the, second, grade, when, he, fi...",[Documentary],"[obsession, camcorder, crush, dreamgirl]","[DrewBarrymore, BrianHerzlinger, CoreyFeldman]",[BrianHerzlinger]


In [71]:
movies['tags']=movies['overview']+movies['genres']+movies['keywords']+movies['cast']+movies['crew']

In [72]:
df=movies[['movie_id','title','tags']]

In [73]:
df.head(1)

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin..."


In [75]:
df['tags']=df['tags'].apply(lambda x:' '.join(x))

In [76]:
df['tags'][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver JamesCameron'

In [77]:
df.head(2)

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."


## Textual preprocessing steps

### 1. lowercasing
### 2. Remove punctuations
### 3. Remove urls ,whitespaces
### 4. Remove Stopwords
### 5. Tokenization 
### 6. Text Normalization(Stemming ,Lemmatization)
### 7. Text vectorization(tf-idf, bag-of-words, word2vec)
### 8. Dimensionality Reduction



In [78]:
#lower casing the tags columns 

df['tags']=df['tags'].apply(lambda x:x.lower())

In [104]:
import nltk

In [105]:
from nltk.stem.porter import PorterStemmer
ps=PorterStemmer()

In [106]:
def stem(text):
    y=[]
    
    for i in text.split():
        y.append(ps.stem(i))
        
    return ' '.join(y)

In [107]:
df['tags']=df['tags'].apply(stem)

In [108]:
df['tags'][0]

'in the 22nd century, a parapleg marin is dispatch to the moon pandora on a uniqu mission, but becom torn between follow order and protect an alien civilization. action adventur fantasi sciencefict cultureclash futur spacewar spacecoloni societi spacetravel futurist romanc space alien tribe alienplanet cgi marin soldier battl loveaffair antiwar powerrel mindandsoul 3d samworthington zoesaldana sigourneyweav jamescameron'

In [109]:
# Remove Stopwords with Vectorizer(countvectorizer )

from sklearn.feature_extraction.text import CountVectorizer
cv= CountVectorizer(max_features=5000,stop_words='english')



In [110]:
vectors=cv.fit_transform(df['tags']).toarray()

In [111]:
# we can see the 5000 words 
cv.get_feature_names_out

<bound method CountVectorizer.get_feature_names_out of CountVectorizer(max_features=5000, stop_words='english')>

In [113]:
from sklearn.metrics.pairwise import cosine_similarity

In [121]:
similarity=cosine_similarity(vectors)
similarity

array([[1.        , 0.08458258, 0.08718573, ..., 0.04559608, 0.        ,
        0.        ],
       [0.08458258, 1.        , 0.06063391, ..., 0.02378257, 0.        ,
        0.02615329],
       [0.08718573, 0.06063391, 1.        , ..., 0.02451452, 0.        ,
        0.        ],
       ...,
       [0.04559608, 0.02378257, 0.02451452, ..., 1.        , 0.03962144,
        0.04229549],
       [0.        , 0.        , 0.        , ..., 0.03962144, 1.        ,
        0.08714204],
       [0.        , 0.02615329, 0.        , ..., 0.04229549, 0.08714204,
        1.        ]])

In [132]:
similarity[1]

array([0.08458258, 1.        , 0.06063391, ..., 0.02378257, 0.        ,
       0.02615329])

In [133]:
similarity[2]

array([0.08718573, 0.06063391, 1.        , ..., 0.02451452, 0.        ,
       0.        ])

In [134]:
cosine_similarity(vectors).shape

(4806, 4806)

In [135]:
vectors.shape

(4806, 5000)

In [138]:
df.head(2)

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a parapleg marin is dispa..."
1,285,Pirates of the Caribbean: At World's End,"captain barbossa, long believ to be dead, ha c..."


In [142]:
df[df['title']== 'Avatar'].index[0]

0

## Recommender System

In [152]:
def recommend(movie):
    movie_index=df[df['title']==movie].index[0]   # fetch the index of the movie title
    distances=similarity[movie_index]             # fetch the distances between the movie title 
    movies_list=sorted(list(enumerate(distances)),reverse=True,key=lambda x:x[1])[1:6]    #sort the values in descending order using enumerate (otherwise the indexing will be lost) for top 5
    
    for i in movies_list:
        print(i[0])    #this fetch the top 5 similar kind of movie index numbers 
        print(df.iloc[i[0]].title)

In [154]:
recommend('Batman Begins')

65
The Dark Knight
1363
Batman
1362
Batman
3
The Dark Knight Rises
3297
10th & Wolf


In [155]:
df.iloc[1216].title

'Aliens vs Predator: Requiem'

## Thankyou