## Movie Recommendation Model -

We are going to build a movie recommendation model which predicts the movie names based on our search history.

we devide the project in multiple parts as - (Project Flow)

1. Data
2. Data Preprosessing
3. Model Building
4. Convert the model in website/product
5. Deploy the product

In [1]:
import pandas as pd
import numpy as np

We have selected a data from kaggle named - **'TMDB 5000 movie dataset'**

It has two datasets-
1. tmdb_5000_movies
2. tmdb_5000_credits

* Loaded the data into dataframe

In [2]:
movies=pd.read_csv("tmdb_5000_movies.csv")
movies.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500


In [3]:
credits=pd.read_csv("tmdb_5000_credits.csv")
credits.head(2)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


* Merge both the dataset in single dataframe for the preprocessing on common variable named **'title'**

In [4]:
movies=movies.merge(credits,on='title')
movies.info() #check the info of the datarame to know the no. of rows & columns.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4809 entries, 0 to 4808
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4809 non-null   int64  
 1   genres                4809 non-null   object 
 2   homepage              1713 non-null   object 
 3   id                    4809 non-null   int64  
 4   keywords              4809 non-null   object 
 5   original_language     4809 non-null   object 
 6   original_title        4809 non-null   object 
 7   overview              4806 non-null   object 
 8   popularity            4809 non-null   float64
 9   production_companies  4809 non-null   object 
 10  production_countries  4809 non-null   object 
 11  release_date          4808 non-null   object 
 12  revenue               4809 non-null   int64  
 13  runtime               4807 non-null   float64
 14  spoken_languages      4809 non-null   object 
 15  status               

* Remove unwanted columns -

we are going to remove the columns which will not help to recommend the movie name. The following variables we sorted and kept.
1. genres
2. keywords
3. overview
4. title
5. movie_id
6. cast
7. crew

In [5]:
movies=movies.drop(['budget','homepage','id','original_language','original_title','popularity','production_companies','production_countries','release_date','revenue','runtime','spoken_languages','status','tagline','vote_average','vote_count'],axis=1)

In [6]:
movies.head(2)

Unnamed: 0,genres,keywords,overview,title,movie_id,cast,crew
0,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di...",Avatar,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha...",Pirates of the Caribbean: At World's End,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [7]:
movies.isnull().sum() #to check whether out data contains any Nan values

genres      0
keywords    0
overview    3
title       0
movie_id    0
cast        0
crew        0
dtype: int64

In [8]:
movies.dropna(inplace=True) #to drop the Nan values if the no. of Nan values as compare to the data are minor

In [9]:
movies.isnull().sum()

genres      0
keywords    0
overview    0
title       0
movie_id    0
cast        0
crew        0
dtype: int64

In [10]:
movies.duplicated().sum() #to check whether data has any duplicate values

0

* First considering the variable 'genres'. we have to extract the genres name from the given list of dictonaries. (i.e. [Action, Adventure, Fantasy, Science Fiction])

In [11]:
movies['genres'][0]

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

* Used helper function to get the list of genres names <br>
"import ast <br>
 ast.literal_eval(obj)" <br>
this function is used to convert the string object into non-string. <br>
we can observe in our dataset the list is given as a string. we have to convert it into non-string object.

In [12]:
import ast
def convert(obj):
    l=[]
    for i in ast.literal_eval(obj):
        l.append(i['name'])
    return l

In [13]:
movies['genres']=movies['genres'].apply(convert) #applied the helper function on genres to get the list of whole variable
movies['genres'][0]

['Action', 'Adventure', 'Fantasy', 'Science Fiction']

In [14]:
movies.head(2)

Unnamed: 0,genres,keywords,overview,title,movie_id,cast,crew
0,"[Action, Adventure, Fantasy, Science Fiction]","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di...",Avatar,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,"[Adventure, Fantasy, Action]","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha...",Pirates of the Caribbean: At World's End,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [15]:
movies['keywords'][0]

'[{"id": 1463, "name": "culture clash"}, {"id": 2964, "name": "future"}, {"id": 3386, "name": "space war"}, {"id": 3388, "name": "space colony"}, {"id": 3679, "name": "society"}, {"id": 3801, "name": "space travel"}, {"id": 9685, "name": "futuristic"}, {"id": 9840, "name": "romance"}, {"id": 9882, "name": "space"}, {"id": 9951, "name": "alien"}, {"id": 10148, "name": "tribe"}, {"id": 10158, "name": "alien planet"}, {"id": 10987, "name": "cgi"}, {"id": 11399, "name": "marine"}, {"id": 13065, "name": "soldier"}, {"id": 14643, "name": "battle"}, {"id": 14720, "name": "love affair"}, {"id": 165431, "name": "anti war"}, {"id": 193554, "name": "power relations"}, {"id": 206690, "name": "mind and soul"}, {"id": 209714, "name": "3d"}]'

In [16]:
movies['keywords']=movies['keywords'].apply(convert) #the same helper function is applied on keyword variable to get the list of keyword names.

In [17]:
movies.head(2)

Unnamed: 0,genres,keywords,overview,title,movie_id,cast,crew
0,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","In the 22nd century, a paraplegic Marine is di...",Avatar,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","Captain Barbossa, long believed to be dead, ha...",Pirates of the Caribbean: At World's End,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [18]:
movies['cast'][0]

'[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_id": "52fe48009251416c750ac9cb", "gender": 1, "id": 8691, "name": "Zoe Saldana", "order": 1}, {"cast_id": 25, "character": "Dr. Grace Augustine", "credit_id": "52fe48009251416c750aca39", "gender": 1, "id": 10205, "name": "Sigourney Weaver", "order": 2}, {"cast_id": 4, "character": "Col. Quaritch", "credit_id": "52fe48009251416c750ac9cf", "gender": 2, "id": 32747, "name": "Stephen Lang", "order": 3}, {"cast_id": 5, "character": "Trudy Chacon", "credit_id": "52fe48009251416c750ac9d3", "gender": 1, "id": 17647, "name": "Michelle Rodriguez", "order": 4}, {"cast_id": 8, "character": "Selfridge", "credit_id": "52fe48009251416c750ac9e1", "gender": 2, "id": 1771, "name": "Giovanni Ribisi", "order": 5}, {"cast_id": 7, "character": "Norm Spellman", "credit_id": "52fe48009251416c750ac9dd", "gender": 

* In variable 'cast' the whole cast is given. <br>
we only need the first 3 actors for our dataset because we assume first 3 Actors are the main actors and will be helpful to recommend the movie name.<br>
we used another helper function to get the list of first three names.

In [19]:
def convert2(obj):
    l=[]
    counter=0
    for i in ast.literal_eval(obj):
        if counter != 3:
            l.append(i['name'])
            counter+=1
    return l

In [20]:
movies['cast']=movies['cast'].apply(convert2) #applied the function on whole variable.

In [21]:
movies.head(2) #we can observe here the variable cast which have a list of first three Actor names.

Unnamed: 0,genres,keywords,overview,title,movie_id,cast,crew
0,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","In the 22nd century, a paraplegic Marine is di...",Avatar,19995,"[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","Captain Barbossa, long believed to be dead, ha...",Pirates of the Caribbean: At World's End,285,"[Johnny Depp, Orlando Bloom, Keira Knightley]","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


* In the variable 'crew', the data is given about all the crew members worked in the project which is not useful for us.<br>
The only useful data in crew is the name of director.<br>
We used another helper function to fetch the director name from the list of dictionaries.

In [22]:
def convert3(obj):
    l=[]
    for i in ast.literal_eval(obj):
        if i['job']=='Director':
            l.append(i['name'])
    return l
    

In [23]:
movies['crew']=movies['crew'].apply(convert3) #applied the function on whole crew variable.

In [24]:
movies.head(2) #we extracted the data from the raw data as we wanted.

Unnamed: 0,genres,keywords,overview,title,movie_id,cast,crew
0,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","In the 22nd century, a paraplegic Marine is di...",Avatar,19995,"[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]
1,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","Captain Barbossa, long believed to be dead, ha...",Pirates of the Caribbean: At World's End,285,"[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski]


We are going to creat only three main variables for our data.<br>
1. title
2. movie_id
3. tags - concatinating the 'genres,keywords,overview,cast & crew' columns

* The variable overview is not in list format. to concatnate the columns, we need all the columns in a same format. <br>
Converted the overview column into list by using split method.

In [25]:
movies['overview']=movies['overview'].apply(lambda x:x.split())

In [26]:
#created a new column named tags and combined the 5 columns as below -
movies['tags']=movies['genres']+movies['keywords']+movies['overview']+movies['cast']+movies['crew'] 

In [27]:
movies.head(2)

Unnamed: 0,genres,keywords,overview,title,movie_id,cast,crew,tags
0,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[In, the, 22nd, century,, a, paraplegic, Marin...",Avatar,19995,"[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron],"[Action, Adventure, Fantasy, Science Fiction, ..."
1,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Captain, Barbossa,, long, believed, to, be, d...",Pirates of the Caribbean: At World's End,285,"[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski],"[Adventure, Fantasy, Action, ocean, drug abuse..."


* created a new dataframe with the required columns only. 

In [28]:
new_df=movies[['title','movie_id','tags']]

In [29]:
new_df.head()

Unnamed: 0,title,movie_id,tags
0,Avatar,19995,"[Action, Adventure, Fantasy, Science Fiction, ..."
1,Pirates of the Caribbean: At World's End,285,"[Adventure, Fantasy, Action, ocean, drug abuse..."
2,Spectre,206647,"[Action, Adventure, Crime, spy, based on novel..."
3,The Dark Knight Rises,49026,"[Action, Crime, Drama, Thriller, dc comics, cr..."
4,John Carter,49529,"[Action, Adventure, Science Fiction, based on ..."


We have to remove the space between two words.<br>
for example - we have a actor name -Sam Worthington. It consider two sperate words first Same & second Worthington.<br>
if we have another actor called Sam XYZ then the Sam name will become common & model will recommend the movie for both the actors.<br>
So that we need to remove the space hence it consider as a single word.

In [30]:
new_df['tags']=new_df['tags'].apply(lambda x:[i.replace(" ","") for i in x]) #applied lambda function on tags column to remove the space.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(lambda x:[i.replace(" ","") for i in x]) #applied lambda function on tags column to remove the space.


In [31]:
new_df.head()

Unnamed: 0,title,movie_id,tags
0,Avatar,19995,"[Action, Adventure, Fantasy, ScienceFiction, c..."
1,Pirates of the Caribbean: At World's End,285,"[Adventure, Fantasy, Action, ocean, drugabuse,..."
2,Spectre,206647,"[Action, Adventure, Crime, spy, basedonnovel, ..."
3,The Dark Knight Rises,49026,"[Action, Crime, Drama, Thriller, dccomics, cri..."
4,John Carter,49529,"[Action, Adventure, ScienceFiction, basedonnov..."


In [32]:
new_df['tags']=new_df['tags'].apply(lambda x:" ".join(x)) #tags column is in list format. converted it into non-list/string format.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(lambda x:" ".join(x)) #tags column is in list format. converted it into non-list/string format.


In [33]:
new_df['tags'][0]

'Action Adventure Fantasy ScienceFiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. SamWorthington ZoeSaldana SigourneyWeaver JamesCameron'

In [34]:
new_df.head()

Unnamed: 0,title,movie_id,tags
0,Avatar,19995,Action Adventure Fantasy ScienceFiction cultur...
1,Pirates of the Caribbean: At World's End,285,Adventure Fantasy Action ocean drugabuse exoti...
2,Spectre,206647,Action Adventure Crime spy basedonnovel secret...
3,The Dark Knight Rises,49026,Action Crime Drama Thriller dccomics crimefigh...
4,John Carter,49529,Action Adventure ScienceFiction basedonnovel m...


In [35]:
new_df['tags']=new_df['tags'].apply(lambda x:x.lower()) #converted all the columns in lower format.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(lambda x:x.lower()) #converted all the columns in lower format.


In [36]:
new_df.head(10)

Unnamed: 0,title,movie_id,tags
0,Avatar,19995,action adventure fantasy sciencefiction cultur...
1,Pirates of the Caribbean: At World's End,285,adventure fantasy action ocean drugabuse exoti...
2,Spectre,206647,action adventure crime spy basedonnovel secret...
3,The Dark Knight Rises,49026,action crime drama thriller dccomics crimefigh...
4,John Carter,49529,action adventure sciencefiction basedonnovel m...
5,Spider-Man 3,559,fantasy action adventure dualidentity amnesia ...
6,Tangled,38757,animation family hostage magic horse fairytale...
7,Avengers: Age of Ultron,99861,action adventure sciencefiction marvelcomic se...
8,Harry Potter and the Half-Blood Prince,767,adventure fantasy family witch magic broom sch...
9,Batman v Superman: Dawn of Justice,209112,action adventure fantasy dccomics vigilante su...


In [37]:
from nltk.stem.porter import PorterStemmer
ps=PorterStemmer()

In [38]:
def stem(text):
    l=[]
    for i in text.split():
        l.append(ps.stem(i))
        
    return " ".join(l)
        

In [39]:
new_df['tags']=new_df['tags'].apply(stem)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(stem)


In [40]:
new_df.head(2)

Unnamed: 0,title,movie_id,tags
0,Avatar,19995,action adventur fantasi sciencefict culturecla...
1,Pirates of the Caribbean: At World's End,285,adventur fantasi action ocean drugabus exotici...


In [41]:
from sklearn.feature_extraction.text import CountVectorizer
cv=CountVectorizer(max_features=5000,stop_words='english')

In [42]:
vectors=cv.fit_transform(new_df['tags']).toarray()

In [43]:
vectors

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [44]:
vectors.shape

(4806, 5000)

In [45]:
cv.get_feature_names()



['000',
 '007',
 '10',
 '100',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '17th',
 '18',
 '18th',
 '18thcenturi',
 '19',
 '1910',
 '1920',
 '1930',
 '1940',
 '1944',
 '1950',
 '1950s',
 '1960',
 '1960s',
 '1970',
 '1970s',
 '1971',
 '1974',
 '1976',
 '1980',
 '1985',
 '1990',
 '1999',
 '19th',
 '19thcenturi',
 '20',
 '200',
 '2003',
 '2009',
 '20th',
 '21st',
 '23',
 '24',
 '25',
 '30',
 '300',
 '3d',
 '40',
 '50',
 '500',
 '60',
 '70',
 '80',
 'aaron',
 'aaroneckhart',
 'abandon',
 'abduct',
 'abigailbreslin',
 'abil',
 'abl',
 'aboard',
 'abov',
 'abus',
 'academ',
 'academi',
 'accept',
 'access',
 'accid',
 'accident',
 'acclaim',
 'accompani',
 'accomplish',
 'account',
 'accus',
 'ace',
 'achiev',
 'acquaint',
 'act',
 'action',
 'actionhero',
 'activ',
 'activist',
 'activities',
 'actor',
 'actress',
 'actual',
 'ad',
 'adam',
 'adamsandl',
 'adamshankman',
 'adapt',
 'add',
 'addict',
 'adjust',
 'admir',
 'admit',
 'adolesc',
 'adopt',
 'ador',
 'adrienbrodi',
 'adult'

In [46]:
from sklearn.metrics.pairwise import cosine_similarity
similarity=cosine_similarity(vectors)

In [47]:
similarity.shape

(4806, 4806)

In [48]:
similarity[0]

array([1.        , 0.08346223, 0.0860309 , ..., 0.04499213, 0.        ,
       0.        ])

In [49]:
def recommend(movies):
    movie_index=new_df[new_df['title']==movies].index[0]
    distance=similarity[movie_index]
    movie_list=sorted(list(enumerate(distance)),reverse=True,key=lambda x:x[1])[1:6]
    
    for i in movie_list:
        print(i[0])
        print(new_df.iloc[i[0]].title) 

In [57]:
recommend('John Carter')

1322
Riddick
3093
Krrish
3377
The Other Side of Heaven
610
The Legend of Hercules
1257
Get Carter


## Recommend Function in detail below -

In [51]:
movie_index=new_df[new_df['title']=='Bang'].index[0]
movie_index

4801

In [52]:
distance=similarity[movie_index]
distance

array([0.01925214, 0.04070628, 0.04195907, ..., 0.        , 0.15258621,
       0.12668775])

In [53]:
list(enumerate(similarity[0]))

[(0, 1.0000000000000002),
 (1, 0.08346223261119858),
 (2, 0.08603090020146065),
 (3, 0.0734718358370645),
 (4, 0.1892994097121204),
 (5, 0.10838874619051501),
 (6, 0.04024218182927669),
 (7, 0.14673479641335554),
 (8, 0.05923488777590923),
 (9, 0.0967301666813349),
 (10, 0.10259783520851541),
 (11, 0.09464970485606021),
 (12, 0.09037128496931669),
 (13, 0.04499212706658476),
 (14, 0.12824729401064427),
 (15, 0.06282808624375433),
 (16, 0.07894736842105264),
 (17, 0.13977653617040256),
 (18, 0.09493290614465533),
 (19, 0.0830812984794528),
 (20, 0.058038100008800934),
 (21, 0.10968169942141635),
 (22, 0.0662266178532522),
 (23, 0.08740748201220976),
 (24, 0.0533380747062665),
 (25, 0.05101627678885769),
 (26, 0.15389675281277312),
 (27, 0.18693292157876878),
 (28, 0.116543309349613),
 (29, 0.065033247714309),
 (30, 0.06684847767323797),
 (31, 0.15907119074394446),
 (32, 0.08520286456846099),
 (33, 0.09733285267845754),
 (34, 0.0),
 (35, 0.09933992677987831),
 (36, 0.17119059581558146),


In [54]:
movie_list=sorted(list(enumerate(distance)),reverse=True,key=lambda x:x[1])[1:6]
movie_list

[(4529, 0.4132616485751057),
 (3859, 0.39835341847333205),
 (1992, 0.3976480878143553),
 (4478, 0.38566468738110243),
 (3676, 0.3800632629601931)]

In [55]:
new_df.head()

Unnamed: 0,title,movie_id,tags
0,Avatar,19995,action adventur fantasi sciencefict culturecla...
1,Pirates of the Caribbean: At World's End,285,adventur fantasi action ocean drugabus exotici...
2,Spectre,206647,action adventur crime spi basedonnovel secreta...
3,The Dark Knight Rises,49026,action crime drama thriller dccomic crimefight...
4,John Carter,49529,action adventur sciencefict basedonnovel mar m...


In [56]:
for i in movie_list:
    print(i[0])
    print(new_df.iloc[i[0]].title)

4529
The R.M.
3859
2:13
1992
Faster
4478
Interview with the Assassin
3676
Should've Been Romeo
