# Movie Recommender

<img src="./input_data/3_1.gif" 
align="middle" alt="Figure 3_1" data-canonical-src="" style="width:30%;height:30%">

```
1. Import our dependencies
2. Load dataset
3. Analyze dataset
4. Build recommendation system
   6.1 CF (Collaborative Filtering)
        6.3.1 Implement model
        6.3.2 Evaluate Result
   6.2 Hybrid recommendation system
        6.4.1 Implement model
        6.4.2 Evaluate Result
```   

## 1. Import Libraries

In [1]:
%matplotlib inline

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

from ast import literal_eval
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.stem.snowball import SnowballStemmer

from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate

#to depricate warnings
import warnings; warnings.simplefilter('ignore')

## 2. Load Dataset
We have MovieLens datasets.

Comprises of 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users.

In [2]:
# cast, crew, movieid
credits = pd.read_csv('./input_data/movie_dataset/credits.csv')

# movieid, keywords_related_to_movie
keywords = pd.read_csv('./input_data/movie_dataset/keywords.csv')

# movieid, tmdbid, imdbid
links_small = pd.read_csv('./input_data/movie_dataset/links_small.csv')

# All data about movie
md = pd.read_csv('./input_data/movie_dataset/movies_metadata.csv')

# userid, movieid, rating
ratings = pd.read_csv('./input_data/movie_dataset/ratings_small.csv')

## 3. Analyze Dataset

In [3]:
credits.head()

Unnamed: 0,cast,crew,id
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862


In [4]:
credits.shape

(45476, 3)

In [5]:
credits.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45476 entries, 0 to 45475
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   cast    45476 non-null  object
 1   crew    45476 non-null  object
 2   id      45476 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 1.0+ MB


In [6]:
keywords.head()

Unnamed: 0,id,keywords
0,862,"[{'id': 931, 'name': 'jealousy'}, {'id': 4290,..."
1,8844,"[{'id': 10090, 'name': 'board game'}, {'id': 1..."
2,15602,"[{'id': 1495, 'name': 'fishing'}, {'id': 12392..."
3,31357,"[{'id': 818, 'name': 'based on novel'}, {'id':..."
4,11862,"[{'id': 1009, 'name': 'baby'}, {'id': 1599, 'n..."


In [7]:
keywords.shape

(46419, 2)

In [8]:
keywords.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46419 entries, 0 to 46418
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        46419 non-null  int64 
 1   keywords  46419 non-null  object
dtypes: int64(1), object(1)
memory usage: 725.4+ KB


In [9]:
links_small.head()

Unnamed: 0,movieId,imdbId,tmdbId
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0
3,4,114885,31357.0
4,5,113041,11862.0


In [10]:
links_small.shape

(9125, 3)

In [11]:
links_small.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9125 entries, 0 to 9124
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   movieId  9125 non-null   int64  
 1   imdbId   9125 non-null   int64  
 2   tmdbId   9112 non-null   float64
dtypes: float64(1), int64(2)
memory usage: 214.0 KB


In [12]:
md.iloc[0:3].transpose()

Unnamed: 0,0,1,2
adult,False,False,False
belongs_to_collection,"{'id': 10194, 'name': 'Toy Story Collection', ...",,"{'id': 119050, 'name': 'Grumpy Old Men Collect..."
budget,30000000,65000000,0
genres,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...","[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...","[{'id': 10749, 'name': 'Romance'}, {'id': 35, ..."
homepage,http://toystory.disney.com/toy-story,,
id,862,8844,15602
imdb_id,tt0114709,tt0113497,tt0113228
original_language,en,en,en
original_title,Toy Story,Jumanji,Grumpier Old Men
overview,"Led by Woody, Andy's toys live happily in his ...",When siblings Judy and Peter discover an encha...,A family wedding reignites the ancient feud be...


In [13]:
md.columns

Index(['adult', 'belongs_to_collection', 'budget', 'genres', 'homepage', 'id',
       'imdb_id', 'original_language', 'original_title', 'overview',
       'popularity', 'poster_path', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title', 'video',
       'vote_average', 'vote_count'],
      dtype='object')

In [14]:
md.shape

(45466, 24)

In [15]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [16]:
ratings.shape

(100004, 4)

In [17]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100004 entries, 0 to 100003
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   userId     100004 non-null  int64  
 1   movieId    100004 non-null  int64  
 2   rating     100004 non-null  float64
 3   timestamp  100004 non-null  int64  
dtypes: float64(1), int64(3)
memory usage: 3.1 MB


## 4. Build Recommendation System

### 4.1 CF (Collaborative Filtering)

* Collaborative Filtering is based on the idea that users similar to a user can be used to predict how much he/she will like a particular product or service those users have used/experienced but he/she have not.
* I will use the __Surprise library__ that used extremely powerful algorithms like __Singular Value Decomposition (SVD) to minimise RMSE (Root Mean Square Error) and give great recommendations__.

In [18]:
"""USE OF SURPRISE LIBRARY"""

# surprise reader API to read the dataset
# The Reader class is used to parse a file containing ratings
reader = Reader()

# Load a dataset from a pandas dataframe
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

In [19]:
# Singular Value Decomposition
svd = SVD()

# Run a cross validation procedure for a given algorithm, reporting accuracy measures and computation times
# cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
# RMSE: Root mean square error, MAE: Mean Absolute Error

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8916  0.9104  0.8925  0.9014  0.8902  0.8972  0.0077  
MAE (testset)     0.6853  0.6989  0.6889  0.6932  0.6849  0.6903  0.0053  
Fit time          10.25   9.95    10.12   10.20   9.81    10.06   0.16    
Test time         0.30    0.39    0.31    0.29    0.37    0.33    0.04    


{'test_rmse': array([0.89156894, 0.9103568 , 0.89253292, 0.90144185, 0.89020575]),
 'test_mae': array([0.6853356 , 0.69894778, 0.68890423, 0.6932446 , 0.68487407]),
 'fit_time': (10.2476065158844,
  9.951979160308838,
  10.117501735687256,
  10.195326566696167,
  9.807865381240845),
 'test_time': (0.3021423816680908,
  0.38536930084228516,
  0.3070971965789795,
  0.2913956642150879,
  0.36789512634277344)}

In [20]:
trainset = data.build_full_trainset()

# Train the algorithm on the trainset
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1f77dd58e08>

In [21]:
ratings[ratings['userId'] == 1]

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
5,1,1263,2.0,1260759151
6,1,1287,2.0,1260759187
7,1,1293,2.0,1260759148
8,1,1339,3.5,1260759125
9,1,1343,2.0,1260759131


In [22]:
'''
prediction based on ratings given by other users and
ratings of current user
'''
svd.predict(1, 302)

Prediction(uid=1, iid=302, r_ui=None, est=2.9885808360751986, details={'was_impossible': False})

we got expected rating "2.9685687504852".
* One startling feature of this is that it doesn't care what the movie is (or what it contains). It works purely on the basis of an assigned movie ID and tries to predict ratings based on how the other users have perceive the movie.

### 4.2 Hybrid recommendation system

* In this method we are going to use **CF** (from previous section) in combination with **Cosine Similarity** to predict the best choices for user.
* **Input:** User ID and the Title of a Movie
* **Output:** Similar movies sorted on the basis of expected ratings by that particular user.


In [23]:
def convert_int(x):
    try:
        return int(x)
    except:
        return np.nan

In [24]:
# To convert : [{'id': 16, 'name': 'Animation'}, {'id': 35, '...}]
# to : [Animation, Comedy, Family]
md['genres'] = md['genres'].fillna('[]').apply(literal_eval).apply(lambda x: [i[
    'name'] for i in x] if isinstance(x, list) else [])

# Pre-processing step for getting year from date by splliting it using '-'
md['year'] = pd.to_datetime(md['release_date'], errors='coerce').apply(
    lambda x: str(x).split('-')[0] if x != np.nan else np.nan)

md['id'] = md['id'].apply(convert_int)
md[md['id'].isnull()]

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,year
19730,- Written by Ørnås,0.065736,/ff9qCepilowshEtG2GYWwzt2bs4.jpg,"[Carousel Productions, Vision View Entertainme...","[{'iso_3166_1': 'CA', 'name': 'Canada'}, {'iso...",,0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,...,,,,,,,,,,NaT
29503,Rune Balot goes to a casino connected to the ...,1.931659,/zV8bHuSL6WXoD6FWogP9j4x80bL.jpg,"[Aniplex, GoHands, BROSTA TV, Mardock Scramble...","[{'iso_3166_1': 'US', 'name': 'United States o...",,0,68.0,"[{'iso_639_1': 'ja', 'name': '日本語'}]",Released,...,,,,,,,,,,NaT
35587,Avalanche Sharks tells the story of a bikini ...,2.185485,/zaSf5OG7V8X8gqFvly88zDdRm46.jpg,"[Odyssey Media, Pulser Productions, Rogue Stat...","[{'iso_3166_1': 'CA', 'name': 'Canada'}]",,0,82.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,...,,,,,,,,,,NaT


In [25]:
md = md.drop([19730, 29503, 35587])
md['id'] = md['id'].astype('int')

In [26]:
keywords['id'] = keywords['id'].astype('int')
credits['id'] = credits['id'].astype('int')
md['id'] = md['id'].astype('int')
md.shape

(45463, 25)

In [27]:
md = md.merge(credits, on='id')
md = md.merge(keywords, on='id')

In [28]:
# data common in: links_small and md
# because links_small have less data than md
smd = md[md['id'].isin(links_small['tmdbId'])]
smd.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,status,tagline,title,video,vote_average,vote_count,year,cast,crew,keywords
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[Animation, Comedy, Family]",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,Released,,Toy Story,False,7.7,5415.0,1995,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[{'id': 931, 'name': 'jealousy'}, {'id': 4290,..."
1,False,,65000000,"[Adventure, Fantasy, Family]",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,1995,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[{'id': 10090, 'name': 'board game'}, {'id': 1..."
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[Romance, Comedy]",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,1995,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[{'id': 1495, 'name': 'fishing'}, {'id': 12392..."
3,False,,16000000,"[Comedy, Drama, Romance]",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0,1995,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...","[{'id': 818, 'name': 'based on novel'}, {'id':..."
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,[Comedy],,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0,1995,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...","[{'id': 1009, 'name': 'baby'}, {'id': 1599, 'n..."


In [29]:
smd['cast'] = smd['cast'].apply(literal_eval)
smd['crew'] = smd['crew'].apply(literal_eval)
smd['keywords'] = smd['keywords'].apply(literal_eval)
smd['cast_size'] = smd['cast'].apply(lambda x: len(x))
smd['crew_size'] = smd['crew'].apply(lambda x: len(x))
smd.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,title,video,vote_average,vote_count,year,cast,crew,keywords,cast_size,crew_size
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[Animation, Comedy, Family]",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,Toy Story,False,7.7,5415.0,1995,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[{'id': 931, 'name': 'jealousy'}, {'id': 4290,...",13,106
1,False,,65000000,"[Adventure, Fantasy, Family]",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,Jumanji,False,6.9,2413.0,1995,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[{'id': 10090, 'name': 'board game'}, {'id': 1...",26,16
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[Romance, Comedy]",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,Grumpier Old Men,False,6.5,92.0,1995,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[{'id': 1495, 'name': 'fishing'}, {'id': 12392...",7,4
3,False,,16000000,"[Comedy, Drama, Romance]",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,Waiting to Exhale,False,6.1,34.0,1995,"[{'cast_id': 1, 'character': 'Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...","[{'id': 818, 'name': 'based on novel'}, {'id':...",10,10
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,[Comedy],,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,Father of the Bride Part II,False,5.7,173.0,1995,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...","[{'id': 1009, 'name': 'baby'}, {'id': 1599, 'n...",12,7


In [30]:
def get_director(x):
    for i in x:
        if i['job'] == 'Director':
            return i['name']
    return np.nan

In [31]:
smd['director'] = smd['crew'].apply(get_director)
smd['director_not_soup'] = smd['director']
smd['director'] = smd['director'].astype('str').apply(lambda x: str.lower(x.replace(" ", "")))
smd['director'] = smd['director'].apply(lambda x: [x,x, x])

smd['cast'] = smd['cast'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
smd['cast'] = smd['cast'].apply(lambda x: x[:3] if len(x) >=3 else x)
smd['cast_not_soup'] = smd['cast']
smd['cast'] = smd['cast'].apply(lambda x: [str.lower(i.replace(" ", "")) for i in x])

smd['keywords'] = smd['keywords'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])

In [32]:
# All keywords
s = smd.apply(lambda x: pd.Series(x['keywords']),axis=1).stack().reset_index(level=1, drop=True)
s.name = 'keyword'
s = s.value_counts()
s[:5]

independent film        610
woman director          550
murder                  399
duringcreditsstinger    327
based on novel          318
Name: keyword, dtype: int64

In [33]:
# keywords with more than one occurrences
s = s[s > 1]
s.head()

independent film        610
woman director          550
murder                  399
duringcreditsstinger    327
based on novel          318
Name: keyword, dtype: int64

In [34]:
# Just an example
stemmer = SnowballStemmer('english')
stemmer.stem('dogs')

'dog'

In [35]:
def filter_keywords(x):
    words = []
    for i in x:
        if i in s:
            words.append(i)
    return words

In [36]:
# Normalizing/ formatting keywords
# eg. jealousy --> jealousi, friends --> friend
smd['keywords'] = smd['keywords'].apply(filter_keywords)
smd['keywords'] = smd['keywords'].apply(lambda x: [stemmer.stem(i) for i in x])
smd['keywords'] = smd['keywords'].apply(lambda x: [str.lower(i.replace(" ", "")) for i in x])

In [37]:
# Combining data of multiple columns in a string for later operations
smd['soup'] = smd['keywords'] + smd['cast'] + smd['director'] + smd['genres']
smd['soup'] = smd['soup'].apply(lambda x: ' '.join(x))

In [38]:
# Convert a collection of text documents to a matrix of token counts
count = CountVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')

# gives matrix of count
count_matrix = count.fit_transform(smd['soup'])

In [39]:
print(count_matrix)

  (0, 48201)	1
  (0, 99578)	1
  (0, 11703)	1
  (0, 35554)	1
  (0, 35478)	1
  (0, 82467)	1
  (0, 11769)	1
  (0, 70474)	1
  (0, 99598)	1
  (0, 98820)	1
  (0, 98078)	1
  (0, 26723)	1
  (0, 51067)	3
  (0, 4568)	1
  (0, 19542)	1
  (0, 32156)	1
  (0, 48271)	1
  (0, 99581)	1
  (0, 11714)	1
  (0, 35590)	1
  (0, 35536)	1
  (0, 82470)	1
  (0, 11771)	1
  (0, 70475)	1
  (0, 99608)	1
  :	:
  (9217, 106919)	1
  (9217, 86548)	1
  (9217, 38388)	1
  (9217, 37878)	1
  (9217, 25096)	1
  (9217, 53526)	1
  (9217, 41960)	1
  (9217, 106920)	1
  (9217, 86549)	1
  (9217, 41617)	1
  (9218, 68585)	2
  (9218, 26187)	1
  (9218, 84440)	3
  (9218, 84450)	2
  (9218, 26197)	1
  (9218, 26178)	1
  (9218, 51095)	1
  (9218, 74590)	1
  (9218, 82382)	1
  (9218, 68642)	1
  (9218, 26185)	1
  (9218, 74598)	1
  (9218, 82384)	1
  (9218, 51098)	1
  (9218, 84444)	1


In [40]:
cosine_sim = cosine_similarity(count_matrix, count_matrix)
print(cosine_sim)
cosine_sim.shape

[[1.         0.02441931 0.02738955 ... 0.         0.         0.        ]
 [0.02441931 1.         0.         ... 0.02973505 0.02500782 0.        ]
 [0.02738955 0.         1.         ... 0.03335187 0.         0.        ]
 ...
 [0.         0.02973505 0.03335187 ... 1.         0.08700222 0.        ]
 [0.         0.02500782 0.         ... 0.08700222 1.         0.        ]
 [0.         0.         0.         ... 0.         0.         1.        ]]


(9219, 9219)

In [41]:
x=cosine_sim.astype("float32")
x.nbytes

339959844

In [42]:
smd = smd.reset_index()

indices = pd.Series(smd.index, index=smd['title'])
indices.head()

title
Toy Story                      0
Jumanji                        1
Grumpier Old Men               2
Waiting to Exhale              3
Father of the Bride Part II    4
dtype: int64

In [43]:
id_map = pd.read_csv('./input_data/movie_dataset/links_small.csv')[['movieId', 'tmdbId']]
id_map['tmdbId'] = id_map['tmdbId'].apply(convert_int)
id_map.columns = ['movieId', 'id']
id_map = id_map.merge(smd[['title', 'id']], on='id').set_index('title')
id_map.head()

Unnamed: 0_level_0,movieId,id
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Toy Story,1,862.0
Jumanji,2,8844.0
Grumpier Old Men,3,15602.0
Waiting to Exhale,4,31357.0
Father of the Bride Part II,5,11862.0


In [44]:
indices_map = id_map.set_index('id')
indices_map.head()

Unnamed: 0_level_0,movieId
id,Unnamed: 1_level_1
862.0,1
8844.0,2
15602.0,3
31357.0,4
11862.0,5


In [45]:
def hybrid(userId,arg,type):
    arg = arg.title()
    if(type == "title"):
        idx = indices[arg]
        tmdbId = id_map.loc[arg]['id']
        movie_id = id_map.loc[arg]['movieId']

        sim_arg = list(enumerate(cosine_sim[int(idx)]))
        sim_arg = sorted(sim_arg, key=lambda x: x[1], reverse=True)
        sim_arg = sim_arg[1:26]
        sim_arg = [i[0] for i in sim_arg]
        
    elif(type == "genre"):
        sim_arg = smd[smd["genres"].apply(lambda x: arg in x)]
        sim_arg = sim_arg.sort_values(by='vote_average', ascending=False)
        sim_arg = sim_arg.index[:25]
    elif(type == "cast"):
        sim_arg = smd[smd["cast_not_soup"].apply(lambda x: arg in x)].index[:]
        
    elif(type == "director"):
        sim_arg = smd[smd["director_not_soup"] == arg].index[:]    

######################################################################
    movie_indices = [i for i in sim_arg]    
    movies = smd.iloc[movie_indices][['title', 'id','genres','cast_not_soup','director_not_soup']]
#     movies = smd.iloc[movie_indices][['title', 'id','genres','cast_not_soup','director_not_soup','vote_average']]
    
    # use of svd to predict estimated rating(from previous section)
    movies['est'] = movies['id'].apply(lambda x: svd.predict(userId, indices_map.loc[x]['movieId']).est)
    
    # sorting using estimated rating
    movies = movies.sort_values('est', ascending=False)
    movies.rename(columns={"cast_not_soup": "cast","director_not_soup": "director"}, inplace = True)
#     movies.rename(columns={"cast_not_soup": "cast","director_not_soup": "director","vote_average":"Vote Average"}, inplace = True)
    return movies.head(10)

In [46]:
# hybrid(456,"Star Wars", "title")

In [47]:
hybrid(456,"christopher nolan", "director")

Unnamed: 0,title,id,genres,cast,director,est
3381,Memento,77,"[Mystery, Thriller]","[Guy Pearce, Carrie-Anne Moss, Joe Pantoliano]",Christopher Nolan,4.196559
6623,The Prestige,1124,"[Drama, Mystery, Thriller]","[Hugh Jackman, Christian Bale, Michael Caine]",Christopher Nolan,4.143332
6981,The Dark Knight,155,"[Drama, Action, Crime, Thriller]","[Christian Bale, Michael Caine, Heath Ledger]",Christopher Nolan,4.128027
8613,Interstellar,157336,"[Adventure, Drama, Science Fiction]","[Matthew McConaughey, Jessica Chastain, Anne H...",Christopher Nolan,3.999972
8031,The Dark Knight Rises,49026,"[Action, Crime, Drama, Thriller]","[Christian Bale, Michael Caine, Gary Oldman]",Christopher Nolan,3.820075
6218,Batman Begins,272,"[Action, Crime, Drama]","[Christian Bale, Michael Caine, Liam Neeson]",Christopher Nolan,3.773217
4145,Insomnia,320,"[Crime, Mystery, Thriller]","[Al Pacino, Robin Williams, Hilary Swank]",Christopher Nolan,3.632812
7648,Inception,27205,"[Action, Thriller, Science Fiction, Mystery, A...","[Leonardo DiCaprio, Joseph Gordon-Levitt, Elle...",Christopher Nolan,3.548042
2085,Following,11660,"[Crime, Drama, Thriller]","[Jeremy Theobald, Alex Haw, Lucy Russell]",Christopher Nolan,3.494494


In [48]:
def addd(x,listt):
    for i in x:
        if i not in listt:
            listt.append(i)

In [49]:
genre_li = []
smd["genres"].apply(lambda x: addd(x,genre_li))
print("GENRES: ",genre_li)

GENRES:  ['Animation', 'Comedy', 'Family', 'Adventure', 'Fantasy', 'Romance', 'Drama', 'Action', 'Crime', 'Thriller', 'Horror', 'History', 'Science Fiction', 'Mystery', 'War', 'Foreign', 'Music', 'Documentary', 'Western', 'TV Movie']


In [50]:
cast_li = []
smd["cast_not_soup"].apply(lambda x: addd(x,cast_li))
print("CAST (",len(cast_li),") :",cast_li[:25])

CAST ( 10369 ) : ['Tom Hanks', 'Tim Allen', 'Don Rickles', 'Robin Williams', 'Jonathan Hyde', 'Kirsten Dunst', 'Walter Matthau', 'Jack Lemmon', 'Ann-Margret', 'Whitney Houston', 'Angela Bassett', 'Loretta Devine', 'Steve Martin', 'Diane Keaton', 'Martin Short', 'Al Pacino', 'Robert De Niro', 'Val Kilmer', 'Harrison Ford', 'Julia Ormond', 'Greg Kinnear', 'Jonathan Taylor Thomas', 'Brad Renfro', 'Rachael Leigh Cook', 'Jean-Claude Van Damme']


In [51]:
director_li = list(set(smd["director_not_soup"]))[1:]
print("Director (",len(director_li),") :",director_li[:25])

Director ( 3601 ) : ['Ben Stiller', 'Gerard Johnson', 'James Toback', 'Patrick Archibald', 'Manny Rodriguez', 'Eugene Jarecki', 'Jacques Demy', 'Yasuzô Masumura', 'John Curran', 'Heidi Ewing', 'Jeremy Newberger', 'Danny Pang', 'Christian Molina', 'Gerald Potterton', 'Amy Berg', 'Martin Rosen', 'Jon Hurwitz', 'William Wiard', 'G.W. Pabst', 'Christopher Smith', 'Mark Herman', 'Luke Matheny', 'Bruno Barreto', 'Jason Lei Howden', 'Anthony Stacchi']


In [52]:
print(genre_li)

['Animation', 'Comedy', 'Family', 'Adventure', 'Fantasy', 'Romance', 'Drama', 'Action', 'Crime', 'Thriller', 'Horror', 'History', 'Science Fiction', 'Mystery', 'War', 'Foreign', 'Music', 'Documentary', 'Western', 'TV Movie']


In [53]:
# code=""
# for i in genre_li:
#     code+='<option value="'+i+'">'+i+ '</option>'
    
#     print('<option value="',i,'">',i,'</option>',end="\n",sep="")
    

In [54]:
# code