# Recommendation types and techniques

### Type of Recommendation


    1. Collaborative Recommender System:
        It is most widely implemented and most mature technique that is available in the market. Collaborative recommender systems aggregate ratings or recommendations of objects, recognize commonalities between the users on the basis of their ratings, and generate new recommendations based on inter-user comparisons. The greatest strength of collaborative techniques is that they are completely independent of any machine-readable representation of the objects being recommended and work well for complex objects where variations in taste are responsible for much of the variation in preferences. 
        
    2. Content based Recommender System:
        It’s mainly classified as an outgrowth and continuation of information filtering research. In this system, the objects are mainly defined by their associated features.
        
    3. Demographic based Recommender System: 
        This system aims to categorize the users based on attributes and make recommendations based on demographic classes. Many industries have taken this kind of approach as it’s not that complex and easy to implement. In Demographic-based recommender system the algorithms first need a proper market research in the specified region accompanied with a short survey to gather data for categorization.
        
    4. Utility based Recommender System: 
        Utility based recommender system makes suggestions based on computation of the utility of each object for the user. 
        
    5. Knowledge based Recommender System:
        This type of recommender system attempts to suggest objects based on inferences about a user’s needs and preferences. Knowledge based recommendation works on functional knowledge: they have knowledge about how a particular item meets a particular user need, and can therefore reason about the relationship between a need and a possible recommendation.
        
    6. Hybrid Recommender System: 
        Combining any of the two systems in a manner that suits a particular industry is known as Hybrid Recommender system. This is the most sought after Recommender system that many companies look after, as it combines the strengths of more than two Recommender system and also eliminates any weakness which exist when only one recommender system is used. There are several ways in which the systems can be   combined, such as:
        
        i) Weighted Hybrid Recommender:
            In this system the score of a recommended item is computed from the     results of all of the available recommendation techniques present in the system. For example, P-Tango system combines collaborative and content based recommendation systems giving them equal weight in the starting, but gradually adjusting the weighting as predictions about the user ratings are confirmed or disconfirmed.
            
        ii) Switching Hybrid Recommender:
            Switching Hybrid Recommender, switches between the recommendation techniques based on particular criterions. Suppose if we combine the content and collaborative based recommender systems then, the switching hybrid recommender can first deploy content based recommender system and if it doesn’t work then it will deploy collaborative based recommender system.
            
        iii) Mixed Hybrid Recommender:
            Where it’s possible to make a large number of recommendations simultaneously, we should go for Mixed recommender systems. Here recommendations from more than one technique are presented together, so the user can choose from a wide range of recommendations.
        
        

# Necessary Library

In [131]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
from ast import literal_eval
from nltk.stem.snowball import SnowballStemmer

import warnings; warnings.simplefilter('ignore')


# Colaborative Recommendation

I am not going to implement Colaborative Filtering. I will use the Surprise library that used extremely powerful algorithms like Singular Value Decomposition(SVD) to minimise RMSE(root mean square error) and will provide great recommendation.

In [132]:
rating = pd.read_csv("/home/hasan/Downloads/3405_6663_bundle_archive/ratings_small.csv")


In [133]:
rating.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [134]:
reader = Reader()


In [135]:
data = Dataset.load_from_df(rating[['userId', 'movieId', 'rating']], reader)

In [136]:
# Use the famous SVD(singular value decomposition) algorithm
algo = SVD()

# Run 5-fold cross-validation and then print results
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8927  0.9022  0.9055  0.8947  0.8897  0.8970  0.0060  
MAE (testset)     0.6862  0.6927  0.6996  0.6887  0.6837  0.6902  0.0056  
Fit time          4.68    4.71    4.68    4.68    4.69    4.69    0.01    
Test time         0.16    0.16    0.16    0.16    0.16    0.16    0.00    


{'test_rmse': array([0.89271973, 0.90222447, 0.90554098, 0.8947029 , 0.88969534]),
 'test_mae': array([0.68617295, 0.69274102, 0.69962804, 0.68866417, 0.68371471]),
 'fit_time': (4.675418376922607,
  4.709033727645874,
  4.681403875350952,
  4.679007530212402,
  4.686733245849609),
 'test_time': (0.16245031356811523,
  0.16494464874267578,
  0.1635913848876953,
  0.1644134521484375,
  0.16445302963256836)}

I got mean RMSE score is 0.8959 that is really good. 

##### train on the dataset

In [137]:
train_data = data.build_full_trainset()
algo.fit(train_data)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f1745abee50>

##### Let's Check some data

In [138]:
rating[rating['userId']==2].head()

Unnamed: 0,userId,movieId,rating,timestamp
20,2,10,4.0,835355493
21,2,17,5.0,835355681
22,2,39,5.0,835355604
23,2,47,4.0,835355552
24,2,50,4.0,835355586


##### Predict with new data

In [139]:
algo.predict(1, 32, 6)

Prediction(uid=1, iid=32, r_ui=6, est=2.7473874545414763, details={'was_impossible': False})

For movie ID 32 we got estimated prediction=2.8568969734433862

Note:
    One startling feature of this recommender system is that it doesn't care what the movie is (or what it contains). It works purely on the basis of an assigned movie ID and tries to predict ratings based on how the other users have predicted the movie.

# Content Based Recommender

I will complete content based recommender in two way:
    
    1. Recommender based on movie description.
         Here I will mix tagline, overview columns as description.
    2. Recommender based on metadata.
         Here I will mix Cast, Crew, Keywords and Genre columns

### Recommender based on movie description.

In [140]:
# reading links_small dataset
link = pd.read_csv("/home/hasan/Downloads/3405_6663_bundle_archive/links_small.csv")

In [141]:
# head part of the dataset
link.head()

Unnamed: 0,movieId,imdbId,tmdbId
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0
3,4,114885,31357.0
4,5,113041,11862.0


In [142]:
# seperating tmdbId columns and and removing null data of this column and type as int
link = link[link['tmdbId'].notnull()]['tmdbId'].astype('int')
link.head()

0      862
1     8844
2    15602
3    31357
4    11862
Name: tmdbId, dtype: int64

##### Reading meta_data dataset

In [143]:
# reading metadata dataset
meta_data = pd.read_csv("/home/hasan/Downloads/3405_6663_bundle_archive/movies_metadata.csv")

In [144]:
# head part of metadata
meta_data.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


##### Making smaller dataset

In [145]:
# dropping three rows because of messy data in 'id' column
meta_data = meta_data.drop([19730, 29503, 35587])

In [146]:
# converting to integer of meta_data's 'id' columns
meta_data['id'] = meta_data['id'].astype('int')

# filtering metadata, taking those rows which have link's data 
small_meta_data = meta_data[meta_data['id'].isin(link)]

In [147]:
small_meta_data.head(3)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0


In [148]:
# shape of the small_meta_data
small_meta_data.shape

(9099, 24)

##### Recommender based on movie Description

Let us first try to build a recommender using movie descriptions and taglines. We do not have a quantitative metric to judge our machine's performance so this will have to be done qualitatively.

In [149]:
# filling nan value with nothing
# adding two columns into descriptions
# filling nan value of description column with nothing
small_meta_data['tagline'] = small_meta_data['tagline'].fillna('')
small_meta_data['description'] = small_meta_data['overview'] + small_meta_data['tagline']
small_meta_data['description'] = small_meta_data['description'].fillna('')


In [150]:
small_meta_data.head(3)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,description
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0,"Led by Woody, Andy's toys live happily in his ..."
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,When siblings Judy and Peter discover an encha...
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,A family wedding reignites the ancient feud be...


##### Vectorizer Decalre

In [151]:
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 2), min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(small_meta_data['description'])


In [152]:
tfidf_matrix

<9099x268124 sparse matrix of type '<class 'numpy.float64'>'
	with 540591 stored elements in Compressed Sparse Row format>

In [153]:
tfidf_matrix.shape

(9099, 268124)

##### cosine similarity

Since we have used the TF-IDF Vectorizer, calculating the Dot Product will directly give us the Cosine Similarity Score. Therefore, we will use sklearn's linear_kernel instead of cosine_similarities since it is much faster.

In [154]:
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

In [155]:
cosine_sim[0]

array([1.        , 0.00680476, 0.        , ..., 0.        , 0.00344913,
       0.        ])

We now have a pairwise cosine similarity matrix for all the movies in our dataset. The next step is to write a function that returns the 30 most similar movies based on the cosine similarity score.

##### Predicting 30 similar movies

In [156]:
small_meta_data = small_meta_data.reset_index()
title = small_meta_data['title']
titles = pd.DataFrame(title)
indices = pd.Series(small_meta_data.index, index=small_meta_data['title'])


In [157]:
def get_recommendations(title):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:31]
    movie_indices = [i[0] for i in sim_scores]
    return titles.iloc[movie_indices]


In [158]:
get_recommendations('The Godfather')

Unnamed: 0,title
973,The Godfather: Part II
8387,The Family
3509,Made
4196,Johnny Dangerously
29,Shanghai Triad
5667,Fury
2412,American Movie
1582,The Godfather: Part III
4221,8 Women
2159,Summer of Sam


In [159]:
get_recommendations('Road to Perdition')

Unnamed: 0,title
2539,Topsy-Turvy
3799,"Monsters, Inc."
2800,Frequency
3007,F/X2
6064,Thriller: A Cruel Picture
3918,A Walk to Remember
3967,40 Days and 40 Nights
3157,Dr. T and the Women
7502,Cyrus
2935,Prizzi's Honor


### Recommender based on metadata.

##### Reading dataset

In [160]:
credits = pd.read_csv("/home/hasan/Downloads/3405_6663_bundle_archive/credits.csv")
keywords = pd.read_csv("/home/hasan/Downloads/3405_6663_bundle_archive/keywords.csv")
meta_data = pd.read_csv("/home/hasan/Downloads/3405_6663_bundle_archive/movies_metadata.csv")


##### Changing data type of 'id' column

In [161]:
keywords['id'] = keywords['id'].astype('int')
credits['id'] = credits['id'].astype('int')

# dropping some non-integer type data
meta_data = meta_data.drop([19730, 29503, 35587])
meta_data['id'] = meta_data['id'].astype('int')
meta_data.head(3)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0


In [162]:
meta_data.shape

(45463, 24)

##### Merging of credits and keywords with meta_data

In [163]:
meta_data = meta_data.merge(credits, on='id')
meta_data = meta_data.merge(keywords, on='id')

##### Making final dataset

In [164]:
links_small = pd.read_csv('/home/hasan/Downloads/3405_6663_bundle_archive/links_small.csv')
links_small = links_small[links_small['tmdbId'].notnull()]['tmdbId'].astype('int')


In [165]:
small_meta_data = meta_data[meta_data['id'].isin(links_small)]
small_meta_data.head(3)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,spoken_languages,status,tagline,title,video,vote_average,vote_count,cast,crew,keywords
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[{'id': 931, 'name': 'jealousy'}, {'id': 4290,..."
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[{'id': 10090, 'name': 'board game'}, {'id': 1..."
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[{'id': 1495, 'name': 'fishing'}, {'id': 12392..."


In [166]:
small_meta_data.shape

(9219, 27)

##### applying literal_eval to cast, crew and keyword columns

1. Crew: From the crew, we will only pick the director as our feature since the others don't contribute that much to the feel of the movie.

2. Cast: Choosing Cast is a little more tricky. Lesser known actors and minor roles do not really affect people's opinion of a movie. Therefore, we must only select the major characters and their respective actors. Arbitrarily we will choose the top 3 actors that appear in the credits list.


In [167]:
small_meta_data['cast'] = small_meta_data['cast'].apply(literal_eval)
small_meta_data['crew'] = small_meta_data['crew'].apply(literal_eval)
small_meta_data['keywords'] = small_meta_data['keywords'].apply(literal_eval)
small_meta_data['cast_size'] = small_meta_data['cast'].apply(lambda x: len(x))
small_meta_data['crew_size'] = small_meta_data['crew'].apply(lambda x: len(x))

In [168]:
small_meta_data.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,tagline,title,video,vote_average,vote_count,cast,crew,keywords,cast_size,crew_size
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,,Toy Story,False,7.7,5415.0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[{'id': 931, 'name': 'jealousy'}, {'id': 4290,...",13,106
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[{'id': 10090, 'name': 'board game'}, {'id': 1...",26,16
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[{'id': 1495, 'name': 'fishing'}, {'id': 12392...",7,4
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0,"[{'cast_id': 1, 'character': 'Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...","[{'id': 818, 'name': 'based on novel'}, {'id':...",10,10
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...","[{'id': 1009, 'name': 'baby'}, {'id': 1599, 'n...",12,7


In [169]:
# seperating only director
def get_director(x):
    for i in x:
        if i['job'] == 'Director':
            return i['name']
    return np.nan


In [170]:
small_meta_data['director'] = small_meta_data['crew'].apply(get_director)


In [171]:
# filtering only name
small_meta_data['cast'] = small_meta_data['cast'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])

# keeping only three actors
small_meta_data['cast'] = small_meta_data['cast'].apply(lambda x: x[:3] if len(x) >=3 else x)

# filter only names
small_meta_data['keywords'] = small_meta_data['keywords'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
small_meta_data.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,title,video,vote_average,vote_count,cast,crew,keywords,cast_size,crew_size,director
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,Toy Story,False,7.7,5415.0,"[Tom Hanks, Tim Allen, Don Rickles]","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[jealousy, toy, boy, friendship, friends, riva...",13,106,John Lasseter
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,Jumanji,False,6.9,2413.0,"[Robin Williams, Jonathan Hyde, Kirsten Dunst]","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[board game, disappearance, based on children'...",26,16,Joe Johnston
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,Grumpier Old Men,False,6.5,92.0,"[Walter Matthau, Jack Lemmon, Ann-Margret]","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[fishing, best friend, duringcreditsstinger, o...",7,4,Howard Deutch
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,Waiting to Exhale,False,6.1,34.0,"[Whitney Houston, Angela Bassett, Loretta Devine]","[{'credit_id': '52fe44779251416c91011acb', 'de...","[based on novel, interracial relationship, sin...",10,10,Forest Whitaker
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,Father of the Bride Part II,False,5.7,173.0,"[Steve Martin, Diane Keaton, Martin Short]","[{'credit_id': '52fe44959251416c75039ed7', 'de...","[baby, midlife crisis, confidence, aging, daug...",12,7,Charles Shyer


In [172]:
# converting to lower case and removing space
small_meta_data['cast'] = small_meta_data['cast'].apply(lambda x: [str.lower(i.replace(" ", "")) for i in x])
small_meta_data['director'] = small_meta_data['director'].astype('str').apply(lambda x: str.lower(x.replace(" ", "")))
small_meta_data['director'] = small_meta_data['director'].apply(lambda x: [x,x, x])
small_meta_data.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,title,video,vote_average,vote_count,cast,crew,keywords,cast_size,crew_size,director
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,Toy Story,False,7.7,5415.0,"[tomhanks, timallen, donrickles]","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[jealousy, toy, boy, friendship, friends, riva...",13,106,"[johnlasseter, johnlasseter, johnlasseter]"
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,Jumanji,False,6.9,2413.0,"[robinwilliams, jonathanhyde, kirstendunst]","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[board game, disappearance, based on children'...",26,16,"[joejohnston, joejohnston, joejohnston]"
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,Grumpier Old Men,False,6.5,92.0,"[waltermatthau, jacklemmon, ann-margret]","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[fishing, best friend, duringcreditsstinger, o...",7,4,"[howarddeutch, howarddeutch, howarddeutch]"
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,Waiting to Exhale,False,6.1,34.0,"[whitneyhouston, angelabassett, lorettadevine]","[{'credit_id': '52fe44779251416c91011acb', 'de...","[based on novel, interracial relationship, sin...",10,10,"[forestwhitaker, forestwhitaker, forestwhitaker]"
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,Father of the Bride Part II,False,5.7,173.0,"[stevemartin, dianekeaton, martinshort]","[{'credit_id': '52fe44959251416c75039ed7', 'de...","[baby, midlife crisis, confidence, aging, daug...",12,7,"[charlesshyer, charlesshyer, charlesshyer]"


##### Keywords

We will do a small amount of pre-processing of our keywords before putting them to any use. As a first step, we calculate the frequenct counts of every keyword that appears in the dataset.

In [173]:
# stack of keywords
s = small_meta_data.apply(lambda x: pd.Series(x['keywords']),axis=1).stack().reset_index(level=1, drop=True)
s.name = 'keyword'
s


0           jealousy
0                toy
0                boy
0         friendship
0            friends
            ...     
41391    destruction
41391          kaiju
41391          toyko
41669          music
41669    documentary
Name: keyword, Length: 64407, dtype: object

In [174]:
# counting every words
s = s.value_counts()
s

independent film        610
woman director          550
murder                  399
duringcreditsstinger    327
based on novel          318
                       ... 
dirt bike                 1
soul selling              1
orc                       1
autocide                  1
catcher in the rye        1
Name: keyword, Length: 12940, dtype: int64

In [175]:
# taking those word which are appear more than 1 time
s = s[s>1]
s

independent film                610
woman director                  550
murder                          399
duringcreditsstinger            327
based on novel                  318
                               ... 
mechanical engineering            2
dialogue driven                   2
eccentric family                  2
dulles international airport      2
area 51                           2
Name: keyword, Length: 6709, dtype: int64

##### Stemming

In [176]:
stemmer = SnowballStemmer('english')

In [177]:
def filter_words(word_list):
    words = []
    for i in word_list:
        if i in s:
            words.append(i)
    return words
    

In [178]:
# filtering keywords
small_meta_data['keywords'] = small_meta_data['keywords'].apply(filter_words)
small_meta_data['keywords'] = small_meta_data['keywords'].apply(lambda x: [stemmer.stem(i) for i in x])
small_meta_data['keywords'] = small_meta_data['keywords'].apply(lambda x: [str.lower(i.replace(" ", "")) for i in x])
small_meta_data.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,title,video,vote_average,vote_count,cast,crew,keywords,cast_size,crew_size,director
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,Toy Story,False,7.7,5415.0,"[tomhanks, timallen, donrickles]","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[jealousi, toy, boy, friendship, friend, rival...",13,106,"[johnlasseter, johnlasseter, johnlasseter]"
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,Jumanji,False,6.9,2413.0,"[robinwilliams, jonathanhyde, kirstendunst]","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[boardgam, disappear, basedonchildren'sbook, n...",26,16,"[joejohnston, joejohnston, joejohnston]"
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,Grumpier Old Men,False,6.5,92.0,"[waltermatthau, jacklemmon, ann-margret]","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[fish, bestfriend, duringcreditssting]",7,4,"[howarddeutch, howarddeutch, howarddeutch]"
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,Waiting to Exhale,False,6.1,34.0,"[whitneyhouston, angelabassett, lorettadevine]","[{'credit_id': '52fe44779251416c91011acb', 'de...","[basedonnovel, interracialrelationship, single...",10,10,"[forestwhitaker, forestwhitaker, forestwhitaker]"
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,Father of the Bride Part II,False,5.7,173.0,"[stevemartin, dianekeaton, martinshort]","[{'credit_id': '52fe44959251416c75039ed7', 'de...","[babi, midlifecrisi, confid, age, daughter, mo...",12,7,"[charlesshyer, charlesshyer, charlesshyer]"


##### Making new column of keywords, cast, director and genres

In [179]:
small_meta_data['genres'] = small_meta_data['genres'].fillna('[]').apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])


In [180]:
small_meta_data['mix_col'] = small_meta_data['keywords'] + small_meta_data['cast'] + small_meta_data['director'] + small_meta_data['genres']
small_meta_data['mix_col'] = small_meta_data['mix_col'].apply(lambda x: ' '.join(x))
small_meta_data.head()


Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,video,vote_average,vote_count,cast,crew,keywords,cast_size,crew_size,director,mix_col
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[Animation, Comedy, Family]",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,False,7.7,5415.0,"[tomhanks, timallen, donrickles]","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[jealousi, toy, boy, friendship, friend, rival...",13,106,"[johnlasseter, johnlasseter, johnlasseter]",jealousi toy boy friendship friend rivalri boy...
1,False,,65000000,"[Adventure, Fantasy, Family]",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,False,6.9,2413.0,"[robinwilliams, jonathanhyde, kirstendunst]","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[boardgam, disappear, basedonchildren'sbook, n...",26,16,"[joejohnston, joejohnston, joejohnston]",boardgam disappear basedonchildren'sbook newho...
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[Romance, Comedy]",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,False,6.5,92.0,"[waltermatthau, jacklemmon, ann-margret]","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[fish, bestfriend, duringcreditssting]",7,4,"[howarddeutch, howarddeutch, howarddeutch]",fish bestfriend duringcreditssting waltermatth...
3,False,,16000000,"[Comedy, Drama, Romance]",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,False,6.1,34.0,"[whitneyhouston, angelabassett, lorettadevine]","[{'credit_id': '52fe44779251416c91011acb', 'de...","[basedonnovel, interracialrelationship, single...",10,10,"[forestwhitaker, forestwhitaker, forestwhitaker]",basedonnovel interracialrelationship singlemot...
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,[Comedy],,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,False,5.7,173.0,"[stevemartin, dianekeaton, martinshort]","[{'credit_id': '52fe44959251416c75039ed7', 'de...","[babi, midlifecrisi, confid, age, daughter, mo...",12,7,"[charlesshyer, charlesshyer, charlesshyer]",babi midlifecrisi confid age daughter motherda...


##### Count Vectorizer

In [181]:
# making vector from word
count = CountVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
count_matrix = count.fit_transform(small_meta_data['mix_col'])
count_matrix.shape

(9219, 107377)

In [182]:
# cosine similary
cosine_sim = cosine_similarity(count_matrix, count_matrix)

##### printing 30 recommended movie

In [183]:
small_meta_data = small_meta_data.reset_index()
titles = small_meta_data['title']
indices = pd.Series(small_meta_data.index, index=small_meta_data['title'])
len(indices)

9219

In [184]:
def get_recommendations(title):
    idx = indices[title]
    print(idx)
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:31]
    movie_indices = [i[0] for i in sim_scores]
    return titles.iloc[movie_indices]


In [185]:
get_recommendations('The Dark Knight')

6981


8031                 The Dark Knight Rises
6218                         Batman Begins
6623                          The Prestige
2085                             Following
7648                             Inception
4145                              Insomnia
3381                               Memento
8613                          Interstellar
7659            Batman: Under the Red Hood
1134                        Batman Returns
8927               Kidnapping Mr. Heineken
5943                              Thursday
1260                        Batman & Robin
9024    Batman v Superman: Dawn of Justice
4021                  The Long Good Friday
5809                           Point Blank
7362       Gangster's Paradise: Jerusalema
7561                           Harry Brown
7582                              Defendor
8001                      Batman: Year One
2754                          Death Wish 3
132                         Batman Forever
2131                              Superman
2448       

### Hybrid Recommender

I am going to bulild a hybridd recommender that brings together techniques those I have implemented in the content based and colaborative filter based engine.

Where 
 1. input:- User ID and the title of a movie
 2. output:- similar movies sorted on the basis of expected ratings by that particular user.

##### Reading link small dataset

In [186]:
link = pd.read_csv("/home/hasan/Downloads/3405_6663_bundle_archive/links_small.csv")[['movieId', 'tmdbId']]
link.head(10)


Unnamed: 0,movieId,tmdbId
0,1,862.0
1,2,8844.0
2,3,15602.0
3,4,31357.0
4,5,11862.0
5,6,949.0
6,7,11860.0
7,8,45325.0
8,9,9091.0
9,10,710.0


In [187]:
# changing type of tmdbId column
def conver_int(x):
    try:
        return int(x)
    except:
        return np.nan

link['tmdbId'] = link['tmdbId'].apply(conver_int)

# changing columns name
link.columns = ['movieId', 'id']
link.head()

Unnamed: 0,movieId,id
0,1,862.0
1,2,8844.0
2,3,15602.0
3,4,31357.0
4,5,11862.0


In [188]:
# merging link dataset with small_meta_data dataset
merge_dataset = link.merge(small_meta_data[['title', 'id']], on='id').set_index('title')
merge_dataset.head()


Unnamed: 0_level_0,movieId,id
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Toy Story,1,862.0
Jumanji,2,8844.0
Grumpier Old Men,3,15602.0
Waiting to Exhale,4,31357.0
Father of the Bride Part II,5,11862.0


In [189]:
# id as title
id_as_index = merge_dataset.set_index('id')
id_as_index.head()

Unnamed: 0_level_0,movieId
id,Unnamed: 1_level_1
862.0,1
8844.0,2
15602.0,3
31357.0,4
11862.0,5


In [190]:
def hybrid_recommender(user_id, title):
    idx = indices[title]
    tmdbId = merge_dataset.loc[title]['id']
    movie_id = merge_dataset.loc[title]['movieId']
    
    sim_scores = list(enumerate(cosine_sim[int(idx)]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:31]
    movie_indices = [i[0] for i in sim_scores]
    
    movies = small_meta_data.iloc[movie_indices][['title', 'vote_count', 'vote_average', 'id']]
    movies['est'] = movies['id'].apply(lambda x: algo.predict(user_id, id_as_index.loc[x]['movieId']).est)
    movies = movies.sort_values('est', ascending=False)
    return movies.head(30)
    
    

In [191]:
# 30 recommended movie when a person click Avatar movie
hybrid_recommender(1, 'Avatar')

Unnamed: 0,title,vote_count,vote_average,id,est
1011,The Terminator,4208.0,7.4,218,3.192304
522,Terminator 2: Judgment Day,4274.0,7.7,280,3.087389
2014,Fantastic Planet,140.0,7.6,16306,3.03171
974,Aliens,3282.0,7.7,679,2.976445
8658,X-Men: Days of Future Past,6155.0,7.5,127585,2.971079
2834,Predator,2129.0,7.3,106,2.91084
922,The Abyss,822.0,7.1,2756,2.773384
8401,Star Trek Into Darkness,4479.0,7.4,54138,2.729972
1621,Darby O'Gill and the Little People,35.0,6.7,18887,2.718587
344,True Lies,1138.0,6.8,36955,2.700831


In [192]:
# 30 recommended movie when a person click Titanic movie
hybrid_recommender(3, 'Predator')

Unnamed: 0,title,vote_count,vote_average,id,est
6252,Serenity,1287.0,7.4,16320,3.893331
8401,Star Trek Into Darkness,4479.0,7.4,54138,3.63797
8869,Ant-Man,6029.0,7.0,102899,3.564726
1298,The Hunt for Red October,971.0,7.2,1669,3.558154
1117,Star Trek: First Contact,671.0,7.0,199,3.548713
7208,Replicant,93.0,5.0,10596,3.457838
2230,The Thomas Crown Affair,349.0,6.7,913,3.451317
858,Die Hard,4005.0,7.5,562,3.429968
4505,Firefox,141.0,5.5,10724,3.381819
4114,Nomads,20.0,5.3,26725,3.32176


### Simple Recommender Using IMDB's weighted rating formula

I will use IMDB's weighted rating formula to construct my chart. Mathematically, it is represented as follows:

Weighted Rating (WR) = ((v/(v+m))*R)+((m/(v+m))*C)

where,

    v is the number of votes for the movie
    m is the minimum votes required to be listed in the chart
    R is the average rating of the movie
    C is the mean vote across the whole report


##### Reading meta_data 

In [193]:
meta_data = pd.read_csv("/home/hasan/Downloads/3405_6663_bundle_archive/movies_metadata.csv")
meta_data.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


In [194]:
 meta_data.shape

(45466, 24)

In [195]:
# seperating only names from genres column
meta_data['genres'] = meta_data['genres'].fillna('[]').apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
meta_data.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[Animation, Comedy, Family]",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[Adventure, Fantasy, Family]",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[Romance, Comedy]",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[Comedy, Drama, Romance]",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,[Comedy],,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


##### Calculating total vote and average vote

In [196]:
vote_counts = meta_data[meta_data['vote_count'].notnull()]['vote_count'].astype('int')
vote_averages = meta_data[meta_data['vote_average'].notnull()]['vote_average'].astype('int')
C = vote_averages.mean()

We will use 95th percentile as our cutoff. In other words, for a movie to feature in the charts, it must have more votes than at least 95% of the movies in the list.

In [197]:
# we are taking those movies which have vote more than 95% movie's vote 
m = vote_counts.quantile(0.95)
m

434.0

In [198]:
# creating year column
meta_data['year'] = pd.to_datetime(meta_data['release_date'], errors='coerce').apply(lambda x: str(x).split('-')[0] if x != np.nan else np.nan)
meta_data.head(3)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,year
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[Animation, Comedy, Family]",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0,1995
1,False,,65000000,"[Adventure, Fantasy, Family]",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,1995
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[Romance, Comedy]",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,1995


##### Seperating necessary columns

In [199]:
Necess_columns = meta_data[(meta_data['vote_count']>=m) * (meta_data['vote_count'].notnull()) & (meta_data['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity', 'genres']]
Necess_columns.head()

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres
0,Toy Story,1995,5415.0,7.7,21.9469,"[Animation, Comedy, Family]"
1,Jumanji,1995,2413.0,6.9,17.0155,"[Adventure, Fantasy, Family]"
5,Heat,1995,1886.0,7.7,17.9249,"[Action, Crime, Drama, Thriller]"
9,GoldenEye,1995,1194.0,6.6,14.686,"[Adventure, Action, Thriller]"
15,Casino,1995,1343.0,7.8,10.1374,"[Drama, Crime]"


In [200]:
Necess_columns.shape

(2274, 6)

##### Weighted Rating

In [201]:
def weighted_rating(x):
    v = x['vote_count']
    R = x['vote_average']
    return (v/(v+m)*R) + (m/(m+v)*C)


In [202]:
Necess_columns['weighted_rating'] = Necess_columns.apply(weighted_rating, axis=1)
Necess_columns.sort_values('weighted_rating', ascending=False).head()


Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,weighted_rating
314,The Shawshank Redemption,1994,8358.0,8.5,51.6454,"[Drama, Crime]",8.339318
834,The Godfather,1972,6024.0,8.5,41.1093,"[Drama, Crime]",8.281246
12481,The Dark Knight,2008,12269.0,8.3,123.167,"[Drama, Action, Crime, Thriller]",8.195622
2843,Fight Club,1999,9678.0,8.3,63.8696,[Drama],8.168877
292,Pulp Fiction,1994,8670.0,8.3,140.95,"[Thriller, Crime]",8.154359


In [203]:
# top 20 movies based on weighted_rating
top_20_movie = Necess_columns.sort_values('weighted_rating', ascending=False)
top_20_movie['title'].head(20)

314                           The Shawshank Redemption
834                                      The Godfather
12481                                  The Dark Knight
2843                                        Fight Club
292                                       Pulp Fiction
351                                       Forrest Gump
522                                   Schindler's List
23673                                         Whiplash
15480                                        Inception
1154                           The Empire Strikes Back
5481                                     Spirited Away
22879                                     Interstellar
18465                                 The Intouchables
2211                                 Life Is Beautiful
7000     The Lord of the Rings: The Return of the King
1178                            The Godfather: Part II
289                             Leon: The Professional
256                                          Star Wars
3030      

##### Finding top movies based on Genres

In [204]:
s = meta_data.apply(lambda x: pd.Series(x['genres']),axis=1).stack().reset_index(level=1, drop=True)
s.name = 'genre'
gen_meta_data = meta_data.drop('genres', axis=1).join(s)
gen_meta_data.head(10)

Unnamed: 0,adult,belongs_to_collection,budget,homepage,id,imdb_id,original_language,original_title,overview,popularity,...,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,year,genre
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",21.9469,...,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0,1995,Animation
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",21.9469,...,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0,1995,Comedy
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",21.9469,...,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0,1995,Family
1,False,,65000000,,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,17.0155,...,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,1995,Adventure
1,False,,65000000,,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,17.0155,...,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,1995,Fantasy
1,False,,65000000,,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,17.0155,...,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,1995,Family
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,11.7129,...,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,1995,Romance
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,11.7129,...,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,1995,Comedy
3,False,,16000000,,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",3.85949,...,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0,1995,Comedy
3,False,,16000000,,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",3.85949,...,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0,1995,Drama


In [205]:
def based_genres(genre, percentile=.90):
    genre_df = gen_meta_data[gen_meta_data['genre'] == genre]
    vote_count = genre_df[genre_df['vote_count'].notnull()]['vote_count'].astype('int')
    vote_avg = genre_df[genre_df['vote_average'].notnull()]['vote_average'].astype('int')
    C = vote_avg.mean()
    m = vote_count.quantile(percentile)
    
    selected_col = genre_df[(genre_df['vote_count']>=m) & (genre_df['vote_count'].notnull()) & (genre_df['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity']]
    selected_col['vote_count'] = selected_col['vote_count'].astype('int')
    selected_col['vote_average'] = selected_col['vote_average'].astype('int')
    
    selected_col['weighted_ratio'] = selected_col.apply(lambda x: (x['vote_count']/(x['vote_count'] + m) * x['vote_average']) + (m/(m+x['vote_count']) * C), axis=1 )
    selected_col = selected_col.sort_values('weighted_ratio', ascending=False).head(20)
    return selected_col
    

In [206]:
# 20 best romance movie
based_genres('Romance')

Unnamed: 0,title,year,vote_count,vote_average,popularity,weighted_ratio
10309,Dilwale Dulhania Le Jayenge,1995,661,9,34.457,8.271289
351,Forrest Gump,1994,8147,8,48.3072,7.947552
876,Vertigo,1958,1162,8,18.2082,7.672104
40251,Your Name.,2016,1030,8,34.461252,7.635975
883,Some Like It Hot,1959,835,8,11.8451,7.565203
1132,Cinema Paradiso,1988,834,8,14.177,7.564769
19901,Paperman,2012,734,8,7.19863,7.516517
37863,Sing Street,2016,669,8,10.672862,7.478971
882,The Apartment,1960,498,8,11.9943,7.345193
38718,The Handmaiden,2016,453,8,16.727405,7.297743


In [207]:
# 20 best family movie
based_genres('Family')

Unnamed: 0,title,year,vote_count,vote_average,popularity,weighted_ratio
1225,Back to the Future,1985,6239,8,25.7785,7.799689
359,The Lion King,1994,5520,8,21.6058,7.775802
5481,Spirited Away,2001,3968,8,41.0489,7.698086
5833,My Neighbor Totoro,1988,1730,8,13.5073,7.396347
926,It's a Wonderful Life,1946,1103,8,15.0316,7.161596
19901,Paperman,2012,734,8,7.19863,6.912765
4766,Harry Potter and the Philosopher's Stone,2001,7188,7,38.1872,6.890551
13724,Up,2009,7048,7,19.3309,6.888524
30315,Inside Out,2015,6737,7,23.9856,6.883739
15472,Despicable Me,2010,6595,7,22.2745,6.881416
