# Movies Recommendation System

Using the movies in the *Entire Dataset*, I will construct a Simple Recommender.

In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from ast import literal_eval
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
from surprise import SVD, Reader
from surprise import Dataset
from surprise.model_selection import cross_validate
import warnings; warnings.simplefilter('ignore')
from surprise.model_selection import train_test_split

## Simple Recommender

The Simple Recommender offers generalized recommnendations to every user based on movie popularity and (sometimes) genre. The basic idea behind this recommender is that movies that are more popular and more critically acclaimed will have a higher probability of being liked by the average audience. This model does not give personalized recommendations based on the user. 

The implementation of this model is extremely trivial. All we have to do is sort our movies based on ratings and popularity and display the top movies of our list. As an added step, we can pass in a genre argument to get the top movies of a particular genre.

In [2]:
mm = pd. read_csv('/kaggle/input/the-movies-dataset/movies_metadata.csv')
mm.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


In [3]:
mm['genres'] = mm['genres'].fillna('[]').apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])

An IMDB weighted rating method will be used to construct the list.

Rating Weighted (WR) = (vv+m.R) + (mv+m.C)

Finding a suitable figure for m—the minimum number of votes needed to be included in the chart—is the next step. Our cutoff will be the 95th percentile. Put otherwise, a film needs to receive more votes than at least 95% of the other films on the list in order to be included in the charts.

In [4]:
vote_counts = mm[mm['vote_count'].notnull()]['vote_count'].astype('int')
vote_averages = mm[mm['vote_average'].notnull()]['vote_average'].astype('int')
C = vote_averages.mean()

m = vote_counts.quantile(0.95)
print(C,m)

5.244896612406511 434.0


Average rating for a movie on TMDB is 5.244

In [5]:
mm['year'] = pd.to_datetime(mm['release_date'], errors='coerce').apply(lambda x: str(x).split('-')[0] if x != np.nan else np.nan)

In [6]:
top = mm[(mm['vote_count'] >= m) & (mm['vote_count'].notnull()) & (mm['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity', 'genres']]
top['vote_count'] = top['vote_count'].astype('int')
top['vote_average'] = top['vote_average'].astype('int')
top.shape

def weighted_rating(x):
    v = x['vote_count']
    R = x['vote_average']
    return (v/(v+m) * R) + (m/(m+v) * C)

top['wr'] = top.apply(weighted_rating, axis=1)

top = top.sort_values('wr', ascending=False).head(250)

In [7]:
top.head()

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,wr
15480,Inception,2010,14075,8,29.108149,"[Action, Thriller, Science Fiction, Mystery, A...",7.917588
12481,The Dark Knight,2008,12269,8,123.167259,"[Drama, Action, Crime, Thriller]",7.905871
22879,Interstellar,2014,11187,8,32.213481,"[Adventure, Drama, Science Fiction]",7.897107
2843,Fight Club,1999,9678,8,63.869599,[Drama],7.881753
4863,The Lord of the Rings: The Fellowship of the Ring,2001,8892,8,32.070725,"[Adventure, Fantasy, Action]",7.871787


Now let's design the function that creates charts for specific genres.

In [8]:
s = mm.apply(lambda x: pd.Series(x['genres']),axis=1).stack().reset_index(level=1, drop=True)
s.name = 'genre'
gen_mm = mm.drop('genres', axis=1).join(s)

In [9]:
def topmovies(genre, percentile=0.85):
    df = gen_mm[gen_mm['genre'] == genre]
    vote_counts = df[df['vote_count'].notnull()]['vote_count'].astype('int')
    vote_averages = df[df['vote_average'].notnull()]['vote_average'].astype('int')
    C = vote_averages.mean()
    m = vote_counts.quantile(percentile)
    
    top = df[(df['vote_count'] >= m) & (df['vote_count'].notnull()) & (df['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity']]
    top['vote_count'] = top['vote_count'].astype('int')
    top['vote_average'] = top['vote_average'].astype('int')
    
    top['wr'] = top.apply(lambda x: (x['vote_count']/(x['vote_count']+m) * x['vote_average']) + (m/(m+x['vote_count']) * C), axis=1)
    top = top.sort_values('wr', ascending=False).head(250)
    
    return top

### Top Comedy Movies

In [10]:
topmovies('Comedy').head(15)

Unnamed: 0,title,year,vote_count,vote_average,popularity,wr
10309,Dilwale Dulhania Le Jayenge,1995,661,9,34.457024,8.463024
351,Forrest Gump,1994,8147,8,48.307194,7.963363
1225,Back to the Future,1985,6239,8,25.778509,7.952358
18465,The Intouchables,2011,5410,8,16.086919,7.945207
22841,The Grand Budapest Hotel,2014,4644,8,14.442048,7.936384
2211,Life Is Beautiful,1997,3643,8,39.39497,7.91943
732,Dr. Strangelove or: How I Learned to Stop Worr...,1964,1472,8,9.80398,7.809073
3342,Modern Times,1936,881,8,8.159556,7.695554
883,Some Like It Hot,1959,835,8,11.845107,7.680781
1236,The Great Dictator,1940,756,8,9.241748,7.651762


## Content Based Recommender

In this section, we'll build two content-based recommenders: one based on movie descriptions and taglines, and the other based on movie metadata such as cast, crew, genre, and keywords. These engines will use the textual and categorical features of movies to recommend similar ones to users.

I will build two Content Based Recommenders based on:
* Movie Overviews and Taglines
* Movie Cast, Crew, Keywords and Genre

Also I will be using the small dataset because of limited resources

In [11]:
links = pd.read_csv('/kaggle/input/the-movies-dataset/links_small.csv')
links = links[links['tmdbId'].notnull()]['tmdbId'].astype('int')

In [12]:
mm = mm.drop([19730, 29503, 35587])

In [13]:
mm['id'] = mm['id'].astype('int')

In [14]:
nmm = mm[mm['id'].isin(links)]
nmm.shape

(9099, 25)

### Movie Description Based Recommender

Let us first try to build a recommender using movie descriptions and taglines. We do not have a quantitative metric to judge our machine's performance so this will have to be done qualitatively.

In [15]:
nmm['tagline'] = nmm['tagline'].fillna('')
nmm['description'] = nmm['overview'] + nmm['tagline']
nmm['description'] = nmm['description'].fillna('')

In [16]:
tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(nmm['description'])

In [17]:
tfidf_matrix.shape

(9099, 268124)

### Cosine Similarity

To get a numerical value that represents how similar two films are, I'll be utilising the Cosine Similarity. The mathematical definition is as follows:

$cosine(x,y) = \frac{x.y^\intercal}{||x||.||y||} $

The Dot Product calculation will yield the Cosine Similarity Score since we utilised the TF-IDF Vectorizer. Hence, we will utilise sklearn's **linear_kernel** instead of cosine_similarities because it is considerably faster.

In [18]:
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

In [19]:
cosine_sim[0]

array([1.        , 0.00680476, 0.        , ..., 0.        , 0.00344913,
       0.        ])

In [20]:
nmm = nmm.reset_index()
titles = nmm['title']
indices = pd.Series(nmm.index, index=nmm['title'])

In [21]:
def get_recommendations(title):
    idx = indices[title]
    dmm_scores = list(enumerate(cosine_sim[idx]))
    dmm_scores = [(i, score) for i, score in dmm_scores]  # Ensure each element is a tuple
    dmm_scores = np.array(dmm_scores)  # Convert to NumPy array
    dmm_scores = sorted(dmm_scores, key=lambda x: x[1], reverse=True)
    dmm_scores = dmm_scores[1:31]
    movie_indices = [i[0] for i in dmm_scores]
    return mm['title'].iloc[movie_indices]


In [22]:
get_recommendations('Pulp Fiction').head(5)

7977                For Queen & Country
6974    The Killing of a Chinese Bookie
2935                              Dogma
6734                             Sylvia
240                    Gumby: The Movie
Name: title, dtype: object

In [23]:
get_recommendations('The Notebook').head(5)

1577                                 Phantoms
6971                              Naked Lunch
7949                   Fist of the North Star
5440           Betty Fisher and Other Stories
3804    Abbott and Costello Meet Frankenstein
Name: title, dtype: object

### Metadata Based Recommender

To build our standard metadata based content recommender, we will need to merge our current dataset with the crew and the keyword datasets. Let us prepare this data as our first step.

In [24]:
credits = pd.read_csv('/kaggle/input/the-movies-dataset/credits.csv')
keywords = pd.read_csv('/kaggle/input/the-movies-dataset/keywords.csv')

In [25]:
keywords['id'] = keywords['id'].astype('int')
credits['id'] = credits['id'].astype('int')
mm['id'] = mm['id'].astype('int')

In [26]:
mm.shape

(45463, 25)

In [27]:
mm = mm.merge(credits, on='id')
mm = mm.merge(keywords, on='id')

In [28]:
nmm = mm[mm['id'].isin(links)]
nmm.shape

(9219, 28)

In [29]:
nmm['cast'] = nmm['cast'].apply(literal_eval)
nmm['crew'] = nmm['crew'].apply(literal_eval)
nmm['keywords'] = nmm['keywords'].apply(literal_eval)
nmm['cast_size'] = nmm['cast'].apply(lambda x: len(x))
nmm['crew_size'] = nmm['crew'].apply(lambda x: len(x))

In [30]:
def get_director(x):
    for i in x:
        if i['job'] == 'Director':
            return i['name']
    return np.nan

In [31]:
nmm['director'] = nmm['crew'].apply(get_director)

In [32]:
nmm['cast'] = nmm['cast'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
nmm['cast'] = nmm['cast'].apply(lambda x: x[:3] if len(x) >=3 else x)

In [33]:
nmm['keywords'] = nmm['keywords'].apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])

In [34]:
nmm['cast'] = nmm['cast'].apply(lambda x: [str.lower(i.replace(" ", "")) for i in x])

In [35]:
nmm['director'] = nmm['director'].astype('str').apply(lambda x: str.lower(x.replace(" ", "")))
nmm['director'] = nmm['director'].apply(lambda x: [x,x, x])

In [36]:
s = nmm.apply(lambda x: pd.Series(x['keywords']),axis=1).stack().reset_index(level=1, drop=True)
s.name = 'keyword'

In [37]:
s = s.value_counts()
s[:5]

keyword
independent film        610
woman director          550
murder                  399
duringcreditsstinger    327
based on novel          318
Name: count, dtype: int64

Keywords occur in frequencies ranging from 1 to 610. We do not have any use for keywords that occur only once. Therefore, these can be safely removed. Finally, we will convert every word to its stem so that words such as *Dogs* and *Dog* are considered the same.

In [38]:
s = s[s > 1]

In [39]:
stemmer = SnowballStemmer('english')
stemmer.stem('dogs')

'dog'

In [40]:
def filter_keywords(x):
    words = []
    for i in x:
        if i in s:
            words.append(i)
    return words

In [41]:
nmm['keywords'] = nmm['keywords'].apply(filter_keywords)
nmm['keywords'] = nmm['keywords'].apply(lambda x: [stemmer.stem(i) for i in x])
nmm['keywords'] = nmm['keywords'].apply(lambda x: [str.lower(i.replace(" ", "")) for i in x])

In [42]:
nmm['soup'] = nmm['keywords'] + nmm['cast'] + nmm['director'] + nmm['genres']
nmm['soup'] = nmm['soup'].apply(lambda x: ' '.join(x))

In [43]:
count = CountVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
count_matrix = count.fit_transform(nmm['soup'])

In [44]:
cosine_sim = cosine_similarity(count_matrix, count_matrix)

In [45]:
nmm = nmm.reset_index()
titles = nmm['title']
indices = pd.Series(nmm.index, index=nmm['title'])

In [46]:
get_recommendations('Pulp Fiction').head(10)

1373                                  Thieves
8877                              Yellowbeard
5172                              High Crimes
886                          The Gay Divorcee
4875    Morgan: A Suitable Case for Treatment
7240                         Dawn of the Dead
6752                               Wonderland
8266                    Lightning in a Bottle
4567                          Three Fugitives
4736                             Corky Romano
Name: title, dtype: object

In [47]:
get_recommendations('The Notebook').head(10)

1281        Kids of Survival
7297             Jersey Girl
6639            Bubba Ho-tep
8609               The Prize
3964         A Monkey's Tale
3998    Flowers in the Attic
7191      Goodbye, Mr. Chips
3247     A Raisin in the Sun
4330            The Big Boss
2931              Spaceballs
Name: title, dtype: object

We noticed that our recommendation algorithm recommends films independent of rating or popularity.

As a result, we will include a system for removing terrible films and returning popular films that have received positive critical reviews.

I'll pick the top 25 movies based on similarity scores and compute the vote for the 60th percentile movie. Then, using this as the value of $m$, we will compute the weighted rating of each movie using ImmB's algorithm, as we did in the Simple Recommender section.

In [48]:
def improved_recommendations(title):
    idx = indices[title]
    dmm_scores = list(enumerate(cosine_sim[idx]))
    dmm_scores = sorted(dmm_scores, key=lambda x: x[1], reverse=True)
    dmm_scores = dmm_scores[1:26]
    movie_indices = [i[0] for i in dmm_scores]
    
    movies = nmm.iloc[movie_indices][['title', 'vote_count', 'vote_average', 'year']]
    vote_counts = movies[movies['vote_count'].notnull()]['vote_count'].astype('int')
    vote_averages = movies[movies['vote_average'].notnull()]['vote_average'].astype('int')
    C = vote_averages.mean()
    m = vote_counts.quantile(0.60)
    top = movies[(movies['vote_count'] >= m) & (movies['vote_count'].notnull()) & (movies['vote_average'].notnull())]
    top['vote_count'] = top['vote_count'].astype('int')
    top['vote_average'] = top['vote_average'].astype('int')
    top['wr'] = top.apply(weighted_rating, axis=1)
    top = top.sort_values('wr', ascending=False).head(10)
    return top

In [49]:
improved_recommendations('Pulp Fiction')

Unnamed: 0,title,vote_count,vote_average,year,wr
886,Reservoir Dogs,3821,8,1992,7.718986
8266,Django Unchained,10297,7,2012,6.929017
7240,Inglourious Basterds,6598,7,2009,6.891679
4875,Kill Bill: Vol. 1,5091,7,2003,6.862133
8877,The Hateful Eight,4405,7,2015,6.842588
5172,Kill Bill: Vol. 2,4061,7,2004,6.830542
1373,Jackie Brown,1580,7,1997,6.62179
65,From Dusk Till Dawn,1644,6,1996,5.842293
6752,Death Proof,1359,6,2007,5.817225
4736,S.W.A.T.,780,5,2003,5.08755


In [50]:
improved_recommendations('The Notebook')

Unnamed: 0,title,vote_count,vote_average,year,wr
726,Breakfast at Tiffany's,1082,7,1961,6.49755
5507,Before Sunset,734,7,2004,6.347847
7297,My Sister's Keeper,614,7,2009,6.273173
3964,John Q,604,7,2002,6.266171
4330,Frida,397,7,2002,6.083376
97,The Bridges of Madison County,397,7,1995,6.083376
8609,The Other Woman,1467,6,2014,5.827609
1256,My Best Friend's Wedding,606,6,1997,5.68489
6639,Alpha Dog,463,6,2006,5.634655
3882,Kate & Leopold,430,6,2001,5.6207


## Collaborative Filtering

There are some serious issues with our content-based engine. It can only propose movies that are *close* to a specific movie. That is, it is incapable of recording preferences and making recommendations across genres.

Furthermore, the engine that we created is not truly personalised in the sense that it does not capture a user's specific preferences and biases. Anyone who requests our system for recommendations based on a movie will receive the same recommendations, regardless of their identity.

In this part, we will employ a technique known as **Collaborative Filtering** to provide suggestions to moviegoers. Collaborative Filtering is based on the premise that users who are similar to me may estimate how much I will appreciate a product or service that they have used/experienced but I have not.

I'll utilise the **Surprise** library, instead starting from scratch with Collaborative Filtering as Surprise employs strong techniques such as **Singular Value Decomposition (SVD)** to reduce RMSE (Root Mean Square Error) and provide excellent recommendations.

In [51]:
reader = Reader()

In [52]:
ratings = pd.read_csv('/kaggle/input/the-movies-dataset/ratings_small.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [53]:
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
trainset, testset = train_test_split(data, test_size=0.2)
svd = SVD()
results = cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
print(results)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9041  0.8977  0.9017  0.8914  0.8943  0.8978  0.0046  
MAE (testset)     0.6967  0.6878  0.6944  0.6871  0.6881  0.6908  0.0039  
Fit time          1.24    1.22    1.23    1.24    1.26    1.24    0.01    
Test time         0.34    0.11    0.12    0.11    0.12    0.16    0.09    
{'test_rmse': array([0.90409788, 0.89767953, 0.90170259, 0.89143452, 0.89430605]), 'test_mae': array([0.696673  , 0.6877754 , 0.69439198, 0.68712758, 0.68807062]), 'fit_time': (1.238638162612915, 1.2233705520629883, 1.2347919940948486, 1.2425897121429443, 1.2580173015594482), 'test_time': (0.34325170516967773, 0.11249566078186035, 0.1221764087677002, 0.11369490623474121, 0.11906933784484863)}


Now to train on our dataset

In [54]:
trainset, testset = train_test_split(data, test_size=0.2)

svd = SVD()

svd.fit(trainset)

predictions = svd.test(testset)

from surprise import accuracy
accuracy.rmse(predictions)
accuracy.mae(predictions)

RMSE: 0.8943
MAE:  0.6901


0.6901104173337365

Let's select a user and look at their ratings.

In [55]:
ratings[ratings['userId'] == 1]

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
5,1,1263,2.0,1260759151
6,1,1287,2.0,1260759187
7,1,1293,2.0,1260759148
8,1,1339,3.5,1260759125
9,1,1343,2.0,1260759131


In [56]:
svd.predict(1, 302, 3)

Prediction(uid=1, iid=302, r_ui=3, est=2.83489595952931, details={'was_impossible': False})

This recommender system's most surprising characteristic is that it doesn't care what the movie is (or contains). It operates only on the basis of an allocated movie ID, attempting to anticipate ratings based on how other users have forecasted the film.

I plan to create a basic hybrid recommender that combines techniques from our content-based and collaborative filter-based engines. Here's how it will function:

* **Input:** User ID and movie title
* **Output:** Similar movies arranged by expected user ratings.

In [57]:
def convert_int(x):
    try:
        return int(x)
    except:
        return np.nan

In [58]:
id_map = pd.read_csv('/kaggle/input/the-movies-dataset/links_small.csv')[['movieId', 'tmdbId']]
id_map['tmdbId'] = id_map['tmdbId'].apply(convert_int)
id_map.columns = ['movieId', 'id']
id_map = id_map.merge(nmm[['title', 'id']], on='id').set_index('title')
#id_map = id_map.set_index('immbId')

In [59]:
indices_map = id_map.set_index('id')

In [60]:
def hybrid(userId, title):
    idx = indices[title]
    immbId = id_map.loc[title]['id']
    #print(idx)
    movie_id = id_map.loc[title]['movieId']
    
    dmm_scores = list(enumerate(cosine_sim[int(idx)]))
    dmm_scores = sorted(dmm_scores, key=lambda x: x[1], reverse=True)
    dmm_scores = dmm_scores[1:26]
    movie_indices = [i[0] for i in dmm_scores]
    
    movies = nmm.iloc[movie_indices][['title', 'vote_count', 'vote_average', 'year', 'id']]
    movies['est'] = movies['id'].apply(lambda x: svd.predict(userId, indices_map.loc[x]['movieId']).est)
    movies = movies.sort_values('est', ascending=False)
    return movies.head(10)

In [61]:
hybrid(1, 'Nausicaä of the Valley of the Wind')

Unnamed: 0,title,vote_count,vote_average,year,id,est
6071,Howl's Moving Castle,2049.0,8.2,2004,4935,3.504338
4245,Spirited Away,3968.0,8.3,2001,129,3.301378
4428,My Neighbor Totoro,1730.0,8.0,1988,8392,3.195249
2417,Princess Mononoke,2041.0,8.2,1997,128,3.182531
7167,Ponyo,953.0,7.5,2008,12429,2.922096
5879,Porco Rosso,563.0,7.6,1992,11621,2.89462
4618,Castle in the Sky,877.0,7.8,1986,10515,2.886654
8427,The Wind Rises,720.0,7.7,2013,149870,2.781065
8494,One Piece Film Strong World,68.0,7.4,2009,41498,2.77866
3983,Vampire Hunter D: Bloodlust,92.0,7.0,2000,15999,2.742469


In [62]:
hybrid(500, 'Nausicaä of the Valley of the Wind')

Unnamed: 0,title,vote_count,vote_average,year,id,est
4245,Spirited Away,3968.0,8.3,2001,129,3.826579
6071,Howl's Moving Castle,2049.0,8.2,2004,4935,3.60646
4428,My Neighbor Totoro,1730.0,8.0,1988,8392,3.590125
4618,Castle in the Sky,877.0,7.8,1986,10515,3.414915
5857,Kiki's Delivery Service,768.0,7.6,1989,16859,3.410199
5879,Porco Rosso,563.0,7.6,1992,11621,3.393016
7268,Fullmetal Alchemist the Movie: Conqueror of Sh...,74.0,7.0,2005,14003,3.307907
7167,Ponyo,953.0,7.5,2008,12429,3.282658
6271,Final Fantasy VII: Advent Children,290.0,6.7,2005,647,3.265512
8427,The Wind Rises,720.0,7.7,2013,149870,3.26476


We can observe that our hybrid recommender generates various recommendations for different users, even though the movie is the same. As a result, our recommendations are more personalised and targeted towards specific consumers.