# Recommender Systems

### Learning Objectives:
- [Introduction: Simple Recommender Systems](#Introduction:-Simple-Recommeder-Systems)
- [Offline & Online Evaluation](#Offline-&-Online-Evaluation)
- [Content-based Recommenders](#Content\-based-Recommenders)
- [Collaborative-filtering](#Collaborative\-filtering)
- [Hybrid Systems](#Hybrid-Systems)


# Introduction: Simple Recommender Systems

__Recommender systems__, also referred to as __recommendation systems__, are filtering systems used by many different companies world-wide to be able to recommend products (e.g. movies, clothes, etc) based on user preferences. We obviously cannot recommend _exactly_ what a user wants as we cannot access or process all the information in their brain at the same time. Instead, we can take advantage or users'
past ratings and preferences to __predict__ the ratings of products users had not purchased/rated before and use these to estimate user preferences.

How do these systems do what they do? This is question that has become a large topic of research and the current answer is that there are mutliple ways to create recommender systems: each working under different assumptions and algorithms. There are two main broad classifications that we will cover shortly: __content-based recommendation__ (item-centred) and __collaborative filtering__ (user-centred). 

Before we dive into these, we will cover what is informally referred to as a simple recommender system: a system that uses the weighted average rating from all users to make recommendations on the "best" options. Throughout this notebook, we will use the "Movies Dataset" from [Kaggle](https://www.kaggle.com/rounakbanik/the-movies-dataset), where the full version contains information on over 45,000 movies with 26 million ratings from  270,000 users. We will be using the small version, as shown below:

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

In [2]:
# Importing ratings
ratings = pd.read_csv("../DATA/ratings_small.csv")
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [3]:
# Importing movie metadata
metadata = pd.read_csv("../DATA/movies_metadata.csv")
metadata.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


For our simple recommender system, we will use the IMDB's known __weighted average formula__ used for their Top Movies Chart, given as follows:

$$ R_{W} = (\frac{v}{v + m})R + (\frac{m}{v + m})C  $$

Where:
- $R_{W}$ is the weighted average movie rating
- $v$ is the number of votes for that movie title
- $m$ is the minimum number of votes required to be in the top Chart
- $R$ is the average rating of that movie title
- $C$ is the mean vote rating across all movies

We can now begin our calculations to construct our simple recommender:

In [4]:
# Computing mean vote count across all movies
vote_counts = metadata[metadata['vote_count'].notnull()]['vote_count'].astype('int')
vote_averages = metadata[metadata['vote_average'].notnull()]['vote_average'].astype('int')
C = vote_averages.mean()
C

5.244896612406511

We must now choose a value for the minimum number. In this case, we will choose a value $m$ that gives us movies that have received more votes than 95% of the other remaining movies.

In [5]:
# Computing minimum number of votes required
m = vote_counts.quantile(0.95)
print(m)

434.0


We can now extract the movies that are considered to be canditates for the top charts in our recommender system given our computed 'm'.

In [6]:
# Extracting all movies that have a votecount that is greater than our m value
qualified = metadata[(metadata['vote_count'] >= m) & (metadata['vote_count'].notnull()) & (metadata['vote_average'].notnull())] \
                 [['title', 'release_date', 'vote_count', 'vote_average', 'popularity', 'genres']]
qualified['vote_count'] = qualified['vote_count'].astype('int')
qualified['vote_average'] = qualified['vote_average'].astype('int')
qualified.shape

(2274, 6)

In [7]:
# Computing weighted average and determining top 250 chart
def weighted_rating(x):
    v = x['vote_count']
    R = x['vote_average']
    return (v/(v+m) * R) + (m/(m+v) * C)

qualified['weighted_average'] = qualified.apply(weighted_rating, axis=1)
qualified = qualified.sort_values('weighted_average', ascending=False).head(250)
qualified

Unnamed: 0,title,release_date,vote_count,vote_average,popularity,genres,weighted_average
15480,Inception,2010-07-14,14075,8,29.1081,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",7.917588
12481,The Dark Knight,2008-07-16,12269,8,123.167,"[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name...",7.905871
22879,Interstellar,2014-11-05,11187,8,32.2135,"[{'id': 12, 'name': 'Adventure'}, {'id': 18, '...",7.897107
2843,Fight Club,1999-10-15,9678,8,63.8696,"[{'id': 18, 'name': 'Drama'}]",7.881753
4863,The Lord of the Rings: The Fellowship of the Ring,2001-12-18,8892,8,32.0707,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",7.871787
...,...,...,...,...,...,...,...
2006,Indiana Jones and the Temple of Doom,1984-05-23,2841,7,15.8023,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...",6.767415
16129,The King's Speech,2010-09-06,2817,7,11.2604,"[{'id': 18, 'name': 'Drama'}, {'id': 36, 'name...",6.765698
895,Sunset Boulevard,1950-08-10,533,8,11.7098,"[{'id': 18, 'name': 'Drama'}]",6.763480
9888,Sin City,2005-04-01,2755,7,15.0105,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",6.761143


Great, so now we have the top charts. This chart can be carried further to become a simple recommender system by recommending the top movies in the charts to all users. Is this a good recommender system? Not particularly. By taking a global weighted average we are able to determine which ones are considered the best on average, but we are unable to account for the individual preferences of the users. For instance, if I was a fan of exclusively Romcoms, I would be recommended only movies I dislike from the list above. The same would happen to people who really don't like action and thrillers. Therefore, we must use our data to instead be able to account for individual preferences! Therefore, simple recommender systems like this that generate global recommendations are generally only used for users who the system has collected little data from.

By accounting for individual user preferences, we would likely achieve a higher score. But how can we determine which system is better? This leads us to the two methods of recommender system evaluation: __offline evaluation__ and __online evaluation.__

# Offline & Online Evaluation

How can we tell that our recommender system is doing what it is supposed to? There are two different approaches to evaluating our system:

- __Offline evaluation:__ Use data we already have and evaluation metrics to compute numeric efectiveness measures that can be tuned for and/or compared. These are the same evaluation metrics which we have encountered and used to assess the performance of our models
- __Online evaluation:__  involves using a live system, and tracking user-related behaviors such as dwell-times, click-through rates, and purchase conversions

When carrying offline evaluation, we can split our data into a training and a test dataset just as we have seen before to ensure that we are tuning our systems appropriately. On the other hand, online evaluation enables us to capture aspects of the performance of our system that offline methods cannot. Whether offline evaluation, online evaluation or a combination of both is the best method to evaluate our system's performance still remains a topic of research. For the purposes of this notebook, we will only be covering simplistic forms of offline evaluation.

# Content-based Recommenders
We can now begin to understand the first sub-class of recommendation systems: __content-based recommenders.__ Let us look at the recommendation problem in the context of our movies dataset. It is intuitive to say that we would like to recommend romance movies to someone that has rated other romantic movies highly as opposed to action, or to recommend older films to users to who are fans of old classics, or even Batman movies to a Batman fan. In this context, we are looking at the characteristics (content) of each movie, and recommending movies that are similar to the previously highly rated movies by the same user. 

There are multiple approaches for the machinery of content-based recommenders. Most will either use the features of movies to predict whether you like or dislike a movie (classification) or to predict the rating the user would give to a movie they have not yet seen. Some might even use the features of a movie you have just watched and recommend the most similar movies to that given movie given their respective features. We will be creating our own algorithm to predict the ratings of unseen movies and recommend those that are rated the highest.

In modelling terms, we can frame the problem of recommendation as using the __features__ of movies watched by a user and the ratings given to each movie to __predict__ the rating the user would give to a movie not yet watched based on the movie's features. In other words, if we have enough data, we train a model for each user based on the previously watched movies and their features. The features we have chosen to use in our model are given genres, vote average, release date and runtime. While release date, runtime and vote average are available, we have to extract and process the genres for each movie. Given that we have a list of genres for each movie, we will have to dummy encode it as follows:
- Determine how many genres there are and make each genre a feature of the model
- Assign to each genre a zero if it is not present in the movie's list of genres, or 1 for each genre given each movie 


In [8]:
# Importing our data
links_small = pd.read_csv("../DATA/links_small.csv")
ratings_small = pd.read_csv("../DATA/ratings_small.csv")
md = pd.read_csv("../DATA/movies_metadata.csv")

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [9]:
# Displaying our data
md
# ratings_small.head()
#links_small

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45461,False,,0,"[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'n...",http://www.imdb.com/title/tt6209470/,439050,tt6209470,fa,رگ خواب,Rising and falling between a man and woman.,...,,0.0,90.0,"[{'iso_639_1': 'fa', 'name': 'فارسی'}]",Released,Rising and falling between a man and woman,Subdue,False,4.0,1.0
45462,False,,0,"[{'id': 18, 'name': 'Drama'}]",,111109,tt2028550,tl,Siglo ng Pagluluwal,An artist struggles to finish his work while a...,...,2011-11-17,0.0,360.0,"[{'iso_639_1': 'tl', 'name': ''}]",Released,,Century of Birthing,False,9.0,3.0
45463,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",,67758,tt0303758,en,Betrayal,"When one of her hits goes wrong, a professiona...",...,2003-08-01,0.0,90.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,A deadly game of wits.,Betrayal,False,3.8,6.0
45464,False,,0,[],,227506,tt0008536,en,Satana likuyushchiy,"In a small town live two brothers, one a minis...",...,1917-10-21,0.0,87.0,[],Released,,Satan Triumphant,False,0.0,0.0


In [10]:
# Making sure imdb_id matches between links and metadata
md["imdb_id"] = md["imdb_id"].str.strip("tt")

# Removing all movies without a genre, release_date or runtime
md["genres"].replace('[]', np.nan, inplace=True)
md.dropna(subset=["genres", "release_date", "imdb_id", "vote_average", "popularity"], inplace=True)

# Converting release_date to a POSIX timestamp float
md["release_date"] = pd.to_datetime(md["release_date"], infer_datetime_format=True)
md["release_date"] = md["release_date"].apply(lambda x:x.timestamp())

# Converting imdb_id to int
md["imdb_id"] = md["imdb_id"].astype('int64')

In [11]:
# Converting "genres" column from a dictionary from a list of strings, containing the respective genres
def extract_genres(x):
    genre_string = ''
    x = eval(x) # executes expression inside of string
    for dictionary in x:
        genre_string += dictionary["name"] + '|'
    return genre_string # include all but last one
md["genres"] = md["genres"].apply(extract_genres)

Now that we have preprocessed the data we are going to use for this model, we can extract the three columns: genre, release_data and budget. We will now also determine the unique features present in the dataset and use each as a feature. Note that this model assumes that all possible genres are included in the dataset.

In [12]:
# Initialising our features matrix
FEATURES = md[["imdb_id", "original_title", "release_date", "vote_average", "popularity"]]

# Finding unique genre names
GENRES = md["genres"]
unique_genres = list(set(GENRES.sum().split('|')[:-1])) # Don't include last element
print(unique_genres)

# Removing '|' from the end of each string
GENRES = GENRES.apply(lambda x:x[:-1])

['Thriller', 'Action', 'Western', 'Animation', 'Foreign', 'Adventure', 'Horror', 'Mystery', 'Documentary', 'TV Movie', 'Crime', 'Comedy', 'Fantasy', 'History', 'Music', 'Romance', 'Science Fiction', 'Family', 'War', 'Drama']


In [13]:
# Adding each genre as a feature
extended_features = GENRES.str.get_dummies()

# Horizontally stack our extended features and the original features
FEATURES = FEATURES.merge(extended_features, left_index=True, right_index=True)

In [14]:
from sklearn.preprocessing import MinMaxScaler

# Normalizing non-categorical features
scaler = MinMaxScaler()
FEATURES[['release_date', 'vote_average', 'popularity']] = scaler.fit_transform(FEATURES[['release_date', 'vote_average', 'popularity']])

Now that our features for each movie are set, we need to create a list of users, where each user contains the features and ratings of each of the movies they rated. To link the users and their ratings to the movies, we will need to use the intermediary "links" table. Be careful! We have dropped a few of the movies in the original dataset when the required feature was not available. First we will merge the appropriate dataframes.

In [15]:
# First Join: links JOIN FEATURES ON imdbId
FEATURES = FEATURES.rename(columns={'imdb_id':'imdbId'}) # making column names match
first_join = links_small.merge(FEATURES, on="imdbId")

# Second Join: ratings JOIN first_join on movieId
data_matrix = ratings_small.merge(first_join, on="movieId")

# Delete unnecessary columns
data_matrix.drop(['movieId', 'timestamp', 'imdbId', 'tmdbId'],axis='columns', inplace=True)

# Converting ratings from float to ints for Logistic Regression
# data_matrix["rating"] = data_matrix["rating"].apply(lambda x: 2*x).astype("int64")

After merging our dataframes, we will create a list of users, each with all the ratings provided by each user and features of the corresponding rated movie.

In [16]:
# Creating unique user list
unique_users = list(set(data_matrix["userId"]))
user_data = {}

# Adding movie ratings and features to the list of the corresponding user
data_copy = data_matrix.copy()
for user_id in unique_users:
    user_data[user_id] = data_copy[data_copy["userId"] == user_id]
    data_copy = data_copy[data_copy["userId"] != user_id]

In [17]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as MSE, mean_absolute_error as MAE

# Train regression model on each user!
user_models = {}
rmse = []
mae = []
r2 = []
for user_id in user_data.keys():
    # Get user data
    data = user_data[user_id]
    Y = data["rating"]
    X = data.drop(["userId", "rating", "original_title"], axis=1)
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)
    # Train model and predict on test data
    reg = RandomForestRegressor(n_estimators=80, max_depth=3, random_state=0).fit(X_train, Y_train)
    predictions = reg.predict(X_test)
    # Store RMSE and model
    rmse.append(MSE(Y_test, predictions, squared=False))
    mae.append(MAE(Y_test, predictions))
    r2.append(reg.score(X_test, Y_test))
    user_models[user_id] = reg # storing estimator for this user!

In [18]:
# Pick user with the largest number of samples to look at individually
max_user_id = 1
for user_id in user_data.keys():
    if (user_data[max_user_id].shape[0] < user_data[user_id].shape[0]):
        max_user_id = user_id

# Check the predictions of the recommender
model = user_models[max_user_id]
movies = FEATURES.iloc[:, 1:] #pd.concat([FEATURES.iloc[:, 1:], user_data[max_user_id].iloc[:, 2:]]).drop_duplicates(keep=False)

# Rating all movies with the model trained on the one user
predictions = pd.DataFrame({"rating": reg.predict(movies.iloc[:, 1:])})
rated_movies = predictions.merge(movies, left_index=True, right_index=True)

# Diplay top 10 rated and top 10 recommendations!
top_rated = user_data[max_user_id].nlargest(columns="rating", n=10)
top_rated

Unnamed: 0,userId,rating,original_title,release_date,vote_average,popularity,Action,Adventure,Animation,Comedy,...,History,Horror,Music,Mystery,Romance,Science Fiction,TV Movie,Thriller,War,Western
72,547,5.0,Dumbo,0.457934,0.68,0.026769,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
207,547,5.0,Nuovo Cinema Paradiso,0.780305,0.82,0.025895,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
250,547,5.0,The Deer Hunter,0.712204,0.78,0.014118,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
343,547,5.0,Gandhi,0.739448,0.74,0.022619,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
571,547,5.0,The French Connection,0.663135,0.74,0.012228,1,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1101,547,5.0,Sense and Sensibility,0.828718,0.72,0.019495,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3909,547,5.0,Pulp Fiction,0.820111,0.83,0.257449,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
7086,547,5.0,Schindler's List,0.814768,0.83,0.076212,0,0,0,0,...,1,0,0,0,0,0,0,0,1,0
8807,547,5.0,The Silence of the Lambs,0.795417,0.81,0.007867,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
9307,547,5.0,The Shawshank Redemption,0.820355,0.85,0.094332,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
# Top 10 recommendations
rated_movies.nlargest(columns="rating", n=10)

Unnamed: 0,rating,original_title,release_date,vote_average,popularity,Action,Adventure,Animation,Comedy,Crime,...,History,Horror,Music,Mystery,Romance,Science Fiction,TV Movie,Thriller,War,Western
539,4.714777,Striking Distance,0.813399,0.54,0.024459,1,0,0,0,1,...,0,0,0,1,0,0,0,1,0,0
0,4.632683,Toy Story,0.827893,0.77,0.040087,0,0,1,1,0,...,0,0,0,0,0,0,0,0,0,0
252,4.626761,Junior,0.82148,0.47,0.012256,0,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0
574,4.560571,The Celluloid Closet,0.829618,0.7,0.012392,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1132,4.537389,Nuovo Cinema Paradiso,0.780305,0.82,0.025895,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2395,4.514863,The Breaks,0.850674,0.52,0.000305,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1115,4.509719,Raw Deal,0.503084,0.72,0.018432,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
580,4.502915,Ghost,0.791592,0.69,0.021819,0,0,0,0,0,...,0,0,0,1,1,0,0,1,0,0
1186,4.502467,Annie Hall,0.700992,0.78,0.033144,0,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0
355,4.501639,I Love Trouble,0.818743,0.53,0.006522,1,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0


In [20]:
print("Mean RMSE over all users:", np.mean(rmse))
print("Mean MAE over all users:", np.mean(mae))
print("Mean R2 over all users:", np.mean(r2))
print()

Mean RMSE over all users: 0.8475740365109855
Mean MAE over all users: 0.6799642147659223
Mean R2 over all users: -0.11123908354644395



From the evaluation metrics, we can see that the prediction made by our model deviates on average by 0.68 from the true rating, which is not a large deviation, and is generally able to distinguish between 'okay', 'great' and 'terrible' movies. However, if we would like to recommend 10 movies out of roughly 45,000, our rating predictions need to be even better. Our low R-squared also implies that the model may be underfit. Therefore, this model may serve as a strong baseline model that can be improved by accounting for other features such as cast, directors, plot, amongst others.

# Collaborative-filtering

# Hybrid Systems