# Business Problem

A client - an independent movie company that prefers to remain anonymous - is interested in entering the streaming space.  The client recognizes that this space is competitive due to present offering.  However, the client still believes there is an opportunity based on its marketing analysis and backlog of independent films.

Before building the streaming service, the client has requested KBO Analytics to create a recommendation system.  KBO Analytics will address the first phase of this project by building a proof-of-concept based on the MovieLens dataset.

# Data Understanding

The data for examing the aforementioned problem comes from the following source: [MovieLens](https://grouplens.org/datasets/movielens/latest/)

Before beginning to create a recommendation system, I want to examine and become familiar with the dataset. I will conduct exploratory data analysis (EDA) in order to understand the dataset attributes, which includes, but not limited to the following:

1. Number of Columns
2. Number of Rows
3. Column Names
4. Format of the data in each column

In [1]:
# Importing Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

from surprise import Reader, Dataset
from surprise.model_selection import train_test_split
from surprise.model_selection import cross_validate
from surprise.prediction_algorithms import SVD
from surprise.prediction_algorithms import KNNWithMeans, KNNBasic, KNNBaseline
from surprise.model_selection import GridSearchCV

There are a total of four csv files associated with MovieLens.  They are the following:

- *links.csv*
- *movies.csv*
- *ratings.csv*
- *tags.csv*

I will investigate each of the aforementioned files in order to further understand how I will build the recommendation system.

## Links.csv ##

In [2]:
# Reading the 'links.csv' tile into a dataframe

df_links = pd.read_csv('data/links.csv')

In [3]:
# Examining the first five rows of the dataframe

df_links.head()

Unnamed: 0,movieId,imdbId,tmdbId
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0
3,4,114885,31357.0
4,5,113041,11862.0


In [4]:
# Examining the last five rows of the dataframe

df_links.tail()

Unnamed: 0,movieId,imdbId,tmdbId
9737,193581,5476944,432131.0
9738,193583,5914996,445030.0
9739,193585,6397426,479308.0
9740,193587,8391976,483455.0
9741,193609,101726,37891.0


In [5]:
# Examining the dataframe

df_links.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9742 entries, 0 to 9741
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   movieId  9742 non-null   int64  
 1   imdbId   9742 non-null   int64  
 2   tmdbId   9734 non-null   float64
dtypes: float64(1), int64(2)
memory usage: 228.5 KB


In [6]:
# Examining missing values in each column

df_links.isna().sum()

movieId    0
imdbId     0
tmdbId     8
dtype: int64

In [7]:
# Examining dataframe for duplicate data

df_links.duplicated().sum()

0

## Movies.csv ##

In [8]:
# Reading the 'movies.csv' tile into a dataframe

df_movies = pd.read_csv('data/movies.csv')

In [9]:
# Examining the first 5 rows of the dataframe

df_movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [10]:
# Examining the last 5 rows of the dataframe

df_movies.tail()

Unnamed: 0,movieId,title,genres
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation
9741,193609,Andrew Dice Clay: Dice Rules (1991),Comedy


In [11]:
# Examining missing values in each column

df_movies.isna().sum()

movieId    0
title      0
genres     0
dtype: int64

In [12]:
# Examining dataframe for duplicate data

df_movies.duplicated().sum()

0

## Ratings.csv ##

In [13]:
# Reading the 'ratings.csv' tile into a dataframe

df_ratings = pd.read_csv('data/ratings.csv')

In [14]:
df_ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [15]:
df_ratings.tail()

Unnamed: 0,userId,movieId,rating,timestamp
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352
100835,610,170875,3.0,1493846415


In [16]:
df_ratings.isna().sum()

userId       0
movieId      0
rating       0
timestamp    0
dtype: int64

In [17]:
df_ratings.duplicated().sum()

0

## Tags.csv ##

In [18]:
df_tags = pd.read_csv('data/tags.csv')

In [19]:
df_tags.head()

Unnamed: 0,userId,movieId,tag,timestamp
0,2,60756,funny,1445714994
1,2,60756,Highly quotable,1445714996
2,2,60756,will ferrell,1445714992
3,2,89774,Boxing story,1445715207
4,2,89774,MMA,1445715200


In [20]:
df_tags.tail()

Unnamed: 0,userId,movieId,tag,timestamp
3678,606,7382,for katie,1171234019
3679,606,7936,austere,1173392334
3680,610,3265,gun fu,1493843984
3681,610,3265,heroic bloodshed,1493843978
3682,610,168248,Heroic Bloodshed,1493844270


In [21]:
df_tags.isna().sum()

userId       0
movieId      0
tag          0
timestamp    0
dtype: int64

In [22]:
df_tags.duplicated().sum()

0

## Data Understanding | Conclusion

I have created dataframes for all of the csv files - *links.csv*, *movies.csv*, *ratings.csv*, and *tags.csv*.  Since the *ratings.csv* file has the movie ratings, I will use it to build the recommendation system.  In addition, the *movies.csv* file has the name of the movies associated with the movie id's.  I will concantenate the dataframes for the respective *movies.csv* file and *ratings.csv* file.

# Data Preparation

I will transition to the data preparation stage.  I will clean the *ratings.csv* file by dropping the timestamp column.  I will also concantenate the *movies.csv* file with the *ratings.csv* file.

## Cleaning Ratings.csv file

In [23]:
# Removing the 'timestamp' column from the 'ratings.csv' file

df_ratings.drop('timestamp', axis=1, inplace=True)

In [24]:
# Checking to see whether or not the 'timestamp' column has been removed from the 'ratings.csv' file

df_ratings.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


## Combining Ratings.csv file and Movies.csv file

In [25]:
# Checking the shape of the 'df_ratings' dataframe

df_ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100836 entries, 0 to 100835
Data columns (total 3 columns):
 #   Column   Non-Null Count   Dtype  
---  ------   --------------   -----  
 0   userId   100836 non-null  int64  
 1   movieId  100836 non-null  int64  
 2   rating   100836 non-null  float64
dtypes: float64(1), int64(2)
memory usage: 2.3 MB


In [26]:
# Concatenating the 'ratings.csv' and the 'movies.csv' files

df_ratings_movies = pd.merge(df_ratings, df_movies, on='movieId', how='left')

In [27]:
# Examining the initial rows of the new dataframe, 'df_ratings_movies'

df_ratings_movies.head()

Unnamed: 0,userId,movieId,rating,title,genres
0,1,1,4.0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,1,3,4.0,Grumpier Old Men (1995),Comedy|Romance
2,1,6,4.0,Heat (1995),Action|Crime|Thriller
3,1,47,5.0,Seven (a.k.a. Se7en) (1995),Mystery|Thriller
4,1,50,5.0,"Usual Suspects, The (1995)",Crime|Mystery|Thriller


In [28]:
# Examining the shape of the new dataframe, 'df_ratings_movies'

df_ratings_movies.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 100836 entries, 0 to 100835
Data columns (total 5 columns):
 #   Column   Non-Null Count   Dtype  
---  ------   --------------   -----  
 0   userId   100836 non-null  int64  
 1   movieId  100836 non-null  int64  
 2   rating   100836 non-null  float64
 3   title    100836 non-null  object 
 4   genres   100836 non-null  object 
dtypes: float64(1), int64(2), object(2)
memory usage: 4.6+ MB


In [29]:
# Dropping the 'genres' column from the new dataframe, 'df_ratings_movies'

df_ratings_movies.drop('genres', axis=1, inplace=True)

In [30]:
# Examining the initial rows of the new dataframe, 'df_ratings_movies', after the 'genres' column has been dropped

df_ratings_movies.head()

Unnamed: 0,userId,movieId,rating,title
0,1,1,4.0,Toy Story (1995)
1,1,3,4.0,Grumpier Old Men (1995)
2,1,6,4.0,Heat (1995)
3,1,47,5.0,Seven (a.k.a. Se7en) (1995)
4,1,50,5.0,"Usual Suspects, The (1995)"


In [31]:
# Examining the shape of the new dataframe, 'df_ratings_movies', after the 'genres' column has been dropped

df_ratings_movies.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 100836 entries, 0 to 100835
Data columns (total 4 columns):
 #   Column   Non-Null Count   Dtype  
---  ------   --------------   -----  
 0   userId   100836 non-null  int64  
 1   movieId  100836 non-null  int64  
 2   rating   100836 non-null  float64
 3   title    100836 non-null  object 
dtypes: float64(1), int64(2), object(1)
memory usage: 3.8+ MB


In [32]:
# Examining whether or no the new dataframe, 'df_ratings_movies', has any missing values

df_ratings_movies.isna().sum()

userId     0
movieId    0
rating     0
title      0
dtype: int64

In [33]:
# Editing the 'df_ratings_movies' by dropping the 'title column'

df_ratings_movies_edited = df_ratings_movies.drop('title', axis=1)

In [34]:
df_ratings_movies_edited.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


# Modeling

## Model Creation

### Single Value Decomposition (SVD) via Grid Search

In [35]:
# Read in the 'df_ratings_movies' values as a surprise dataset

reader = Reader()

data = Dataset.load_from_df(df_ratings_movies_edited,reader)

In [36]:
# Define your parameter grid
params = {'n_factors': [20, 35, 50, 75, 100],
          'reg_all': [0.02, 0.035, 0.05, 0.075, 0.1]}

# Initialize GridSearchCV
g_s_svd = GridSearchCV(SVD, param_grid=params, measures=['rmse'], cv=5, n_jobs=-1)

# Fit the GridSearchCV to the data
g_s_svd.fit(data)

# Best RMSE score
print(f"Best RMSE score: {g_s_svd.best_score['rmse']}")

# Best parameters
print(f"Best parameters: {g_s_svd.best_params['rmse']}")

Best RMSE score: 0.8690853351157871
Best parameters: {'n_factors': 75, 'reg_all': 0.05}


### K-Nearest Neighbors (KNN) Basic via Cross-Validation

In [37]:
# cross validating with KNNBasic

knn_basic = KNNBasic(sim_options={'name':'pearson', 'user_based':True})
cv_knn_basic = cross_validate(knn_basic, data, n_jobs=-1)

In [38]:
# Average RMSE score for the test set

for i in cv_knn_basic.items():
    print(i)
print('-----------------------')
print(np.mean(cv_knn_basic['test_rmse']))

('test_rmse', array([0.97787072, 0.9770339 , 0.97590855, 0.97487132, 0.96349881]))
('test_mae', array([0.75440722, 0.75216621, 0.75311799, 0.75362078, 0.74607994]))
('fit_time', (1.4682347774505615, 1.5607001781463623, 1.582892894744873, 1.51214599609375, 0.9395761489868164))
('test_time', (2.657402276992798, 2.5891613960266113, 2.5571200847625732, 2.560227632522583, 1.6982412338256836))
-----------------------
0.9738366611730971


### KNN Baseline via Cross-Validation

In [39]:
# Cross validating with KNNBaseline

knn_baseline = KNNBaseline(sim_options={'name':'pearson', 'user_based':True})
cv_knn_baseline = cross_validate(knn_baseline,data)

Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.


In [40]:
# Average score for the test set

for i in cv_knn_baseline.items():
    print(i)

np.mean(cv_knn_baseline['test_rmse'])

('test_rmse', array([0.88069632, 0.87981391, 0.86513915, 0.88673694, 0.87015412]))
('test_mae', array([0.66935645, 0.67322506, 0.66385051, 0.67364931, 0.66888665]))
('fit_time', (1.801255464553833, 1.7290058135986328, 1.0594933032989502, 1.2280616760253906, 0.9391648769378662))
('test_time', (3.9171085357666016, 2.8899221420288086, 2.4426634311676025, 2.047191619873047, 3.9419684410095215))


0.8765080885358015

### KNN with Means via Cross-Validation

In [41]:
# Cross validating with KNNBaseline

knn_withmeans = KNNWithMeans(sim_options={'name':'pearson', 'user_based':True})
cv_knn_withmeans = cross_validate(knn_withmeans,data)

Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.


In [42]:
# Average score for the test set

for i in cv_knn_withmeans.items():
    print(i)

np.mean(cv_knn_withmeans['test_rmse'])

('test_rmse', array([0.89056916, 0.89199515, 0.89936038, 0.90032861, 0.89587502]))
('test_mae', array([0.6780599 , 0.67822686, 0.6861357 , 0.68335886, 0.68360728]))
('fit_time', (3.420161008834839, 0.9071965217590332, 0.8409550189971924, 0.8876984119415283, 1.1982500553131104))
('test_time', (3.8175079822540283, 1.8948354721069336, 1.5553834438323975, 1.5941004753112793, 3.030090093612671))


0.8956256646889171

### Model Creation | Conclusion

I created a total of four collaborative filtering models.  The four models, with their associated Root-Mean Square Error (RMSE) metrics, are the following:

- SVD - RMSE: 0.8692
- KNN Basic - RMSE: 0.9727
- KNN Baseline - RMSE: 0.8779
- KNN with Means - RMSE: 0.8976

Since the Single Value Decomposition Model has the lowest RMSE score, or 0.8692, I will proceed to use the SVD model to make predictions.

## Making Predictions

I am going to use the SVD model, the model with the lowest RMSE out of the previous four models, to make predictions.  I will proceed with the following steps:

1. Create user ratings by rating some of the movies within the present dataset, or *df_ratings_movies_edited* dataframe
2. Add my user ratings to the *df_ratings_movies_edited* dataframe
3. Generate predictions via the SVD model

In [43]:
# Exploring the first 60 movies of the df_ratings_movies dataframe

df_ratings_movies.head(60)

Unnamed: 0,userId,movieId,rating,title
0,1,1,4.0,Toy Story (1995)
1,1,3,4.0,Grumpier Old Men (1995)
2,1,6,4.0,Heat (1995)
3,1,47,5.0,Seven (a.k.a. Se7en) (1995)
4,1,50,5.0,"Usual Suspects, The (1995)"
5,1,70,3.0,From Dusk Till Dawn (1996)
6,1,101,5.0,Bottle Rocket (1996)
7,1,110,4.0,Braveheart (1995)
8,1,151,5.0,Rob Roy (1995)
9,1,157,5.0,Canadian Bacon (1995)


In [44]:
# Exploring the last 60 movies of the df_ratings_movies dataframe

df_ratings_movies.tail(60)

Unnamed: 0,userId,movieId,rating,title
100776,610,138036,3.5,The Man from U.N.C.L.E. (2015)
100777,610,138210,3.5,13 Hours (2016)
100778,610,138610,1.5,The Gallows (2015)
100779,610,138632,5.0,Tokyo Tribe (2014)
100780,610,139385,4.5,The Revenant (2015)
100781,610,139511,3.0,Exte: Hair Extensions (2007)
100782,610,139644,4.5,Sicario (2015)
100783,610,139655,3.0,Goodnight Mommy (Ich seh ich seh) (2014)
100784,610,140174,4.0,Room (2015)
100785,610,140247,3.5,The Gift (2015)


My list of movies with associated movie id's and my ratings are below:
    
- Toy Story - Movie ID: 1, Rating: 3.0
- Grumpier Old Men - Movie ID: 3, Rating: 4.0
- Heat - Movie ID: 6, Rating: 5.0
- Braveheart - Movie ID: 110, Rating: 5.0
- Billy Madison - Movie ID: 216, Rating: 3.0
- Forrest Gump - Movie ID: 356, Rating: 4.0
- The Jungle Book - Movie ID: 362, Rating: 2.5
- The Mask - Movie ID: 367, Rating: 3.5
- The Three Musketeers - Movie ID: 552, Rating: 2.5
- Batman - Movie ID: 592, Rating: 5.0
- Mission: Impossible - Movie ID: 648, Rating: 3.5 
- Space Jam - Movie ID: 673, Rating: 2.0
- Independence Day - Movie ID: 780, Rating: 4.0
- The Sword In The Stone - Movie ID: 1025, Rating: 5.0  
- Dumbo - Movie ID: 1029, Rating: 2.5 
- Alice in Wonderland - Movie ID: 1032, Rating: 2.5
- The Revenant - Movie ID: 139385, Rating: 5.0 
- Sicario - Movie ID: 139644, Rating: 3.5
- The Big Short - Movie ID: 148626, Rating: 4.5
- Blair Witch - Movie ID: 163937, Rating: 3.5
- Arrival - Movie ID: 164179, Ratings: 4.0
- Rogue One: A Star Wars Story - Movie ID: 166528, Rating: 3.5
- John Wick: Chapter Two - Movie ID: 168248, Rating: 4.5
- Get Out - Movie ID: 168250, Rating: 5.0
- Logan - Movie ID: 168252, Rating: 4.0
- The Fate of the Furious - Movie ID: 170875, Rating: 3.5

The highest user ID number in the df_ratings_movies dataframe is number 610.  As a result, I will assign myself the user ID number of 611.

In [45]:
# Creating a list for the movies I have rated

my_user_ratings = [{'userId': 611, 'movieId': 1, 'rating': 3.0},
                   {'userId': 611, 'movieId': 3, 'rating': 4.0},
                   {'userId': 611, 'movieId': 6, 'rating': 5.0},
                   {'userId': 611, 'movieId': 110, 'rating': 5.0},
                   {'userId': 611, 'movieId': 216, 'rating': 3.0},
                   {'userId': 611, 'movieId': 356, 'rating': 4.0},
                   {'userId': 611, 'movieId': 362, 'rating': 2.5},
                   {'userId': 611, 'movieId': 367, 'rating': 3.5},
                   {'userId': 611, 'movieId': 552, 'rating': 2.5},
                   {'userId': 611, 'movieId': 592, 'rating': 5.0},
                   {'userId': 611, 'movieId': 648, 'rating': 3.5},
                   {'userId': 611, 'movieId': 673, 'rating': 2.0},
                   {'userId': 611, 'movieId': 780, 'rating': 4.0},
                   {'userId': 611, 'movieId': 1025, 'rating': 5.0},
                   {'userId': 611, 'movieId': 1029, 'rating': 2.5},
                   {'userId': 611, 'movieId': 1032, 'rating': 2.5},
                   {'userId': 611, 'movieId': 139385, 'rating': 5.0},
                   {'userId': 611, 'movieId': 139644, 'rating': 3.5},
                   {'userId': 611, 'movieId': 148626, 'rating': 4.5},
                   {'userId': 611, 'movieId': 163937, 'rating': 3.5},
                   {'userId': 611, 'movieId': 164179, 'rating': 4.0},
                   {'userId': 611, 'movieId': 166528, 'rating': 3.5},
                   {'userId': 611, 'movieId': 168248, 'rating': 4.5},
                   {'userId': 611, 'movieId': 168250, 'rating': 5.0},
                   {'userId': 611, 'movieId': 168252, 'rating': 4.0},
                   {'userId': 611, 'movieId': 170875, 'rating': 3.5}]

In [46]:
# Viewing whether or not the list I created for the movies I rated has been captured

my_user_ratings

[{'userId': 611, 'movieId': 1, 'rating': 3.0},
 {'userId': 611, 'movieId': 3, 'rating': 4.0},
 {'userId': 611, 'movieId': 6, 'rating': 5.0},
 {'userId': 611, 'movieId': 110, 'rating': 5.0},
 {'userId': 611, 'movieId': 216, 'rating': 3.0},
 {'userId': 611, 'movieId': 356, 'rating': 4.0},
 {'userId': 611, 'movieId': 362, 'rating': 2.5},
 {'userId': 611, 'movieId': 367, 'rating': 3.5},
 {'userId': 611, 'movieId': 552, 'rating': 2.5},
 {'userId': 611, 'movieId': 592, 'rating': 5.0},
 {'userId': 611, 'movieId': 648, 'rating': 3.5},
 {'userId': 611, 'movieId': 673, 'rating': 2.0},
 {'userId': 611, 'movieId': 780, 'rating': 4.0},
 {'userId': 611, 'movieId': 1025, 'rating': 5.0},
 {'userId': 611, 'movieId': 1029, 'rating': 2.5},
 {'userId': 611, 'movieId': 1032, 'rating': 2.5},
 {'userId': 611, 'movieId': 139385, 'rating': 5.0},
 {'userId': 611, 'movieId': 139644, 'rating': 3.5},
 {'userId': 611, 'movieId': 148626, 'rating': 4.5},
 {'userId': 611, 'movieId': 163937, 'rating': 3.5},
 {'userId':

In [52]:
# Adding my movie ratings to the 'df_ratings_movies_edited' dataframe

my_user_ratings_df = pd.DataFrame(my_user_ratings)
new_ratings_df = pd.concat([df_ratings_movies_edited, my_user_ratings_df], axis=0)
new_dataset = Dataset.load_from_df(new_ratings_df,reader)

In [54]:
# Train a model using the new combined DataFrame
## Note: Based on the grid search optimization I performed for SVD, the optimal parameters are the following: 
## {'n_factors': 75, 'reg_all': 0.05}

optimal_svd = SVD(n_factors= 75, reg_all=0.05)
optimal_svd.fit(new_dataset.build_full_trainset())

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x2ac589802b0>

In [57]:
# Making predictions 

list_of_movies = []
for m_id in df_ratings_movies_edited['movieId'].unique():
    list_of_movies.append( (m_id,optimal_svd.predict(1000,m_id)[3]))

In [58]:
# Ordering the predictions from highest to lowest rated

ranked_movies = sorted(list_of_movies, key=lambda x:x[1], reverse=True)

In [59]:
# Creating a function to return the top 'n' recommendations

def recommended_movies(user_ratings,movie_title_df,n):
        for idx, rec in enumerate(user_ratings):
            title = movie_title_df.loc[movie_title_df['movieId'] == int(rec[0])]['title']
            print('Recommendation # ', idx+1, ': ', title, '\n')
            n-= 1
            if n == 0:
                break

In [61]:
# Utilizing the "recommended_movies" function to return the top 10 movie recommendations

recommended_movies(ranked_movies,df_movies,10)

Recommendation #  1 :  277    Shawshank Redemption, The (1994)
Name: title, dtype: object 

Recommendation #  2 :  602    Dr. Strangelove or: How I Learned to Stop Worr...
Name: title, dtype: object 

Recommendation #  3 :  906    Lawrence of Arabia (1962)
Name: title, dtype: object 

Recommendation #  4 :  841    Streetcar Named Desire, A (1951)
Name: title, dtype: object 

Recommendation #  5 :  2226    Fight Club (1999)
Name: title, dtype: object 

Recommendation #  6 :  686    Rear Window (1954)
Name: title, dtype: object 

Recommendation #  7 :  659    Godfather, The (1972)
Name: title, dtype: object 

Recommendation #  8 :  680    Philadelphia Story, The (1940)
Name: title, dtype: object 

Recommendation #  9 :  2462    Boondock Saints, The (2000)
Name: title, dtype: object 

Recommendation #  10 :  46    Usual Suspects, The (1995)
Name: title, dtype: object 



## Cold-Start Problem

# Overall Conclusion and Recommendations

## Overall Conclusion

## Recommendations