# Movie Recommendation using Netflix Movie Reviews




This project aims to build a movie recommendation system using Netflix Movie Ratings. There are 17337458 Ratings given by 143458 users to 1350 movies. Ratings are in the form of Integer i.e. 1 - 5


**Table of Content**



#### 1.  Load Rating Data
#### 2.  Load Movie Data
#### 3.  Analyze Data
#### 4.  Recommendation Model
#### 4.1 Collaborative Filtering - SVD

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
! pip install scikit-surprise

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
from surprise import Reader, Dataset, SVD
from surprise import accuracy
from surprise.model_selection import train_test_split

# 1. Load Rating Data

In [4]:
df = pd.read_csv('Netflix_Dataset_Rating.csv')
df

Unnamed: 0,User_ID,Rating,Movie_ID
0,712664,5,3
1,1331154,4,3
2,2632461,3,3
3,44937,5,3
4,656399,4,3
...,...,...,...
4296405,208022,5,1174
4296406,593939,3,1174
4296407,81478,5,1174
4296408,2162151,4,1174


In [5]:
df.dtypes

User_ID     int64
Rating      int64
Movie_ID    int64
dtype: object

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4296410 entries, 0 to 4296409
Data columns (total 3 columns):
 #   Column    Dtype
---  ------    -----
 0   User_ID   int64
 1   Rating    int64
 2   Movie_ID  int64
dtypes: int64(3)
memory usage: 98.3 MB


In [7]:
df['Rating'].describe().astype('int')

count    4296410
mean           3
std            1
min            1
25%            3
50%            4
75%            4
max            5
Name: Rating, dtype: int64

In [8]:
print("Unique Values :\n",df.nunique())

Unique Values :
 User_ID     143457
Rating           5
Movie_ID       372
dtype: int64


# 2. Load Movie Data

In [9]:
df_title = pd.read_csv('Netflix_Dataset_Movie.csv')
df_title

Unnamed: 0,Movie_ID,Year,Name
0,1,2003,Dinosaur Planet
1,2,2004,Isle of Man TT 2004 Review
2,3,1997,Character
3,4,1994,Paula Abdul's Get Up & Dance
4,5,2004,The Rise and Fall of ECW
...,...,...,...
17765,17766,2002,Where the Wild Things Are and Other Maurice Se...
17766,17767,2004,Fidel Castro: American Experience
17767,17768,2000,Epoch
17768,17769,2003,The Company


In [10]:
df_title.dtypes

Movie_ID     int64
Year         int64
Name        object
dtype: object

In [11]:
df_title.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17770 entries, 0 to 17769
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Movie_ID  17770 non-null  int64 
 1   Year      17770 non-null  int64 
 2   Name      17770 non-null  object
dtypes: int64(2), object(1)
memory usage: 416.6+ KB


In [12]:
df_title['Year'].describe().astype('int')

count    17770
mean      1990
std         16
min       1915
25%       1985
50%       1997
75%       2002
max       2005
Name: Year, dtype: int64

In [13]:
print("Unique Values :\n",df_title.nunique())

Unique Values :
 Movie_ID    17770
Year           91
Name        17297
dtype: int64


# 3. Analyze Data

In [14]:
no_of_rated_products_per_users = df.groupby(by='User_ID')['Rating'].count().sort_values(ascending=False)
no_of_rated_products_per_users.head()

User_ID
305344     370
2439493    366
387418     366
2118461    358
1664010    347
Name: Rating, dtype: int64

In [15]:
no_of_rated_products_per_users.describe()


count    143457.000000
mean         29.949114
std          20.564429
min           1.000000
25%          16.000000
50%          24.000000
75%          37.000000
max         370.000000
Name: Rating, dtype: float64

In [16]:
no_of_rated_products_per_movies = df.groupby(by='Movie_ID')['Rating'].count().sort_values(ascending=False)
no_of_rated_products_per_movies.head()

Movie_ID
571     101450
607      79476
30       77502
457      77312
1145     74461
Name: Rating, dtype: int64

In [17]:
no_of_rated_products_per_movies.describe()

count       372.000000
mean      11549.489247
std       16329.734105
min        1212.000000
25%        2524.250000
50%        4794.500000
75%       12600.000000
max      101450.000000
Name: Rating, dtype: float64

In [18]:
f = ['count','mean']
df_movie_summary = df.groupby('Movie_ID')['Rating'].agg(f)
df_movie_summary.index = df_movie_summary.index.map(int)
movie_benchmark = round(df_movie_summary['count'].quantile(0.7),0)
drop_movie_list = df_movie_summary[df_movie_summary['count'] < movie_benchmark].index

df__title = df_title.set_index('Movie_ID')

# 4. Recommendation Model


## 4.1 Collaborative Filtering - SVD

In [19]:
model = SVD(n_epochs=10,verbose = True)

data = Dataset.load_from_df(df[['User_ID', 'Movie_ID', 'Rating']], Reader())

trainset, testset = train_test_split(data, test_size=0.3,random_state=10)

trainset = data.build_full_trainset()

model.fit(trainset)

Processing epoch 0
Processing epoch 1
Processing epoch 2
Processing epoch 3
Processing epoch 4
Processing epoch 5
Processing epoch 6
Processing epoch 7
Processing epoch 8
Processing epoch 9


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7fde00736a30>

In [20]:
predictions = model.test(testset)

accuracy.rmse(predictions, verbose=True)

RMSE: 0.8716


0.8716174713950433

In [21]:
def Recommendation(given_user_id,n_movies):
    given_user = df_title.copy()
    given_user = given_user.reset_index()
    given_user = given_user[~given_user['Movie_ID'].isin(drop_movie_list)]


    given_user['Estimated_Rating'] = given_user['Movie_ID'].apply(lambda x: model.predict(given_user_id, x).est)

    given_user = given_user.drop('Movie_ID', axis = 1)

    given_user = given_user.sort_values('Estimated_Rating', ascending=False)
    given_user.drop(['index'], axis = 1,inplace=True)
    given_user.reset_index(inplace=True,drop=True)
    return given_user.head(n_movies)

### Movie Recommendation for User - 712664

In [22]:
Recommendation(712664,10)

Unnamed: 0,Year,Name,Estimated_Rating
0,1954,Seven Samurai,4.820601
1,1989,The Simpsons: Season 1,4.595253
2,1992,Reservoir Dogs,4.534182
3,1959,North by Northwest,4.531993
4,2003,Chappelle's Show: Season 1,4.497407
5,1997,Princess Mononoke,4.449832
6,1981,Sense and Sensibility,4.436528
7,1951,A Streetcar Named Desire,4.409705
8,2001,Sex and the City: Season 4,4.356639
9,1975,Three Days of the Condor,4.343841


### Movie Recommendation for User - 2643029

In [23]:
Recommendation(2643029,10)

Unnamed: 0,Year,Name,Estimated_Rating
0,2001,Sex and the City: Season 4,4.494995
1,1992,Reservoir Dogs,4.363032
2,1981,Sense and Sensibility,4.306532
3,1954,Seven Samurai,4.296197
4,1989,The Simpsons: Season 1,4.271719
5,2004,Ray,4.223994
6,1959,North by Northwest,4.215641
7,1991,Fried Green Tomatoes,4.188042
8,2002,Rabbit-Proof Fence,4.174016
9,2003,Chappelle's Show: Season 1,4.147469
