# Movie Recommendation System

## Introduction

One of the type of entertainment for human beings are watching movies. There are tens of thousands movies that had been produces up until now. People usually watched a movie based on the genre's that they liked. They also watched on different platforms and sometimes rates the movies. Based on that information, we would like to try creating a recommendation system for people that watched movies.

## Goals

We would like to build a movie's recommendation system based on Similarity between Items and Collaborative Filtering. For collaborative filtering, we would like to try both the user-based and item-based filtering.

## Dataset

The datasets used for this project were taken from Kaggle, with the title of [Getting Started with a Movie Recommendation System](https://www.kaggle.com/code/ibtesama/getting-started-with-a-movie-recommendation-system/data).

In [1]:
# Loading the library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from ast import literal_eval
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
# Loading the dataset
df_movies = pd.read_csv('movies_metadata.csv')
df_ratings = pd.read_csv('ratings_small.csv')

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [3]:
df_movies.head(5)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


In [4]:
df_movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45466 entries, 0 to 45465
Data columns (total 24 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   adult                  45466 non-null  object 
 1   belongs_to_collection  4494 non-null   object 
 2   budget                 45466 non-null  object 
 3   genres                 45466 non-null  object 
 4   homepage               7782 non-null   object 
 5   id                     45466 non-null  object 
 6   imdb_id                45449 non-null  object 
 7   original_language      45455 non-null  object 
 8   original_title         45466 non-null  object 
 9   overview               44512 non-null  object 
 10  popularity             45461 non-null  object 
 11  poster_path            45080 non-null  object 
 12  production_companies   45463 non-null  object 
 13  production_countries   45463 non-null  object 
 14  release_date           45379 non-null  object 
 15  re

In [5]:
# Checking the movie's genre format
df_movies['genres'][0]

"[{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]"

In [6]:
df_ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [7]:
# Add movie titles to df_ratings
df_ratings['movieId'] = df_ratings['movieId'].apply(str)
df_ratings_with_titles = pd.merge(
    left=df_ratings,
    right=df_movies[['id', 'title']],
    how='inner',
    left_on = 'movieId',
    right_on='id')

In [8]:
df_ratings_with_titles.head(5)

Unnamed: 0,userId,movieId,rating,timestamp,id,title
0,1,1371,2.5,1260759135,1371,Rocky III
1,4,1371,4.0,949810302,1371,Rocky III
2,7,1371,3.0,851869160,1371,Rocky III
3,19,1371,4.0,855193404,1371,Rocky III
4,21,1371,3.0,853852263,1371,Rocky III


In [9]:
df_ratings_with_titles['rating'].describe()

count    44994.000000
mean         3.560986
std          1.053169
min          0.500000
25%          3.000000
50%          4.000000
75%          4.000000
max          5.000000
Name: rating, dtype: float64

Assumed that the lowest rating were 0.5, that means that 0 rating means the movie hadn't been watched by the user.

In [10]:
# Generate user-movie matrix
df_user_movie_matrix = df_ratings_with_titles.pivot_table(
    index='userId', columns='title', values='rating', fill_value=0)
df_user_movie_id_matrix = df_ratings_with_titles.pivot_table(
    index='userId', columns='movieId', values='rating', fill_value=0)

In [11]:
df_user_movie_id_matrix.head()

movieId,100,100017,100032,100272,100450,101,101362,1018,101904,102,...,987,988,99,990,991,99106,992,994,996,99846
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0,0.0,0,0.0,0,0.0,0.0,0
2,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0,0.0,0,0.0,0,0.0,0.0,0
3,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0,0.0,0,0.0,0,0.0,0.0,0
4,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0,0.0,0,0.0,0,0.0,0.0,0
5,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0.0,0,0.0,0,0.0,0,0.0,0.0,0


In [12]:
# Extract movie genre from movies metadata
# Iterate through each genre in each movie's genre list to get genre name

def get_genre_list(genres):
    if isinstance(genres, list): # if type(genres) == list --> kalo tipe datanya list maka ...
        genre_names = [item['name'] for item in genres]
        # pilih 5 genre per movie
        if len(genre_names) > 5:
            genre_names = genre_names[:5]
        return genre_names
    return []

df_movie_genres = df_movies[
    df_movies['id'].isin(df_ratings['movieId'].unique())
][['id', 'genres']].copy()
    
df_movie_genres['genres'] = df_movie_genres['genres'].apply(literal_eval).apply(get_genre_list)

In [13]:
df_movie_genres

Unnamed: 0,id,genres
5,949,"[Action, Crime, Drama, Thriller]"
9,710,"[Adventure, Action, Thriller]"
14,1408,"[Action, Adventure]"
15,524,"[Drama, Crime]"
16,4584,"[Drama, Romance]"
...,...,...
45318,80831,[Drama]
45353,3104,"[Horror, Science Fiction]"
45403,64197,"[Romance, Drama]"
45406,98604,"[Comedy, Romance]"


In [14]:
# generate movie-feature matrix, with genre as the feature.
df_movie_genres_id = df_movie_genres.set_index('id')
df_movie_genres_stacked = df_movie_genres_id['genres'].apply(pd.Series).stack()
df_movie_feature_matrix = pd.get_dummies(df_movie_genres_stacked).groupby(level=0).sum()

In [15]:
df_movie_feature_matrix

Unnamed: 0_level_0,Action,Adventure,Animation,Comedy,Crime,Documentary,Drama,Family,Fantasy,Foreign,History,Horror,Music,Mystery,Romance,Science Fiction,TV Movie,Thriller,War,Western
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
100,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
100017,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
100032,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
100272,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0
101,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99106,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
992,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0
994,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0
996,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0


## Similarity between items with Cosine Similarity

In [16]:
df_ratings['userId'].value_counts()

547    2391
564    1868
624    1735
15     1700
73     1610
       ... 
444      20
438      20
583      20
249      20
399      20
Name: userId, Length: 671, dtype: int64

In [17]:
# Selecting one user to be given recommendation.
curr_user_id = 15

# Selecting the list of movies that had been watched by the selected users.
movie_watched_by_curr_user = df_ratings_with_titles[
    df_ratings_with_titles['userId'] == curr_user_id
]['movieId'].unique()

In [18]:
# Generate similarity matrix
df_cosine_matrix = pd.DataFrame(
    data = cosine_similarity(X=df_movie_feature_matrix),
    columns = df_movie_feature_matrix.index.tolist(),
    index = df_movie_feature_matrix.index.tolist()
)

In [19]:
# Get one watched movie by current user
movie_watched_by_curr_user_title = 'Star Wars'
movie_watched_by_curr_user = df_movies[
    df_movies['title'] == movie_watched_by_curr_user_title
]['id'].values[0]

In [20]:
movie_watched_by_curr_user

'11'

In [21]:
df_sim_with_curr_movie = df_cosine_matrix[movie_watched_by_curr_user].reset_index().rename(
    columns={'index': 'id', movie_watched_by_curr_user: 'cosine_sim'}
)

df_sim_with_curr_movie = pd.merge(
    left=df_sim_with_curr_movie,
    right =df_movies[['id', 'title']],
    how='left',
    on='id'
)

n_recommendation=5

# Exclude watched movies from recommendation
df_sim_with_curr_movie[
    ~df_sim_with_curr_movie['id'].isin([movie_watched_by_curr_user])
].sort_values(by='cosine_sim',ascending=False).iloc[:n_recommendation]

Unnamed: 0,id,cosine_sim,title
2506,8373,1.0,Transformers: Revenge of the Fallen
948,26947,1.0,Gamera vs. Guiron
2612,8854,1.0,Steel
2483,830,1.0,Forbidden Planet
368,1771,1.0,Captain America: The First Avenger


## Content-based Filtering

In [22]:
# Generate user-movie rating matrix of a user
df_current_user_ratings=df_ratings_with_titles[
    df_ratings_with_titles['userId'] == curr_user_id
][['movieId', 'rating']]

df_movie_feature_matrix_curr_user = pd.merge(
    right = df_current_user_ratings,
    left=df_movie_feature_matrix.copy().reset_index(),
    how='left',
    left_on='id',
    right_on='movieId'
)
df_movie_feature_matrix_curr_user

Unnamed: 0,id,Action,Adventure,Animation,Comedy,Crime,Documentary,Drama,Family,Fantasy,...,Music,Mystery,Romance,Science Fiction,TV Movie,Thriller,War,Western,movieId,rating
0,100,0,0,0,1,1,0,0,0,0,...,0,0,0,0,0,0,0,0,,
1,100017,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,,
2,100032,1,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,100032,2.0
3,100272,0,0,0,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,,
4,101,0,0,0,0,1,0,1,0,0,...,0,0,0,0,0,1,0,0,101,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2795,99106,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,,
2796,992,0,0,0,1,0,0,1,0,1,...,0,1,0,0,0,0,0,0,,
2797,994,0,0,0,0,1,0,1,0,0,...,0,1,0,0,0,1,0,0,994,4.0
2798,996,0,0,0,0,0,0,1,0,0,...,0,1,0,0,0,1,0,0,,


In [23]:
# Separate the list of movies between that had been watched and had not been watched
df_movie_feature_matrix_curr_user_pref = df_movie_feature_matrix_curr_user.copy()
df_movie_feature_matrix_curr_user_pref['have_watched'] = df_movie_feature_matrix_curr_user_pref['rating'] > 0
df_movie_feature_matrix_curr_user_not_watched = df_movie_feature_matrix_curr_user_pref[
    df_movie_feature_matrix_curr_user_pref['have_watched'] == False
].fillna(0)
df_movie_feature_matrix_curr_user_pref = df_movie_feature_matrix_curr_user_pref[
    df_movie_feature_matrix_curr_user_pref['have_watched'] == True
]

In [24]:
# Generate list of movies that had not been watched by the selected users.
df_movie_feature_matrix_curr_user_not_watched

Unnamed: 0,id,Action,Adventure,Animation,Comedy,Crime,Documentary,Drama,Family,Fantasy,...,Mystery,Romance,Science Fiction,TV Movie,Thriller,War,Western,movieId,rating,have_watched
0,100,0,0,0,1,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0.0,False
1,100017,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0.0,False
3,100272,0,0,0,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0.0,False
6,1018,0,0,0,0,0,0,1,0,0,...,1,0,0,0,1,0,0,0,0.0,False
7,101904,0,0,0,0,0,0,1,0,1,...,0,0,0,0,0,0,0,0,0.0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2794,991,0,0,0,0,0,0,1,0,0,...,0,0,1,0,0,0,0,0,0.0,False
2795,99106,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0.0,False
2796,992,0,0,0,1,0,0,1,0,1,...,1,0,0,0,0,0,0,0,0.0,False
2798,996,0,0,0,0,0,0,1,0,0,...,1,0,0,0,1,0,0,0,0.0,False


In [25]:
# Multiply each feature with user rating
genres = df_movie_feature_matrix.columns.tolist()

for genre in genres:
    df_movie_feature_matrix_curr_user_pref[genre] = df_movie_feature_matrix_curr_user_pref[genre] * df_movie_feature_matrix_curr_user_pref['rating']

In [26]:
curr_user_feature_vector = df_movie_feature_matrix_curr_user_pref[genres].sum() / df_movie_feature_matrix_curr_user_pref[genres].sum().sum()

In [27]:
curr_user_feature_vector

Action             0.073289
Adventure          0.054574
Animation          0.011909
Comedy             0.120796
Crime              0.071587
Documentary        0.010732
Drama              0.223662
Family             0.023034
Fantasy            0.028138
Foreign            0.008245
History            0.024473
Horror             0.037168
Music              0.018846
Mystery            0.034289
Romance            0.086638
Science Fiction    0.042403
TV Movie           0.004188
Thriller           0.104437
War                0.012040
Western            0.009554
dtype: float64

In [28]:
# Estimate current user's preference for unwatched movies
for genre in genres:
    df_movie_feature_matrix_curr_user_not_watched[genre] = df_movie_feature_matrix_curr_user_not_watched[genre] * curr_user_feature_vector[genre]

In [29]:
df_movie_feature_matrix_curr_user_not_watched['est_pref_score'] = df_movie_feature_matrix_curr_user_not_watched[genres].sum(axis=1)

In [30]:
df_movie_feature_matrix_curr_user_not_watched

Unnamed: 0,id,Action,Adventure,Animation,Comedy,Crime,Documentary,Drama,Family,Fantasy,...,Romance,Science Fiction,TV Movie,Thriller,War,Western,movieId,rating,have_watched,est_pref_score
0,100,0.000000,0.0,0.0,0.120796,0.071587,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0,0.0,False,0.192383
1,100017,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.223662,0.0,0.000000,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0,0.0,False,0.223662
3,100272,0.000000,0.0,0.0,0.120796,0.000000,0.0,0.223662,0.0,0.000000,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0,0.0,False,0.381625
6,1018,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.223662,0.0,0.000000,...,0.0,0.000000,0.0,0.104437,0.0,0.0,0,0.0,False,0.362387
7,101904,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.223662,0.0,0.028138,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0,0.0,False,0.251800
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2794,991,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.223662,0.0,0.000000,...,0.0,0.042403,0.0,0.000000,0.0,0.0,0,0.0,False,0.266065
2795,99106,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0,0.0,False,0.037168
2796,992,0.000000,0.0,0.0,0.120796,0.000000,0.0,0.223662,0.0,0.028138,...,0.0,0.000000,0.0,0.000000,0.0,0.0,0,0.0,False,0.406884
2798,996,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.223662,0.0,0.000000,...,0.0,0.000000,0.0,0.104437,0.0,0.0,0,0.0,False,0.362387


In [31]:
# Get top n recommendation based on estimated preference score
df_curr_user_recommendation = pd.merge(
    left = df_movie_feature_matrix_curr_user_not_watched[['id', 'est_pref_score']],
    right = df_movies[['id', 'title']],
    how='left',
    on='id'
).drop_duplicates()

df_curr_user_recommendation.sort_values(by='est_pref_score', ascending=False)[:n_recommendation]

Unnamed: 0,id,est_pref_score,title
1358,4912,1.214239,Confessions of a Dangerous Mind
1642,5965,0.576757,Scorcher
926,31921,0.569821,They All Laughed
1882,757,0.55477,Murder Most Foul
2150,9005,0.543908,The Ice Harvest


## Collaborative Filtering

### Memory-based Filtering

#### User-Based

In [32]:
# Users that will be given the recommendation.
curr_user_id

# Create user similarity matrix.
df_user_similarity_score = pd.DataFrame(
    data=cosine_similarity(df_user_movie_matrix),
    index=df_user_movie_matrix.index.tolist(),
    columns=df_user_movie_matrix.index.tolist()
)

# Get user with similar preference with current user
df_users_similar_w_curr = df_user_similarity_score[curr_user_id].sort_values(ascending=False).reset_index().rename(columns={'index':'userId', curr_user_id : 'sim_score'})

top_n = 5
# df_top_n_similar_users = df_users_similar_w_curr.query('userId != @curr_user_id')[:top_n]
df_top_n_similar_users = df_users_similar_w_curr[:top_n + 1]
df_top_n_similar_users

# Get movie list from most similar users
df_top_n_user_matrix = df_user_movie_matrix[df_user_movie_matrix.index.isin(df_top_n_similar_users['userId'])].T

# Selecting the list of movies that had been watched by the other user but had not been watched by the selected user.
top_n_user_id_cols = df_top_n_user_matrix.columns.tolist()
top_n_user_id_cols.remove(curr_user_id)

df_curr_user_unwatched_movies = df_top_n_user_matrix[
    (df_top_n_user_matrix[curr_user_id] == 0) # belom ditonton curr user
    & (df_top_n_user_matrix[top_n_user_id_cols].product(axis=1) > 0) # pernah ditonton oleh semua top n user
]

# Sorting and filtering
df_curr_user_unwatched_movies['est_rating_by_curr_user'] = (df_curr_user_unwatched_movies.sum(axis=1)/top_n)
df_curr_user_unwatched_movies.sort_values(by='est_rating_by_curr_user', ascending=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_curr_user_unwatched_movies['est_rating_by_curr_user'] = (df_curr_user_unwatched_movies.sum(axis=1)/top_n)


userId,15,73,388,461,509,580,est_rating_by_curr_user
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
The Thirteenth Floor,0.0,5.0,5.0,3.0,4.0,4.0,4.2
Dogville,0.0,4.0,2.0,4.0,4.0,4.0,3.6
What's Eating Gilbert Grape,0.0,4.0,4.0,2.5,2.0,3.5,3.2
"20,000 Leagues Under the Sea",0.0,3.0,5.0,1.5,3.0,2.5,3.0
Belle Époque,0.0,3.0,3.0,2.0,3.0,3.0,2.8


#### 2. Item Based

In [33]:
# Selecting the movies that will be recommended to the users.
current_movie_title = df_ratings_with_titles.iloc[0].title

# Create movie similarity matrix.
df_movie_similarity_score = pd.DataFrame(
    data=cosine_similarity(df_user_movie_matrix.T),
    index=df_user_movie_matrix.columns.tolist(),
    columns=df_user_movie_matrix.columns.tolist()
)

# Get movies which are similarly preferences as current movie.
df_movie_similar_w_curr = df_movie_similarity_score[current_movie_title].sort_values(
    ascending=False).reset_index().rename(
    columns={'index':'title', current_movie_title : 'sim_score'})

top_n = 5
df_top_n_similar_movie = df_movie_similar_w_curr[:top_n + 1]

# Get user list from most similar movies
df_top_n_users_matrix = df_user_movie_matrix[df_top_n_similar_movie['title']]

# Inserting the list ofusers that had watched the similar movie but had not been watched the selected movie.
top_n_movie_title_cols = df_top_n_users_matrix.columns.tolist()
top_n_movie_title_cols.remove(current_movie_title)

df_curr_movie_unwatched_users = df_top_n_users_matrix[
    (df_top_n_users_matrix[current_movie_title] == 0) # Had not watched the current movies.
    & (df_top_n_users_matrix[top_n_movie_title_cols].product(axis=1) > 0) # Had watched top n of similar movies.
]

# Sorting and filtering.
df_curr_movie_unwatched_users['est_rating_for_curr_movie'] = (df_curr_movie_unwatched_users.sum(axis=1) / top_n).copy()
df_curr_movie_unwatched_users.sort_values(by='est_rating_for_curr_movie', ascending=False)['est_rating_for_curr_movie']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_curr_movie_unwatched_users['est_rating_for_curr_movie'] = (df_curr_movie_unwatched_users.sum(axis=1) / top_n).copy()


userId
177    4.4
Name: est_rating_for_curr_movie, dtype: float64

## Result

We successfully build a movie recommendation system based on the user's rating. This system is still manual, so in further projects, we could automate this process. We could also asks input from users or movies identity, so it could gives the recommendation based on users input.