<div style="text-align:center">
    <b style="font-size:22px"> RECOMMENDATION SYSTEM </b>
</div>

<h4> Task: create a simple recommendation system using a dataset of film or book ratings available online. The system will suggest new items to users based on their previous ratings.
</h4>
    
<div style="position: absolute; top: 100px; right: 50px;">
    Sara Paola Saviori
    <br>
    Matricola: 5114771
</div>

# 


The recommendation system aims to provide personalized recommendations to users, helping them discover new movies based on their previous ratings. One of the most widely used techniques to achieve this goal is Collaborative Filtering, a methodology based on the analysis of users' past behaviors and preferences. In Collaborative Filtering, there are two main approaches: User-Based Filtering and Item-Based Filtering.
**User-Based Filtering** focuses on similarities between users. It uses a user-item matrix to calculate similarities among users based on their previous ratings. Similar users are identified and recommended items liked by similar users but not yet seen by the target user.
**Item-Based Filtering**, on the other hand, is based on the similarity between items. It uses an item-user matrix to calculate item similarities based on the history of users' interactions with items. Similar items are identified and recommended to users based on their previous ratings.
This project focuses on the implementation of a Recommendation System that exploits these two main collaborative filtering techniques

The structure of the project includes the following steps:
1. **Creation of the Dataframe for the Recommendation System**: importing and structuring the relevant data into a dataframe, setting the stage for further analysis.
2. **User-Item Matrix**: building a user-item matrix representing the interactions between users and items.
3. **User-User Similarity Matrix**: computing a similarity matrix between users based on their past interactions. This allows us to identify similar users.
4. **Item-Item Similarity Matrix**: calculating a similarity matrix between items based on user interactions. This helps us to identify related items.
5. **User-Based Recommendation System**: implementing a user-based recommendation system using the user-user similarity matrix.
6. **Item-Based Recommendation System**: implementation of an item-based recommendation system using the item similarity matrix.
7. **Item-Based and User-Based Recommendation System**: by integrating both approaches, a system of recommendations is developed, this system provides hybrid recommendations by combining both Item-Based Filtering and User-Based Filtering.

The goal of this project is to provide personalized recommendations to users through the analysis of user preferences and item similarity, enabling them to discover pertinent movies efficiently.


<img src="KNN.webp" alt="Collaborative Filtering">


In [1]:
#Import the libraries 

import pandas as pd #needed to store the data in DataFrames

import numpy as np #needed to perform array operations and numerical computations 

from sklearn.metrics.pairwise import cosine_similarity #needed to calculate the cosine similarity between items 

from sklearn.neighbors import NearestNeighbors #needed to find the nearest neighbors 

import matplotlib.pyplot as plt  #needed to plot 

from scipy.spatial.distance import pdist, squareform #needed to compute distances or similarities among points or objects in multi-dimensional spaces

from scipy.spatial.distance import cdist #needed to compute distances between points 

In [2]:
#load the CSV files named 'ratings.csv' and 'movies_metadata.csv' into a Pandas DataFrame. Specifying that the first row of 
#the CSV file contains the column names and removing rows with missing values from the df, in order to ensure 
#that only complete rows are taken into account

ratings = pd.read_csv('ratings.csv',header=0).dropna()  

movies = pd.read_csv('movies_metadata.csv', header=0, delimiter=';', on_bad_lines='skip').dropna()

In [3]:
#print the columns of the files 

print("ratings columns are:",list(ratings.columns))

print("movies columns are:",list(movies.columns))

ratings columns are: ['userId', 'movieId', 'rating', 'timestamp']
movies columns are: ['adult', 'budget', 'movieId', 'imdb_id', 'original_language', 'original_title', 'overview', 'popularity', 'release_date', 'revenue', 'runtime', 'status', 'tagline', 'title', 'video', 'vote_average', 'vote_count', 'genres', 'spoken_languages', 'production_countries', 'production_companies']


## Creation of the dataframe for the Recommendation System 

Merge the movies and the ratings dataframe based on a common column which is the movieId. 
movie_merged will contain the columns from both ratings and movies for rows where 'movieId' matches.
This dataframe will be the starting point for the creation of the recommendation system 

In [4]:
#merge the files on movieId to create the dataframe on which I will develope the recommendation system

movie_merged= pd.merge(ratings, movies, on='movieId')
movie_merged.head()

Unnamed: 0,userId,movieId,rating,timestamp,adult,budget,imdb_id,original_language,original_title,overview,...,status,tagline,title,video,vote_average,vote_count,genres,spoken_languages,production_countries,production_companies
0,1,147,4.5,1425942435,False,0,tt0053198,fr,Les Quatre Cents Coups,"For young Parisian boy Antoine Doinel, life is...",...,Released,Angel faces hell-bent for violence.,The 400 Blows,False,8.0,363.0,Drama,en fr,FR,Les Films du Carrosse Sédif Productions T...
1,24,147,4.0,979870988,False,0,tt0053198,fr,Les Quatre Cents Coups,"For young Parisian boy Antoine Doinel, life is...",...,Released,Angel faces hell-bent for violence.,The 400 Blows,False,8.0,363.0,Drama,en fr,FR,Les Films du Carrosse Sédif Productions T...
2,70,147,4.5,1414243855,False,0,tt0053198,fr,Les Quatre Cents Coups,"For young Parisian boy Antoine Doinel, life is...",...,Released,Angel faces hell-bent for violence.,The 400 Blows,False,8.0,363.0,Drama,en fr,FR,Les Films du Carrosse Sédif Productions T...
3,142,147,4.0,866391048,False,0,tt0053198,fr,Les Quatre Cents Coups,"For young Parisian boy Antoine Doinel, life is...",...,Released,Angel faces hell-bent for violence.,The 400 Blows,False,8.0,363.0,Drama,en fr,FR,Les Films du Carrosse Sédif Productions T...
4,150,147,2.0,945065750,False,0,tt0053198,fr,Les Quatre Cents Coups,"For young Parisian boy Antoine Doinel, life is...",...,Released,Angel faces hell-bent for violence.,The 400 Blows,False,8.0,363.0,Drama,en fr,FR,Les Films du Carrosse Sédif Productions T...


## User-Item Matrix

In a recommendation system, the User-Item Matrix is a fundamental component that serves as the foundation for making personalized recommendations because is a way to represent the historical interactions or preferences of users with items. Each row in the matrix corresponds to a user, each column corresponds to an item, and the entries represent user-item interactions (i.e.ratings).

I need to delete duplicates to create the user-item matrix, meaning that that if there are rows with  the same combination of 'userId' and 'movieId', they will be considered duplicates.
Dropping duplicates is important in this context for Data Consistency: removing duplicates ensures that each user-item interaction is counted only once.

In [5]:
movie_merged = movie_merged.drop_duplicates(subset=['userId', 'movieId'])

In [6]:
#Keep only the colums that are useful for the recommendation system

filtered_movie_merged= movie_merged.sort_values(by='userId').reset_index()
filtered_movie_merged = filtered_movie_merged[['userId', 'movieId', 'rating', 'title']]
filtered_movie_merged

Unnamed: 0,userId,movieId,rating,title
0,1,147,4.5,The 400 Blows
1,1,1968,4.0,Fools Rush In
2,1,2762,4.5,Young and Innocent
3,1,2959,4.0,License to Wed
4,1,1246,5.0,Rocky Balboa
...,...,...,...,...
7674424,270896,261,3.5,Cat on a Hot Tin Roof
7674425,270896,539,4.0,Psycho
7674426,270896,1485,3.0,Get Carter
7674427,270896,1923,3.0,Twin Peaks: Fire Walk with Me


In [7]:
user_item_matrix = filtered_movie_merged.pivot(index='userId', columns='movieId', values='rating')
user_item_matrix = user_item_matrix.fillna(0)
user_item_matrix

movieId,5,11,12,13,14,15,16,17,18,19,...,174371,174611,174675,174759,174865,175245,175291,175555,176069,176143
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
270892,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
270893,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
270894,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
270895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## User-User Similarity Matrix

This matrix is used in recommendation systems to measure the similarity between users based on their preferences, interactions, or behaviors. 
It's an important component in collaborative filtering algorithms, particularly user-based collaborative filtering, where recommendations are made by finding users with similar preferences to the target user.

Since the dimension of the matrix is quite big to be computed on my computer, I chose to  keep only users with at least 300 ratings in order to reduce the dimension and have more consistent result.

In [8]:
# Calculate the count of ratings for each user.
user_ratings_count = filtered_movie_merged['userId'].value_counts().reset_index()
user_ratings_count.columns = ['userId', 'ratings_count']

ratings = 300
filtered_movie_merged1 = movie_merged[movie_merged['userId'].isin(user_ratings_count[user_ratings_count['ratings_count'] >= ratings]['userId'])]

In [9]:
filtered_movie_merged1[['userId', 'movieId', 'rating', 'title']]

Unnamed: 0,userId,movieId,rating,title
34,1932,147,1.5,The 400 Blows
59,3437,147,3.0,The 400 Blows
77,4095,147,4.0,The 400 Blows
83,4387,147,3.0,The 400 Blows
87,4666,147,3.5,The 400 Blows
...,...,...,...,...
7675378,251197,107096,2.0,Nemesis 3: Time Lapse
7675384,267597,136850,3.5,Breaking the Girls
7675385,270123,125249,3.5,The Bat Man
7675386,270123,159109,1.5,The Rambler


In [11]:
user_item_matrix = filtered_movie_merged1.pivot(index='userId', columns='movieId', values='rating')
user_item_matrix = user_item_matrix.fillna(0)
user_item_matrix

movieId,5,11,12,13,14,15,16,17,18,19,...,173541,173805,173893,173897,174195,174371,174675,174759,174865,175291
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
229,1.0,0.0,1.0,0.0,0.0,0.0,3.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
533,0.0,3.0,0.0,0.0,0.0,3.0,0.0,5.0,2.0,3.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
611,0.0,4.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
741,3.5,0.0,3.0,0.0,0.0,3.0,0.0,0.0,3.5,3.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1846,0.0,0.0,0.0,3.0,0.0,0.0,3.5,0.0,0.0,2.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
270123,2.5,3.5,2.0,0.0,4.0,0.0,3.5,1.0,3.5,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
270213,3.0,5.0,4.0,4.0,0.0,2.0,0.0,4.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
270237,2.0,3.0,0.0,0.0,0.0,0.0,3.0,4.0,0.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
270654,0.0,4.0,0.0,0.0,0.0,0.0,4.5,4.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


A User-User Similarity Matrix is a matrix that quantifies the similarity between users in a recommendation system. Each row and column of this matrix corresponds to a user, and the entries contain similarity scores that indicate how similar or dissimilar one user is to another. The similarity can be computed using various methods, with cosine similarity being one of them.

Cosine similarity is a metric used to measure the similarity of two vectors. Specifically, it measures the similarity in the direction or orientation of the vectors ignoring differences in their magnitude or scale. 
The similarity of two vectors is measured by the cosine of the angle between them.

In the specific case of the recommendation system cosine similarity measures how similar two users are based on their preferences or interactions with items. 

It's a mathematical concept that provides a similarity score between two users, typically ranging from -1 (completely dissimilar) to 1 (perfectly similar), so a score close to 1 suggests that the users have very similar preferences, making them good candidates for recommending items to each other.

The cosine similarity between two vectors is computed as followed: 

$$
\text{Cosine Similarity}(\mathbf{A}, \mathbf{B}) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|}
$$


In the context of recommendation systems, the similarity score is used to find users with similar tastes, and items liked by those similar users can be recommended to the target user. It's a fundamental concept in collaborative filtering methods, which leverage user-user similarity to provide personalized recommendations.

In [13]:
# Compute the User-User Similarity Matrix using Cosine Similarity.
user_similarity_matrix = cosine_similarity(user_item_matrix)

In [15]:
# Create the DataFrame from the similarity matrix
user_similarity_df = pd.DataFrame(user_similarity_matrix, index=user_item_matrix.index, columns=user_item_matrix.index)

# Set column and index names
user_similarity_df.index.name = 'userId'
user_similarity_df.columns.name = 'userId'

user_similarity_df

userId,229,533,611,741,1846,1932,2520,2531,2547,2627,...,269132,269279,269750,269843,270071,270123,270213,270237,270654,270887
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
229,1.000000,0.383114,0.456611,0.482924,0.450446,0.412922,0.375800,0.449689,0.363450,0.477422,...,0.550915,0.431872,0.437139,0.445303,0.372130,0.441626,0.470058,0.482858,0.471165,0.249351
533,0.383114,1.000000,0.375981,0.434189,0.390878,0.476163,0.174702,0.435449,0.408078,0.367233,...,0.454054,0.383871,0.467282,0.448247,0.390875,0.402357,0.491978,0.416655,0.288972,0.355849
611,0.456611,0.375981,1.000000,0.382504,0.369324,0.313107,0.317024,0.431760,0.316951,0.416781,...,0.522047,0.341346,0.357408,0.373835,0.318299,0.417515,0.406703,0.494799,0.458886,0.249284
741,0.482924,0.434189,0.382504,1.000000,0.560810,0.592995,0.304589,0.493513,0.505947,0.529495,...,0.513039,0.500493,0.520030,0.505572,0.518647,0.487581,0.515673,0.424389,0.410894,0.369742
1846,0.450446,0.390878,0.369324,0.560810,1.000000,0.572962,0.349851,0.544046,0.513583,0.501777,...,0.517805,0.537000,0.539315,0.408665,0.522184,0.439340,0.409475,0.410428,0.427073,0.366070
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
270123,0.441626,0.402357,0.417515,0.487581,0.439340,0.444925,0.367986,0.511649,0.485016,0.472703,...,0.505576,0.457458,0.446325,0.412669,0.471569,1.000000,0.480759,0.454359,0.513479,0.430185
270213,0.470058,0.491978,0.406703,0.515673,0.409475,0.474458,0.295151,0.410247,0.351270,0.423685,...,0.525354,0.356914,0.494792,0.517499,0.389116,0.480759,1.000000,0.476229,0.349082,0.378395
270237,0.482858,0.416655,0.494799,0.424389,0.410428,0.382073,0.359417,0.477911,0.377779,0.521816,...,0.555816,0.403710,0.432867,0.435624,0.369716,0.454359,0.476229,1.000000,0.450450,0.286859
270654,0.471165,0.288972,0.458886,0.410894,0.427073,0.338689,0.451504,0.472864,0.398241,0.443234,...,0.490665,0.402153,0.360652,0.315080,0.360127,0.513479,0.349082,0.450450,1.000000,0.241862


## Item-Item Similarity Matrix

Calculate the similarity between items and replacing the nan with zeros assuming that a missing interaction is equivalent to an interaction with a score of 0.
By transposing the matrix it converts the user-item matrix into an item-item matrix, each cell of this matrix contains the similarity score between two items, which will be calculated using cosine similarity. 

This matrix is useful for the Item-Based recommendation system 

In [16]:
item_similarity = cosine_similarity(user_item_matrix.T)

In [17]:
#Create a dataframe  where the rows and columns are labeled with the names of movies to better visualize 
#the similarity scores between movies

item_similarity_df = pd.DataFrame(item_similarity, index=user_item_matrix.columns, columns=user_item_matrix.columns)
item_similarity_df

movieId,5,11,12,13,14,15,16,17,18,19,...,173541,173805,173893,173897,174195,174371,174675,174759,174865,175291
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5,1.000000,0.616707,0.422323,0.302361,0.336262,0.404858,0.531936,0.458056,0.417272,0.595538,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.027117,0.045787,0.000000,0.000000,0.000000
11,0.616707,1.000000,0.364603,0.276214,0.510410,0.416222,0.670585,0.661005,0.446927,0.546840,...,0.031227,0.000000,0.038375,0.000000,0.038375,0.022791,0.033847,0.038375,0.000000,0.038375
12,0.422323,0.364603,1.000000,0.284320,0.255249,0.361661,0.388305,0.275633,0.376321,0.500371,...,0.035046,0.000000,0.000000,0.060296,0.000000,0.066250,0.001246,0.000000,0.010049,0.000000
13,0.302361,0.276214,0.284320,1.000000,0.161700,0.254891,0.257581,0.214876,0.160504,0.279449,...,0.000000,0.072421,0.000000,0.000000,0.000000,0.000000,0.001497,0.000000,0.000000,0.000000
14,0.336262,0.510410,0.255249,0.161700,1.000000,0.270329,0.562031,0.446598,0.352373,0.348535,...,0.060123,0.000000,0.037844,0.000000,0.037844,0.000000,0.003912,0.037844,0.012615,0.037844
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
174371,0.027117,0.022791,0.066250,0.000000,0.000000,0.000000,0.049747,0.007019,0.037334,0.087173,...,0.000000,0.000000,0.000000,0.356348,0.000000,1.000000,0.000000,0.000000,0.000000,0.000000
174675,0.045787,0.033847,0.001246,0.001497,0.003912,0.047414,0.026675,0.030947,0.044221,0.024274,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1.000000,0.000000,0.000000,0.000000
174759,0.000000,0.038375,0.000000,0.000000,0.037844,0.055941,0.030184,0.039395,0.037529,0.017791,...,0.813733,0.000000,1.000000,0.000000,1.000000,0.000000,0.000000,1.000000,0.000000,1.000000
174865,0.000000,0.000000,0.010049,0.000000,0.012615,0.000000,0.022638,0.000000,0.000000,0.011861,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1.000000,0.000000


## User-Based Recommendation System

A user-based recommendation system is a type of recommendation system that provides personalized recommendations to users based on the behavior and preferences of similar users. It relies on the idea that users who have similar tastes or preferences in the past are likely to have similar tastes in the future.

I have already computed the similarity between users, using the cosine similarity, now I will select the 10 users that are more similar to the target user (user_id), using the KNN algorithm. 
It works by measuring the distance or similarity between items or users based on their features or behavior. 
The "K" in KNN represents the number of nearest neighbors to consider.

In [18]:
# Create a NearestNeighbors object with the desired number of neighbors (k)
k = 10
nn = NearestNeighbors(n_neighbors=k, metric='cosine', algorithm='brute')
nn.fit(user_similarity_matrix)  # user_similarity_matrix è la matrice di similarità tra utenti

# Find the most similar neighbors for a target user (user_id)
user_id = 229
distances, indices = nn.kneighbors(user_similarity_matrix[user_id].reshape(1, -1), k+1)

# get similar users and their ratings for items
similar_users = indices[0][1:]  # I primi k vicini

similar_users = user_item_matrix.index[similar_users]

# Finds movies that are already rated by the target user
movies_rated_userid = user_item_matrix.loc[user_id][user_item_matrix.loc[user_id] > 0].index

# Find movies recommended by neighbors that the current user has not yet rated    
# Initialize the list to collect recommended movies
recommended_movies = []

for similar_user in similar_users:
    movies_rated_similar = user_item_matrix.loc[similar_user][user_item_matrix.loc[similar_user] > 0].index
    recommended_movies.extend(set(movies_rated_similar) - set(movies_rated_userid))
    
# Calculate an expected rating for the recommended movies
expected_rating = {}

for recommended_movie in recommended_movies:
    rating = 0  # Initialize the rating for the current recommended movie
    for similar_user in similar_users:
        if user_item_matrix.loc[similar_user][recommended_movie] > 0:
            rating += user_similarity_df[user_id][similar_user] * user_item_matrix.loc[user_id][recommended_movie]
    
    expected_rating[recommended_movie] = rating  # Store the rating for the recommended movie

# Sort the items according to the predicted ratings
recommended_movie_sorted = sorted(expected_rating, key=expected_rating.get, reverse=True)

# Recommend the first N items based on the predicted ratings
N = 100
recommendations_user = recommended_movie_sorted[:N]


In [19]:
# Create a dictionary that maps movieId to title
movie_id_to_title = dict(zip(filtered_movie_merged1['movieId'], filtered_movie_merged1['title']))

# Add title to recommended results
title_recommendations_user = [movie_id_to_title[movie_id] for movie_id in recommendations_user]

# Create a Dataframe from title_recommendations
user_based_rec = pd.DataFrame({'User-Based Recommendations': title_recommendations_user})


user_based_rec.style.set_table_styles([
    {'selector': 'td', 'props': [('font-size', '12px'), ('text-align', 'center'), ('border', '2px solid black')]}
])

Unnamed: 0,User-Based Recommendations
0,Ronin
1,Son of Frankenstein
2,The Goddess
3,Modern Times
4,Star Wars
5,Cowboy
6,Casino
7,American Beauty
8,Lost Horizon
9,The Dark


## Item-Based Recommendation System 

An item-based recommendation system is a type of recommendation system that gives users item suggestions by evaluating the similarity between items. Unlike user-based recommendation systems, which rely on similarities between users, item-based systems focus on discovering items that are similar to those that the user has previously rated.

Item-based recommendation systems are often considered easier to implement and more computationally efficient than user-based recommendation systems, because user-based systems can suffer from data sparsity problems because users typically only interact with a limited number of items. This can make it challenging to find similar users with sufficient overlap in preferences. On the contrary, item-based systems tend to have greater data density, since items are often rated by multiple users.

Item-based recommendations are often more interpretable, as the system suggests items based on similarity to items the user has previously interacted with. It is easier for users to understand why a recommendation is made.

In [20]:
# Create a NearestNeighbors object with the desired number of neighbors (k)
k = 100
nn = NearestNeighbors(n_neighbors=k, metric='cosine', algorithm='brute')
nn.fit(item_similarity)  

# Find the most similar neighbors for a target user (user_id)
distances, indices = nn.kneighbors(item_similarity_df.iloc[:, user_id].values.reshape(1, -1), k+1)

# Exclude the target user himself
similar_item_indices = indices[0][1:]

# Recommend similar items (movies) to the user
recommended_items = [user_item_matrix.columns[i] for i in similar_item_indices]

# Recommend the first N items (movies) based on the similarity scores
N = 100
recommendations_item = recommended_items[:N]

# Add title to recommended items
title_recommendations_item = [movie_id_to_title[movie_id] for movie_id in recommendations_item]

# Create a Dataframe from title_recommendations
item_based_rec = pd.DataFrame({'Item-Based Recommendations': title_recommendations_item})

item_based_rec.style.set_table_styles([
    {'selector': 'td', 'props': [('font-size', '12px'), ('text-align', 'center'), ('border', '2px solid black')]}
])


Unnamed: 0,Item-Based Recommendations
0,Transamerica
1,The Sum of All Fears
2,The Bear
3,Blow
4,"20,000 Leagues Under the Sea"
5,American Psycho
6,Nowhere in Africa
7,Harry Potter and the Prisoner of Azkaban
8,The Men
9,Breaking the Waves


## Item-Based and User-Based Recommendation System 

Creating an overlap between item-based and user-based recommendations is important for having more robust and reliable recommendations.

In [21]:
# Find common elements between user-based and item-based recommendations
common_recommendations = list(set(recommendations_user).intersection(recommendations_item))

# Add title to common recommendations
title_recommendations_common = [movie_id_to_title[movie_id] for movie_id in common_recommendations]

# Creation of a DataFrame with final recommendations.
rec_final = pd.DataFrame({'Recommendations': title_recommendations_common})

rec_final.style.set_table_styles([
    {'selector': 'td', 'props': [('font-size', '12px'), ('text-align', 'center'), ('border', '2px solid black')]}
])

Unnamed: 0,Recommendations
0,Harry Potter and the Prisoner of Azkaban
1,Transamerica
2,8 Mile
3,Mr. Jones
4,Ocean's Twelve
5,Mad Max 2: The Road Warrior
6,The Vanishing
7,Nosferatu
8,Breaking the Waves
9,Chill Factor


The previous table shows the recommended movies for user 229 by overlapping the result obtained from the item-based recommendation system and the user-based one.

## Conclusions

Mixing item-based and user-based recommendation approaches can lead to more robust and reliable recommendations. By overlapping recommendations, the limitations of one approach can be complemented by the strengths of the other. This is particularly useful in data scarcity situations, where data on user interactions may be insufficient. Using both approaches, item-based recommendations can fill in the gaps and provide recommendations based on item similarities. 

Combining these approaches improves personalization by considering both item similarity and user behavior, resulting in a more diversified and personalized set of recommendations.
Moreover, using multiple techniques can help reduce the bias that may exist in a single approach, leading to a more balanced and unbiased set of recommendations by considering different aspects of user behavior and item characteristics.