In [1]:
# import required primary functions
import pandas as pd
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

### We convert the required data information into Pandas Dataframe

In [None]:
# We convert the information of the existing movies and user ratings into a dataframe
movies_df = pd.read_csv('movies.csv')
ratings_df = pd.read_csv('ratings.csv')
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


## organize and preprocessing of data
### first step Separating the year from the title and placing them in a new column called years

In [3]:
# expand If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.
movies_df['year']=movies_df['title'].str.extract('(\(\d{4}\))',expand=False)
movies_df['year']=movies_df['year'].str.extract('(\d{4})',expand=False)
#regex is not sensitive to the size and smallness of numbers, that's why we use this function
movies_df['title']=movies_df['title'].str.replace(r'(\(\d{4}\))','',regex=True).str.strip()
movies_df.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,Adventure|Animation|Children|Comedy|Fantasy,1995
1,2,Jumanji,Adventure|Children|Fantasy,1995
2,3,Grumpier Old Men,Comedy|Romance,1995
3,4,Waiting to Exhale,Comedy|Drama|Romance,1995
4,5,Father of the Bride Part II,Comedy,1995


# Content-Based methode
Content-based description typically refers to a method of describing or categorizing items, such as products, articles, movies, or any other type of content, based on the intrinsic features and characteristics of the items themselves. This approach contrasts with collaborative filtering, which relies on user interactions or preferences.

In content-based description, the system analyzes the properties or attributes of the items and uses that information to make recommendations or provide relevant information. Here are some common features of content-based description:

#### 1_Item Attributes:

* Products: Features like color, size, brand, and price.
* Movies: Genre, director, actors, and release year.
* Articles: Topic, keywords, and writing style.

#### 2_Content Analysis:

* For text-based content, natural language processing techniques may be used to extract key information from the text.
* For images or videos, features like color distribution, shapes, or visual elements can be analyzed.

#### 3_User Profile:

* The system often creates a user profile based on their preferences, history, or explicit feedback.

#### 4_Matching Algorithm:

* A matching algorithm is then employed to recommend items that match the user's profile or preferences.

#### 5_Personalization:

* The recommendations are personalized for each user based on their individual characteristics and past interactions.

#### 6_Relevance Scores:

* Each item is assigned a relevance score based on how well it matches the user's preferences or requirements.

Content-based recommendation systems are commonly used in various domains, such as e-commerce, content streaming services, and news platforms. While they are effective in providing personalized recommendations, they may face challenges, such as the "filter bubble" problem, where users are only exposed to content similar to their past preferences, potentially limiting diversity in recommendations. To address this, hybrid recommendation systems combining content-based and collaborative filtering approaches are often employed to provide a more comprehensive recommendation experience.(I will look for the method to combine them and if I succeed, I will implement it in this project)
#### Since in Content-Based methods we need the content of the movies, we extract all the genres of the available movies from the genre column and create a separate Boolean column for each genre (here it means 0 or 1) 
#### for this we need a list genre that split can make it

In [4]:
# As you can see in the data frame above, in the genre column, all genres are separated by the sign "|"
# so we simply have to call the split function on '|'
movies_df['genres']=movies_df['genres'].str.split('|')
movies_df.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995
2,3,Grumpier Old Men,"[Comedy, Romance]",1995
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995
4,5,Father of the Bride Part II,[Comedy],1995


##### As I said, we need to extract all the genres, assign a column to each one, and measure the movies based on them. In the genre of each column, if a movie has the number "1", it means that the movie has the desired genre, if " 0" means the movie does not have the desired genre
##### So far, we have converted the genres of the genre column into a list, but this is not enough because it is not optimize. We need to use the One Hot Encoding technique so that we can create the desired columns from the existing genre list.

In [5]:
# We make a copy of our data frame so as not to damage the original data frame and to make changes in the new data frame that were not needed in the original data frame.
movie_with_genres=movies_df.copy()
#For every row in the dataframe,We put 1 for each genre
#In Python, iterrows() is a method used to loop through each row of a Pandas DataFrame
for index , row in movies_df.iterrows():
    for genres in row ['genres']:
        movie_with_genres.at[index,genres]=1
# Movies that do not have the desired genre are filled with 0 to show that this movie does not have the desired genre
movie_with_genres = movie_with_genres.fillna(0)
movie_with_genres.head(60)

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,1.0,1.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,Grumpier Old Men,"[Comedy, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,5,Father of the Bride Part II,[Comedy],1995,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,6,Heat,"[Action, Crime, Thriller]",1995,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,7,Sabrina,"[Comedy, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,8,Tom and Huck,"[Adventure, Children]",1995,1.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,9,Sudden Death,[Action],1995,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,10,GoldenEye,"[Action, Adventure, Thriller]",1995,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### point ***
##### My point is, I found another method that can be used to do this separation, which is as follows, but we do not use this method and only create a copy data frame to display the method.

In [6]:
moive_onehotlabel_genres=movies_df.copy()
# Assuming movies_df is your DataFrame with the 'genres' column containing lists of genres
# Example:
# movies_df = pd.DataFrame({'genres': [['Action', 'Adventure'], ['Drama'], ['Comedy', 'Romance']]})

# Applying one-hot encoding
one_hot_encoded = pd.get_dummies(moive_onehotlabel_genres['genres'].apply(pd.Series).stack(), prefix='genre').groupby(level=0).sum()

# Concatenating the one-hot encoded columns to the original DataFrame
moive_onehotlabel_genres = pd.concat([moive_onehotlabel_genres, one_hot_encoded], axis=1)

# Displaying the resulting DataFrame
moive_onehotlabel_genres.head()

Unnamed: 0,movieId,title,genres,year,genre_(no genres listed),genre_Action,genre_Adventure,genre_Animation,genre_Children,genre_Comedy,...,genre_Film-Noir,genre_Horror,genre_IMAX,genre_Musical,genre_Mystery,genre_Romance,genre_Sci-Fi,genre_Thriller,genre_War,genre_Western
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,0,0,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,0,0,1,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,3,Grumpier Old Men,"[Comedy, Romance]",1995,0,0,0,0,0,1,...,0,0,0,0,0,1,0,0,0,0
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995,0,0,0,0,0,1,...,0,0,0,0,0,1,0,0,0,0
4,5,Father of the Bride Part II,[Comedy],1995,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


#### now, let's look our rating dataframe

In [7]:
ratings_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


#### I will delete the timestamp column because it was not useful in the lesson and I have not found a way to use this column at the moment, so by deleting this column we will save some memory.

In [8]:
ratings_df = ratings_df.drop('timestamp', axis=1)
# To remove from the columns, axis must be 1
ratings_df.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


## Content-Based recommendation system
#### A content based movie recommendation system is a system that suggests movies similar to those a user has previously liked. It uses attributes such as genre, director, description, actors, etc. for moviesto analyze the contents of the movies and find out other movies with similar content. It then ranks similar movies according to their similarity scoresand recommends the most relevant movies to the user.
##### For this purpose, we add a new imaginary  person with a certain movie rating to the system

In [9]:
user_input=[
    {'title':'Fight Club','rating':5},
    {'title':'Indiana Jones and the Kingdom of the Crystal Skull','rating':1},
    {'title':'Out of Africa','rating':3},
    {'title':'Seven ','rating':3.5},
    {'title':'Cinema Paradiso (Nuovo cinema Paradiso)','rating':5},
    {'title':'Star Wars: Episode V - The Empire Strikes Back','rating':4},
    {'title':'Eye for an Eye','rating':3.5},
    {'title':'Fargo','rating':2.5},   
]
# convert user inout into a dataframe
input_movie=pd.DataFrame(user_input)
input_movie

Unnamed: 0,title,rating
0,Fight Club,5.0
1,Indiana Jones and the Kingdom of the Crystal S...,1.0
2,Out of Africa,3.0
3,Seven,3.5
4,Cinema Paradiso (Nuovo cinema Paradiso),5.0
5,Star Wars: Episode V - The Empire Strikes Back,4.0
6,Eye for an Eye,3.5
7,Fargo,2.5


##### in this step we must find moiveid (from our movie dataframe) that our user seen and merge it with inputmovie in order to reconize movies generes and after that we can calcute weighted value of each genre for the user, 
##### in this step we can drop additon columns

In [10]:
# isin() is a method in the pandas.DataFrame class in Python that checks whether each element in a DataFrame is contained in a list of values. The method returns a DataFrame of booleans showing whether each element in the DataFrame is contained in the list of values 1.
inter_id=movies_df[movies_df['title'].isin(input_movie['title'].tolist())]
# The tolist() function is used in Python to convert an array to an ordinary list with the same items, elements, or values1
input_movie=pd.merge(inter_id,input_movie)
input_movie=input_movie.drop('year',axis=1).drop('genres', axis=1)
# point, if we add a movie in userinput add didn't appare in inputmovie might be we spelled differently , or even the movie that we add might be not in our orginal movies dataframe
input_movie

Unnamed: 0,movieId,title,rating
0,61,Eye for an Eye,3.5
1,608,Fargo,2.5
2,1172,Cinema Paradiso (Nuovo cinema Paradiso),5.0
3,1196,Star Wars: Episode V - The Empire Strikes Back,4.0
4,1959,Out of Africa,3.0
5,2959,Fight Club,5.0
6,59615,Indiana Jones and the Kingdom of the Crystal S...,1.0


#### in this step we can filter movie_with_genres for user movie by movieid in order to calcute user genre wighted value

In [11]:
#Filtering out the movies from the input
user_Movies = movie_with_genres[movie_with_genres['movieId'].isin(input_movie['movieId'].tolist())]
user_Movies

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
54,61,Eye for an Eye,"[Drama, Thriller]",1996,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
520,608,Fargo,"[Comedy, Crime, Drama, Thriller]",1996,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
878,1172,Cinema Paradiso (Nuovo cinema Paradiso),[Drama],1989,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
898,1196,Star Wars: Episode V - The Empire Strikes Back,"[Action, Adventure, Sci-Fi]",1980,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1436,1959,Out of Africa,"[Drama, Romance]",1985,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2226,2959,Fight Club,"[Action, Crime, Drama, Thriller]",1999,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6754,59615,Indiana Jones and the Kingdom of the Crystal S...,"[Action, Adventure, Comedy, Sci-Fi]",2008,1.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### now we found movie genere that user seen let drop uselse columns

In [12]:
#Resetting the index to avoid future issues
user_Movies=user_Movies.reset_index(drop=True)
#Dropping unnecessary issues due to save memory and to avoid issues
user_GenreTable = user_Movies.drop('movieId',axis=1).drop('title',axis=1).drop('genres',axis=1).drop('year',axis=1)
user_GenreTable

Unnamed: 0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### now we have user movies generes so we had to The rating user given to the movies must be should be multiplied to (dot) generes  by a matrix to determine the weighted value of the genres for the user

In [13]:
input_movie['rating']

0    3.5
1    2.5
2    5.0
3    4.0
4    3.0
5    5.0
6    1.0
Name: rating, dtype: float64

In [14]:
#dot() is a method in the numpy module of Python that computes the dot product of two arrays  to get weights
# transpose is a method that allows you to swap the rows and columns of a matrix or a dataframe
user_Profile = user_GenreTable.transpose().dot(input_movie['rating'])
#The user profile
user_Profile

Adventure              5.0
Animation              0.0
Children               0.0
Comedy                 3.5
Fantasy                0.0
Romance                3.0
Drama                 19.0
Action                10.0
Crime                  7.5
Thriller              11.0
Horror                 0.0
Mystery                0.0
Sci-Fi                 5.0
War                    0.0
Musical                0.0
Documentary            0.0
IMAX                   0.0
Western                0.0
Film-Noir              0.0
(no genres listed)     0.0
dtype: float64

### Now we have the weighted value of the genre of the movies that the user has watched, now we need to suggest movies based on the user's favorite genre by multiplying weighted value of the genre all movies genre.
### we must find our best movie (based on genre that user liked) to recomend user
### For this, we first sort all the genres based on the ID of the movies and create a new data frame

In [15]:
genreTable=movie_with_genres.set_index(movie_with_genres['movieId'])
# remove useles columns to avoid issue
genreTable = genreTable.drop('movieId',axis=1).drop('title',axis=1).drop('genres',axis=1).drop('year',axis=1)
genreTable

Unnamed: 0_level_0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193581,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
193583,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
193585,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
193587,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


##### So far we have obtained the weighted value of the user's favorite genres and on the other hand we also have all the available movie genres, so in the next step all we need to do is  Multiply the weighted value of the genres of the user to the genre of all available movies in the form of a matrix (dot) to determine the value of the movie to suggest to the user based on the genre the user likes.

In [16]:
# Multiply the genres by the weights and then take the weighted average
# We multiply the weighted value of each user's genre by its own genre column and sum the columns together and divide the sum of all the values of the user's genres.
recommendationTable_df_Content_Based = ((genreTable*user_Profile).sum(axis=1))/(user_Profile.sum())
recommendationTable_df_Content_Based.head(15)

movieId
1     0.132812
2     0.078125
3     0.101562
4     0.398438
5     0.054688
6     0.445312
7     0.101562
8     0.078125
9     0.156250
10    0.406250
11    0.398438
12    0.054688
13    0.078125
14    0.296875
15    0.281250
dtype: float64

### The value of all the movies is obtained, the only thing we need to do is to sort the movies based on the value obtained, and finally, we offer the user the first 20 movies of the list that get the highest score.

In [17]:
#Sort our recommendations in descending order
recommendationTable_df_Content_Based = recommendationTable_df_Content_Based.sort_values(ascending=False)
# An ascending function in Python is a function that sorts a list of values in increasing order1
#Just a peek at the values
recommendationTable_df_Content_Based.head(20)

movieId
81132     0.875000
459       0.867188
4719      0.843750
71999     0.828125
519       0.820312
49530     0.820312
2985      0.820312
79132     0.820312
198       0.820312
26701     0.820312
6016      0.820312
4956      0.804688
7007      0.796875
5628      0.796875
145       0.796875
7235      0.796875
1432      0.796875
5027      0.796875
20        0.796875
117529    0.781250
dtype: float64

#### We have the IDs of the recommended movies, we just need to find the names of the movies from their IDs

In [18]:
movies_df.loc[movies_df['movieId'].isin(recommendationTable_df_Content_Based.head(20).keys())]

Unnamed: 0,movieId,title,genres,year
19,20,Money Train,"[Action, Comedy, Crime, Drama, Thriller]",1995
118,145,Bad Boys,"[Action, Comedy, Crime, Drama, Thriller]",1995
167,198,Strange Days,"[Action, Crime, Drama, Mystery, Sci-Fi, Thriller]",1995
400,459,"Getaway, The","[Action, Adventure, Crime, Drama, Romance, Thr...",1994
454,519,RoboCop 3,"[Action, Crime, Drama, Sci-Fi, Thriller]",1993
1103,1432,Metro,"[Action, Comedy, Crime, Drama, Thriller]",1997
2248,2985,RoboCop,"[Action, Crime, Drama, Sci-Fi, Thriller]",1987
3460,4719,Osmosis Jones,"[Action, Animation, Comedy, Crime, Drama, Roma...",2001
3608,4956,"Stunt Man, The","[Action, Adventure, Comedy, Drama, Romance, Th...",1980
3657,5027,Another 48 Hrs.,"[Action, Comedy, Crime, Drama, Thriller]",1990


### result
Content-based recommenders in the context of movies analyze the attributes of items (movies, in this case) and recommend items based on their similarity to the user's preferences. Here are some advantages and disadvantages of content-based recommenders for movies:

### Advantages:

1-Personalization:

* Example: If a user likes action movies with a specific actor, a content-based recommender can identify other movies with the same actor and recommend them.
* Advantage: Provides personalized recommendations based on the user's preferences.

2- Reduced Cold Start Problem:

* Example: If a new user signs up, a content-based recommender can still provide recommendations based on the features of the movies, even if the system doesn't have explicit user preferences yet.
* Advantage: Handles new users by relying on item features.

3- Transparency:

* Example: The system can explain recommendations by highlighting specific features of the movies that match the user's preferences.
* Advantage: Users can understand why certain recommendations are made, leading to increased trust in the system.

4- Less Dependent on Data:

* Example: Content-based recommenders can work well with sparse user-item interaction data, as they primarily rely on the features of items.
* Advantage: Robust in scenarios where user-item interactions are limited.

### Disadvantages:

1- Limited Serendipity:

* Example: If a user has a preference for action movies, a content-based recommender might not recommend a great romantic movie that the user might enjoy.
* Disadvantage: Tends to recommend items similar to what the user has already liked, limiting exposure to diverse content.

2- Dependency on Feature Quality:

* Example: If the features used for recommendation (e.g., genre, actors) are not accurately defined or incomplete, the recommendations may not be effective.
* Disadvantage: Quality of recommendations heavily relies on the quality and relevance of the features used.

3- Over-Specialization:

* Example: If a user watches a few movies from a specific genre, the system might overly focus on that genre, potentially missing out on other preferences.
* Disadvantage: Tends to recommend items very similar to the user's past preferences, limiting diversity.

4- Scalability Issues:

* Example: As the number of items increases, calculating similarity between items becomes computationally expensive.
* Disadvantage: Scalability can be a challenge, especially with large datasets.


In [19]:
movies_df.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995
2,3,Grumpier Old Men,"[Comedy, Romance]",1995
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995
4,5,Father of the Bride Part II,[Comedy],1995


# Collaborative Filtering methode

Collaborative Filtering (CF) is a technique used in recommender systems to make automatic predictions about a user's interests or preferences by collecting preferences from many users (collaborating). The underlying assumption is that if a user A has similar preferences to a user B on certain items, A is likely to have similar preferences to B on other items as well.

There are two main types of collaborative filtering:

### 1_ User-Based Collaborative Filtering:

* In user-based collaborative filtering, the system recommends items based on the preferences and behaviors of users who are similar to the target user.
* The similarity between users is calculated based on their historical interactions or ratings for items.
* For example, if User A and User B have both liked or rated similar movies, and User A has rated a movie that User B has not seen, the system might recommend that movie to User B based on the similarity of their preferences.

### 2_Item-Based Collaborative Filtering:

* In item-based collaborative filtering, the system recommends items that are similar to those the user has liked or interacted with in the past.
* The similarity between items is calculated based on user interactions or ratings, and items that are similar to those the user has shown interest in are recommended.
* For example, if User A has liked Movie X, and Movie Y is similar to Movie X based on the preferences of other users, the system may recommend Movie Y to User A.

Collaborative filtering is often combined with other recommendation techniques, such as content-based filtering, to overcome some of its limitations and create more accurate and robust recommendation systems. (now I will look for the method to combine them  and if I succeed, I will implement it in this project)

### Advantages of Collaborative Filtering:

#### 1_Serendipity:
Collaborative filtering can provide recommendations for items that a user might not have discovered on their own but are liked by users with similar tastes.

#### 2_ User-Independence: 
It doesn't rely on explicit item features, making it applicable to a wide range of domains and items.

#### 3_Adaptability:
Collaborative filtering can adapt to changing user preferences over time as it continuously updates recommendations based on user interactions.

#### Challenges and Considerations (disadvantages) :

#### 1_ Cold Start Problem: 
Collaborative filtering can struggle with new users or items that have limited interaction history.

#### 2_ Sparsity: 
In systems with a large number of users and items, the user-item matrix can be sparse, meaning that many users have not interacted with most items. This can affect the quality of recommendations.

#### 3_Scalability: 
As the number of users and items increases, the computational complexity of calculating similarities can become challenging.

#### 4_Privacy Concerns:
Collaborative filtering relies on collecting and analyzing user preferences, which can raise privacy concerns. Techniques such as anonymization and aggregation are often used to address these issues.

### We did the  pre-processing in the previous method, it is enough to clear the genre because this method uses the experiences of other users.

In [20]:
movies_df=movies_df.drop(['genres'],axis=1)
movies_df.head()

Unnamed: 0,movieId,title,year
0,1,Toy Story,1995
1,2,Jumanji,1995
2,3,Grumpier Old Men,1995
3,4,Waiting to Exhale,1995
4,5,Father of the Bride Part II,1995


### To compare our observations, we use the previous user input

In [21]:
input_movie

Unnamed: 0,movieId,title,rating
0,61,Eye for an Eye,3.5
1,608,Fargo,2.5
2,1172,Cinema Paradiso (Nuovo cinema Paradiso),5.0
3,1196,Star Wars: Episode V - The Empire Strikes Back,4.0
4,1959,Out of Africa,3.0
5,2959,Fight Club,5.0
6,59615,Indiana Jones and the Kingdom of the Crystal S...,1.0


#### Here, by matching the user's rating to the movies and the rating of other users to the same movies, we create a new data frame by movieid in order to know the experience and level of satisfaction of the other users.

In [22]:
#Filtering out users that have watched movies that the input has watched and storing it
usersubset=ratings_df[ratings_df['movieId'].isin(input_movie['movieId'].tolist())]
usersubset.head(10)

Unnamed: 0,userId,movieId,rating
36,1,608,5.0
68,1,1196,5.0
192,1,2959,5.0
342,4,608,5.0
376,4,1196,5.0
458,4,2959,2.0
559,5,608,3.0
592,6,61,4.0
815,6,608,3.0
897,7,1196,4.0


#### We now group up the rows by user ID.

In [23]:
# groupby is a powerful method in the pandas library that allows you to group data based on one or more columns and apply a function to each group.
usersubset_group=usersubset.groupby(usersubset['userId'])
usersubset_group.head()

Unnamed: 0,userId,movieId,rating
36,1,608,5.0
68,1,1196,5.0
192,1,2959,5.0
342,4,608,5.0
376,4,1196,5.0
...,...,...,...
99107,608,2959,5.0
99576,610,608,4.5
99607,610,1196,5.0
99699,610,2959,5.0


#### For a better understanding of this method, you can see the ID of a user who has rated the videos of our users.

In [24]:
usersubset_group.get_group(6)

Unnamed: 0,userId,movieId,rating
592,6,61,4.0
815,6,608,3.0


#### Since we need to find users with the most similarity with our input user, we sort (prioritize) all users who rated similar movies based on the number of shared movies.

In [25]:
# Sort users based on the most similar movie
usersubset_group = sorted(usersubset_group,  key=lambda x: len(x[1]), reverse=True)
# To sort in descending order, we use the reverse function
usersubset_group[0:3]

[(599,
         userId  movieId  rating
  92656     599       61     2.5
  92879     599      608     3.5
  93020     599     1196     5.0
  93264     599     1959     3.0
  93545     599     2959     5.0
  94667     599    59615     2.0),
 (202,
         userId  movieId  rating
  29444     202      608     4.0
  29487     202     1172     4.0
  29495     202     1196     5.0
  29594     202     1959     4.0
  29691     202     2959     5.0),
 (387,
         userId  movieId  rating
  59313     387      608     4.0
  59393     387     1172     3.5
  59397     387     1196     4.5
  59682     387     2959     4.5
  60259     387    59615     3.0)]

### Because the closer we get to the bottom of the list, the similarity becomes less and less, so we separate a part of the data so that the processing is faster and more memory is saved.

In [26]:
usersubset_group = usersubset_group[0:100]

### The Pearson correlation coefficient
The Pearson correlation coefficientو often denoted as "r," is a statistical measure that quantifies the strength and direction of a linear relationship between two continuous variables. In other words, it assesses how well the relationship between two variables can be described by a straight line.

The formula for calculating the Pearson correlation coefficient between two variables, X and Y, with n data points, is as follows:

\[ r = \frac{\sum{(X_i - \bar{X})(Y_i - \bar{Y})}}{\sqrt{\sum{(X_i - \bar{X})^2}\sum{(Y_i - \bar{Y})^2}}} \]

Where:
* \( X_i \) and \( Y_i \) are individual data points.
* \( \bar{X} \) and \( \bar{Y} \) are the means of the X and Y variables, respectively.

The Pearson correlation coefficient can take values between -1 and 1, where:

* \( r = 1 \): Perfect positive correlation. As one variable increases, the other variable increases proportionally.
  
* \( r = -1 \): Perfect negative correlation. As one variable increases, the other variable decreases proportionally.

* \( r = 0 \): No linear correlation. There is no systematic linear relationship between the two variables.

Key points about the Pearson correlation coefficient:

1. **Strength of Correlation:**
   - The closer the absolute value of \( r \) is to 1, the stronger the linear correlation. 
   - Values close to 0 indicate a weak or no linear correlation.

2. **Direction of Correlation:**
   - If \( r \) is positive, it indicates a positive linear correlation (both variables increase together).
   - If \( r \) is negative, it indicates a negative linear correlation (one variable increases as the other decreases).

3. **Assumption of Linearity:**
   - Pearson correlation assesses linear relationships. It may not capture non-linear associations between variables.

4. **Sensitive to Outliers:**
   - Outliers can strongly influence the Pearson correlation coefficient, especially in small datasets.

5. **Does Not Imply Causation:**
   - A high correlation does not imply causation; it only indicates a statistical association between the variables.

The Pearson correlation coefficient is widely used in various fields, including statistics, economics, psychology, and biology, to examine relationships between quantitative variables. However, it's important to consider other factors and conduct further analysis to draw meaningful conclusions about causation or the nature of the relationship.

In [27]:
# Store the Pearson Correlation in a dictionary, where the key is the user Id and the value is the coefficient
pearsonCorrelationDict = {}

# For every user group in our subset
for name, group in usersubset_group:
    # Let's start by sorting the input and current user group by movieId to ensure consistent order
    group = group.sort_values(by='movieId')
    input_movie = input_movie.sort_values(by='movieId')

    # Get the number of ratings (N) for the formula
    nRatings = len(group)

    # Get the review scores for the movies that both users have in common
    common_movies = input_movie[input_movie['movieId'].isin(group['movieId'].tolist())]

    # Extract the rating lists for the common movies
    tempRatingList = common_movies['rating'].tolist()
    tempGroupList = group['rating'].tolist()

    # Calculate the sum of squares for each user's ratings and their cross-product
    Sxx = sum(i**2 for i in tempRatingList) - pow(sum(tempRatingList), 2) / float(nRatings)
    Syy = sum(i**2 for i in tempGroupList) - pow(sum(tempGroupList), 2) / float(nRatings)
    Sxy = sum(i * j for i, j in zip(tempRatingList, tempGroupList)) - sum(tempRatingList) * sum(tempGroupList) / float(nRatings)

    # If the denominator is different than zero, then calculate the Pearson correlation coefficient, else, set correlation to 0
    if Sxx != 0 and Syy != 0:
        pearsonCorrelationDict[name] = Sxy / sqrt(Sxx * Syy)
    else:
        pearsonCorrelationDict[name] = 0


In [28]:
pearsonCorrelationDict

{599: 0.8100925873009824,
 202: 0.48038446141526187,
 387: 0.6088602338088149,
 414: 0.625867453574992,
 474: -0.3100868364730211,
 483: 0.9494431566409385,
 603: -0.5765295860893552,
 18: 0.7672500755123294,
 63: 0.9698612260388879,
 68: 0.6173803722306589,
 103: 0.9699067711902921,
 182: 0.07053456158585983,
 198: 0.9169493006161777,
 215: 0.9249105602074973,
 219: 0.7777777777777778,
 220: -0.51163946190968,
 221: 0.06337242505244779,
 226: 0.6613000712661082,
 249: 0.918184516596722,
 274: 0.8346248495316817,
 298: 0.3206528811116609,
 307: 0.32328707534629597,
 380: 0.3333333333333333,
 391: -0.49374193110101877,
 400: 0.8095238095238095,
 448: 0.7757975234238114,
 489: 0.8306573520200808,
 514: 0.032213401200121235,
 555: -0.03925343359894298,
 561: 0.8053350300030309,
 590: 0.46006717724716567,
 606: 0.674199862463242,
 610: 0.918184516596722,
 1: 0,
 4: -0.8029550685469664,
 16: -0.7370434740955017,
 21: -0.5447047794019223,
 28: 0.59603956067927,
 50: 0.3711537444790451,
 64: 

# (I used another method to detect the degree of similarity called Euclidean Distance-based, but I don't use it, I just showed it to compare the methods and how they are applied.)
Let's discuss the Euclidean distance-based approach along with its advantages and disadvantages, and then compare it to the previous Pearson correlation coefficient method.

### Euclidean Distance-based Approach:

#### Advantages:
1. **Simplicity:** The Euclidean distance is straightforward to understand and implement. It measures the straight-line distance between two points in the rating space.
  
2. **Sensitivity to Magnitude:** Euclidean distance considers the magnitude of differences between ratings. Larger differences contribute more to the distance.

#### Disadvantages:
1. **Scale Sensitivity:** Euclidean distance is sensitive to the scale of the data. If users rate on different scales or have different rating patterns, the distance measure may not accurately reflect their similarity.

2. **Sparse Data:** It doesn't handle sparse data well. In collaborative filtering scenarios with many missing values, the Euclidean distance might not provide accurate results.

### Comparison with Pearson Correlation Coefficient:

#### Pearson Correlation Coefficient:

#### Advantages:
1. **Scale Invariance:** Pearson correlation is scale-invariant, meaning it is less affected by differences in the scale of ratings. It focuses more on the relative differences between ratings.

2. **Robustness to Sparse Data:** Pearson correlation can handle sparse data better than Euclidean distance because it considers the centered nature of data.

#### Disadvantages:
1. **Sensitive to Outliers:** Pearson correlation can be sensitive to outliers, especially when users have a small number of common items.

2. **Complexity:** The formula involves multiple steps, including normalization, making it slightly more complex than Euclidean distance.

### Conclusion:

- Use Euclidean distance if simplicity and sensitivity to magnitude are critical in your scenario, and the data is not too sparse or prone to scale variations.

- Use Pearson correlation coefficient if you want a more scale-invariant measure that is robust to sparse data, even though it involves a slightly more complex calculation.

In practice, the choice between these methods often depends on the characteristics of your data and the specific requirements of your recommendation system. It's common to experiment with different similarity metrics to determine which one performs best for a particular use case.

In [29]:
from scipy.spatial.distance import euclidean

# Store the similarity scores in a dictionary, where the key is the user Id and the value is the similarity score
euclideanSimilarityDict = {}

# For every user group in our subset
for name, group in usersubset_group:
    # Let's start by sorting the input and current user group by movieId to ensure consistent order
    group = group.sort_values(by='movieId')
    input_movie = input_movie.sort_values(by='movieId')

    # Get the number of ratings (N) for the formula
    nRatings = len(group)

    # Get the review scores for the movies that both users have in common
    common_movies = input_movie[input_movie['movieId'].isin(group['movieId'].tolist())]

    # Extract the rating lists for the common movies
    tempRatingList = common_movies['rating'].tolist()
    tempGroupList = group['rating'].tolist()

    # Calculate the Euclidean distance between the rating vectors of two users
    euclideanDistance = euclidean(tempRatingList, tempGroupList)

    # Calculate similarity score by taking the inverse of Euclidean distance (closer values are more similar)
    similarityScore = 1 / (1 + euclideanDistance)

    # Store the similarity score in the dictionary
    euclideanSimilarityDict[name] = similarityScore


In [30]:
euclideanSimilarityDict

{599: 0.3333333333333333,
 202: 0.30383243470068705,
 387: 0.25,
 414: 0.23582845781094,
 474: 0.2474401533514073,
 483: 0.25,
 603: 0.16879264089884097,
 18: 0.28172904669025317,
 63: 0.21089672205953397,
 68: 0.2708131845707603,
 103: 0.31451985913875646,
 182: 0.2708131845707603,
 198: 0.4721359549995794,
 215: 0.4494897427831781,
 219: 0.27429188517743175,
 220: 0.15776505912784203,
 221: 0.2674788903885893,
 226: 0.25824569976124334,
 249: 0.25,
 274: 0.27792629762666365,
 298: 0.2204812092115424,
 307: 0.2553967929896867,
 380: 0.21551468935838852,
 391: 0.25824569976124334,
 400: 0.2449655295864104,
 448: 0.30383243470068705,
 489: 0.32037724101704074,
 514: 0.21393876913398135,
 555: 0.25824569976124334,
 561: 0.32037724101704074,
 590: 0.2449655295864104,
 606: 0.4494897427831781,
 610: 0.25,
 1: 0.2708131845707603,
 4: 0.1987625491245426,
 16: 0.2708131845707603,
 21: 0.1896812679802183,
 28: 0.38742588672279304,
 50: 0.2708131845707603,
 64: 0.3761785115301142,
 66: 0.4,
 76

#### In order to be able to use the degree of similarity of users, we convert the pearsonCorrelationDict dictionary into a data frame using the following command and add the UserID column to it.

In [31]:
pearsonDF=pd.DataFrame.from_dict(pearsonCorrelationDict,orient='index')
pearsonDF.columns=['similarityIndex']
pearsonDF['userId']=pearsonDF.index
pearsonDF.index=range(len(pearsonDF['userId']))
pearsonDF

Unnamed: 0,similarityIndex,userId
0,0.810093,599
1,0.480384,202
2,0.608860,387
3,0.625867,414
4,-0.310087,474
...,...,...
95,-1.000000,17
96,1.000000,19
97,1.000000,24
98,-1.000000,32


#### Because we need to find users with high similarity, it is wise to sort all pearsonDF data frames (similarity) based on the highest similarity, and since the closer we get to the bottom of the list, the similarity decreases. We only use a part of the data frame to find the most similar user IDs

In [32]:
topUsers=pearsonDF.sort_values(by='similarityIndex',ascending=False)[0:50]
topUsers.head()

Unnamed: 0,similarityIndex,userId
93,1.0,6
97,1.0,24
96,1.0,19
81,1.0,477
46,1.0,132


#### As you can see, we obtained the ID of the users with their degree of similarity by Pearson correlation, and now to give the best suggestion to the user, we need to extract the score of each user to the movies he/she has seen.

In [33]:
# We merge top Users with ratings df through user Id
topUsersRating=topUsers.merge(ratings_df, left_on='userId', right_on='userId', how='inner')
topUsersRating.head()

Unnamed: 0,similarityIndex,userId,movieId,rating
0,1.0,6,2,4.0
1,1.0,6,3,5.0
2,1.0,6,4,3.0
3,1.0,6,5,5.0
4,1.0,6,6,4.0


##### (Since the definition of the pamphlet itself was complete, I will leave it  )Now all we need to do is simply multiply the movie rating by its weight (The similarity index), then sum up the new ratings and divide it by the sum of the weights.We can easily do this by simply multiplying two columns, then grouping up the dataframe by movieId and then dividing two columns:

##### It shows the idea of all similar users to candidate movies for the input user:

In [34]:
#Multiplies the similarity by the user's ratings
topUsersRating['weightedRating'] = topUsersRating['similarityIndex']*topUsersRating['rating']
topUsersRating.head()

Unnamed: 0,similarityIndex,userId,movieId,rating,weightedRating
0,1.0,6,2,4.0,4.0
1,1.0,6,3,5.0,5.0
2,1.0,6,4,3.0,3.0
3,1.0,6,5,5.0,5.0
4,1.0,6,6,4.0,4.0


#### We create a new data frame, this time the grouping is done based on movies, and then we sum the degree of similarity and weight of rate for each movie.

In [35]:
#Applies a sum to the topUsers after grouping it up by userId
tempTopUsersRating = topUsersRating.groupby('movieId').sum()[['similarityIndex','weightedRating']]
tempTopUsersRating.columns = ['sum_similarityIndex','sum_weightedRating']
tempTopUsersRating.head()

Unnamed: 0_level_0,sum_similarityIndex,sum_weightedRating
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,31.046707,114.023066
2,20.836904,68.033972
3,8.99754,26.825199
4,1.802955,4.204433
5,9.151023,29.246552


#### Here, it is enough to divide the total weighted score of each movie by the total similarity of each movie to get the weighted average of the recommendation.

In [36]:
#Creates an empty dataframe
recommendation_df_Collaborative_Filtering = pd.DataFrame()
#Now we take the weighted average
recommendation_df_Collaborative_Filtering['weighted average recommendation score'] = tempTopUsersRating['sum_weightedRating']/tempTopUsersRating['sum_similarityIndex']
recommendation_df_Collaborative_Filtering['movieId'] = tempTopUsersRating.index
recommendation_df_Collaborative_Filtering.head()

Unnamed: 0_level_0,weighted average recommendation score,movieId
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,3.67263,1
2,3.265071,2
3,2.981393,3
4,2.331967,4
5,3.195987,5


##### Now we have to sort based on the weighted average score. In fact, based on the similarity that we obtained and through this similarity, we weighted the amount of points that each movie received for the user so that we can find the highest rating for each movie according to the similarity of the user to other users.

In [37]:
recommendation_df_Collaborative_Filtering = recommendation_df_Collaborative_Filtering.sort_values(by='weighted average recommendation score', ascending=False)
recommendation_df_Collaborative_Filtering.head(10)

Unnamed: 0_level_0,weighted average recommendation score,movieId
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
918,5.0,918
65642,5.0,65642
94810,5.0,94810
88448,5.0,88448
3787,5.0,3787
1046,5.0,1046
3494,5.0,3494
5416,5.0,5416
60333,5.0,60333
156371,5.0,156371


#### We find the name of the movie by the ID of the movies

In [38]:
movies_df.loc[movies_df['movieId'].isin(recommendation_df_Collaborative_Filtering.head(10)['movieId'].tolist())]

Unnamed: 0,movieId,title,year
700,918,Meet Me in St. Louis,1944
799,1046,Beautiful Thing,1996
2610,3494,True Grit,1969
2830,3787,Shower (Xizao),1999
3852,5416,Cherish,2002
6783,60333,Encounters at the End of the World,2008
6954,65642,"Timecrimes (Cronocrímenes, Los)",2007
7656,88448,Paper Birds (Pájaros de papel),2010
7886,94810,Eva,2011
9256,156371,Everybody Wants Some,2016


#### As you can see, how different the results of the two methods are, I briefly explained the advantages and disadvantages of each method above.
#### As I said, I tried a lot to combine these two methods even I got help from artificial intelligence and search, but I was not able to find the correct answer. Anyway, this is my code, if you can help me to combine these two methods. do

In [None]:
# Reset the index of recommendationTable_df_Content_Based
recommendationTable_df_Content_Based = recommendationTable_df_Content_Based.reset_index()

# Convert recommendationTable_df_Content_Based to DataFrame if it's a Series
if isinstance(recommendationTable_df_Content_Based, pd.Series):
    recommendationTable_df_Content_Based = recommendationTable_df_Content_Based.to_frame()

# Reset the index of recommendation_df_Collaborative_Filtering
recommendation_df_Collaborative_Filtering = recommendation_df_Collaborative_Filtering.reset_index()

# Convert recommendation_df_Collaborative_Filtering to DataFrame if it's a Series
if isinstance(recommendation_df_Collaborative_Filtering, pd.Series):
    recommendation_df_Collaborative_Filtering = recommendation_df_Collaborative_Filtering.to_frame()

# Merge the recommendations from both methods
merged_recommendations = pd.merge(
    pd.DataFrame(recommendationTable_df_Content_Based),  # Convert recommendationTable_df to dataframe
    pd.DataFrame(recommendation_df_Collaborative_Filtering.drop('movieId', axis=1)),  # Convert recommendation_df to dataframe and drop 'movieId' column
    on='movieId',
    how='outer'
)

# Drop the 'level_0' column if it exists
merged_recommendations = merged_recommendations.drop('level_0', axis=1, errors='ignore')

# Fill NaN values with 0 (for movies that are not recommended by a particular method)
merged_recommendations = merged_recommendations.fillna(0)

# Define weights for content-based and collaborative filtering recommendations
content_based_weight = 0.7
collaborative_filtering_weight = 0.3

# Calculate the combined recommendation score
merged_recommendations['combined_score'] = (content_based_weight * merged_recommendations['content_based_score']) + \
                                           (collaborative_filtering_weight * merged_recommendations['weighted average recommendation score'])

# Sort the recommendations by the combined score in descending order
final_recommendations = merged_recommendations.sort_values(by='combined_score', ascending=False).head(10)

# Display the final recommendations
final_movie_details = movies_df.loc[movies_df['movieId'].isin(final_recommendations['movieId'].tolist())]
final_recommendations = pd.merge(final_recommendations, final_movie_details[['movieId', 'title']], on='movieId', how='left')
print(final_recommendations[['movieId', 'title', 'combined_score']])
