# PROJECT
# Hybrid Recommender System

We will make predictions using the User-Based and Item-Based Recommender methods for a given user with an ID, and we will get 5 recommendations from each model. Finally, we will combine the recommendations from both models to obtain a total of 10 recommendations.

* movie.csv
1. movieId: unique movie number (UniqueID)
1. title: movie title
* rating.csv
1. userId: unique user number (UniqueID)
1. movieId: unique movie number (UniqueID)
1. rating: rating given by the user for the movie
1. timestamp: evaluation date written in the form of Unix timestamp"

## 1. Data Preparation:
Step 1: We read the Movie and Rating datasets.
The Movie dataset contains movieId, movie title, and genre information.
The Rating dataset contains UserID, movie title, rating given by the user, and timestamp.

Step 2: We merge to the movie and rating to datasets.
In the Rating dataset, we currently have only the movie IDs for the movies that users rated.
We will use the Movie dataset to add the movie names and genres corresponding to the movie IDs.

Step 3: We calculate the total number of votes (ratings) received for each movie.
For each movie, we will calculate the total number of users who rated it.
We will remove movies from the dataset that have received less than 1000 votes.
The names of the movies with less than 1000 votes will be stored in the 'rare_movies' variable and removed from the dataset.

Step 4: We will create a pivot table for the DataFrame where the user IDs are in the index, movie names are in the columns, and the ratings are the values.

Step 5: We will encapsulate all the above operations into functions to make the data preparation process more modular and reusable.

In [22]:
import pandas as pd
import numpy as np

In [23]:
#Step 1: We read the Movie and Rating datasets.
#Step 2: We merge to the movie and rating to datasets.
movie= pd.read_csv("/kaggle/input/movielense20m/movie.csv")
rating = pd.read_csv("/kaggle/input/movielense20m/rating.csv")
df= movie.merge(rating, how="left", on ="movieId")
df.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.0,4.0,1999-12-11 13:36:47
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6.0,5.0,1997-03-13 17:50:52
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8.0,4.0,1996-06-05 13:37:51
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,10.0,4.0,1999-11-25 02:44:47
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11.0,4.5,2009-01-02 01:13:41


In [24]:
#Step 3: We calculate the total number of votes (ratings) received for each movie.
df["title"].value_counts()

Pulp Fiction (1994)                          67310
Forrest Gump (1994)                          66172
Shawshank Redemption, The (1994)             63366
Silence of the Lambs, The (1991)             63299
Jurassic Park (1993)                         59715
                                             ...  
Rapture (Arrebato) (1980)                        1
Education of Mohammad Hussein, The (2013)        1
Satanas (2007)                                   1
Psychosis (2010)                                 1
Innocence (2014)                                 1
Name: title, Length: 27262, dtype: int64

In [25]:
# We keep the names of movies with less than 1000 votes in rare_movies. And we subtract from the dataset
comments_counts= pd.DataFrame(df["title"].value_counts())
rare_movie= comments_counts[comments_counts["title"]<= 1000].index
common_movie= df[~df["title"].isin(rare_movie)]
common_movie["title"].value_counts()

Pulp Fiction (1994)                  67310
Forrest Gump (1994)                  66172
Shawshank Redemption, The (1994)     63366
Silence of the Lambs, The (1991)     63299
Jurassic Park (1993)                 59715
                                     ...  
Scanners (1981)                       1003
Pet Sematary II (1992)                1003
Return to Paradise (1998)             1003
Lincoln Lawyer, The (2011)            1001
Wristcutters: A Love Story (2006)     1001
Name: title, Length: 3159, dtype: int64

In [26]:
#Step 4: We will create a pivot table for the DataFrame where the user IDs are in the index, movie names are in the columns, and the ratings are the values.
user_movie_df = common_movie.pivot_table(index=["userId"], columns=["title"], values="rating")
user_movie_df

title,"'burbs, The (1989)",(500) Days of Summer (2009),*batteries not included (1987),...And Justice for All (1979),10 Things I Hate About You (1999),"10,000 BC (2008)",101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),102 Dalmatians (2000),12 Angry Men (1957),...,Zero Dark Thirty (2012),Zero Effect (1998),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zulu (1964),[REC] (2007),eXistenZ (1999),xXx (2002),¡Three Amigos! (1986)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,,,,,,,,,,,...,,,,,,,,,,
2.0,,,,,,,,,,,...,,,,,,,,,,
3.0,,,,,,,,,,,...,,,,,,,,,,
4.0,,,,,,,,,,,...,,,,,,,,,,
5.0,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
138489.0,,,,,,,,,,4.5,...,,,,,,,,,,
138490.0,,,,,,,,,,,...,,,,,,,,,,
138491.0,,,,,,,,2.5,,,...,,,,,,,,,,
138492.0,,,,,,,,,,,...,,,,,,,,,,


In [2]:
#Step 5: We encapsulate all the above operations into functions to make the data preparation process more modular and reusable.

def create_user_movie_df():
    import pandas as pd  # load library
    import numpy as np # load library
    movie= pd.read_csv("/kaggle/input/movielense20m/movie.csv") # load datasets
    rating = pd.read_csv("/kaggle/input/movielense20m/rating.csv") # load datasets
    df= movie.merge(rating, how="left", on ="movieId") # merge datasets
    comments_counts= pd.DataFrame(df["title"].value_counts()) # count to rating values DataFarame
    rare_movie= comments_counts[comments_counts["title"]<= 1000].index #Movies rated below 1000 were listened to from the data and indexes.
    common_movie= df[~df["title"].isin(rare_movie)] 
    user_movie_df = common_movie.pivot_table(index="userId", columns="title", values="rating") 
    return user_movie_df

In [3]:
# check to functions.
user_movie_df= create_user_movie_df()  
user_movie_df

title,"'burbs, The (1989)",(500) Days of Summer (2009),*batteries not included (1987),...And Justice for All (1979),10 Things I Hate About You (1999),"10,000 BC (2008)",101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),102 Dalmatians (2000),12 Angry Men (1957),...,Zero Dark Thirty (2012),Zero Effect (1998),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zulu (1964),[REC] (2007),eXistenZ (1999),xXx (2002),¡Three Amigos! (1986)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,,,,,,,,,,,...,,,,,,,,,,
2.0,,,,,,,,,,,...,,,,,,,,,,
3.0,,,,,,,,,,,...,,,,,,,,,,
4.0,,,,,,,,,,,...,,,,,,,,,,
5.0,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
138489.0,,,,,,,,,,4.5,...,,,,,,,,,,
138490.0,,,,,,,,,,,...,,,,,,,,,,
138491.0,,,,,,,,2.5,,,...,,,,,,,,,,
138492.0,,,,,,,,,,,...,,,,,,,,,,



## 2. Determining the Movies Watched by the User for Recommendation

Step 1: Choose a random user ID.

Step 2: Create a new DataFrame named random_user_df consisting of observations belonging to the selected user.

Step 3: Assign the movies rated by the selected user to a list named movies_watched.

In [52]:
#Step 1: Choose a random user ID.
random_user= int(pd.Series(user_movie_df.index).sample(1).values)
random_user

39694

In [53]:
#Step 2: Create a new DataFrame named random_user_df consisting of observations belonging to the selected user.
random_user_df= user_movie_df[user_movie_df.index == random_user]
print(random_user_df)

title    'burbs, The (1989)  (500) Days of Summer (2009)  \
userId                                                     
39694.0                 NaN                          NaN   

title    *batteries not included (1987)  ...And Justice for All (1979)  \
userId                                                                   
39694.0                             NaN                            NaN   

title    10 Things I Hate About You (1999)  10,000 BC (2008)  \
userId                                                         
39694.0                                NaN               NaN   

title    101 Dalmatians (1996)  \
userId                           
39694.0                    NaN   

title    101 Dalmatians (One Hundred and One Dalmatians) (1961)  \
userId                                                            
39694.0                                                NaN        

title    102 Dalmatians (2000)  12 Angry Men (1957)  ...  \
userId                                

In [54]:
# Step 3: Assign the movies rated by the selected user to a list named movies_watched.
movies_watched= random_user_df.columns[random_user_df.notna().any()].tolist()
print(movies_watched)

['Alien (1979)', 'Aliens (1986)', 'Bad Boys (1995)', 'Blade (1998)', 'Blues Brothers, The (1980)', 'Bone Collector, The (1999)', 'Braveheart (1995)', 'Corruptor, The (1999)', 'Die Hard (1988)', 'Dirty Dozen, The (1967)', 'Dr. No (1962)', 'End of Days (1999)', 'Evil Dead II (Dead by Dawn) (1987)', 'Excalibur (1981)', 'For Your Eyes Only (1981)', 'From Dusk Till Dawn (1996)', 'Fugitive, The (1993)', 'Full Metal Jacket (1987)', 'Glory (1989)', 'Godfather, The (1972)', 'Godfather: Part II, The (1974)', 'Good, the Bad and the Ugly, The (Buono, il brutto, il cattivo, Il) (1966)', 'Heat (1995)', 'House on Haunted Hill (1999)', 'Hunt for Red October, The (1990)', 'Indiana Jones and the Last Crusade (1989)', 'Jaws (1975)', 'Jurassic Park (1993)', 'Mask of Zorro, The (1998)', 'Matrix, The (1999)', 'Men in Black (a.k.a. MIB) (1997)', 'Midnight Run (1988)', 'Planet of the Apes (1968)', 'Police Academy (1984)', 'Princess Bride, The (1987)', 'Raiders of the Lost Ark (Indiana Jones and the Raiders of

## 3: Accessing Data and IDs of Other Users Who Watched the Same Movies

Step 1: Select the columns corresponding to the movies watched by the selected user from the user_movie_df DataFrame and create a new DataFrame named movies_watched_df.

Step 2: Create a new DataFrame named user_movie_count, which contains information about how many of the selected user's movies each user has watched.

Step 3: Consider users who have watched 60% or more of the movies rated by the selected user as similar users. Create a list named users_same_movies containing the IDs of these similar users.

In [55]:
#Step 1: Select the columns corresponding to the movies watched by the selected user from the user_movie_df DataFrame and create a new DataFrame named movies_watched_df.
movies_watched_df= user_movie_df[movies_watched]
user_movie_df

title,"'burbs, The (1989)",(500) Days of Summer (2009),*batteries not included (1987),...And Justice for All (1979),10 Things I Hate About You (1999),"10,000 BC (2008)",101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),102 Dalmatians (2000),12 Angry Men (1957),...,Zero Dark Thirty (2012),Zero Effect (1998),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zulu (1964),[REC] (2007),eXistenZ (1999),xXx (2002),¡Three Amigos! (1986)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,,,,,,,,,,,...,,,,,,,,,,
2.0,,,,,,,,,,,...,,,,,,,,,,
3.0,,,,,,,,,,,...,,,,,,,,,,
4.0,,,,,,,,,,,...,,,,,,,,,,
5.0,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
138489.0,,,,,,,,,,4.5,...,,,,,,,,,,
138490.0,,,,,,,,,,,...,,,,,,,,,,
138491.0,,,,,,,,2.5,,,...,,,,,,,,,,
138492.0,,,,,,,,,,,...,,,,,,,,,,


In [56]:
#Step 2: Create a new DataFrame named user_movie_count, which contains information about how many of the selected user's movies each user has watched.
user_movie_count= movies_watched_df.T.notnull().sum()
user_movie_count = user_movie_count.reset_index()
print(user_movie_count)

          userId   0
0            1.0  17
1            2.0   8
2            3.0  23
3            4.0   4
4            5.0  10
...          ...  ..
138488  138489.0   3
138489  138490.0   2
138490  138491.0   0
138491  138492.0   7
138492  138493.0  16

[138493 rows x 2 columns]


In [57]:
user_movie_count.columns= ["userId", "movie_count"]
user_movie_count

Unnamed: 0,userId,movie_count
0,1.0,17
1,2.0,8
2,3.0,23
3,4.0,4
4,5.0,10
...,...,...
138488,138489.0,3
138489,138490.0,2
138490,138491.0,0
138491,138492.0,7


In [58]:
#Step 3: Consider users who have watched 60% or more of the movies rated by the selected user as similar users. 
#Create a list named users_same_movies containing the IDs of these similar users.
qty_wacthed_film= len(movies_watched)  # random_userın izlediği film sayısı 
print(qty_wacthed_film)

51


In [59]:
users_same_movies=user_movie_count[user_movie_count["movie_count"]>qty_wacthed_film*60/100]["userId"]
print(users_same_movies)

53            54.0
57            58.0
90            91.0
103          104.0
115          116.0
            ...   
138306    138307.0
138324    138325.0
138381    138382.0
138396    138397.0
138410    138411.0
Name: userId, Length: 4242, dtype: float64


## 4.Determining the Most Similar Users to the User for Recommendation

Step 1: Filter the movies_watched_df DataFrame to include only the IDs of users who are similar to the selected user based on the user_same_movies list.

Step 2: Create a new DataFrame named corr_df to calculate the correlations between users.

corr_df[corr_df["user_id_1"] == random_user]

Step 3: Create a new DataFrame named top_users by filtering the users with high correlation (above 0.50) with the selected user.

Step 4: Merge the top_users DataFrame with the rating dataset.


In [60]:
#Step 1: Filter the movies_watched_df DataFrame to include only the IDs of users who are similar to the selected user based on the user_same_movies list.
movies_watched_df = pd.concat([movies_watched_df[movies_watched_df.index.isin(users_same_movies)], # user iDs
                      random_user_df[movies_watched]])
print(movies_watched_df)

title     Alien (1979)  Aliens (1986)  Bad Boys (1995)  Blade (1998)  \
userId                                                                 
54.0               5.0            5.0              NaN           NaN   
58.0               5.0            4.0              NaN           NaN   
91.0               4.0            4.0              2.5           3.0   
104.0              3.0            2.0              NaN           NaN   
116.0              NaN            NaN              3.0           4.0   
...                ...            ...              ...           ...   
138325.0           5.0            4.5              NaN           NaN   
138382.0           4.0            4.0              NaN           5.0   
138397.0           5.0            5.0              3.5           NaN   
138411.0           3.5            5.0              3.0           4.0   
39694.0            3.0            5.0              4.0           4.0   

title     Blues Brothers, The (1980)  Bone Collector, The (1999

In [66]:
#Step 2: Create a new DataFrame named corr_df to calculate the correlations between users.
#corr_df[corr_df["user_id_1"] == random_user]

corr_df = movies_watched_df.T.corr().unstack().sort_values().drop_duplicates()
corr_df= pd.DataFrame(corr_df, columns= ["corr"])
corr_df.index.names = ["user_id_1", "user_id_2"]
corr_df = corr_df.reset_index()
corr_df= corr_df[corr_df["user_id_1"] == random_user]
print(corr_df)

         user_id_1  user_id_2      corr
3102       39694.0    65130.0 -0.531504
4841       39694.0   125795.0 -0.506942
13670      39694.0    76540.0 -0.445205
16503      39694.0    44996.0 -0.432685
16504      39694.0    44996.0 -0.432685
...            ...        ...       ...
8047820    39694.0   130826.0  0.568369
8208751    39694.0    52810.0  0.602954
8208752    39694.0    52810.0  0.602954
8281917    39694.0    75173.0  0.623201
8281918    39694.0    75173.0  0.623201

[3309 rows x 3 columns]


In [70]:
#Step 3: Create a new DataFrame named top_users by filtering the users with high correlation (above 0.5) with the selected user.

top_users= corr_df[corr_df["corr"]> 0.5].sort_values("corr", ascending=False)
top_users

Unnamed: 0,user_id_1,user_id_2,corr
8281918,39694.0,75173.0,0.623201
8281917,39694.0,75173.0,0.623201
8208752,39694.0,52810.0,0.602954
8208751,39694.0,52810.0,0.602954
8047820,39694.0,130826.0,0.568369
8032148,39694.0,130715.0,0.565497
8016955,39694.0,67676.0,0.562772
8005730,39694.0,106268.0,0.560777
8000907,39694.0,117033.0,0.559897
7992520,39694.0,54236.0,0.558413


In [71]:
top_users= top_users[["user_id_2", "corr"]].reset_index(drop=True)
top_users

Unnamed: 0,user_id_2,corr
0,75173.0,0.623201
1,75173.0,0.623201
2,52810.0,0.602954
3,52810.0,0.602954
4,130826.0,0.568369
5,130715.0,0.565497
6,67676.0,0.562772
7,106268.0,0.560777
8,117033.0,0.559897
9,54236.0,0.558413


In [72]:
top_users.rename(columns= {"user_id_2": "userId"}, inplace=True)
top_users

Unnamed: 0,userId,corr
0,75173.0,0.623201
1,75173.0,0.623201
2,52810.0,0.602954
3,52810.0,0.602954
4,130826.0,0.568369
5,130715.0,0.565497
6,67676.0,0.562772
7,106268.0,0.560777
8,117033.0,0.559897
9,54236.0,0.558413


In [73]:
#Step 4: Merge the top_users DataFrame with the rating dataset.
rating= pd.read_csv("/kaggle/input/movielense20m/rating.csv")
top_users= top_users.merge(rating[["userId", "movieId", "rating"]], how="inner")
top_users= top_users[top_users["userId"] != random_user]  # top user içinden random userı çıkardık
top_users

Unnamed: 0,userId,corr,movieId,rating
0,75173.0,0.623201,1,4.0
1,75173.0,0.623201,3,5.0
2,75173.0,0.623201,5,3.0
3,75173.0,0.623201,6,4.0
4,75173.0,0.623201,10,5.0
...,...,...,...,...
18453,72596.0,0.507097,6333,3.5
18454,72596.0,0.507097,6377,3.0
18455,72596.0,0.507097,6539,4.0
18456,72596.0,0.507097,6874,3.0


## 5.Calculation of Weighted Average Recommendation Score and Selecting the Top 5 Films

Step 1: Create a new variable named weighted_rating, which consists of the product of each user's corr and rating values.

Step 2: Create a new DataFrame named recommendation_df, which contains the movie IDs and the average of weighted ratings for each movie, considering all users.

Step 3: Select movies from recommendation_df where the weighted rating is greater than 1.5 and sort them based on the weighted rating in descending order. Save the top 5 observations as movies_to_be_recommend.

Step 4: Retrieve the names of the recommended 5 movies.

In [74]:
#Step 1: Create a new variable named weighted_rating, which consists of the product of each user's corr and rating values.
top_users["weighted_rating"] = top_users["corr"] * top_users["rating"]
top_users

Unnamed: 0,userId,corr,movieId,rating,weighted_rating
0,75173.0,0.623201,1,4.0,2.492805
1,75173.0,0.623201,3,5.0,3.116006
2,75173.0,0.623201,5,3.0,1.869603
3,75173.0,0.623201,6,4.0,2.492805
4,75173.0,0.623201,10,5.0,3.116006
...,...,...,...,...,...
18453,72596.0,0.507097,6333,3.5,1.774840
18454,72596.0,0.507097,6377,3.0,1.521292
18455,72596.0,0.507097,6539,4.0,2.028389
18456,72596.0,0.507097,6874,3.0,1.521292


In [75]:
#Step 2: Create a new DataFrame named recommendation_df, which contains the movie IDs and the average of weighted ratings for each movie, considering all users.
recommendation_df= top_users.groupby("movieId").agg({"weighted_rating": "mean"})
recommendation_df

Unnamed: 0_level_0,weighted_rating
movieId,Unnamed: 1_level_1
1,2.076778
2,1.671941
3,1.946567
4,1.679692
5,1.792431
...,...
115210,2.557661
115617,2.841845
116823,2.609931
118696,2.515203


In [78]:
#Step 3: Select movies from recommendation_df where the weighted rating is greater than 1.5 and sort them based on the weighted rating in descending order. 
#Save the top 5 observations as movies_to_be_recommend.

movies_to_be_recommend=recommendation_df[recommendation_df["weighted_rating"]> 1.5].sort_values("weighted_rating", ascending= False)
movies_to_be_recommend= movies_to_be_recommend.head()
movies_to_be_recommend

Unnamed: 0_level_0,weighted_rating
movieId,Unnamed: 1_level_1
2620,3.116006
6702,3.116006
63479,3.116006
40010,3.116006
3933,3.116006


In [79]:
#Step 4: Retrieve the names of the recommended 5 movies.
movie= pd.read_csv("/kaggle/input/movielense20m/movie.csv")
movies_to_be_recommend_with_name=movies_to_be_recommend.merge(movie[["movieId", "title"]], on="movieId")
movies_to_be_recommend_with_name= movies_to_be_recommend_with_name["title"]
movies_to_be_recommend_with_name

0                    This Is My Father (1998)
1    Dickie Roberts: Former Child Star (2003)
2                            Sex Drive (2008)
3     Duck Season (Temporada de patos) (2004)
4                   Killer Shrews, The (1959)
Name: title, dtype: object


## 6.Item-Based Recommendation

Make an item-based narration based on the name of the movie that the user has watched most recently and has the highest scores.
user = 108170

Step 1: Read the movie and rating datasets.

Step 2: Obtain the movie ID with the highest rating from the user's top 5 rated movies.

Step 3: Filter the user_movie_df DataFrame based on the selected movie ID.

Step 4: Calculate and sort the correlation between the selected movie and other movies.

Step 5: Recommend the top 5 movies based on correlations (excluding the selected movie itself).

In [80]:
#Step 1: Read the movie and rating datasets.
movie= pd.read_csv("/kaggle/input/movielense20m/movie.csv")
rating = pd.read_csv("/kaggle/input/movielense20m/rating.csv")
df= movie.merge(rating, how="left", on ="movieId")
df.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.0,4.0,1999-12-11 13:36:47
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6.0,5.0,1997-03-13 17:50:52
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8.0,4.0,1996-06-05 13:37:51
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,10.0,4.0,1999-11-25 02:44:47
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11.0,4.5,2009-01-02 01:13:41


In [245]:
df[(df["userId"]== 108170) & (df["rating"]== 5.0)].sort_values("timestamp", ascending=False).iloc[0:1]

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
16849191,7044,Wild at Heart (1990),Crime|Drama|Mystery|Romance|Thriller,108170.0,5.0,2005-06-11 04:59:10


In [246]:
#Step 2: Obtain the movie ID with the highest rating from the user's top 5 rated movies.
movie_info= df[(df["userId"]== 101102) & (df["rating"]== 5.0)].sort_values("timestamp", ascending=False).iloc[0:1]
movie_info

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
81249,3,Grumpier Old Men (1995),Comedy|Romance,101102.0,5.0,1996-10-18 05:17:38


In [247]:
movie_name=movie_info["title"]
movie_name

81249    Grumpier Old Men (1995)
Name: title, dtype: object

In [6]:
#Step 3: Filter the user_movie_df DataFrame based on the selected movie ID.
filtered_df = user_movie_df["Grumpier Old Men (1995)"]
filtered_df

userId
1.0         NaN
2.0         4.0
3.0         NaN
4.0         NaN
5.0         NaN
           ... 
138489.0    NaN
138490.0    NaN
138491.0    NaN
138492.0    NaN
138493.0    NaN
Name: Grumpier Old Men (1995), Length: 138493, dtype: float64

In [7]:
#Step 4: Calculate and sort the correlation between the selected movie and other movies.
corr= user_movie_df.corrwith(filtered_df).sort_values(ascending=False)
corr

title
Grumpier Old Men (1995)                                   1.000000
Grumpy Old Men (1993)                                     0.774175
Nutty Professor II: The Klumps (2000)                     0.523128
City Slickers II: The Legend of Curly's Gold (1994)       0.516291
Herbie Goes Bananas (1980)                                0.511147
                                                            ...   
Thin Blue Line, The (1988)                               -0.124326
Grand Illusion (La grande illusion) (1937)               -0.125610
Battle of Algiers, The (La battaglia di Algeri) (1966)   -0.168030
Repulsion (1965)                                         -0.191074
Children of Paradise (Les enfants du paradis) (1945)     -0.198268
Length: 3159, dtype: float64

In [8]:
#Step 5: Recommend the top 5 movies based on correlations (excluding the selected movie itself).
recommendations= corr.sort_values(ascending=False).iloc[2:7]
recommendations

title
Nutty Professor II: The Klumps (2000)                  0.523128
City Slickers II: The Legend of Curly's Gold (1994)    0.516291
Herbie Goes Bananas (1980)                             0.511147
Dennis the Menace (1993)                               0.504967
Crocodile Dundee in Los Angeles (2001)                 0.494295
dtype: float64