### PROJECT: HYBRID RECOMMENDER SYSTEM

- Make a guess for the user whose ID is given, using the item-based and user-based recomennder methods.
- Consider 5 suggestions from the user-based model and 5 suggestions from the item-based model and finally make 10 suggestions from 2 models.

#### TASK 1: DATA PREPARATION

In [1]:
import pandas as pd

#### Step 1: Read movie and rating csv file

In [2]:
movie = pd.read_csv("movie.csv")
movie.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [3]:
rating = pd.read_csv("rating.csv")
rating.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,2,3.5,2005-04-02 23:53:47
1,1,29,3.5,2005-04-02 23:31:16
2,1,32,3.5,2005-04-02 23:33:39
3,1,47,3.5,2005-04-02 23:32:07
4,1,50,3.5,2005-04-02 23:29:40


#### Step 2: Add the titles and genre of the movies to the rating dataset using the movie set.

In [4]:
df = movie.merge(rating, how="left", on="movieId")
df.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.0,4.0,1999-12-11 13:36:47
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6.0,5.0,1997-03-13 17:50:52
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8.0,4.0,1996-06-05 13:37:51
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,10.0,4.0,1999-11-25 02:44:47
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11.0,4.5,2009-01-02 01:13:41


#### Step 3: Calculate the total number of people who voted for each movie. Remove the movies with less than 1000 votes from the data set.

In [5]:
rating_counts = pd.DataFrame(df["movieId"].value_counts())
rating_counts

Unnamed: 0,movieId
296,67310
356,66172
318,63366
593,63299
480,59715
...,...
109526,1
109524,1
109522,1
109520,1


In [6]:
movie_index = rating_counts[rating_counts["movieId"]<=1000].index
movie_index

Int64Index([  4267,   3412,  95441,   1465,   1382,   2978,   7121,  45732,
              5553,  54004,
            ...
            109571, 109481,  34517, 109567, 109565, 109526, 109524, 109522,
            109520, 131262],
           dtype='int64', length=24119)

In [7]:
movie_df = df.query("movieId not in @movie_index")
movie_df

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.0,4.0,1999-12-11 13:36:47
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6.0,5.0,1997-03-13 17:50:52
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8.0,4.0,1996-06-05 13:37:51
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,10.0,4.0,1999-11-25 02:44:47
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11.0,4.5,2009-01-02 01:13:41
...,...,...,...,...,...,...
19981647,112852,Guardians of the Galaxy (2014),Action|Adventure|Sci-Fi,138166.0,4.0,2014-12-30 20:20:07
19981648,112852,Guardians of the Galaxy (2014),Action|Adventure|Sci-Fi,138177.0,4.0,2014-12-05 15:11:14
19981649,112852,Guardians of the Galaxy (2014),Action|Adventure|Sci-Fi,138280.0,3.0,2014-10-15 16:40:37
19981650,112852,Guardians of the Galaxy (2014),Action|Adventure|Sci-Fi,138380.0,5.0,2015-02-28 19:55:02


#### Step 4: In the index, there are userIDs, movie names in the columns and ratings as values. Create a pivot table for the dataframe.

In [8]:
user_movie_df = movie_df.pivot_table(index=["userId"], columns=["title"], values="rating")
user_movie_df

title,"'burbs, The (1989)",(500) Days of Summer (2009),*batteries not included (1987),...And Justice for All (1979),10 Things I Hate About You (1999),"10,000 BC (2008)",101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),102 Dalmatians (2000),12 Angry Men (1957),...,Zero Dark Thirty (2012),Zero Effect (1998),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zulu (1964),[REC] (2007),eXistenZ (1999),xXx (2002),¡Three Amigos! (1986)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,,,,,,,,,,,...,,,,,,,,,,
2.0,,,,,,,,,,,...,,,,,,,,,,
3.0,,,,,,,,,,,...,,,,,,,,,,
4.0,,,,,,,,,,,...,,,,,,,,,,
5.0,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
138489.0,,,,,,,,,,4.5,...,,,,,,,,,,
138490.0,,,,,,,,,,,...,,,,,,,,,,
138491.0,,,,,,,,2.5,,,...,,,,,,,,,,
138492.0,,,,,,,,,,,...,,,,,,,,,,


#### Step 5: Functualize the whole process

In [9]:
def create_userMovie_df():
    movie = pd.read_csv("movie.csv")
    rating = pd.read_csv("rating.csv")
    df = movie.merge(rating, how="left", on="movieId")
    rating_counts = pd.DataFrame(df["movieId"].value_counts())
    movie_index = rating_counts[rating_counts["movieId"] <= 1000].index
    movie_df = df.query("movieId not in @movie_index")
    user_movie = movie_df.pivot_table(index=["userId"], columns=["title"], values="rating")
    return user_movie

In [10]:
user_movie_df = create_userMovie_df()

#### TASK 2: Determining the movies watched by the user to be recommended

#### Step 1: Choose a user id randomly

In [11]:
random_user = int(pd.Series(user_movie_df.index).sample(1, random_state=42).values)
random_user

9761

#### Step 2: Create a new dataframe named random_user_df consisting of observation units of the selected user.

In [12]:
random_user_df = user_movie_df[user_movie_df.index == random_user]
random_user_df

title,"'burbs, The (1989)",(500) Days of Summer (2009),*batteries not included (1987),...And Justice for All (1979),10 Things I Hate About You (1999),"10,000 BC (2008)",101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),102 Dalmatians (2000),12 Angry Men (1957),...,Zero Dark Thirty (2012),Zero Effect (1998),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zulu (1964),[REC] (2007),eXistenZ (1999),xXx (2002),¡Three Amigos! (1986)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
9761.0,,,,,,,,,,,...,,,,,,,,,,


#### Step 3: Assign the movies voted by the selected user to a list called movies_watched.

In [13]:
movies_watched = random_user_df.columns[random_user_df.notna().any()].tolist()
movies_watched[:7]

['Ace Ventura: Pet Detective (1994)',
 'Ace Ventura: When Nature Calls (1995)',
 'Addams Family Values (1993)',
 'Aladdin (1992)',
 'American President, The (1995)',
 'Apollo 13 (1995)',
 'Babe (1995)']

#### TASK 3: Accessing the data and Ids of other users watching the same movies

#### Step 1: Select the columns of the movies watched by the selected user from user_movie_df and create a new dataframe named movies_watched_df.

In [14]:
movies_watched_df = user_movie_df[movies_watched]
movies_watched_df.head()

title,Ace Ventura: Pet Detective (1994),Ace Ventura: When Nature Calls (1995),Addams Family Values (1993),Aladdin (1992),"American President, The (1995)",Apollo 13 (1995),Babe (1995),"Basketball Diaries, The (1995)",Batman (1989),Batman Forever (1995),...,True Lies (1994),True Romance (1993),Twelve Monkeys (a.k.a. 12 Monkeys) (1995),Under Siege 2: Dark Territory (1995),"Usual Suspects, The (1995)","War, The (1994)",Waterworld (1995),What's Eating Gilbert Grape (1993),When a Man Loves a Woman (1994),While You Were Sleeping (1995)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,,,,,,,,,,,...,,,3.5,,3.5,,,3.5,,
2.0,,,,,,,,,,,...,,,,,,,,,,
3.0,,,,,,,,,,,...,,,4.0,,5.0,,,3.0,,
4.0,,3.0,,,,,,,,,...,3.0,,1.0,,,,,,,
5.0,,,,5.0,5.0,5.0,,,,,...,5.0,,,,,,,,,


#### Step 2: Create a new dataframe named user_movie_count, which contains the information about how many movies each user has watched. 

In [15]:
user_movie_count = pd.DataFrame(movies_watched_df.T.notnull().sum())
user_movie_count.head()

Unnamed: 0_level_0,0
userId,Unnamed: 1_level_1
1.0,11
2.0,3
3.0,12
4.0,4
5.0,13


In [16]:
user_movie_count.reset_index(inplace=True)

In [17]:
user_movie_count.columns=["userId","movie_count"]

In [18]:
user_movie_count.head()

Unnamed: 0,userId,movie_count
0,1.0,11
1,2.0,3
2,3.0,12
3,4.0,4
4,5.0,13


#### Step 3: We consider those who watch 60 percent or more of the movies voted by the selected user as similar users. Create a list named users_same_movies from the ids of these users.

In [19]:
percent = len(movies_watched) * 60 /100 
users_same_movies = user_movie_count[user_movie_count["movie_count"] > percent]["userId"]
users_same_movies.head()

28      29.0
115    116.0
155    156.0
157    158.0
183    184.0
Name: userId, dtype: float64

#### TASK 4: Determining the users who are most similar to the user to be suggested

#### Step 1: Filter the movies_watched_df dataframe to find the ids of the users that are similar to the selected user in the user_same_movies list.

In [20]:
filtered_df = pd.concat([movies_watched_df[movies_watched_df.index.isin(users_same_movies)],
               random_user_df[movies_watched]])
filtered_df.head()

title,Ace Ventura: Pet Detective (1994),Ace Ventura: When Nature Calls (1995),Addams Family Values (1993),Aladdin (1992),"American President, The (1995)",Apollo 13 (1995),Babe (1995),"Basketball Diaries, The (1995)",Batman (1989),Batman Forever (1995),...,True Lies (1994),True Romance (1993),Twelve Monkeys (a.k.a. 12 Monkeys) (1995),Under Siege 2: Dark Territory (1995),"Usual Suspects, The (1995)","War, The (1994)",Waterworld (1995),What's Eating Gilbert Grape (1993),When a Man Loves a Woman (1994),While You Were Sleeping (1995)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
29.0,3.0,3.0,3.0,4.0,4.0,4.0,4.0,3.0,4.0,3.0,...,3.0,5.0,3.0,4.0,4.0,3.0,3.0,,,3.0
116.0,3.5,2.5,2.0,3.0,2.0,3.0,2.0,,4.5,0.5,...,3.0,,4.0,1.5,4.5,,2.0,,,
156.0,3.0,,3.0,,5.0,5.0,3.0,,4.0,3.0,...,5.0,4.0,5.0,3.0,5.0,,3.0,4.0,,
158.0,2.0,1.0,,4.0,4.0,3.0,5.0,,5.0,3.0,...,4.0,4.0,4.0,3.0,3.0,,3.0,,,3.0
184.0,2.0,3.0,4.0,3.0,4.0,4.0,,,4.0,4.0,...,4.0,,3.0,,5.0,4.0,3.0,,4.0,4.0


#### Step 2: Create a new corr_df dataframe where users' correlations with each other will be found.

In [21]:
filtered_df.T.corr()

userId,29.0,116.0,156.0,158.0,184.0,298.0,330.0,401.0,435.0,579.0,...,137670.0,137686.0,137856.0,137878.0,137885.0,137925.0,138208.0,138483.0,138484.0,9761.0
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
29.0,1.000000,0.459213,0.158993,0.305576,0.355357,0.387116,0.443708,0.027652,0.260904,0.295339,...,0.319774,0.408648,0.058571,0.025504,0.333503,0.378747,0.425081,0.363880,0.460443,0.297668
116.0,0.459213,1.000000,0.241802,0.481358,0.248310,0.361012,0.444806,-0.105120,0.130854,0.598689,...,0.199535,0.548969,0.480451,0.015816,0.449926,0.432109,0.467195,0.369046,0.337566,0.339229
156.0,0.158993,0.241802,1.000000,0.069682,0.528312,0.278095,0.341646,0.129350,0.385910,0.064832,...,0.471628,0.236345,0.098134,0.156872,0.148275,0.501402,0.241056,0.224766,0.364209,0.181530
158.0,0.305576,0.481358,0.069682,1.000000,0.306179,0.161813,0.508892,-0.342091,0.451058,0.600453,...,0.146885,0.584178,0.574150,0.218859,0.477479,0.040616,0.610925,0.558426,0.528719,0.391993
184.0,0.355357,0.248310,0.528312,0.306179,1.000000,0.569289,0.391826,0.218726,0.463217,0.131655,...,0.297074,0.169566,-0.005192,0.295383,0.272277,0.449172,0.338769,0.230013,0.430823,0.057537
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
137925.0,0.378747,0.432109,0.501402,0.040616,0.449172,0.498012,0.397139,0.270919,0.414281,0.203534,...,0.315635,0.261719,0.238895,0.146465,0.422692,1.000000,0.326615,0.107058,0.196300,0.315315
138208.0,0.425081,0.467195,0.241056,0.610925,0.338769,0.381862,0.527479,0.077033,0.403005,0.483730,...,0.406978,0.613951,0.541996,0.181084,0.533451,0.326615,1.000000,0.573016,0.299629,0.421139
138483.0,0.363880,0.369046,0.224766,0.558426,0.230013,0.323149,0.472593,-0.202877,0.248310,0.539090,...,0.187951,0.538239,0.499798,-0.080995,0.180226,0.107058,0.573016,1.000000,0.318965,0.393371
138484.0,0.460443,0.337566,0.364209,0.528719,0.430823,0.449411,0.498779,-0.126082,0.265299,0.433731,...,0.308614,0.368959,0.221954,0.051967,0.201622,0.196300,0.299629,0.318965,1.000000,0.343597


In [22]:
corr_df = filtered_df.T.corr().unstack().sort_values()
corr_df.head()

userId    userId  
98926.0   101861.0   -0.676278
101861.0  98926.0    -0.676278
95874.0   43093.0    -0.642818
43093.0   95874.0    -0.642818
60159.0   81525.0    -0.626700
dtype: float64

In [23]:
corr_df = pd.DataFrame(corr_df, columns=["corr"])

corr_df.index.names = ['user_id_1', 'user_id_2']

corr_df = corr_df.reset_index()

corr_df.head()

Unnamed: 0,user_id_1,user_id_2,corr
0,98926.0,101861.0,-0.676278
1,101861.0,98926.0,-0.676278
2,95874.0,43093.0,-0.642818
3,43093.0,95874.0,-0.642818
4,60159.0,81525.0,-0.6267


In [24]:
corr_df[(corr_df["user_id_1"] == random_user) & (corr_df["corr"]>=0.60)].head(7)

Unnamed: 0,user_id_1,user_id_2,corr
2578304,9761.0,27041.0,0.60154
2578307,9761.0,27041.0,0.60154
2580404,9761.0,99169.0,0.602928
2580406,9761.0,99169.0,0.602928
2583730,9761.0,39510.0,0.60519
2583731,9761.0,39510.0,0.60519
2595251,9761.0,24893.0,0.613301


#### Step 3: Create a new dataframe named top_users by filtering out the users with high correlation (over 0.60) with the selected user.

In [25]:
top_users = corr_df[(corr_df["user_id_1"] == random_user) & (corr_df["corr"]>=0.60)][["user_id_2","corr"]]
top_users.reset_index(drop=True)
top_users.head()

Unnamed: 0,user_id_2,corr
2578304,27041.0,0.60154
2578307,27041.0,0.60154
2580404,99169.0,0.602928
2580406,99169.0,0.602928
2583730,39510.0,0.60519


In [26]:
top_users.rename(columns={"user_id_2": "userId"}, inplace=True)

In [27]:
top_users = top_users.sort_values(by='corr', ascending=False)
top_users.head()

Unnamed: 0,userId,corr
2679764,9761.0,1.0
2678712,9761.0,1.0
2678639,9761.0,1.0
2678126,9761.0,1.0
2630650,110670.0,0.644594


#### Step 4: Merge the top_users dataframe with the rating dataset

In [28]:
top_users_ratings = top_users.merge(rating[["userId", "movieId", "rating"]], how='inner')

In [29]:
top_users_ratings = top_users_ratings[top_users_ratings["userId"] != random_user]
top_users_ratings.head()

Unnamed: 0,userId,corr,movieId,rating
360,110670.0,0.644594,1,4.0
361,110670.0,0.644594,5,2.5
362,110670.0,0.644594,6,4.0
363,110670.0,0.644594,10,2.5
364,110670.0,0.644594,11,3.0


#### TASK 5: Calculate weighted average recommendation score and select the first 5 movie

#### Step 1: Create a new variable named weighted_rating, which is the product of each user's corr and rating.

In [30]:
top_users_ratings["weighted_rating"] = top_users_ratings["corr"] * top_users_ratings["rating"]
top_users_ratings.head()

Unnamed: 0,userId,corr,movieId,rating,weighted_rating
360,110670.0,0.644594,1,4.0,2.578377
361,110670.0,0.644594,5,2.5,1.611485
362,110670.0,0.644594,6,4.0,2.578377
363,110670.0,0.644594,10,2.5,1.611485
364,110670.0,0.644594,11,3.0,1.933783


#### Step 2: Create a new dataframe named recommendation_df containing the movie id and the average value of the weighted ratings of all users for each movie.

In [31]:
recommendation_df = top_users_ratings.groupby("movieId").agg({"weighted_rating":"mean"})
recommendation_df.reset_index(inplace=True)
recommendation_df.head()

Unnamed: 0,movieId,weighted_rating
0,1,2.377815
1,2,1.731954
2,3,1.849852
3,4,1.860318
4,5,1.439553


#### Step 3: Select the films with a weighted rating greater than 3 in recommendation_df and rank them by weighted rating. Save the first 5 observations as movies_to_be_recommend.

In [32]:
movies_to_be_recommend = recommendation_df[recommendation_df["weighted_rating"] >= 3].sort_values(by="weighted_rating",ascending=False).head()
movies_to_be_recommend

Unnamed: 0,movieId,weighted_rating
1139,2533,3.149799
1182,2660,3.149799
1151,2571,3.072544
1983,4993,3.068361
2161,5952,3.068361


#### Step 4: The names of 5 recommended movies

In [33]:
movies_to_be_recommend.merge(movie[["movieId","title"]])

Unnamed: 0,movieId,weighted_rating,title
0,2533,3.149799,Escape from the Planet of the Apes (1971)
1,2660,3.149799,"Thing from Another World, The (1951)"
2,2571,3.072544,"Matrix, The (1999)"
3,4993,3.068361,"Lord of the Rings: The Fellowship of the Ring,..."
4,5952,3.068361,"Lord of the Rings: The Two Towers, The (2002)"


#### TASK 6: ITEM-BASED RECOMMENDATION
- Make an item-based suggestion based on the name of the movie that the user last watched and gave the highest rating.

In [34]:
user = 108170

#### Step 1: Read movie and rating csv files

In [35]:
movie = pd.read_csv("movie.csv")
rating = pd.read_csv("rating.csv")

#### Step 2: Get the id of the movie with the most recent score from the movies that the user to be recommended gives 5 points. 

In [36]:
user_recommend = rating[(rating["userId"]==user) & (rating["rating"]==5)].sort_values(by="timestamp",ascending=False).head(1)
user_recommend

Unnamed: 0,userId,movieId,rating,timestamp
15643060,108170,7044,5.0,2005-06-11 04:59:10


In [37]:
movie_id = int(user_recommend["movieId"].values)
movie_id

7044

#### Step 3: Filter the user_movie_df dataframe created in the User based recommendation section according to the selected movie id.

In [38]:
movie_choosen = movie[movie["movieId"]==movie_id]["movieId"].values[0]
user_movie_df_filtered = user_movie_df[movie[movie["movieId"]==movie_choosen]["title"].values[0]]
user_movie_df_filtered

userId
1.0        NaN
2.0        NaN
3.0        NaN
4.0        NaN
5.0        NaN
            ..
138489.0   NaN
138490.0   NaN
138491.0   NaN
138492.0   NaN
138493.0   NaN
Name: Wild at Heart (1990), Length: 138493, dtype: float64

In [39]:
user_movie_df.shape

(138493, 3159)

In [40]:
user_movie_df_filtered .shape

(138493,)

In [41]:
user_movie_df_filtered.notna().sum()

1537

#### Step 4: Using the filtered dataframe, find the correlation of the selected movie with the other movies and rank them.

In [42]:
corr = user_movie_df.corrwith(user_movie_df_filtered).sort_values(ascending=False).head(10)
corr

title
Wild at Heart (1990)                     1.000000
My Science Project (1985)                0.570187
Mediterraneo (1991)                      0.538868
Old Man and the Sea, The (1958)          0.536192
National Lampoon's Senior Trip (1995)    0.533029
Clockwatchers (1997)                     0.483337
Repo Man (1984)                          0.478674
Lost Highway (1997)                      0.476251
Blue Velvet (1986)                       0.471225
Jeffrey (1995)                           0.457849
dtype: float64

#### Step 5: Give the first 5 movies as suggestions except the selected movie itself.

In [43]:
movie_list = corr[1:6].index

In [44]:
for movie in movie_list:
    print(movie)

My Science Project (1985)
Mediterraneo (1991)
Old Man and the Sea, The (1958)
National Lampoon's Senior Trip (1995)
Clockwatchers (1997)
