![Project](https://github.com/oguzerdo/recommender-systems/blob/main/images/project.png?raw=true)
# Business Problem

For the user whose ID is given, it is desired to make 10 movie recommendations using item-based and user-based recommender methods

In [2]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)

# Section I - Data Preparation

### Step #1
Read Data

In [3]:
movie = pd.read_csv('datasets/movie_lens_dataset/movie.csv')
rating = pd.read_csv('datasets/movie_lens_dataset/rating.csv')

### Step #2 
Merge data

In [4]:
df = movie.merge(rating, how="left", on="movieId")

### Step #3
Drop rare movies

In [41]:
comment_counts = pd.DataFrame(df["title"].value_counts())
rare_movies = comment_counts[comment_counts["title"] <= 10000].index
common_movies = df[~df["title"].isin(rare_movies)]
user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")

In [42]:
# Section II - Determining the movies watched by the user who will be recommended

In [43]:
# Select random user
random_user = 108170
random_user_df = user_movie_df[user_movie_df.index == random_user]
movies_watched = random_user_df.columns[random_user_df.notna().any()].tolist()

In [48]:
# 139 movies watched
len(movies_watched)

139

# Section III - Access data and Ids of other users watching the same movies

In [49]:
movies_watched_df = user_movie_df[movies_watched]
user_movie_count = movies_watched_df.T.notnull().sum()
user_movie_count = user_movie_count.reset_index()

In [50]:
user_movie_count.columns = ["userId", "movie_count"]
user_movie_count.head()

Unnamed: 0,userId,movie_count
0,1.0,47
1,2.0,10
2,3.0,43
3,4.0,4
4,5.0,15


In [51]:
# 60% of the 186 movies watched by the user (i.e. 111)
perc = len(movies_watched) * 60 / 100

In [52]:
users_same_movies = user_movie_count[user_movie_count["movie_count"] > perc]["userId"]
users_same_movies.head()

57      58.0
90      91.0
115    116.0
146    147.0
155    156.0
Name: userId, dtype: float64

In [53]:
# 2326 people watched a movie with a random user.
# Control all values unique
users_same_movies.value_counts()

58.0        1
91994.0     1
92362.0     1
92269.0     1
92260.0     1
           ..
45919.0     1
45891.0     1
45874.0     1
45815.0     1
138411.0    1
Name: userId, Length: 4902, dtype: int64

# Section IV - Identifying the most similar users to the user who will be recommended

In [54]:
final_df = movies_watched_df[movies_watched_df.index.isin(users_same_movies)]

In [55]:
corr_df = final_df.T.corr().unstack().sort_values()
corr_df = pd.DataFrame(corr_df, columns=["corr"])
corr_df.index.names = ['user_id_1', 'user_id_2']
corr_df = corr_df.reset_index()
corr_df.head()

Unnamed: 0,user_id_1,user_id_2,corr
0,21853.0,79828.0,-0.59208
1,79828.0,21853.0,-0.59208
2,72838.0,110130.0,-0.580457
3,110130.0,72838.0,-0.580457
4,56879.0,62340.0,-0.576827


In [56]:
top_users = corr_df[(corr_df["user_id_1"] == random_user) & 
                    (corr_df["corr"] >= 0.65) & 
                    (corr_df["user_id_2"] != random_user)].reset_index(drop=True)


In [57]:
top_users = top_users.sort_values(by='corr', ascending=False)
top_users.shape
top_users["user_id_2"].value_counts()
top_users.rename(columns={"user_id_2": "userId"}, inplace=True)

# There were 24 people with a correlation greater than 65%. Their ids and correlations
top_users

Unnamed: 0,user_id_1,userId,corr
23,108170.0,89195.0,0.737658
22,108170.0,11517.0,0.719326
21,108170.0,5155.0,0.710507
20,108170.0,121747.0,0.704917
19,108170.0,82860.0,0.702602
18,108170.0,44435.0,0.694021
17,108170.0,23753.0,0.685583
16,108170.0,89202.0,0.678307
15,108170.0,42497.0,0.677204
14,108170.0,79270.0,0.674767


In [62]:
rating = pd.read_csv('datasets/movie_lens_dataset/rating.csv')
top_users_ratings = top_users.merge(rating[["userId", "movieId", "rating"]], how='inner')
top_users_ratings.head()
top_users_ratings.shape

(15419, 5)

# Section V - Calculating the Weighted Average Recommendation Score

In [63]:
top_users_ratings['weighted_rating'] = top_users_ratings['corr'] * top_users_ratings['rating']

top_users_ratings.groupby('movieId').agg({"weighted_rating": "mean"})

Unnamed: 0_level_0,weighted_rating
movieId,Unnamed: 1_level_1
1,1.980258
2,1.275607
3,1.879999
5,1.729232
6,2.844900
...,...
116797,2.630109
116823,1.798314
116887,2.157977
117133,2.877303


In [64]:
recommendation_df = top_users_ratings.groupby('movieId').agg({"weighted_rating": "mean"})
recommendation_df = recommendation_df.reset_index()

recommendation_df.head()

Unnamed: 0,movieId,weighted_rating
0,1,1.980258
1,2,1.275607
2,3,1.879999
3,5,1.729232
4,6,2.8449


In [66]:
# Let's get the ones with weighted_rating greater than 3:
recommendation_df[recommendation_df["weighted_rating"] > 3]
movies_to_be_recommend = recommendation_df[recommendation_df["weighted_rating"] > 3].sort_values("weighted_rating", ascending=False)[0:5]

In [67]:
movies_to_be_recommend.merge(movie[["movieId", "title"]]).index

Int64Index([0, 1, 2, 3, 4], dtype='int64')

# Section VI -  Item-Based Recommendation

- Make an item-based recommendation based on the name of the most recent movie the user has watched and rated the highest.
- Make 10 suggestions, 
    - 5 suggestions user-based
    - 5 suggestions item-based

In [84]:
user = 10845

# Öneri yapılacak kullanıcının 5 puan verdiği filmlerden puanı en güncel olan filmin id'sinin alınması:
movie_id = rating[(rating["userId"] == user) & (rating["rating"] == 5.0)].sort_values(by="timestamp", ascending=False)["movieId"][0:1].values[0]

In [85]:
def item_based_recommender(movie_name, user_movie_df):
    movie = user_movie_df[movie_name]
    return user_movie_df.corrwith(movie).sort_values(ascending=False).head(10)


movies_from_item_based = item_based_recommender(movie[movie["movieId"] == movie_id]["title"].values[0], user_movie_df)
# 1 to 6. 0 is the movie itself. We left it out.
movies_from_item_based[1:11]

title
Ghost (1990)                   0.466227
Pretty Woman (1990)            0.434943
Sleepless in Seattle (1993)    0.395029
Twister (1996)                 0.391597
Pocahontas (1995)              0.376469
Dirty Dancing (1987)           0.373146
Little Mermaid, The (1989)     0.372924
Mrs. Doubtfire (1993)          0.370549
Speed (1994)                   0.369893
dtype: float64