<a href="https://colab.research.google.com/github/slyofzero/Movie-Recommender/blob/main/Movie_Recommender.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Aim-

To create an user liking based movie recommendation algorithm.

Data - https://raw.githubusercontent.com/slyofzero/Movie-Recommender/main/ratings.csv

---

##Cleaning the data.

Let's first load in the data and clean it up.

In [1]:
# Loading the dataset.
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")

ratings_df = pd.read_csv("https://raw.githubusercontent.com/slyofzero/Movie-Recommender/main/ratings.csv")
ratings_df.head()

Unnamed: 0,movie_id,title,user_id,rating
0,1,Toy Story (1995),308,4
1,1,Toy Story (1995),287,5
2,1,Toy Story (1995),148,4
3,1,Toy Story (1995),280,4
4,1,Toy Story (1995),66,3


Because we want to create a movie recommendation algorithm that recommends you movies based on the movies you liked in the past, we'll be needing a table with the total list of movies and the list of users who watched and rated that movie.

In [2]:
# Creating a table with user ratings by each user for each movie.
movie_ratings_matrix = ratings_df.pivot_table(index="user_id",columns="title",values="rating")
movie_ratings_matrix

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,,,,,,,,,...,,,,,,,,,,
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
939,,,,,,,,,,,...,,,,,,,,,,
940,,,,,,,,,,,...,,,,,,,,,,
941,,,,,,,,,,,...,,,,,,,,,,
942,,,,,,,,3.0,,3.0,...,,,,,,,,,,


Now that we have the table, let's find the correlation of ratings of one movie with the ratings of other movies.

In [3]:
# Viewing the correlation of a random movie with other movies.
movie_corr = movie_ratings_matrix.corr()["\'Til There Was You (1997)"]
movie_corr = movie_corr.sort_values(ascending = False)
movie_corr = movie_corr.dropna()
movie_corr

title
Amistad (1997)                  1.0
'Til There Was You (1997)       1.0
From Dusk Till Dawn (1996)      1.0
Home for the Holidays (1995)    1.0
Home Alone (1990)               1.0
                               ... 
Taxi Driver (1976)             -1.0
Close Shave, A (1995)          -1.0
Chinatown (1974)               -1.0
Celluloid Closet, The (1995)   -1.0
Wrong Trousers, The (1993)     -1.0
Name: 'Til There Was You (1997), Length: 341, dtype: float64

That's strange, here we are getting a lot of movies which have a perfect correlation with `\'Til There Was You (1997)`.

This might be because the movies like `Amistad (1997)` and `From Dusk Till Dawn (1996)` might not have had a lot of ratings.

So let's retrieve the movies with a high enough ratings count!

In [4]:
# Grouping each movie and finding the number of ratings it got and the average rating.
movie_ratings_info_df = ratings_df.groupby("title").agg({"rating":[np.size, np.mean]})
movie_ratings_info_df.columns = ["no. of ratings", "avg score"]
movie_ratings_info_df

Unnamed: 0_level_0,no. of ratings,avg score
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'Til There Was You (1997),9,2.333333
1-900 (1994),5,2.600000
101 Dalmatians (1996),109,2.908257
12 Angry Men (1957),125,4.344000
187 (1997),41,3.024390
...,...,...
Young Guns II (1990),44,2.772727
"Young Poisoner's Handbook, The (1995)",41,3.341463
Zeus and Roxanne (1997),6,2.166667
unknown,9,3.444444


In [5]:
# Finding out all the movies with number of ratings greater than or equal to the 75th percentile for the "no. of ratings" column.
valid_movies = movie_ratings_info_df[movie_ratings_info_df["no. of ratings"] >= movie_ratings_info_df["no. of ratings"].quantile(q = 0.75)].index
movie_ratings_matrix = movie_ratings_matrix[valid_movies]
movie_ratings_matrix

title,101 Dalmatians (1996),12 Angry Men (1957),2 Days in the Valley (1996),2001: A Space Odyssey (1968),Absolute Power (1997),"Abyss, The (1989)",Ace Ventura: Pet Detective (1994),Addams Family Values (1993),"Adventures of Priscilla, Queen of the Desert, The (1994)","African Queen, The (1951)",...,What's Eating Gilbert Grape (1993),When Harry Met Sally... (1989),While You Were Sleeping (1995),White Squall (1996),William Shakespeare's Romeo and Juliet (1996),Willy Wonka and the Chocolate Factory (1971),"Wizard of Oz, The (1939)","Wrong Trousers, The (1993)",Young Frankenstein (1974),Young Guns (1988)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,,,,,,,,,...,,,,,,,,,,
1,2.0,5.0,,4.0,,3.0,3.0,,,,...,4.0,5.0,4.0,,,4.0,4.0,5.0,5.0,3.0
2,,,,,3.0,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
939,,,,,,,,,,,...,,,,,,,,,,
940,,,,,,2.0,,,3.0,,...,,4.0,4.0,,,3.0,,,,
941,,,,,,,,,,,...,,,,,,,,,,
942,,,,3.0,,,,,,5.0,...,,4.0,,,,,,,,


In [6]:
movie_ratings_matrix.corr()["101 Dalmatians (1996)"]

title
101 Dalmatians (1996)                           1.000000
12 Angry Men (1957)                            -0.049890
2 Days in the Valley (1996)                     0.048973
2001: A Space Odyssey (1968)                   -0.043407
Absolute Power (1997)                           0.398783
                                                  ...   
Willy Wonka and the Chocolate Factory (1971)    0.105261
Wizard of Oz, The (1939)                        0.366592
Wrong Trousers, The (1993)                     -0.002382
Young Frankenstein (1974)                       0.158840
Young Guns (1988)                               0.119234
Name: 101 Dalmatians (1996), Length: 416, dtype: float64

Yes, that looks a lot better.

---

##Creating the recommendation algorithm.

Now that we have the list of valid movies and their ratings, we can check for movies with high correlation with the movies the user liked and recommend them!

In [7]:
# Picking out a random user and that user's movie preferences.
user_input = movie_ratings_matrix.loc[0, :].dropna()
user_input

title
Empire Strikes Back, The (1980)    5.0
Gone with the Wind (1939)          1.0
Star Wars (1977)                   5.0
Name: 0, dtype: float64

In [8]:
# Creating a function to predict the movies the user might like.
def movie_recommder(user_input):
  user_input = pd.Series(user_input)
  best_movies = pd.Series()

  for movie in user_input.index:
    highly_corr_movies = movie_ratings_matrix.corr()[movie].dropna()
    highly_corr_movies = highly_corr_movies.sort_values(ascending = False)
    highly_corr_movies = highly_corr_movies * user_input[movie]
    
    best_movies = best_movies.append(highly_corr_movies)

  best_movies = best_movies.sort_values(ascending = False)
  best_movies = best_movies.drop(index = user_input.index).index.unique()[:5]
  
  return best_movies

In [12]:
# Running the function.
print("List of movies you might like- \n")
for movie in movie_recommder(user_input):
  print(f"\t{movie}")

List of movies you might like- 

	Return of the Jedi (1983)
	Raiders of the Lost Ark (1981)
	Philadelphia Story, The (1940)
	Frighteners, The (1996)
	Con Air (1997)


As you can see the recommendations here are pretty good. Because the user liked two movies from the Star Wars franchise (`Empire Strikes Back, The (1980)` and `Star Wars (1977)`), the algorithm was able to recommend another  movie from the Star Wars franchise.

But wow! This was way easier than I thought it would be. Next time I would try to incorporate a few Machine Learning algorithms to have better recommendations.

---

#END OF THE PROJECT