# Simple Recommender

### A simple movie recommender system based on weighted movie ratings (from datacamp tutorial)

To start, we'll download data sets from Full MovieLens Dataset and load datasets. The small dataset is used here, which consist of 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users.

In [2]:
#import libraries

import pandas as pd
import numpy as np


In [73]:
ratings = pd.read_csv('ml-latest-small/ratings.csv')
movies = pd.read_csv('ml-latest-small/movies.csv')
ratings.head()
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


Next, we'll calculate the average ratings of each movie and the number of ratings received for each movie

In [74]:
ratings_group = ratings[['movieId','rating']]
ratings_group = ratings_group.groupby(['movieId']).agg(vote_avg=("rating","mean"), vote_count=("rating","count"))
ratings_group

Unnamed: 0_level_0,vote_avg,vote_count
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,3.920930,215
2,3.431818,110
3,3.259615,52
4,2.357143,7
5,3.071429,49
...,...,...
193581,4.000000,1
193583,3.500000,1
193585,3.500000,1
193587,3.500000,1


Combine both movies and ratings dataframe to one single dataframe

In [75]:
movies_ratings = pd.merge(movies,ratings_group,on=['movieId'])
movies_ratings

Unnamed: 0,movieId,title,genres,vote_avg,vote_count
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.920930,215
1,2,Jumanji (1995),Adventure|Children|Fantasy,3.431818,110
2,3,Grumpier Old Men (1995),Comedy|Romance,3.259615,52
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance,2.357143,7
4,5,Father of the Bride Part II (1995),Comedy,3.071429,49
...,...,...,...,...,...
9719,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy,4.000000,1
9720,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy,3.500000,1
9721,193585,Flint (2017),Drama,3.500000,1
9722,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation,3.500000,1


Now, we'll generate a weight rating for each movie based on the IMDB formula: 

##### Weighted Rating = (m/(v+m))*R + (m/(v+m))*C

where,

v is the number of votes for the movie;
m is the minimum votes required to be listed in the chart;
R is the average rating of the movie;
C is the mean vote across the whole report.

We have values to R (vote_avg) and v (vote_count). We will need to determine the values of C and m. In this project, m is set as the 90th percentile,  where we consider the top 10% of the movies in terms of the number of votes garnered.

In [76]:
#Determine value of C
C = movies_ratings['vote_avg'].mean()
C

3.262448274810963

It is observed that the mean rating acorss all movies is around 3.26 on a scale of 5

Next, we calculate the number of votes, m, received by a movie in the 90th percentile

In [77]:
#Determine value of m
m = movies_ratings['vote_count'].quantile(0.90)
m

27.0

The top 10% of the movies in terms of numer of votes garnered must receive at least 27 vote counts. With this, we can generate a new dataframe with movies having at least 27 vote counts.

In [78]:
movies_ratings_filtered = movies_ratings.copy().loc[movies_ratings['vote_count'] >= m]
movies_ratings_filtered.shape

(976, 5)

There are 976 movies that makes the mark to the minum number of vote counts, about 10% of all the movies in the list

Now, we can calculate the weighted rating for each movie

In [79]:
#Define a function to calculate the weighted rating
def weighted_rating(x, C, m):
    R = x['vote_avg']
    v = x['vote_count']

    return (v/(v+m) * R) + (m/(m+v) * C)

In [80]:
#Define a new column score for the weighted ratings calculated
movies_ratings_filtered['score'] = weighted_rating(x=movies_ratings_filtered,C=C,m=m)
movies_ratings_filtered

Unnamed: 0,movieId,title,genres,vote_avg,vote_count,score
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.920930,215,3.847463
1,2,Jumanji (1995),Adventure|Children|Fantasy,3.431818,110,3.398439
2,3,Grumpier Old Men (1995),Comedy|Romance,3.259615,52,3.260584
4,5,Father of the Bride Part II (1995),Comedy,3.071429,49,3.139291
5,6,Heat (1995),Action|Crime|Thriller,3.946078,102,3.802993
...,...,...,...,...,...,...
8861,134130,The Martian (2015),Adventure|Drama|Sci-Fi,4.000000,48,3.734481
8882,134853,Inside Out (2015),Adventure|Animation|Children|Comedy|Drama|Fantasy,3.813953,43,3.601230
8972,139385,The Revenant (2015),Adventure|Drama,3.903226,31,3.604933
9205,152081,Zootopia (2016),Action|Adventure|Animation|Children|Comedy,3.890625,32,3.603154


In [81]:
#Get the top 20 movies
movies_ratings_filtered = movies_ratings_filtered.sort_values('score',ascending=False)
movies_ratings_filtered.head(20)

Unnamed: 0,movieId,title,genres,vote_avg,vote_count,score
277,318,"Shawshank Redemption, The (1994)",Crime|Drama,4.429022,317,4.33746
659,858,"Godfather, The (1972)",Crime|Drama,4.289062,192,4.162494
2224,2959,Fight Club (1999),Action|Crime|Drama|Thriller,4.272936,218,4.161576
224,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi,4.231076,251,4.137
46,50,"Usual Suspects, The (1995)",Crime|Mystery|Thriller,4.237745,204,4.123749
257,296,Pulp Fiction (1994),Comedy|Crime|Drama|Thriller,4.197068,307,4.121515
461,527,Schindler's List (1993),Drama|War,4.225,220,4.119782
1938,2571,"Matrix, The (1999)",Action|Sci-Fi|Thriller,4.192446,278,4.110118
897,1196,Star Wars: Episode V - The Empire Strikes Back...,Action|Adventure|Sci-Fi,4.21564,211,4.107505
314,356,Forrest Gump (1994),Comedy|Drama|Romance|War,4.164134,329,4.095747
