# Weighted Rating
When recommending most popular items we had to eliminate approximately 95% of our movies to calculate meaningful average rating. However, if we want to perform personalization on recommendations we should have score for each movie we have. In order to achieve it weighted average score method will be used.

The formula:

*$W = (Rv + Cm) / (v + m)$*

* W: Weighted Rating
* R: Average rating for the movie
* C: Total average rating
* v: Number of votes for the movie
* m: Minimum vote count required to be listed in top 250

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
df_ratings = pd.read_csv('datasets/user_ratings.csv')
df_ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,5,1,4.0,847434962,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,7,1,4.5,1106635946,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
3,15,1,2.5,1510577970,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
4,17,1,4.5,1305696483,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


In [3]:
# R: Average rating for each movie
R = df_ratings.groupby('title')['rating'].mean().to_dict()
R['Toy Story (1995)']

3.9209302325581397

In [4]:
# C: Total average rating
C = df_ratings['rating'].mean()
C

3.501556983616962

In [5]:
# v: Vote count for each movie
v = df_ratings['title'].value_counts().to_dict()
v['Toy Story (1995)']

215

In [6]:
# m: Minimum vote count required to be listed in top 250
m = sorted(df_ratings['title'].value_counts().values,
           reverse=True)[250]
m

71

In [7]:
df_movies_voted = df_ratings[['movieId', 'title', 'genres']].drop_duplicates(subset='title').reset_index(drop=True)
print(f"{df_movies_voted.shape[0]} movies are voted at least once.")
df_movies_voted.head()

9719 movies are voted at least once.


Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,3,Grumpier Old Men (1995),Comedy|Romance
2,6,Heat (1995),Action|Crime|Thriller
3,47,Seven (a.k.a. Se7en) (1995),Mystery|Thriller
4,50,"Usual Suspects, The (1995)",Crime|Mystery|Thriller


In [8]:
# Calculating weighted average rating
df_movies_voted['weighted_rating'] = df_movies_voted['title'].apply(lambda x: (R[x]*v[x] + C*m) / (v[x] + m))
df_movies_sorted = df_movies_voted.sort_values(by='weighted_rating', ascending=False).reset_index(drop=True)
df_movies_sorted.head()

Unnamed: 0,movieId,title,genres,weighted_rating
0,318,"Shawshank Redemption, The (1994)",Crime|Drama,4.259306
1,2959,Fight Club (1999),Action|Crime|Drama|Thriller,4.083427
2,858,"Godfather, The (1972)",Crime|Drama,4.076466
3,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi,4.070219
4,296,Pulp Fiction (1994),Comedy|Crime|Drama|Thriller,4.06643


In [9]:
df_movies_sorted.tail()

Unnamed: 0,movieId,title,genres,weighted_rating
9714,1499,Anaconda (1997),Action|Adventure|Thriller,3.067455
9715,1562,Batman & Robin (1997),Action|Adventure|Fantasy|Thriller,3.023102
9716,1882,Godzilla (1998),Action|Sci-Fi|Thriller,3.010678
9717,435,Coneheads (1993),Comedy|Sci-Fi,2.993362
9718,2701,Wild Wild West (1999),Action|Comedy|Sci-Fi|Western,2.948472


### Conclusion

A new movie recommendation engine is generated with respect to vote counts and average ratings of movies. Unlike **most_popular_items.ipynb** notebook while recommending most popular movies unpopular movies are not filtered out from the dataset. Thus, unpopular movies can be used at personalized recommendations in future works.