In [1]:
import pandas as pd

In [4]:
metadata = pd.read_csv('./data/movies_metadata.csv', low_memory=False)

In [6]:
metadata.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


## 1. Simple Recommender ("Tell me what's trending")

The simplest way is to rank the movies based on their global user ratings. 

However, we have to consider the popularity of the movie as well. Otherwise, a movie with a rating of 9 from 10 voters will be considered 'better' than a movie with a rating of 8.9 from 10,000 voters.

This is pretty much like how we choose a restaurant, will you choose a 5-stat restaurant rated by only 5 people or a restaurant with 4.5 ratings from 1000 people? 

Thus, we will use a weighted rating for our simple recommender here:


$$WeightedRating (WR) = \frac{V}{V + V_{min}} \cdot R + \frac{V_{min}}{V + V_{min}} \cdot R_{avg} $$

Here 
* $V$ is the number of votes for this movie
* $V_{min}$ is the minimum number of votes for a movie to be considered
* $R$ is the average rating for this movie
* $R_{avg}$ is the average rating for all movies in our dataset

In [7]:
R_avg = metadata['vote_average'].mean()

R_avg

5.618207215134185

In [8]:
# Calculate minimun # of votes for a movie to be considered

V_min = metadata['vote_count'].quantile(0.9)    # Choose first 90% movies

V_min

160.0

In [10]:
# Create dataset for those movies with >= 160 votes

movie_cp = metadata.copy().loc[metadata['vote_count'] >= V_min]

movie_cp.shape

(4555, 24)

In [14]:
def weighted_rating(df, V_min=V_min, R_avg=R_avg):
    V = df['vote_count']
    R = df['vote_average']

    # Calculation based on the IMDB formula
    return (V/(V+V_min) * R) + (V_min/(V_min+V) * R_avg)

In [15]:
# Define a new feature 'score' and calculate its value with `weighted_rating()`
movie_cp['weighted_score'] = movie_cp.apply(weighted_rating, axis=1)

In [17]:
#Sort movies based on score calculated above
movie_cp = movie_cp.sort_values('weighted_score', ascending=False)

#Print the top 15 movies
movie_cp[['title', 'vote_count', 'vote_average', 'weighted_score']].head(20)

Unnamed: 0,title,vote_count,vote_average,weighted_score
314,The Shawshank Redemption,8358.0,8.5,8.445869
834,The Godfather,6024.0,8.5,8.425439
10309,Dilwale Dulhania Le Jayenge,661.0,9.1,8.421453
12481,The Dark Knight,12269.0,8.3,8.265477
2843,Fight Club,9678.0,8.3,8.256385
292,Pulp Fiction,8670.0,8.3,8.251406
522,Schindler's List,4436.0,8.3,8.206639
23673,Whiplash,4376.0,8.3,8.205404
5481,Spirited Away,3968.0,8.3,8.196055
2211,Life Is Beautiful,3643.0,8.3,8.187171


## 2. Content Based Recommender

However, our simple recommender still have some problem: What if a fantastic movie just got released? Isn't it unfair to exclude it from our list?

Plus, our user might not want to blindly follow the public, so recommending based on global rating is a little bit limited.