# Simple Recommenders
Simple recommenders are basic systems that recommend the top items based on a certain metric or score. In this notebook, I build a simplified clone of IMDB Top 250 Moives using metadata collect from IMDB. The subset dataset can be downloaded from [here](https://www.kaggle.com/rounakbanik/the-movies-dataset/data).

In [1]:
import pandas as pd

In [46]:
# load movies metadata datasets into Dataframe
metadata = pd.read_csv('movies_metadata.csv', low_memory=False)
# Show firts two rows
metadata.head(2)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0


Let's use its weighted rating formula as a metric/score. Mathematically, it is represented as follows:
$$ WeightedRating(WR)= \left ( \frac{v}{v+m} \cdot R\right ) + \left ( \frac{m}{v+m} \cdot C\right )$$
* `v` is the number of votes for the movie.
* `m` is the minimum votes required to be listed in the chart.
* `R` is the average rating of the movie
* `C` is the mean vote across the whole report.

Already have the values `v`(vote_count) and `R`(vote_average) for each movie in the dataset.  

As a first step, let's calculate the value of `C`, the mean rating across all movies.

In [27]:
C = metadata['vote_average'].mean()
C

5.618207215134185

Next, let's calculate the number of votes, `m`, received by a movie in the 90th percentile. (For a movie to be featured in the charts, it must have more votes than at least 90%). You can consider `m` as a preliminary negative filter that will simply remove the movies which have a number of votes less than a certain threshold `m`. 

In [33]:
# 
m = metadata['vote_count'].quantile(0.9)
print(m)

160.0


Now we have the `m` you can simply use a greater than equal to condition to filter out movies having greater than equal to `160` vote counts.

In [35]:
q_movies = metadata.copy()
mask = (q_movies['vote_count'] >= m)
q_movies = q_movies.loc[mask]
print('Movie couunt after filtering:', q_movies.shape[0])

Movie couunt after filtering: 4555


 Next, Calculate the weighted rating for each qualified movie.
 1. Define a function `weighted_rating()`
 2. We already have `m` and `C` will simply pass them as an arguent to the function.
 3. Select the `v(vote_count)` and `R(vote_average)` column from the `q_moives` data framer.
 4. Finally, compute the weighted average and return the result.

In [40]:
def weighted_rating(x, m, C):
    v = x['vote_count']
    R = x['vote_average']
    
    return (v/(v+m) * R) + (m/(m+v)*C)

In [42]:
q_movies['score'] = q_movies.apply(weighted_rating, args=(m, C), axis=1)

Finally, let's sort the DataFrame in descending order based on the score feature column and output the title, vote count, vote average, and weighted rating (score) of the top 20 movies.

In [45]:
#Sort movies based on score calculated above
q_movies = q_movies.sort_values('score', ascending=False)

#Print the top 15 movies
q_movies[['title', 'vote_count', 'vote_average', 'score']].head(20)

Unnamed: 0,title,vote_count,vote_average,score
314,The Shawshank Redemption,8358.0,8.5,8.445869
834,The Godfather,6024.0,8.5,8.425439
10309,Dilwale Dulhania Le Jayenge,661.0,9.1,8.421453
12481,The Dark Knight,12269.0,8.3,8.265477
2843,Fight Club,9678.0,8.3,8.256385
292,Pulp Fiction,8670.0,8.3,8.251406
522,Schindler's List,4436.0,8.3,8.206639
23673,Whiplash,4376.0,8.3,8.205404
5481,Spirited Away,3968.0,8.3,8.196055
2211,Life Is Beautiful,3643.0,8.3,8.187171
