Broadly, recommender systems can be classified into 3 types:

- Simple recommenders: offer generalized recommendations to every user, based on movie popularity and/or genre. The basic idea behind this system is that movies that are more popular and critically acclaimed will have a higher probability of being liked by the average audience. IMDB Top 250 is an example of this system.


In [2]:
import pandas as pd
# Load Movies Metadata
data = pd.read_csv('./the-movies-dataset/movies_metadata.csv', low_memory=False)
mdata=data[['title','vote_average','vote_count','overview']]
# Print the first three rows
mdata.head(5)

Unnamed: 0,title,vote_average,vote_count,overview
0,Toy Story,7.7,5415.0,"Led by Woody, Andy's toys live happily in his ..."
1,Jumanji,6.9,2413.0,When siblings Judy and Peter discover an encha...
2,Grumpier Old Men,6.5,92.0,A family wedding reignites the ancient feud be...
3,Waiting to Exhale,6.1,34.0,"Cheated on, mistreated and stepped on, the wom..."
4,Father of the Bride Part II,5.7,173.0,Just when George Banks has recovered from his ...


In [3]:
mdata.describe()

Unnamed: 0,vote_average,vote_count
count,45460.0,45460.0
mean,5.618207,109.897338
std,1.924216,491.310374
min,0.0,0.0
25%,5.0,3.0
50%,6.0,10.0
75%,6.8,34.0
max,10.0,14075.0


In [4]:
mdata.columns

Index(['title', 'vote_average', 'vote_count', 'overview'], dtype='object')

q-quantile is how you are from the minimum and maximum

i+(j-i)$\times$q

where i is minimum of array and j the maximum



In [5]:
# Because we are looking for movies with an important number of vote, so we want at least data 
# where 90 % of votes close from the maximum, if it was 50% it will not be different of the average
min_votes_required=mdata['vote_count'].quantile(0.9)
min_votes_required

160.0

### Next, you can filter the movies that qualify for the chart, based on their vote counts:

In [9]:
# Filter out all qualified movies into a new DataFrame
q_movies = mdata.copy().loc[mdata['vote_count'] >= min_votes_required]
mean_vote=mdata['vote_average'].mean()
q_movies.shape

(4555, 4)

In [10]:
q_movies.head(4)

Unnamed: 0,title,vote_average,vote_count,overview
0,Toy Story,7.7,5415.0,"Led by Woody, Andy's toys live happily in his ..."
1,Jumanji,6.9,2413.0,When siblings Judy and Peter discover an encha...
4,Father of the Bride Part II,5.7,173.0,Just when George Banks has recovered from his ...
5,Heat,7.7,1886.0,"Obsessive master thief, Neil McCauley leads a ..."


In [11]:
# Function that computes the weighted rating of each movie
def weighted_rating(data, m=min_votes_required, C=mean_vote):
    number_of_votes = data['vote_count']
    average_rating = data['vote_average']
    # Calculation based on the IMDB formula
    return (number_of_votes/(number_of_votes+ m) * average_rating) + (m/(m+number_of_votes) * C)

In [12]:
# Define a new feature 'score' and calculate its value with `weighted_rating()`
q_movies['score'] = q_movies.apply(weighted_rating, axis=1)

In [13]:
#Sort movies based on score calculated above
q_movies = q_movies.sort_values('score', ascending=False)

#Print the top 15 movies
q_movies[['title', 'vote_count', 'vote_average', 'score']].head(5)

Unnamed: 0,title,vote_count,vote_average,score
314,The Shawshank Redemption,8358.0,8.5,8.445869
834,The Godfather,6024.0,8.5,8.425439
10309,Dilwale Dulhania Le Jayenge,661.0,9.1,8.421453
12481,The Dark Knight,12269.0,8.3,8.265477
2843,Fight Club,9678.0,8.3,8.256385


In [14]:
#Print plot overviews of the first 5 movies.
data['overview'].head()

0    Led by Woody, Andy's toys live happily in his ...
1    When siblings Judy and Peter discover an encha...
2    A family wedding reignites the ancient feud be...
3    Cheated on, mistreated and stepped on, the wom...
4    Just when George Banks has recovered from his ...
Name: overview, dtype: object