# Simple Recommender
<p>Simple recommenders are basic systems that recommends the top items based on a certain metric or score. We will build a simplified clone of IMDB Top 250 Movies using <a href="https://www.kaggle.com/rounakbanik/the-movies-dataset/version/7"> metadata</a> collected from IMDB.</p>
<p>we will:
<ul>
    <li>Decide on the metric or score to rate movies on.</li>
    <li>Calculate the score for every movie.</li>
    <li>Sort the movies based on the score and output the top results.</li>
</ul>
</p>

<h5>Loading the data.</h5>

In [1]:
import pandas as pd



In [5]:
# load the data.
metadata = pd.read_csv('movies_metadata.csv', low_memory=False)

In [6]:
# peek at the data

metadata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45466 entries, 0 to 45465
Data columns (total 24 columns):
adult                    45466 non-null object
belongs_to_collection    4494 non-null object
budget                   45466 non-null object
genres                   45466 non-null object
homepage                 7782 non-null object
id                       45466 non-null object
imdb_id                  45449 non-null object
original_language        45455 non-null object
original_title           45466 non-null object
overview                 44512 non-null object
popularity               45461 non-null object
poster_path              45080 non-null object
production_companies     45463 non-null object
production_countries     45463 non-null object
release_date             45379 non-null object
revenue                  45460 non-null float64
runtime                  45203 non-null float64
spoken_languages         45460 non-null object
status                   45379 non-null objec

In [7]:
metadata.head(2)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0


<p>The metric we will be using for this system is a weighted rating system. So as to have an equal playing field for the movies. <br/>
    Weighted Rating (WR) = ((v/(v+m)).R)+((m/(v+m)).C)
<br/>
    v is the number of votes for the movie;
    m is the minimum votes required to be listed in the chart;
    R is the average rating of the movie; And
    C is the mean vote across the whole report
</p>
<p>From our data we will use vote count as v, R as vote_average. we can also get our C from the data. There is no right value for m. You can view it as a preliminary negative filter that ignores movies which have less than a certain number of votes. Our m for a movie to feature in the charts,it must have more votes than at least 90% of the movies in the list.</p>

In [9]:
C = metadata['vote_average'].mean()
m = metadata['vote_count'].quantile(0.90)

print("C=",C)
print("m=",m)

C= 5.618207215134185
m= 160.0


<p>Not all our entries meet the criteria so we'll need to create a dataset with only the entries that have more votes than 160.</p>

In [16]:
qualified_movies = metadata.copy().loc[metadata['vote_count'] >= m]
qualified_movies.shape


(4555, 24)

<h5>Computing the weighted sum</h5>

In [18]:
def weighted_rating(x, m=m, C=C):
    v = x['vote_count']
    R = x['vote_average']
    
    return ((v/(v+m)) * R + (m/(v+m)) * C)

In [20]:
qualified_movies['score'] = qualified_movies.apply(weighted_rating, axis=1)
qualified_movies.head()


Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,score
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0,7.640253
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,6.820293
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0,5.6607
5,False,,60000000,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",,949,tt0113277,en,Heat,"Obsessive master thief, Neil McCauley leads a ...",...,187436818.0,170.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,A Los Angeles Crime Saga,Heat,False,7.7,1886.0,7.537201
8,False,,35000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",,9091,tt0114576,en,Sudden Death,International action superstar Jean Claude Van...,...,64350171.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Terror goes into overtime.,Sudden Death,False,5.5,174.0,5.556626


<p>With our metric we can now recommend the first 20 top movies from imdb.</p>

In [23]:
qualified_movies = qualified_movies.sort_values('score',\
                                                ascending=False)

qualified_movies[['title', 'score', 'vote_count', 'vote_average']].head(20)

Unnamed: 0,title,score,vote_count,vote_average
314,The Shawshank Redemption,8.445869,8358.0,8.5
834,The Godfather,8.425439,6024.0,8.5
10309,Dilwale Dulhania Le Jayenge,8.421453,661.0,9.1
12481,The Dark Knight,8.265477,12269.0,8.3
2843,Fight Club,8.256385,9678.0,8.3
292,Pulp Fiction,8.251406,8670.0,8.3
522,Schindler's List,8.206639,4436.0,8.3
23673,Whiplash,8.205404,4376.0,8.3
5481,Spirited Away,8.196055,3968.0,8.3
2211,Life Is Beautiful,8.187171,3643.0,8.3


<h5>Conclusion.</h5>
<p>We have listed the first 20 movies on imdb. When It comes to simple recommender systems the important thing is to get a metric that can be used to rate the movies. This metric must not be biased.</p>