# IMDB like movie recommender

In [1]:
import pandas as pd
import numpy as np

In [2]:
df=pd.read_csv('./movies_metadata.csv')

  interactivity=interactivity, compiler=compiler, result=result)


## Steps to Build a Recommender System
- #### Choose a metric to rate the movies on
- #### Decide on the prerequisites for the movie to be featured on the chart
- #### Calculate the score for every movie that satisfies the condition
- #### Output the list of movies in decreasing order of their scores

## Metric
The metric is a numerical measure on the basis of which we can rank movies. A movie is considered to be better than another movie if it has a higher metric score than the other movies. It is very important that we have a robust and a reliable metric to build our chart upon to ensure a good quality of recommendation

One of the simplest metrics that can be used is the movie ratings. But, it suffers from many disadvantages. 
- The movie rating does'nt take into consideration, the popularity of the movie
> Therefore, a movie rated  9 by 1000000 usrs will be placed below a movie rated 9.5 by 100 users. This is not desirable as it is highly likely that a movie watched and rated by only 100 people caters to a very specific niche.
> It is also a well known fact that as the number of voters increases the ratings for a movie normalizes and reaches a value that is reflective of the movie's quality and popularity with general populace.

- Therefore, we need a metric that can, to some extent, take into consideration the movie rating and the number of votes it has garnered. 

- We shall use IMDB's weighted rating formula as our metric. 

$$ Value= \frac{v * R}{v+m} + \frac{m * C}{v+m} $$

Where 
- v is the number of votes garnered by the movie
- m is the minimum number of votes required for the movie to be in the chart (popularity in this case)
- R is the mean rating of the movie
- C is the mean rating of all the movies in the dataset

We have all the pieces of the puzzle now. v- included in the dataset. m- We can experiment with different values here I am going to choose the 80th percentile. Which basically means that my movie should have garnered more votes than 80% of the movies present in the dataset. 

In [34]:
m=df['vote_count'].quantile(0.98)

In [35]:
m

1236.8199999999997

From the above rating we can see that only 20% of all movies have gained more than 50 votes. Therefore, our value of m is 50. Another prerequisite that we want in place is the runtime. We will only consider movies that are greater than 45 minutes and less than 300 minutes in length. 

In [36]:
q_movies=df[(df['runtime']>=45) & (df['runtime']<=300) ]

In [37]:
len(q_movies)

41930

Now, lets find the value of c. That is the mean rating for all the movies in the dataset. 

In [38]:
C=df['vote_average'].mean()

In [39]:
C

5.618207215133889

We can see that the average rating for all movie in this dataset is 5.6

Now, lets go ahead and define our method that calculates the score for us , given the values of v, R, m and C

In [40]:
def weighted_rating(x, m=m, C=C):
    v=x['vote_count']
    R=x['vote_average']
    return (v/(v+m) * R)+(m/(m+v)*C)

Now, we can go ahead and apply this method to our dataset

In [41]:
q_movies['score']=q_movies.apply(weighted_rating, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [44]:
len(q_movies)

41930

In [47]:
q_movies.sort_values('score', ascending=False)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,score
314,False,,25000000,"[{'id': 18, 'name': 'Drama'}, {'id': 80, 'name...",,278,tt0111161,en,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,...,2.834147e+07,142.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Fear can hold you prisoner. Hope can set you f...,The Shawshank Redemption,False,8.5,8358.0,8.128523
12481,False,"{'id': 263, 'name': 'The Dark Knight Collectio...",185000000,"[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name...",http://thedarkknight.warnerbros.com/dvdsite/,155,tt0468569,en,The Dark Knight,Batman raises the stakes in his war on crime. ...,...,1.004558e+09,152.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Why So Serious?,The Dark Knight,False,8.3,12269.0,8.054410
834,False,"{'id': 230, 'name': 'The Godfather Collection'...",6000000,"[{'id': 18, 'name': 'Drama'}, {'id': 80, 'name...",http://www.thegodfather.com/,238,tt0068646,en,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",...,2.450664e+08,175.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,An offer you can't refuse.,The Godfather,False,8.5,6024.0,8.009111
2843,False,,63000000,"[{'id': 18, 'name': 'Drama'}]",http://www.foxmovies.com/movies/fight-club,550,tt0137523,en,Fight Club,A ticking-time-bomb insomniac and a slippery s...,...,1.008538e+08,139.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Mischief. Mayhem. Soap.,Fight Club,False,8.3,9678.0,7.996111
292,False,,8000000,"[{'id': 53, 'name': 'Thriller'}, {'id': 80, 'n...",,680,tt0110912,en,Pulp Fiction,"A burger-loving hit man, his philosophical par...",...,2.139288e+08,154.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Just because you are a character doesn't mean ...,Pulp Fiction,False,8.3,8670.0,7.965191
15480,False,,160000000,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",http://inceptionmovie.warnerbros.com/,27205,tt1375666,en,Inception,"Cobb, a skilled thief who commits corporate es...",...,8.255328e+08,148.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Your mind is the scene of the crime.,Inception,False,8.1,14075.0,7.899532
351,False,,55000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,13,tt0109830,en,Forrest Gump,A man with a low IQ has accomplished great thi...,...,6.779454e+08,142.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,"The world will never be the same, once you've ...",Forrest Gump,False,8.2,8147.0,7.859711
22879,False,,165000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 18, '...",http://www.interstellarmovie.net/,157336,tt0816692,en,Interstellar,Interstellar chronicles the adventures of a gr...,...,6.751200e+08,169.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Mankind was born on Earth. It was never meant ...,Interstellar,False,8.1,11187.0,7.852932
7000,False,"{'id': 119, 'name': 'The Lord of the Rings Col...",94000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",http://www.lordoftherings.net,122,tt0167260,en,The Lord of the Rings: The Return of the King,Aragorn is revealed as the heir to the ancient...,...,1.118889e+09,201.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,The eye of the enemy is moving.,The Lord of the Rings: The Return of the King,False,8.1,8226.0,7.775622
1154,False,"{'id': 10, 'name': 'Star Wars Collection', 'po...",18000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...",http://www.starwars.com/films/star-wars-episod...,1891,tt0080684,en,The Empire Strikes Back,"The epic saga continues as Luke Skywalker, in ...",...,5.384000e+08,124.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,The Adventure Continues...,The Empire Strikes Back,False,8.2,5998.0,7.758633
