# Recommend the most popular movies
This simple recommendation system offers generalised recommendations to every user, based on movie popularity and/or genre. The basic idea behind this system is that movies that are more popular and critically acclaimed will have a higher probability of being liked by the average audience.

In [70]:
#load python packages
import os
import pandas as pd
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

## Load the data

In [109]:
ratings = pd.read_csv('data\\ratings_featured.csv')
users = pd.read_csv('data\\users_featured.csv')
movies = pd.read_csv('data\\movies_featured.csv') 

In [110]:
movies.shape

(3883, 21)

In [111]:
movies.head()

Unnamed: 0,movie_id,title,genres,War,Comedy,Musical,Thriller,Sci-Fi,Fantasy,Drama,...,Animation,Mystery,Western,Crime,Adventure,Documentary,Film-Noir,Horror,Romance,Children's
0,1,Toy Story (1995),Animation|Children's|Comedy,0,1,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,1
1,2,Jumanji (1995),Adventure|Children's|Fantasy,0,0,0,0,0,1,0,...,0,0,0,0,1,0,0,0,0,1
2,3,Grumpier Old Men (1995),Comedy|Romance,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
3,4,Waiting to Exhale (1995),Comedy|Drama,0,1,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
4,5,Father of the Bride Part II (1995),Comedy,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [112]:
ratings.head()

Unnamed: 0,user_id,movie_id,rating,datetime
0,1,1193,5,2000-12-31 22:12:40
1,1,661,3,2000-12-31 22:35:09
2,1,914,3,2000-12-31 22:32:48
3,1,3408,4,2000-12-31 22:04:35
4,1,2355,5,2001-01-06 23:38:11


## Making recommendations based on weighted rating metric

The weighted rating metric is,
#### WR = ( ( v / (v + m)) R) + ( ( m / (v + m)) C)
where,

- v is the number of votes for the movie
- m is the minimum votes required to be listed in the chart
- R is the average rating of the movie
- C is the mean vote across the whole report

In [113]:
C = ratings['rating'].mean()
C

3.581564453029317

In [114]:
# Choosing only the top 10% popular movies
rating_count = ratings.groupby(by='movie_id').count()[['rating']].rename(columns={"rating": "count"})
m = rating_count.quantile(0.90)[0]

In [115]:
# Add the count column into the movies dataframe
movies = pd.merge(movies, rating_count, on='movie_id')

In [116]:
# Calculate the avg ratings for each movie and name it avg_rating
avg_rating = ratings.groupby(by='movie_id')[['rating']].mean().rename(columns={"rating": "avg_rating"})

In [117]:
# Join the avg_rating into the movies dataframe
movies = pd.merge(movies, avg_rating, on='movie_id')

In [118]:
movies.head()

Unnamed: 0,movie_id,title,genres,War,Comedy,Musical,Thriller,Sci-Fi,Fantasy,Drama,...,Western,Crime,Adventure,Documentary,Film-Noir,Horror,Romance,Children's,count,avg_rating
0,1,Toy Story (1995),Animation|Children's|Comedy,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,1,2077,4.146846
1,2,Jumanji (1995),Adventure|Children's|Fantasy,0,0,0,0,0,1,0,...,0,0,1,0,0,0,0,1,701,3.201141
2,3,Grumpier Old Men (1995),Comedy|Romance,0,1,0,0,0,0,0,...,0,0,0,0,0,0,1,0,478,3.016736
3,4,Waiting to Exhale (1995),Comedy|Drama,0,1,0,0,0,0,1,...,0,0,0,0,0,0,0,0,170,2.729412
4,5,Father of the Bride Part II (1995),Comedy,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,296,3.006757


In [119]:
# Filter all qualified movies and make another dataset out of them

q_movies = movies.copy().loc[movies['count'] >= m]

# Only 10% of the movies remain
q_movies.shape

(371, 23)

In [120]:
# Function that computes the weightted rating for each entry

def weighted_rating(df, m = m, C = C):
    v = df['count']
    R = df['avg_rating']
  
    score = (v / (v+m) * R) + (m / (m+v) * C)
    return score


In [121]:
# Applying the function to every row
q_movies['score'] = q_movies.apply(weighted_rating, axis = 1)  

In [122]:
q_movies = q_movies.sort_values('score', ascending = False)

In [123]:
# Printing the details of top 15 movies
q_movies[['title', 'genres', 'avg_rating', 'score']].head(20)

Unnamed: 0,title,genres,avg_rating,score
309,"Shawshank Redemption, The (1994)",Drama,4.554558,4.314477
802,"Godfather, The (1972)",Action|Crime|Drama,4.524966,4.291872
513,Schindler's List (1993),Drama|War,4.510417,4.287045
253,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,4.453694,4.282691
1108,Raiders of the Lost Ark (1981),Action|Adventure,4.477725,4.276168
49,"Usual Suspects, The (1995)",Crime|Thriller,4.517106,4.245473
2557,"Sixth Sense, The (1999)",Thriller,4.406263,4.217579
2651,American Beauty (1999),Comedy|Drama,4.317386,4.188275
579,"Silence of the Lambs, The (1991)",Drama|Thriller,4.351823,4.181935
1848,Saving Private Ryan (1998),Action|Drama|War,4.337354,4.174354


The list above contains the most popular movies on the website. movies in this list could be safe to recommended to new users where we do not know their preference and we have no data of them. 