# <span style="color:red"> <strong> Popularity-based Recommender System  </strong>

Popularity-based recommendation system offers generalized recommendations to every user, based on movie popularity and/or genre. The basic idea behind this system is that movies that are most popular and critically acclaimed will have a higher probability of being liked by the average audience. An example could be `IMDB Top 250`.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import math
%matplotlib inline

### Load Dataset

In [9]:
movie_df  = pd.read_csv("../data/movie.csv")
rating_df = pd.read_csv("../data/rating.csv")
movie_rating = movie_df.merge(rating_df, how="inner", on="movieId")
movie_rating["timestamp"] = pd.DatetimeIndex(movie_rating["timestamp"])
movie_rating["year"] = movie_rating["timestamp"].dt.year

In [16]:
movie_rating

Unnamed: 0,movieId,title,genres,userId,rating,timestamp,year
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3,4.0,1999-12-11 13:36:47,1999
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6,5.0,1997-03-13 17:50:52,1997
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8,4.0,1996-06-05 13:37:51,1996
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,10,4.0,1999-11-25 02:44:47,1999
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11,4.5,2009-01-02 01:13:41,2009
...,...,...,...,...,...,...,...
20000258,131254,Kein Bund für's Leben (2007),Comedy,79570,4.0,2015-03-30 19:32:59,2015
20000259,131256,"Feuer, Eis & Dosenbier (2002)",Comedy,79570,4.0,2015-03-30 19:48:08,2015
20000260,131258,The Pirates (2014),Adventure,28906,2.5,2015-03-30 19:56:32,2015
20000261,131260,Rentun Ruusu (2001),(no genres listed),65409,3.0,2015-03-30 19:57:46,2015


### Popularity-based Recommender

In this part we are going to find the most popualr movies and recommend them to users. This can be useful for newcomers who know nothing about the movies.

Hypothesis:
* Historically popular movies should be recommended to new comers
  
Data splitting strategy:
* Data should be splitted according to `year` if we want to evaluate the effect of recommending highly rated movies to new comers

Evaluation strategy:
* Algorithm will be evaluated based on the overlapping rate between movies watched and movies recommended for each user

In [37]:
k = 50 #number of top movies that should be recommended
cutoff_year = 2015 #cut-off year for evaluation

most_voted = movie_rating[movie_rating["year"] < cutoff_year].groupby("movieId").mean()[["rating"]].sort_values(by="rating", ascending=False).reset_index().head(k)
most_voted = most_voted.merge(movie_df, how="inner", on="movieId")
most_voted = most_voted.rename(columns={"rating" : "Avg_rating"})
most_voted

  most_voted = movie_rating[movie_rating["year"] < cutoff_year].groupby("movieId").mean()[["rating"]].sort_values(by="rating", ascending=False).reset_index().head(k)


Unnamed: 0,movieId,Avg_rating,title,genres
0,103143,5.0,Donos de Portugal (2012),Documentary
1,116261,5.0,Wuthering Heights (2009),Drama
2,105187,5.0,Linotype: The Film (2012),Documentary
3,105135,5.0,"Pit, The (1981)",Horror
4,109931,5.0,Repentance (Monanieba) (1984),Drama
5,116387,5.0,Muddy River (1981),Drama
6,101188,5.0,Central Park (1991),Documentary
7,113860,5.0,"Codes of Gender, The (2010)",Documentary
8,93707,5.0,Prom Queen: The Marc Hall Story (2004),Comedy|Drama
9,100743,5.0,Eye In The Sky (Gun chung) (2007),Crime|Thriller


As is shown in the table, the top 50 rated movies will be recommended to each user in year 2015.

### Evaluation with Precision @k and Recall @k

Now let's evaluate the performance of popularity-based recommender!


`Recall` is the percentage of relevant items selected out of all the relevant items in the repository: 

- <span style="color:yellow"> Recall@k = Relevant Items Recommended in top k / Relevant Items

`Precision` is the percentage of relevant items out of those items selected by the query

- <span style="color:yellow"> Precision@k = Relevant Items Recommended in top k / k items recommended

In [57]:
userIds = list(movie_rating[movie_rating["year"] >= cutoff_year]["userId"].unique())

relevant_items = movie_rating[movie_rating["year"] >= cutoff_year][["userId", "movieId"]]#ground-truth movies
recommended_items = most_voted[["movieId"]]#predicted movies

In [69]:
def precision_at_k_score(relevant, topk):
    num = 0
    for item in relevant:
        if item in topk:
            num += 1
    return num / k

def recall_at_k_score(relevant, topk):
    num = 0
    for item in relevant:
        if item in topk:
            num += 1
    return num / len(relevant)

    
#calculate the average precision@k for all the users
sum_precision = 0
sum_recall = 0
for user in userIds:
    relevant = relevant_items[relevant_items["userId"]==user]["movieId"]
    recommendation = recommended_items["movieId"]
    sum_precision += precision_at_k_score(relevant, recommendation)
    sum_recall += recall_at_k_score(relevant, recommendation)

print("Average Precision@k :{}".format(sum_precision/len(userIds)))
print("Average Recall@k :{}".format(sum_recall/len(userIds)))

Average Precision@k :0.014219830899308497
Average Recall@k :0.0073768100982868085


### Conclusion

Our results show that the `Precision@k` of the popularity-based recommender is 1.4%, which means among top 50 recommended movies, only 2.8 movies were watched by users on average. The `Recall@k` is 0.7%, which means among all the movies each user watched, only 0.7 movies are recommended by the algorithm on average.