# Making Recommendations Based on Popularity

In [2]:
import numpy as np
import pandas as pd

In [3]:
#import data 
movies = pd.read_csv('data/ml-latest-small/movies.csv')
ratings = pd.read_csv('data/ml-latest-small/ratings.csv')
movies_ratings = movies.merge(ratings)

In [4]:
movies_ratings.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0,964982703
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0,847434962
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,4.5,1106635946
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.5,1510577970
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17,4.5,1305696483


In [9]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [11]:
movies_ratings.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0,964982703
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0,847434962
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,4.5,1106635946
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.5,1510577970
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17,4.5,1305696483


## Popularity/Quality based recommmender system

Let's group movies by rating, and look at their average rating. This is an **explicit** rating given by users.

In [9]:
rating=pd.DataFrame(movies_ratings.groupby('movieId')['rating'].mean())
rating.sort_values('rating',ascending=False).head()

Unnamed: 0_level_0,rating
movieId,Unnamed: 1_level_1
88448,5.0
100556,5.0
143031,5.0
143511,5.0
143559,5.0


The top rated places have a perfect score of 5/5. But how many reviews do these places have?

In [10]:
movies_ratings.query("movieId==88448")

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
93261,88448,Paper Birds (Pájaros de papel) (2010),Comedy|Drama,483,5.0,1315437602


Looks like only 1 people watched  this movie. Maybe they're just the owner's friends! 

We can also look at how many times each movies has received a rating. The ratings count is an **implicit** rating.

In [14]:
rating['rating_count']=movies_ratings.groupby('movieId')['rating'].count()
rating.sort_values('rating_count',ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
356,4.164134,329
318,4.429022,317
296,4.197068,307
593,4.16129,279
2571,4.192446,278


Some movies have been watched around 300 times. They are more popular than the top rated places, but received lower explicit ratings.

Let's locate the most popular movie, and get some info about it:

In [18]:
# movieId  of most popular movie
top_popular_movieId=rating.sort_values('rating_count',ascending=False).head(1).index[0]

# name of the most popular moive
movies_ratings[movies_ratings['movieId']==top_popular_movieId]

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
10019,356,Forrest Gump (1994),Comedy|Drama|Romance|War,1,4.0,964980962
10020,356,Forrest Gump (1994),Comedy|Drama|Romance|War,6,5.0,845553200
10021,356,Forrest Gump (1994),Comedy|Drama|Romance|War,7,5.0,1106635915
10022,356,Forrest Gump (1994),Comedy|Drama|Romance|War,8,3.0,839463527
10023,356,Forrest Gump (1994),Comedy|Drama|Romance|War,10,3.5,1455301685
...,...,...,...,...,...,...
10343,356,Forrest Gump (1994),Comedy|Drama|Romance|War,605,3.0,1277097509
10344,356,Forrest Gump (1994),Comedy|Drama|Romance|War,606,4.0,1171231370
10345,356,Forrest Gump (1994),Comedy|Drama|Romance|War,608,3.0,1117162603
10346,356,Forrest Gump (1994),Comedy|Drama|Romance|War,609,4.0,847220869


In [None]:
The most popular movie is "Forrest Gump (1994)", a Comedy|Drama|Romance|War movie that has received 329 reviews and it has an average score of 4.2.