# Recommendations with MovieTweetings: Most Popular

This notebook demonstrates a simple rank based recommendation application. 

It is based on pre-cleaned MovieTweetings data. Data source: [MovieTweetings Data](https://github.com/sidooms/MovieTweetings/tree/master/recsyschallenge2014)

**Task:** No matter the user, we need to provide a list of the recommendations based on simply the most popular items.

"Most popular" is based on the following criteria:
1. A movie with the highest average rating is considered best
2. With ties, movies that have more ratings are better
3. A movie must have a minimum of 5 ratings to be considered among the best movies
4. If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that has the most recent rating

In the second part of the notbook we will add some **filters**.


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tests as t

%matplotlib inline

# Read in the datasets
movies = pd.read_csv('data/movies_clean.csv')
reviews = pd.read_csv('data/reviews_clean.csv')
del movies['Unnamed: 0']
del reviews['Unnamed: 0']

In [2]:
movies.head(2)

Unnamed: 0,movie_id,movie,genre,date,1800's,1900's,2000's,History,News,Horror,...,Fantasy,Romance,Game-Show,Action,Documentary,Animation,Comedy,Short,Western,Thriller
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0


In [3]:
reviews.head(2)

Unnamed: 0,user_id,movie_id,rating,timestamp,date,month_1,month_2,month_3,month_4,month_5,...,month_9,month_10,month_11,month_12,year_2013,year_2014,year_2015,year_2016,year_2017,year_2018
0,1,68646,10,1381620027,2013-10-12 23:20:27,0,0,0,0,0,...,0,1,0,0,1,0,0,0,0,0
1,1,113277,10,1379466669,2013-09-18 01:11:09,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0


## Find The Most Popular Movies

Provide back the **n_top** recommendations based on the defined criteria for any user. (A user_id is not really relevant yet, but we'll including for reasons of completeness.)

In [4]:
def create_ranked_movies(reviews, movies):
    '''
    Helper function. Merge the two dataframes reviews and movies and 
    ranks the movies according to the following criteria: 
            
    1. Highest average rating
    2. With ties, most ratings
    3. With ties, most recent rating
    4. minimum of 5 ratings to be considered at all

    ARGUMENTS:
        reviews = DataFrame, contains review data
        movies = DataFrame, contains movie data
        
    RETURNS:
        ranked_movies = DataFrame, contains ranked movies
    '''
    
    # group, aggregate and rank reviews dataframe
    grouped = reviews.groupby('movie_id')
    ratings = pd.DataFrame()
    ratings['mean_rating'] = grouped['rating'].agg(np.mean)
    ratings['rating_counts'] = grouped['movie_id'].agg(np.size)
    ratings['last_rating'] = grouped['date'].agg(np.max)
    ratings = ratings.loc[ratings['rating_counts'] >= 5]
    ratings = ratings.sort_values(['mean_rating', 'rating_counts', 'last_rating'], \
                                  ascending=False)
    
    # merge with movies dataframe
    ranked_movies = movies.set_index('movie_id').join(ratings, how= 'right')
    
    return ranked_movies

In [5]:
def make_popular_recommendations(user_id, n_top):
    '''
    Return n_top recommendations for a user according to the 
    following criteria: 
            
    1. Highest average rating
    2. With ties, most ratings
    3. With ties, most recent rating
    4. minimum of 5 ratings to be considered at all
    
    ARGUMENTS:
        user_id: int, user_id of user to get recommendations
        n_top: int, number recommendations to make
    
    RETURNS:
        top_movies: list, recommended movies in order best to worst
    '''
    # call helper function
    ranked_movies = create_ranked_movies(reviews, movies)
    
    top_movies = list(ranked_movies['movie'][:n_top])
    
    return top_movies

In [6]:
make_popular_recommendations(1, 20)

['MSG 2 the Messenger (2015)',
 'Avengers: Age of Ultron Parody (2015)',
 'Sorry to Bother You (2018)',
 'Selam (2013)',
 "Quiet Riot: Well Now You're Here, There's No Way Back (2014)",
 'Crawl Bitch Crawl (2012)',
 'Make Like a Dog (2015)',
 'Pandorica (2016)',
 'Third Contact (2011)',
 'Romeo Juliet (2009)',
 'Be Somebody (2016)',
 'Birlesen Gonuller (2014)',
 'Agnelli (2017)',
 'Sátántangó (1994)',
 'Shijie (2004)',
 'Foster (2011)',
 'CM101MMXI Fundamentals (2013)',
 'Akahige (1965)',
 'Crystal Lake Memories: The Complete History of Friday the 13th (2013)',
 'Körkarlen (1921)']

**Notice:** This wasn't the only way we could have determined the "top rated" movies. You can imagine that in keeping track of trending news or trending social events, you would likely want to create a time window from the current time, and then pull the articles in the most recent time frame. 

## Part II: Add Filters

To make the recommendations a bit more robust. , we will add arguments that will act as filters for the movie **year** and **genre**.

In [7]:
def make_popular_recommendations_with_filters(user_id, n_top, years=[], genres=[]):
    '''
    Return n_top recommendations for a user according to the 
    following criteria: 
            
    1. Highest average rating
    2. With ties, most ratings
    3. With ties, most recent rating
    4. minimum of 5 ratings to be considered at all
    5. filtered by years and genres if provided
    
    ARGUMENTS:
        user_id: int, user_id of user to get recommendations
        n_top: int, number recommendations to make
        years: list of str, filter by years
        genres: list of str, filter by genres
    
    RETURNS:
        top_movies: list, recommended movies in order best to worst
    '''
    # call helper function
    ranked_movies = create_ranked_movies(reviews, movies)
    
    # apply filters
    if len(years) > 0:
        ranked_movies = ranked_movies.loc[ranked_movies['date'].isin(years)]
    if len(genres) > 0:
        ranked_movies = ranked_movies.loc[ranked_movies[genres].sum(axis=1) > 0]
                                  
    top_movies = list(ranked_movies['movie'][:n_top])
    
    return top_movies

In [8]:
make_popular_recommendations_with_filters(1, 20, years=['2015', '2016', '2017', '2018'], genres=['History'])

['Ayla: The Daughter of War (2017)',
 'I Believe in Miracles (2015)',
 'The Farthest (2017)',
 'Sado (2015)',
 'Hatred (2016)',
 'Kincsem (2017)',
 'Nise - O Coração da Loucura (2015)',
 'LA 92 (2017)',
 'Straight Outta Compton (2015)',
 'Manjhi: The Mountain Man (2015)',
 'Only the Dead (2015)',
 'Spotlight (2015)',
 'Under sandet (2015)',
 'Airlift (2016)',
 'Dunkirk (2017)',
 'Taeksi Woonjunsa (2017)',
 'The Battleship Island (2017)',
 'Darkest Hour (2017)',
 'Best of Enemies (2015)',
 'The Ghazi Attack (2017)']

---