### Recommendations with MovieTweetings: Most Popular Recommendation

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# Read in the datasets
movies = pd.read_csv('data/movies_clean.csv')
reviews = pd.read_csv('data/reviews_clean.csv')

#### 1. How To Find The Most Popular Movies

For this notebook, we have a single task.  The task is that no matter the user, we need to provide a list of the recommendations based on simply the most popular items.

For this task, we will consider what is "most popular" based on the following criteria:

* A movie with the highest average rating is considered best
* With ties, movies that have more ratings are better
* A movie must have a minimum of 5 ratings to be considered among the best movies
* If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating

With these criteria, the goal for this notebook is to take a **user_id** and provide back the **n_top** recommendations.  Use the function below as the scaffolding that will be used for all the future recommendations as well.

In [22]:
def create_ranked_df(movies, reviews):
    '''
    INPUT
    movies - the movies dataframe
    reviews - the reviews dataframe

    OUTPUT
    ranked_movies - a dataframe with movies that are sorted by highest avg rating, more reviews, 
                    then time, and must have more than 4 ratings
    '''

    # Number of ratings for each movie
    no_of_ratings = reviews.groupby('movie_id')['rating'].count()
    no_of_ratings.rename("no_of_ratings", inplace=True)

    # Average rating for each movie
    mean_rating = reviews.groupby('movie_id')['rating'].mean()
    mean_rating.rename("mean_rating", inplace=True)

    # Last reviewed date for each movie
    last_review_date = reviews.groupby('movie_id')['date'].max()
    last_review_date.rename("last_review_date", inplace=True)

    # Put these Series' into a dataframe
    ranked_movies = pd.concat(
        [no_of_ratings, mean_rating, last_review_date], axis=1)

    # Filter out movies with less than 5 reviews
    ranked_movies = ranked_movies[ranked_movies['no_of_ratings'] > 4]

    # Sort movies
    ranked_movies.sort_values(
        ['mean_rating', 'no_of_ratings', 'last_review_date'], ascending=False, inplace=True)

    # Reset index
    ranked_movies.reset_index(inplace=True)

    # Merge with movies data
    ranked_movies = ranked_movies.merge(movies, how='left', on='movie_id')

    return ranked_movies

In [21]:
def popular_recommendations(user_id, n_top, ranked_movies):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    top_movies = list(ranked_movies['movie'][:n_top])

    return top_movies  # a list of the n_top movies as recommended

Usint the three criteria above, you should be able to put together the above function.  If you feel confident in your solution, check the results of your function against our solution. On the next page, you can see a walkthrough and you can of course get the solution by looking at the solution notebook available in this workspace.  

In [43]:
# Create ranked movies dataframe
ranked_movies = create_ranked_df(movies, reviews)

In [26]:
# Top 20 movies recommended for id 1
recs_20_for_1 = popular_recommendations('1', 20, ranked_movies)
recs_20_for_1

['MSG 2 the Messenger (2015)',
 'Avengers: Age of Ultron Parody (2015)',
 'Five Minutes (2017)',
 'Selam (2013)',
 'Let There Be Light (2017)',
 "Quiet Riot: Well Now You're Here, There's No Way Back (2014)",
 'Crawl Bitch Crawl (2012)',
 'Chasing Happiness (2019)',
 'Make Like a Dog (2015)',
 'Pandorica (2016)',
 'Third Contact (2011)',
 'Romeo Juliet (2009)',
 'Be Somebody (2016)',
 'Birlesen Gonuller (2014)',
 'Kitbull (2019)',
 'Agnelli (2017)',
 'Sátántangó (1994)',
 'Foster (2011)',
 'CM101MMXI Fundamentals (2013)',
 'Crystal Lake Memories: The Complete History of Friday the 13th (2013)']

**Notice:** This wasn't the only way we could have determined the "top rated" movies.  You can imagine that in keeping track of trending news or trending social events, you would likely want to create a time window from the current time, and then pull the articles in the most recent time frame.  There are always going to be some subjective decisions to be made.  

If you find that no one is paying any attention to your most popular recommendations, then it might be time to find a new way to recommend, which is what the next parts of the lesson should prepare us to do!


### Part II: Adding Filters

Now that you have created a function to give back the **n_top** movies, let's make it a bit more robust.  Add arguments that will act as filters for the movie **year** and **genre**.  

Use the cells below to adjust your existing function to allow for **year** and **genre** arguments as **lists** of **strings**.  Then your ending results are filtered to only movies within the lists of provided years and genres (as `or` conditions).  If no list is provided, there should be no filter applied.

You can adjust other necessary inputs as necessary to retrieve the final results you are looking for!

Try writing a few tests against the test function in our test function.  Below returns the top 20 movies for user 1 based on the specified year and genre filters.  Does yours return the same? 

```
t.popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History'])
```

In [55]:
def filtered_recommendations(user_id, n_top, ranked_movies, years=None, genres=None):
    '''
    INPUT:
    user_id - the user_id (str) of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    ranked_movies - a pandas dataframe of the already ranked movies based on avg rating, count, and time
    years - a list of strings with years of movies
    genres - a list of strings with genres of movies

    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''

    if years:
        ranked_movies = ranked_movies[ranked_movies['date'].isin(years)]

    if genres:
        ranked_movies = ranked_movies[ranked_movies[genres].sum(axis=1) > 0]

    top_movies = list(ranked_movies['movie'][:n_top])

    return top_movies  # a list of the n_top movies as recommended

In [56]:
filtered_recommendations('1', 20, ranked_movies, years=[
                         2015, 2016, 2017, 2018], genres=['History'])

["Hillary's America: The Secret History of the Democratic Party (2016)",
 'I Believe in Miracles (2015)',
 'O.J.: Made in America (2016)',
 'Ayla: The Daughter of War (2017)',
 'Hacksaw Ridge (2016)',
 'They Shall Not Grow Old (2018)',
 'Namhansanseong (2017)',
 'The Farthest (2017)',
 'Kono sekai no katasumi ni (2016)',
 'Sado (2015)',
 'Silicon Cowboys (2016)',
 '13th (2016)',
 'Ethel &amp; Ernest (2016)',
 'Paul, Apostle of Christ (2018)',
 'Kincsem (2017)',
 'LA 92 (2017)',
 'Straight Outta Compton (2015)',
 'Nise - O Coração da Loucura (2015)',
 'Under sandet (2015)',
 'Only the Dead (2015)']