# Top N recommendations

Before we can optimize our recommendations with our diversity method, we need a list of pre-selected recommendations. We chose to generate 100 movie recommendations using the SVD algorithm.

In this notebook, the recommendations are created and then stored in a csv-file. (There are two options shown, either with all recommendations stored as a list in one column, or every recommendation stored in a separate column.)

## Import Packages

In [8]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

### Surprise Library

In [9]:
from surprise import Dataset
from surprise import Reader

from surprise import SVD
from surprise import accuracy
from surprise import KNNBaseline

from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split
from surprise.model_selection import GridSearchCV

from surprise import accuracy

RSEED = 42

#### Import Data

In [10]:
movies = pd.read_csv('../data/ml-latest-small/movies.csv')
ratings = pd.read_csv('../data/ml-latest-small/ratings.csv')

In [11]:
ratings['rating'].describe()
ratings.movieId.nunique()

9724

In [12]:
df = pd.read_csv('../data/df_features.csv')
movieIds = df.movieId.to_list()

len(movieIds)

9543

In [13]:
ratings = ratings[ratings['movieId'].isin(movieIds)]
ratings.movieId.nunique()

9525

#### Define Reader &
#### Load the data frame into data (here: userId, movieId and rating column)

In [16]:
reader = Reader(rating_scale=(0.5,5))
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

#### Function for Top N Recommendations

In [17]:
# ORIGINAL
# from the surprise documentation
# with the extension of a recommendation dictionary

from collections import defaultdict


def get_top_n(predictions, n=10):
    """Return the top-N recommendation for each user from a set of predictions.

    Args:
        predictions(list of Prediction objects): The list of predictions, as
            returned by the test method of an algorithm.
        n(int): The number of recommendation to output for each user. Default
            is 10.

    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of size n.
    """

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est)) # append number of ratings

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

## Recommendations with SVD (Single Value Decomposition)

In [18]:
# First train an SVD algorithm on the movielens dataset.
trainset = data.build_full_trainset()

algo = SVD()

algo.fit(trainset)

# Than predict ratings for all pairs (u, i) that are NOT in the training set.
testset = trainset.build_anti_testset()
predictions = algo.test(testset)

# generate the top 100 recommendations
top_n = get_top_n(predictions, n=100)

In 'top_n' you get a dictionary for each user:
key: user ID
value: List of tuples (movieId, predicted rating), sorted by predicted rating

In [19]:
# for example
top_n[1][0] # user 1 recommendation 1 (movieId, prediction)

(318, 5)

In [20]:
# if you want to look at the titles of the recommended movies
def get_recomm_for_user(user):
    for i,j in enumerate(top_n[user]):
        movie = movies[movies['movieId']==j[0]].title
        #print('{}. Movie: {}'.format(i+1, movie.iloc[0]))

get_recomm_for_user(1)

# Save recommendations as csv file:

Make to dictionaries, e.g.:
1. user : [1st recommendation, 2nd, 3rd, ...]
2. user : [[1st, 2nd, 3rd], ...]

In [21]:
recommendations1 = {}
for i in range(1, 611) :
    l = []
    for j in range(100) :
        reco = top_n[i][j][0]
        l.append(str(reco))
    recommendations1[i] = l

In [22]:
recommendations2 = {}
for i in range(1, 611) :
    l = []
    for j in range(100) :
        reco = top_n[i][j][0]
        l.append(str(reco))
    recommendations2[i] = [l]

Make dataframes:

In [23]:
top_recomm1 = pd.DataFrame.from_dict(recommendations1, orient='index')

In [24]:
top_recomm2 = pd.DataFrame.from_dict(recommendations2, orient='index')

The first one has got the userId as index and then every movie recommendation as one column:

In [25]:
top_recomm1.head(2)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99
1,318,48516,58559,914,1272,3949,215,898,904,912,...,6787,3347,1235,44555,994,4973,44195,89904,1199,3681
2,1276,7153,912,356,5618,260,3836,3435,5952,898,...,6993,3147,1266,5013,1212,2160,1242,1641,48774,5902


The second one has got a list with all recommendations in one column:

In [26]:
top_recomm2.head(2)

Unnamed: 0,0
1,"[318, 48516, 58559, 914, 1272, 3949, 215, 898,..."
2,"[1276, 7153, 912, 356, 5618, 260, 3836, 3435, ..."


Export as csv files:

recommendations1:
+ index - userId - every recommendation in a separate column:

In [27]:
# export new data to csv. file
top_recomm1.to_csv('../data/recommendations1.csv', index_label='userId')

recommendations2:
+ index - userId - one column with a list of all recommendations:

In [28]:
top_recomm2.to_csv('../data/recommendations2.csv',index=True, index_label='userId', header=['recommendations'])

## Extra
### Some in depth look at the data:

In [29]:
# get all ratings for a certain user:

user_1 = ratings.query('userId == 1')
user_1.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [30]:
# put all the recommended movies for that user in a list:

recomm_movies = []
for i in top_n[1] :
    recomm_movies.append(i[0])

#recomm_movies

In [31]:
# check if all recommended movies are new to the user:

user_1[user_1['movieId'].isin(recomm_movies)]

Unnamed: 0,userId,movieId,rating,timestamp


### If the output above shows no rows then every recommended movie is new to the user!

In [32]:
# get the count of ratings for every recommended movie
for i in recomm_movies :
    num = movie_rat_count[movie_rat_count['movieId'] == i].reset_index()
    num = num.loc[0, 'rating']
    #print(i, num) # activate print statement to see output

### You can see how many ratings the recommended movies got in total.

### Below, you can set a threshold (here: 10) to see how many movies have got a number of ratings below or above the cut off:

In [33]:
# put all the recommended movies for that user in a list:

recomm_movies = []
for i in top_n[414] :
    recomm_movies.append(i[0])

#recomm_movies

In [34]:
below = []
above = []

for i in recomm_movies :
    num = movie_rat_count[movie_rat_count['movieId'] == i].reset_index()
    num = num.loc[0, 'rating']
    if num <= 10 :
        below.append((i, num))
        #print(num)
    else :
        above.append((i, num))

print('10 or less ratings: ', len(below))
print('More than 10 ratings: ', len(above))


10 or less ratings:  41
More than 10 ratings:  59
