Student: Davide Sbetti - 14032

# Graphlab

First, let's import the graphlab library and pandas, used to export the final predictions

In [35]:
import graphlab as gl
import pandas as pd

We now read the movie dataset, using graphlab provided functions to avoid later conversions between different type of objects

In [36]:
movies = gl.SFrame.read_csv('data/train-PDA2019.csv')
print(movies.head())

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,int,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


+--------+--------+--------+-----------+
| userID | itemID | rating | timeStamp |
+--------+--------+--------+-----------+
|   5    |  648   |   5    | 978297876 |
|   5    |  1394  |   5    | 978298237 |
|   5    |  3534  |   5    | 978297149 |
|   5    |  104   |   4    | 978298558 |
|   5    |  2735  |   5    | 978297919 |
|   5    |  3868  |   3    | 978298561 |
|   5    |  1079  |   5    | 978298384 |
|   5    |  2997  |   3    | 978298214 |
|   5    |  1615  |   5    | 978297755 |
|   5    |  1291  |   4    | 978297692 |
+--------+--------+--------+-----------+
[10 rows x 4 columns]



We now read the users test file, in order to get the list of users we are interested into and that we will use for the predictions

In [38]:
users_test = pd.read_csv("data/test-PDA2019.csv")
users_test.head()

Unnamed: 0,userID,recommended_itemIDs
0,1,
1,3,
2,11,
3,29,
4,31,


We now extract the column with the user IDs, in order to easily access them

In [39]:
users = users_test.loc[:,'userID']
users.head()

0     1
1     3
2    11
3    29
4    31
Name: userID, dtype: int64

We now build the recommender system using the standard general function, which automatically identifies an appropriate model, based on the training data, and trains it

In [37]:
model = gl.recommender.create(movies, 'userID', 'itemID', 'rating')

We can now predict the rating for the movies that were not seen by the users. By default, Graphlab already returns the top 10 movies, ordered by their rating in decreasing order. We append the resulting formatted string to the pandas data frame, so that we can export the final predictions easily at the end

In [40]:
for j in range(0,len(users)):
    user = users[j]
    user_predictions = model.recommend([user])
    rec_string = " ".join(str(item) for item in user_predictions['itemID'])
    users_test.loc[j,'recommended_itemIDs'] = " " + rec_string

Having all predictions, we can now use pandas to export the complete data frame into a cvs file that can be submitted to the Kaggle platform for the evaluation

In [41]:
users_test.to_csv(path_or_buf = 'generated/graphlab_recommendations.csv', index = False, header = True, sep = ',')
print(users_test.head())

   userID                             recommended_itemIDs
0       1   318 260 1198 858 2762 2858 2028 1196 593 1197
1       3    318 260 1198 858 2762 2858 2028 593 1197 608
2      11   318 260 1198 858 2762 2858 1196 2028 593 1197
3      29   260 1198 858 2762 2858 1196 2028 593 1197 608
4      31   318 260 1198 858 2762 2858 1196 2028 593 1197
