# Collaborative Filtering in Turi (formerly Dato, Formerly GraphLab)

This tutorial explains methods of collaborative filtering for recommender systems using the graphlab create package (from the company Dato). Many of the examples are manipulated versions of the the following basic tutorials:
- https://dato.com/learn/gallery/notebooks/basic_recommender_functionalities.html 
- https://dato.com/learn/gallery/notebooks/five_line_recommender.html

Furthermore, Dato has plenty of iPython notebook examples to look through that do more than just reccomendation systems, including classification, clustering, and graph analytics. 
- https://dato.com/learn/gallery/index.html

## The five line recommendation system (user-item)
This example will build a recommendation system for movie ratings given the following dataset of users and movie ratings. It is explained in detail at https://dato.com/learn/gallery/notebooks/five_line_recommender.html. This example hides much of the functionality and fine tuning possible, but works nicely for starting out with.

The dataset in this example comes from ~330 users that have rated ~7700 movies (a total of ~82,000 ratings).

In [1]:
import time
t1 = time.clock()

In [2]:
# This is a well known graphlab example that builds a recommendation system in 5 lines of code

import graphlab as gl

data = gl.SFrame.read_csv("http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv", 
                          column_type_hints={"rating":int})
model = gl.recommender.create(data,
                              user_id="user",
                              item_id="movie",
                              target="rating")

results = model.recommend(users=None, k=10)
model.save("my_model")

results.head() # the recommendation output


[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1471544590.log


This non-commercial license of GraphLab Create for academic use is assigned to jjtsai@smu.edu and will expire on August 08, 2017.


user,movie,score,rank
Jacob Smith,The Quiet American,4.63581560648,1
Jacob Smith,The King and I,4.55056975878,2
Jacob Smith,Dirty Dancing,4.50702403581,3
Jacob Smith,12 Angry Men,4.36494468248,4
Jacob Smith,The Right Stuff,4.27868865526,5
Jacob Smith,The Natural,4.17370484865,6
Jacob Smith,Roxanne,4.17145703829,7
Jacob Smith,Bridget Jones's Diary,4.12448237932,8
Jacob Smith,To Kill a Mockingbird,4.10783647096,9
Jacob Smith,One Flew Over the Cuckoo's Nest ...,4.1061456064,10


In the above model creation, we have found the top five highest ranking items for each user. Two users are shown with their corresponding highest ranking items (that they have not rated).
___

In [3]:
data.print_rows(num_rows=30)

+-------------+-------------------------------+--------+
|     user    |             movie             | rating |
+-------------+-------------------------------+--------+
| Jacob Smith |     Flirting with Disaster    |   4    |
| Jacob Smith |       Indecent Proposal       |   3    |
| Jacob Smith |         Runaway Bride         |   2    |
| Jacob Smith |     Swiss Family Robinson     |   1    |
| Jacob Smith |          The Mexican          |   2    |
| Jacob Smith |       Maid in Manhattan       |   4    |
| Jacob Smith | A Charlie Brown Thanksgivi... |   3    |
| Jacob Smith |             Brazil            |   1    |
| Jacob Smith |          Forrest Gump         |   3    |
| Jacob Smith |     It Happened One Night     |   4    |
| Jacob Smith |           Airplane!           |   3    |
| Jacob Smith |      The Wedding Planner      |   3    |
| Jacob Smith |     A League of Their Own     |   3    |
| Jacob Smith |           Swordfish           |   1    |
| Jacob Smith | Indiana Jones a

That's great!! But we really do not know how good these results are, so let's keep moving and we will come back, but using cross-validation. 


## The item-item recommendation system
No let's look at creating the item-item similarity matrix. That is, for each item, what are the top closest items based upon user ratings.

In [4]:
# from graphlab.recommender import item_similarity_recommender

item_item = gl.recommender.item_similarity_recommender.create(data, 
                                  user_id="user", 
                                  item_id="movie", 
                                  target="rating",
                                  only_top_k=3,
                                  similarity_type="cosine")

results = item_item.get_similar_items(k=3)
results.head()

movie,similar,score,rank
Flirting with Disaster,Martin Lawrence: You So Crazy ...,0.561863601208,1
Flirting with Disaster,Shadow Magic,0.535303354263,2
Flirting with Disaster,Seinfeld: Season 4,0.507150530815,3
Indecent Proposal,Cocktail,0.568772494793,1
Indecent Proposal,Beverly Hills Cop,0.516246914864,2
Indecent Proposal,Flatliners,0.513955056667,3
Runaway Bride,Notting Hill,0.613413572311,1
Runaway Bride,Sleepless in Seattle,0.60902172327,2
Runaway Bride,Maid in Manhattan,0.608688771725,3
Swiss Family Robinson,Armed and Dangerous,0.483493804932,1


The item-item matrix is typically a good baseline. However, we can do better with a more personalized system. Something that takes into account the various preferences of specific users, rather than all users rating specific items. 
___
Moreover, we need to be performing cross validation of the data set to see what model and model parameters actually generalize well with out dataset. That also means we need a set of evaluation criteria. The first and very common measuer is the root mean squared error, RMSE. It takes into account the difference between the predicted rating and the actual rating of items. However, we can calculate it in a number of different aggregated ways (i.e., splits and aggregation). For instance, we could just take the average RMSE of every entry in the dataset. Or, we could take the average RMSE for each user, or the average RMSE for each item. Ite really depends on what we are most interested in (i.e., out business case). RMSE can be calculated in the following ways:

$$RMSE=\sqrt{\frac{1}{N}\sum_{i=1}^N (\hat{y}_i-y_i)^2}$$

Or we can calculate the RMSE for each user, U, in our data:

$$\underbrace{RMSE(U)}_{\text{user=U}}=\sqrt{\frac{1}{|U|}\sum_{u\in U} (\hat{y}_u-y_u)^2}$$

Or we can calculate the RMSE for each item, J, in our data:

$$\underbrace{RMSE(J)}_{\text{item=J}}=\sqrt{\frac{1}{|J|}\sum_{j\in J} (\hat{y}_j-y_j)^2}$$

It's importatn to understand that RMSE(U) and RMSE(J) are arrays of averages, the size of the unique number of users or unique number of items, respectively. Therefore an approach that visualizes the distribution of values is a nice evaluation technique. It also means that statistical tests of the distributions can be used to evaluate the differences of the models. That is, "Model A has statistically smaller (with 95% confidence) per user RMSE than model B, thereofore we conclude that model A has superior performance."


So let's now create a holdout set and see if we can judge the RMSE on a per-user and per-item basis:

In [5]:
train, test = gl.recommender.util.random_split_by_user(data,
                                                    user_id="user", item_id="movie",
                                                    max_num_users=100, item_test_proportion=0.2)

In [6]:
from IPython.display import display
from IPython.display import Image

gl.canvas.set_target('ipynb')


item_item = gl.recommender.item_similarity_recommender.create(train, 
                                  user_id="user", 
                                  item_id="movie", 
                                  target="rating",
                                  only_top_k=5,
                                  similarity_type="cosine")

rmse_results = item_item.evaluate(test)



Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    |      0.18      | 0.00851461400757 |
|   2    |      0.17      | 0.0114813179455  |
|   3    |      0.19      | 0.0203175491454  |
|   4    |     0.1975     | 0.0275973558931  |
|   5    |     0.194      | 0.0327684686723  |
|   6    | 0.186666666667 | 0.0354018807673  |
|   7    | 0.194285714286 | 0.0450369208063  |
|   8    |    0.19625     | 0.0510507016015  |
|   9    | 0.195555555556 | 0.0550850767787  |
|   10   |     0.186      | 0.0609766860575  |
+--------+----------------+------------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 3.6337609774346276)

Per User RMSE (best)
+------------+-------+---------------+
|    user    | count |      rmse     |
+------------+-------+---------------+
| Zion Smith |  408  | 1.80170298286 |
+------------+-------+---------------+
[1 rows x 

In [7]:
print rmse_results.viewkeys()
print rmse_results['rmse_by_item']

dict_keys(['rmse_by_user', 'precision_recall_overall', 'rmse_by_item', 'precision_recall_by_user', 'rmse_overall'])
+---------------------------+-------+----------------+
|           movie           | count |      rmse      |
+---------------------------+-------+----------------+
|  Crimes and Misdemeanors  |   2   | 3.53526905472  |
|       Donnie Brasco       |   2   | 4.12310562562  |
|    Cast a Giant Shadow    |   1   |      2.0       |
|       Kiss the Girls      |   2   | 4.98431424915  |
|       Raising Helen       |   5   | 3.42696742672  |
|         The Hours         |   9   | 3.75414785635  |
| The Josephine Baker Story |   1   |      3.0       |
|      Step Into Liquid     |   1   |      3.0       |
| The Deep End of the Ocean |   2   |      3.0       |
|        See Spot Run       |   1   | 0.998634418919 |
+---------------------------+-------+----------------+
[2293 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_colum

In [8]:
rmse_results['rmse_by_user']

user,count,rmse
Jeremy Smith,14,3.79007923299
Beckett Smith,60,3.50345788455
Martin Smith,17,3.99576273485
Gunner Smith,18,4.04970247332
Jesse Smith,13,3.68648775734
Miguel Smith,18,3.63792937328
Quinn Smith,24,3.99839927175
Jaden Smith,117,3.94638843264
Landon Smith,25,4.03798296576
Leo Smith,23,3.75143678111


___
Another evaluation criterion is the per-user-recall or the per-user-precision. These are typically smaller values because they require users with a large number of ratings. The idea behind them is that, given a number of highly rated items for a user, how many of them did my model also recommend. This is inherently difficult to calculate because the user has not rated every item in the dataset---we may have found 10 items that the user would have chosen and rated highly, but if the user never rated them, we can't be sure how good we are recommending them. 

Even still, its a good measure of how well you are rating the items that are most important to the user (assuming the user rated items they had strong opinions about). Its not perfect, but its the best we have to work with.

We define the per user measures as follows: Let $p_k$ be a vector of the $k$ highest ranked recommendations for a particular user and let $a$ be the set of all positively ranked items for that user in the test set. 

The per-user-recall for k-items is given by:

$$R(k)=\frac{|a \cap p_k|}{|a|} $$

Which means, intuitively, "of all the items rated positively by the user, how many did your recommender find?"

The per-user-precision for k-items is given by:

$$P(k)=\frac{|a \cap p_k|}{k} $$

Which means, intuitively, "of the k items found by your recommender, how many were rated positively by the user?"

These, like per user RMSE, are arrays the same size as the uniqu number of users in the dataset. Therefore statistical comparisons can be completed to find superior performing models. 

In [9]:
rmse_results['precision_recall_by_user']

user,cutoff,precision,recall,count
Abel Smith,1,0.0,0.0,18
Abel Smith,2,0.0,0.0,18
Abel Smith,3,0.0,0.0,18
Abel Smith,4,0.0,0.0,18
Abel Smith,5,0.2,0.0555555555556,18
Abel Smith,6,0.166666666667,0.0555555555556,18
Abel Smith,7,0.285714285714,0.111111111111,18
Abel Smith,8,0.25,0.111111111111,18
Abel Smith,9,0.222222222222,0.111111111111,18
Abel Smith,10,0.2,0.111111111111,18


In [10]:
import graphlab.aggregate as agg

# we will be using these aggregations
agg_list = [agg.AVG('precision'),agg.STD('precision'),agg.AVG('recall'),agg.STD('recall')]

# apply these functions to each group (we will group the results by 'k' which is the cutoff)
# the cutoff is the number of top items to look for see the following URL for the actual equation
# https://dato.com/products/create/docs/generated/graphlab.recommender.util.precision_recall_by_user.html#graphlab.recommender.util.precision_recall_by_user
rmse_results['precision_recall_by_user'].groupby('cutoff',agg_list)

# the groups are not sorted

cutoff,Avg of precision,Stdv of precision,Avg of recall,Stdv of recall
16,0.176875,0.123445015594,0.0838804019881,0.0939339501043
10,0.186,0.140014284985,0.0609766860575,0.0901642364686
36,0.138611111111,0.0877087457103,0.14469117791,0.133601225073
26,0.153076923077,0.0962276639725,0.114415246807,0.0968612648945
41,0.135365853659,0.0865338218274,0.159647424369,0.136063553312
3,0.19,0.24154594686,0.0203175491454,0.0684283822744
1,0.18,0.384187454246,0.00851461400757,0.0359141305364
6,0.186666666667,0.173653550368,0.0354018807673,0.0710089212206
11,0.187272727273,0.135950890501,0.0647554440593,0.0900496466592
2,0.17,0.284780617318,0.0114813179455,0.0365112180182


Wow... these results appear to be not so great. Let's try something a little different and look to see if the results get better. Let's start with collaborative filtering to create the user-item matrix. 

___
## Cross Validated Collaborative Filtering

In [11]:
rec1 = gl.recommender.ranking_factorization_recommender.create(train, 
                                  user_id="user", 
                                  item_id="movie", 
                                  target="rating")

rmse_results = rec1.evaluate(test)


Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    |      0.13      | 0.00261458161114 |
|   2    |      0.14      | 0.00569137637914 |
|   3    |      0.12      | 0.00717878646319 |
|   4    |      0.11      | 0.00915596239202 |
|   5    |     0.126      | 0.0145845778316  |
|   6    | 0.116666666667 | 0.0160943436254  |
|   7    |      0.11      | 0.0172493274775  |
|   8    |    0.11125     | 0.0195782486182  |
|   9    | 0.106666666667 | 0.0210495681217  |
|   10   |      0.11      |  0.024477149498  |
+--------+----------------+------------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 1.7572980510590277)

Per User RMSE (best)
+---------------+-------+----------------+
|      user     | count |      rmse      |
+---------------+-------+----------------+
| Jameson Smith |   62  | 0.811753055463 |
+---------------+-------+--------

In [12]:
rmse_results['precision_recall_by_user'].groupby('cutoff',[agg.AVG('precision'),agg.STD('precision'),agg.AVG('recall'),agg.STD('recall')])

cutoff,Avg of precision,Stdv of precision,Avg of recall,Stdv of recall
16,0.104375,0.109706640068,0.0375237486936,0.0391917446547
10,0.11,0.127671453348,0.024477149498,0.0306734737167
36,0.0986111111111,0.0836452495829,0.0784485231464,0.0621388694159
26,0.101923076923,0.0915635065932,0.0568924450449,0.0502400939593
41,0.0992682926829,0.0804696949323,0.0928185595535,0.0651805982905
3,0.12,0.208273324691,0.00717878646319,0.0142121887874
1,0.13,0.336303434416,0.00261458161114,0.00812788804905
6,0.116666666667,0.153659074288,0.0160943436254,0.0267915539182
11,0.106363636364,0.122680125624,0.0258224441816,0.0317012636664
2,0.14,0.255734237051,0.00569137637914,0.0124600471543


___
Okay, so we are getting better, but might need to tweak the results of the classifier by regularizing...
Remember that we need to come up with a good estimate of the latent factors and we need that matrix to be a good estiamte of the given ratings. We can control some of the parameters using regularization constants and increasing or decreasing the number of latent factors.

In [13]:
rec1 = gl.recommender.ranking_factorization_recommender.create(train, 
                                  user_id="user", 
                                  item_id="movie", 
                                  target="rating",
                                  num_factors=16,                 # override the default value
                                  regularization=1e-02,           # override the default value
                                  linear_regularization = 1e-3)   # override the default value

rmse_results = rec1.evaluate(test)


Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    |      0.2       | 0.00335050849505 |
|   2    |      0.15      | 0.00535421397352 |
|   3    |      0.14      | 0.00835695729776 |
|   4    |     0.1375     | 0.0105764583694  |
|   5    |      0.13      | 0.0127913553766  |
|   6    | 0.123333333333 | 0.0143110906788  |
|   7    | 0.122857142857 | 0.0182911532674  |
|   8    |    0.12125     | 0.0204477467519  |
|   9    | 0.123333333333 | 0.0232173307084  |
|   10   |     0.114      |  0.026891933883  |
+--------+----------------+------------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 1.03482141431452)

Per User RMSE (best)
+--------------+-------+----------------+
|     user     | count |      rmse      |
+--------------+-------+----------------+
| Andres Smith |   3   | 0.354793218297 |
+--------------+-------+---------------

# Is this better then the item item matrix?

In [14]:
comparison = gl.recommender.util.compare_models(test, [item_item, rec1])

PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    |      0.18      | 0.00851461400757 |
|   2    |      0.17      | 0.0114813179455  |
|   3    |      0.19      | 0.0203175491454  |
|   4    |     0.1975     | 0.0275973558931  |
|   5    |     0.194      | 0.0327684686723  |
|   6    | 0.186666666667 | 0.0354018807673  |
|   7    | 0.194285714286 | 0.0450369208063  |
|   8    |    0.19625     | 0.0510507016015  |
|   9    | 0.195555555556 | 0.0550850767787  |
|   10   |     0.186      | 0.0609766860575  |
+--------+----------------+------------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 3.6337609774346276)

Per User RMSE (best)
+------------+-------+---------------+
|    user    | count |      rmse     |
+------------+-------+---------------+
| Zion Smith |  408  | 1.80170298286 |
+------------+-------

In [15]:
 comparisonstruct = gl.compare(test, [item_item, rec1])

PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    |      0.18      | 0.00851461400757 |
|   2    |      0.17      | 0.0114813179455  |
|   3    |      0.19      | 0.0203175491454  |
|   4    |     0.1975     | 0.0275973558931  |
|   5    |     0.194      | 0.0327684686723  |
|   6    | 0.186666666667 | 0.0354018807673  |
|   7    | 0.194285714286 | 0.0450369208063  |
|   8    |    0.19625     | 0.0510507016015  |
|   9    | 0.195555555556 | 0.0550850767787  |
|   10   |     0.186      | 0.0609766860575  |
+--------+----------------+------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    |      0.2 

In [16]:
gl.show_comparison(comparisonstruct, [item_item, rec1])

## Parameters, Parameters
There are so many parameters to search through here. It would be great if there as something we could do to change the parameters automatically and search through the best ones...

In [17]:
params = {'user_id': 'user', 
          'item_id': 'movie', 
          'target': 'rating',
          'num_factors': [8, 12, 16, 24, 32], 
          'regularization':[0.001] ,
          'linear_regularization': [0.001]}

job = gl.model_parameter_search.create( (train,test),
        gl.recommender.ranking_factorization_recommender.create,
        params,
        max_models=5,
        environment=None)

# also note thatthis evaluator also supports sklearn
# https://dato.com/products/create/docs/generated/graphlab.toolkits.model_parameter_search.create.html?highlight=model_parameter_search

[INFO] graphlab.deploy.job: Validating job.
[INFO] graphlab.deploy.job: Creating a LocalAsync environment called 'async'.
[INFO] graphlab.deploy.map_job: Validation complete. Job: 'Model-Parameter-Search-Aug-18-2016-11-23-3500000' ready for execution
[INFO] graphlab.deploy.map_job: Job: 'Model-Parameter-Search-Aug-18-2016-11-23-3500000' scheduled.
[INFO] graphlab.deploy.job: Validating job.
[INFO] graphlab.deploy.map_job: A job with name 'Model-Parameter-Search-Aug-18-2016-11-23-3500000' already exists. Renaming the job to 'Model-Parameter-Search-Aug-18-2016-11-23-3500000-58ac0'.
[INFO] graphlab.deploy.map_job: Validation complete. Job: 'Model-Parameter-Search-Aug-18-2016-11-23-3500000-58ac0' ready for execution
[INFO] graphlab.deploy.map_job: Job: 'Model-Parameter-Search-Aug-18-2016-11-23-3500000-58ac0' scheduled.


In [18]:
job.get_status()

{'Canceled': 0, 'Completed': 0, 'Failed': 0, 'Pending': 5, 'Running': 0}

In [19]:
job_result = job.get_results()

job_result.head()

model_id,item_id,linear_regularization,max_iterations,num_factors,num_sampled_negative_exam ples ...,ranking_regularization
1,movie,0.001,50,12,4,0.5
0,movie,0.001,25,16,4,0.25
3,movie,0.001,25,8,4,0.5
2,movie,0.001,25,8,4,0.25
4,movie,0.001,25,12,4,0.5

regularization,target,user_id,training_precision@5,training_recall@5,training_rmse,validation_precision@5
0.001,rating,user,0.343113772455,0.00876171469214,1.13907027191,0.13
0.001,rating,user,0.343113772455,0.00876171469214,1.02105115609,0.13
0.001,rating,user,0.343113772455,0.00876171469214,1.13902979422,0.132
0.001,rating,user,0.343113772455,0.00876171469214,1.02118091173,0.132
0.001,rating,user,0.343113772455,0.00876171469214,1.13889296885,0.126

validation_recall@5,validation_rmse
0.0130459949096,1.14344825771
0.012831653314,1.03253336032
0.0128029974813,1.1433598247
0.0129745104568,1.03259481326
0.0121888593529,1.14308176356


In [20]:
bst_prms = job.get_best_params()
bst_prms

{'item_id': 'movie',
 'linear_regularization': 0.001,
 'max_iterations': 25,
 'num_factors': 16,
 'num_sampled_negative_examples': 4,
 'ranking_regularization': 0.25,
 'regularization': 0.001,
 'target': 'rating',
 'user_id': 'user'}

In [21]:
models = job.get_models()
models

[Class                            : RankingFactorizationRecommender
 
 Schema
 ------
 User ID                          : user
 Item ID                          : movie
 Target                           : rating
 Additional observation features  : 0
 User side features               : []
 Item side features               : []
 
 Statistics
 ----------
 Number of observations           : 77252
 Number of users                  : 334
 Number of items                  : 7474
 
 Training summary
 ----------------
 Training time                    : 6.4111
 
 Model Parameters
 ----------------
 Model class                      : RankingFactorizationRecommender
 num_factors                      : 16
 binary_target                    : 0
 side_data_factorization          : 1
 solver                           : auto
 nmf                              : 0
 max_iterations                   : 25
 
 Regularization Settings
 -----------------------
 regularization                   : 0.001
 regulari

In [22]:
comparisonstruct = gl.compare(test,models)
gl.show_comparison(comparisonstruct,models)

PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    |      0.19      | 0.00315443006367 |
|   2    |     0.165      | 0.00613410527787 |
|   3    | 0.146666666667 | 0.00854842735221 |
|   4    |      0.14      | 0.0110378456412  |
|   5    |      0.13      |  0.012831653314  |
|   6    |      0.13      | 0.0157209181741  |
|   7    |      0.12      | 0.0178958427382  |
|   8    |    0.11625     | 0.0203375370751  |
|   9    | 0.113333333333 | 0.0214753336532  |
|   10   |     0.111      | 0.0231788934624  |
+--------+----------------+------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    |      0.19

In [23]:
models[2]

Class                            : RankingFactorizationRecommender

Schema
------
User ID                          : user
Item ID                          : movie
Target                           : rating
Additional observation features  : 0
User side features               : []
Item side features               : []

Statistics
----------
Number of observations           : 77252
Number of users                  : 334
Number of items                  : 7474

Training summary
----------------
Training time                    : 5.0798

Model Parameters
----------------
Model class                      : RankingFactorizationRecommender
num_factors                      : 8
binary_target                    : 0
side_data_factorization          : 1
solver                           : auto
nmf                              : 0
max_iterations                   : 25

Regularization Settings
-----------------------
regularization                   : 0.001
regularization_type              : normal
li

In [24]:
t2 = time.clock()
print
print('notebook ran in: %.2f seconds' % (t2-t1))


notebook ran in: 101.42 seconds
