In [558]:
import graphlab as gl
import numpy as np

In [559]:
# make some fake ratings that exactly match a 1-factor model

u_vec = np.random.randn(8,1)
i_vec = np.random.randn(1,8)

ratings = u_vec.dot(i_vec)

In [561]:
# include about half of the full ratings matrix

sf = gl.SFrame({'user_id': [j for i in range(8) for j in range(i)],
                      'item_id': [i for i in range(8) for j in range(i)],
                      'rating': [ratings[i,j] for i in range(8) for j in range(i)]})

Regularization is negligible (setting it to 0 gets similar results).  2000 iterations gives us plenty of time, and the undocumented attribute "sgd_convergence_threshold" lets us not stop early.

In [524]:
rec = gl.recommender.factorization_recommender.create(sf, target='rating',
                                                      num_factors=1,
                                                     solver='sgd',
                                                     max_iterations=2000,
                                                     regularization=1e-12,
                                                     linear_regularization=1e-12,
                                                     sgd_convergence_threshold=1e-10)

Below, I verify the 'Final training RMSE' number printed above.

The final RMSE is non-negligible.

Why can't we reconstruct the matrix perfectly?

In [525]:
np.sqrt(np.mean(np.array(rec.predict(sf) - sf['rating'])**2))

0.012927500581845585

The problem seems to be the fast decay of the learning rate.

For instance, we can get a better RMSE if we set the initial learning rate  to 1 (a higher value) so that we get bigger steps overall.

In [526]:
rec = gl.recommender.factorization_recommender.create(sf, target='rating',
                                                      num_factors=1,
                                                     solver='sgd',
                                                     max_iterations=2000,
                                                     regularization=1e-12,
                                                     linear_regularization=1e-12,
                                                     sgd_convergence_threshold=1e-10,
                                                     sgd_step_size=1)

In [527]:
np.sqrt(np.mean(np.array(rec.predict(sf) - sf['rating'])**2))

0.0034754172065354591

There are also some undocumented attributes that appear to give us some control over the learning rate decay.  For instance:

In [528]:
rec.step_size_decrease_rate

0.75

This attribute can range from 0.5 to 1.  Let's set it to 1:

In [529]:
rec = gl.recommender.factorization_recommender.create(sf, target='rating',
                                                      num_factors=1,
                                                     solver='sgd',
                                                     max_iterations=2000,
                                                     regularization=1e-12,
                                                     linear_regularization=1e-12,
                                                     sgd_convergence_threshold=1e-10,
                                                     sgd_step_size=1,
                                                     step_size_decrease_rate=1)

Note in the "Step size" column that the step size is now very close to the (1/t) where t is the iteration number.  Further tests with "step_size_decrease_rate" show that it equals "n" in the equation

learning rate = init_rate / (lambda \* t)^n

with lambda apparently fixed at a value close to 1.  (If there's a way to control lambda, I can't find it.)

To make the learning rate decay as slow as possible, let's try n=0.5.  I've also pumped up the number of iterations to 5000 -- we can see that the RMSE is converging to a nonzero value.

In [536]:
rec = gl.recommender.factorization_recommender.create(sf, target='rating',
                                                      num_factors=1,
                                                     solver='sgd',
                                                     max_iterations=5000,
                                                     regularization=1e-12,
                                                     linear_regularization=1e-12,
                                                     sgd_convergence_threshold=1e-10,
                                                     sgd_step_size=1)

In [537]:
np.sqrt(np.mean(np.array(rec.predict(sf) - sf['rating'])**2))

0.0074831990302293358

Is this the best we can do?  Let's compare it to another recommender library, Nicolas Hug's scikit-surprise.

In [538]:
from surprise import SVD, Dataset, Reader, evaluate

In [539]:
sf['user_id','item_id','rating'].export_csv('sf.csv',header=True,delimiter='\t')

In [540]:
reader = Reader(line_format='user item rating', sep='\t', skip_lines=1)

In [541]:
surprise_sf = Dataset.load_from_file('sf.csv', reader=reader)
surprise_sf_train = surprise_sf.build_full_trainset()
surprise_sf_test = surprise_sf_train.build_testset()

The hyperparameters I set below are the same as the ones I used for graphlab's recommender, except I set the learning rate ("lr_all") to 0.1.  Surprise is just doing vanilla SGD with a fixed learning rate.

In [552]:
svd = SVD(n_factors=1, reg_all=1e-12, n_epochs=5000, lr_all=1e-1)
svd.train(surprise_sf_train)

In [553]:
preds = [svd.predict(s[0], s[1], r_ui=s[2], clip=False) for s in surprise_sf_test]
#preds
np.sqrt(np.mean(np.array([p.est - p.r_ui for p in preds])**2))

3.4100570591393541e-07

Surprise's RMSE is between 5 and 6 orders of magnitude better:

In [557]:
rmse_gl = np.sqrt(np.mean(np.array(rec.predict(sf) - sf['rating'])**2))
rmse_surprise = np.sqrt(np.mean(np.array([p.est - p.r_ui for p in preds])**2))

rmse_surprise / rmse_gl

4.5569509047721358e-05

Note that this is not just a case of bad performance in a constructed toy problem.  I originally looked into the above because Surprise outperformed graphlab's recommender a great deal on some real data I was using, and for the same apparent reason.