Implementation of Rendle et. al 2009 Bayesian Personalized Ranking for Matrix Factorization.
Pkg.clone("https://github.com/jsams/BPR.jl.git")
using BPR
# generate some data. The values are unimportant, only zero versus > 0
# item x user matrix
X = sprand(3000, 4000, 0.05)
# by creating an iterator from the data, can re-use it for other runs
biter = BPR.BPR_iter(X)
@time bpr = BPR.bpr(biter, 10, 0.01, 0.01, 0.01, 0.01; tol=0.01, max_iters=10)
# but could also run straight from the matrix
@time bpr = BPR.bpr(X, 10, 0.01, 0.01, 0.01, 0.01; tol=0.01, max_iters=10)
# did it converge
bpr.converged
# what tolerance was achieved
bpr.value
# what was the BPR-OPT criterion in the last run
bpr.bpr_opt
# how well do we predict in a hold out sample
bpr.auc_outsample
# matrix of predicted rankings from
bpr.W' * bpr.H
To figure out hyperparamters (number of dimensions, regularizations, and
learning rate), a handy function, grid_search
can help you with that. It
takes the data and a vector for each parameter to search over, and constructs
the grid of all those points, running the algorithm for each grid point
sample_count
times. It constructs a new hold out sample for each run. It
returns a DataFrame with the convergence properties and run settings as the
columns minus the resulting parameterization. It is built to run in parallel,
so starting julia with -p#
will run in parallel on #
separate processes.
griddf = grid_search(X, sample_count=2; max_iters=100)
Up to you to analyze griddf as to whether you need to refine the grid search or select the optimal hyperparamters.
- uniform sampling is maybe not quite uniform, read paper to uniform over what.