No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Build Status Coverage Status

An implementation of online mini-batch learning for prediction in julia.


A Learner is fit by repeated calls to update!(l::Learner, x::DSMat{Float64}, y::Vector{Float64}) on mini-batches (x, y) of a dataset. Updating a learner incrementally optimizes some loss function. The loss function depends on the implementation of concrete subtypes of Learner. The actual optimization routine is implemented by an AbstractSGD object.

Values of the outcome are predicted with predict(l::Learner, x::DSMat{Float64}). The predict!(obj::Learner, pr::Vector{Float64}, x::DSMat{Float64}) method calculates predictions in place.

Features (x) can be either a dense or sparse matrix. (DSMat{T} is an alias for DenseMatrix{T} or SparseMatrixCSC{T, Ti <: Integer})

Available learners

  • GLMLearner(m::GLMModel, optimizer::AbstractSGD) - GLMs without regularization.
  • GLMNetLearner(m::GLMModel, optimizer::AbstractSGD, lambda1 = 0.0, lambda2 = 0.0) - GLMs with l_1 and l_2 regularization.
  • SVMLearner - support vector machine, not fully implemented

The type of GLM is specified by GLMModel. Choices are:

  • LinearModel() for least squares
  • LogisticModel() for logistic regression
  • QuantileModel(tau=0.5) for tau-quantile regression.


All of the learners require an optimizer of some sort. Currently, stochastic gradient descent type methods are provided by the AbstractSGD type.

An AbstractSGD implements an update!(obj::AbstractSGD{Float64}, weights::Vector{Float64}, gr::Vector{Float64}) method. This takes the current value of the weight(coefficient) vector and gradient and updates the weight vector in place. The AbstractSGD instance stores tuning parameters and step information, and may have additional storage additional storage for if necessary.

Available optimizers:

  • SimpleSGD(alpha1::Float64, alpha2::Float64) - Standard SGD where step size is alpha1/(1.0 + alpha1 * alpha2 * t).
  • AdaDelta(rho::Float64, eps::Float64) - Implementation of Algorithm 1 here.
  • AdaGrad(eta::Float64) Stepsize is for weight j eta /[sqrt(sum of grad_j^2 up to t) + 1.0e-8]. Paper.
  • AveragedSGD(alpha1::Float64, alpha2::Float64, t0::Int) - Described in section 5.3 here with step size alpha1/(1.0 + alpha1 * alpha2 * t)^(3/4)


This is a work in progress. Most testing has been in simulations and not with real data. GLMLearner and GLMNetLearner with l_2 regularization seem to work pretty well. GLMNetLearner with l_1 regularization has not been thoroughly tested. Statistical performance tends to be pretty senstive to choice of optimizer and tuning parameters.


  • Everything is implemented in terms of Float64. Should allow for Float32 as well.
  • Finish the SVM implementation, perhaps add Pegasos implementation
  • Automatic transformations of features
  • More useful interfaces/DataFrames interface
  • More checking of data
  • Automatic bounding for predictions
  • Remove GLMLearner in favor of GLMNetLearner
  • Better docs
  • Eliminate extra memory allocation