Consider a ridge regression problem as follows:

$$\text{minimize } \ \frac{1}{2} \sum_{i=1}^n (\theta^T x_i - y_i)^2 + \frac{\lambda}{2} \|\theta\|^2$$

This problem is a special case of a more general family of problems called *regularized empirical risk minimization*, where the objective function is usually comprised of two parts: a set of *loss terms* and a *regularization term*.

Now, we show how to use the package *SGDOptim* to solve such a problem. First, we have to prepare some simulation data:

In [1]:
θ = [3.0, -4.0, 5.0];   # the underlying model

3-element Array{Float64,1}:
  3.0
 -4.0
  5.0

In [2]:
n = 10000; X = randn(3, n);  # generate 10000 sample features

3x10000 Array{Float64,2}:
  0.186357   2.06501  -1.30345    …  -0.213211  -1.28663   -0.13952 
 -1.11419   -1.76544   0.0552147     -0.16457   -2.1969     1.49355 
  1.04388   -2.16657  -1.17215        0.271101   0.781125  -0.242643

In [7]:
σ = 0.1; y = vec(θ'X) + σ * randn(n); # generate the responses, adding some noise

10000-element Array{Float64,1}:
  10.109   
   2.34337 
  -9.96466 
  -4.55535 
  -1.01875 
  14.1827  
 -14.0844  
   7.13437 
 -13.5125  
  14.5484  
  14.439   
   7.56407 
   0.982114
   ⋮       
   5.33694 
  -7.16893 
   8.59394 
  -1.94164 
   7.21264 
   9.72875 
   4.03635 
  -3.00291 
  16.838   
   1.37921 
   8.82616 
  -7.80379 

Now, let's try to estimate $\theta$ from the data. This can be done by the following statement:

In [5]:
using SGDOptim

In [8]:
θe = sgd(
    LinearPredictor(),   # use the linear predictor x -> θ'x
    SqrLoss(),     # use the squared loss (the typical choice for linear regression)
    zeros(3),      # the initial guess
    minibatch_seq(X, y, 10),    # supply the data in mini-batches, each with 10 samples
    reg = SqrL2Reg(1.0e-4),     # add a squared L2 regression with coefficient 1.0e-4
    lrate = t->1.0/(100.0 + t), # set the rule of learning rate 
    cbinterval = 100,           # invoke the callback every 100 iterations
    callback = simple_trace)    # print the optimization trace in callback

Iter 100: avg.loss = 2.5217e-03
Iter 200: avg.loss = 5.6985e-03
Iter 300: avg.loss = 2.7370e-03
Iter 400: avg.loss = 9.2170e-03
Iter 500: avg.loss = 5.7844e-03
Iter 600: avg.loss = 4.9012e-03
Iter 700: avg.loss = 8.3151e-03
Iter 800: avg.loss = 3.8253e-03
Iter 900: avg.loss = 3.1001e-03
Iter 1000: avg.loss = 6.1005e-03


3-element Array{Float64,1}:
  2.99657
 -4.00036
  4.99567

Note that 10000 samples can be partitioned into 1000 minibatches of size 10. So there were 1000 iterations, each using a single minibatch.

Now let's compare the estimated solution with the ground-truth:

In [9]:
sumabs2(θe - θ) / sumabs2(θ)

6.133004112320621e-7

The result looks quite accurate. We are done!