# SparseRegression.jl

### https://github.com/joshday/SparseRegression.jl

- Josh Day (`@joshday`)
- emailjoshday@gmail.com

# A brief History of [JuliaML](https://github.com/JuliaML)

- Created at last year's JuliaCon
- A lot has happened since then
    - LearnBase, LossFunctions, PenaltyFunctions, LearningStrategies, Transformations, MLDataUtils, Reinforce,...

# SparseRegression
SparseRegression uses primitives defined in the [JuliaML](https://github.com/JuliaML) ecosystem to implement high-performance algorithms for linear models.
- [LossFunctions](https://github.com/JuliaML/LossFunctions.jl)
- [PenaltyFunctions](https://github.com/JuliaML/PenaltyFunctions.jl)
- [LearningStrategies](https://github.com/JuliaML/LearningStrategies.jl)

# SparseRegression Model
<img src="https://cloud.githubusercontent.com/assets/8075494/25072239/5d85db30-2297-11e7-817e-e7bebaf056cd.png", width=600>

# Loss + (Elementwise) Penalty
Many models have this form
- LASSO: Linear regression with an L1 penalty
$$
\begin{aligned}
f_i(\beta) &= \frac{1}{2}(y_i - x_i^T\beta)^2 \\ J\left(\left|\beta_j\right|\right) &= \left|\beta_j\right|
\end{aligned}
$$

# LossFunctions

Primary author : Evizero (Christof Stocker)

### Distance - Based Losses $(y-yhat)$
<img src="https://camo.githubusercontent.com/80ae72e5878d98aacb0eed6863958c02fd73b1b3/68747470733a2f2f7261776769746875622e636f6d2f4a756c69614d4c2f46696c6553746f726167652f6d61737465722f4c6f737346756e6374696f6e732f64697374616e63652e737667">

### Margin - Based Losses $(y * yhat)$
<img src = "https://camo.githubusercontent.com/80dc81d308845dd34ff70a953aafffb30b2c5c3a/68747470733a2f2f7261776769746875622e636f6d2f4a756c69614d4c2f46696c6553746f726167652f6d61737465722f4c6f737346756e6374696f6e732f6d617267696e2e737667">

# Penalty Functions

Primary Author: joshday (me)

### Element Penalties

![](https://camo.githubusercontent.com/6e0b322991cdb606579e438d3868de0bdd06c7d0/68747470733a2f2f7261776769746875622e636f6d2f4a756c69614d4c2f46696c6553746f726167652f6d61737465722f50656e616c747946756e6374696f6e732f756e69766172696174652e737667)
![](https://camo.githubusercontent.com/930d5222183e795190a19462eac381311c8db680/68747470733a2f2f7261776769746875622e636f6d2f4a756c69614d4c2f46696c6553746f726167652f6d61737465722f50656e616c747946756e6374696f6e732f6269766172696174652e737667)

# Learning Strategies

Primary Author: tbreloff (Tom Breloff)

Unifying framework for strategies involved with iterative algorithms, including:

- `MaxIter`, `TimeLimit`, `Converged`, ...
- Now when I want to implement a new algorithm, **I don't need to include my usual boilerplate**

```julia
function learn!(model, strategies...)
    pre_hook.(strategies, model)
    i = 0
    while true
        i += 1
        learn!.(model, strategies)
        iter_hook.(strategies, model, i)
        any(finished.(strategies, model, i)) && break
    end
    post_hook.(strategies, model)
end
```

# Quick Example

### The LASSO
- `LinearRegression()`, which is alias for `scaled(L2DistLoss(), .5)`
- `L1Penalty()`

In [1]:
using SparseRegression, DataGenerator
using Plots; gr()

# Fake Data
n, p = 10_000, 50
x, y, β = linregdata(n, p);

### Create the model

- Observations (observation=row) are wrapped in `Obs` type
- `Obs` is the only required argument
- Order of arguments doesn't matter (and it's type stable!)
- Possible arguments are
    - `Loss`
    - `Penalty`
    - penalty factor `Vector{Float64}`

In [2]:
o = SparseReg(Obs(x, y), LinearRegression(), L1Penalty(), fill(.2, p))

■ SparseReg
  >           β:  [0.0 0.0 … 0.0 0.0]
  >    λ factor:  [0.2 0.2 … 0.2 0.2]
  >        Loss:  0.5 * (L2DistLoss)
  >     Penalty:  L1Penalty


In [3]:
plot(o, ylim=(-1,1))
png("plot1.png")

### "Learn" the Model
- Using Proximal Gradient Method
- Maximum iterations of 50
- Criteria for convergence is `norm(coef(o) - old_coef) < 1e-6`

In [4]:
learn!(o, ProxGrad(), MaxIter(50), Converged(coef))

[1m[36mINFO: [39m[22m[36mConverged after 8 iterations: [-0.795787, -0.751458, -0.710455, -0.643737, -0.664963, -0.586922, -0.533593, -0.515631, -0.432768, -0.460394, -0.416148, -0.357223, -0.312021, -0.284171, -0.2445, -0.20269, -0.151904, -0.0896865, -0.0823056, -0.043611, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00591399, 0.067068, 0.0823051, 0.136827, 0.190247, 0.279803, 0.233562, 0.31656, 0.350783, 0.389854, 0.446148, 0.453686, 0.508485, 0.586099, 0.606811, 0.62542, 0.651108, 0.733226, 0.778829, 0.804444]
[39m

■ SparseReg
  >           β:  [-0.795787 -0.751458 … 0.778829 0.804444]
  >    λ factor:  [0.2 0.2 … 0.2 0.2]
  >        Loss:  0.5 * (L2DistLoss)
  >     Penalty:  L1Penalty


In [5]:
plot(o)
png("plot2.png")

In [6]:
scatter!(β, label = "Truth")
png("plot3.png")

# Why *Sparse* Regression?

As tuning parameter $\lambda_j$ increases, $\beta_j$ shrinks to 0

In [7]:
β = zeros(500)
β[1] = β[100] = β[200] = β[300] = β[400] = β[500] = 1.0
x, y, _ = linregdata(1000, 500; β = β)
@gif for λ in 0:.001:0.1
    o = SparseReg(Obs(x, y), L1Penalty(), fill(λ, 500))
    learn!(o, ProxGrad(.5), MaxIter(50))
    plot(o, ylim = (-.5, 1.3), g = coef(o) .== 0, xlab = "hi",
        label = ["nonzero" "zero"], marker = (5, stroke(0)), 
        legend = :bottomright)
end

[1m[36mINFO: [39m[22m[36mSaved animation to /Users/joshday/github/Talks/JuliaCon2017_SparseRegression/tmp.gif
[39m

### Algorithms
- `ProxGrad` (Proximal Gradient)
- `Fista` (Accelerated Proximal Gradient)
- `Sweep` (LinearRegression with NoPenalty or L1Penalty)
- `GradientDescent`

In [8]:
using MultivariateStats, BenchmarkTools

In [9]:
x, y, β = linregdata(1_000, 10)
@btime learn!(SparseReg(Obs(x, y), NoPenalty()), Sweep())
@btime llsq(x, y; bias=false);

  31.532 μs (26 allocations: 3.41 KiB)
  29.021 μs (10 allocations: 1.42 KiB)


In [10]:
x, y, β = linregdata(10_000, 100)
@btime learn!(SparseReg(Obs(x, y), NoPenalty()), Sweep())
@btime llsq(x, y; bias=false);

  4.336 ms (28 allocations: 163.80 KiB)
  3.836 ms (11 allocations: 80.19 KiB)


Essentially, all SparseRegression does is

- Add a model type (`SparseReg`), and 
- New learning strategies (algorithms)

to use with LearningStrategies

# Solution Paths

Slightly different parameterization of tuning parameter:
$$
... + \alpha \sum_j \lambda_j J\left(\left|\beta_j\right|\right)
$$

In [11]:
x, y, β = linregdata(1000, 5)

# provide the λ_j's
o = SparseReg(Obs(x, y), LinearRegression(), L1Penalty(), ones(5))

# provide the α's for the path
path = SparseRegPath(o, 0:.05:1)

■ SparseRegPath
  >    λ factor:  [1.0, 1.0, 1.0, 1.0, 1.0]
  >        Loss:  0.5 * (L2DistLoss)
  >     Penalty:  L1Penalty
  > β(0.00) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.05) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.10) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.15) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.20) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.25) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.30) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.35) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.40) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.45) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.50) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.55) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.60) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.65) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.70) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.75) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.80) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.85) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.90) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(0.95) : [0.0, 0.0, 0.0, 0.0, 0.0]
  > β(1.00) : [0.0, 0.0, 0.0, 0.0, 0.0]


In [12]:
learn!(path, ProxGrad(.5), MaxIter(30))

■ SparseRegPath
  >    λ factor:  [1.0, 1.0, 1.0, 1.0, 1.0]
  >        Loss:  0.5 * (L2DistLoss)
  >     Penalty:  L1Penalty
  > β(0.00) : [-1.06909, -0.449434, 0.0300856, 0.537436, 1.03042]
  > β(0.05) : [-1.02056, -0.398305, 0.0, 0.495178, 0.974834]
  > β(0.10) : [-0.971135, -0.346901, 0.0, 0.451892, 0.918945]
  > β(0.15) : [-0.921713, -0.295498, 0.0, 0.408605, 0.863057]
  > β(0.20) : [-0.872291, -0.244094, 0.0, 0.365319, 0.807169]
  > β(0.25) : [-0.822868, -0.192691, 0.0, 0.322032, 0.751281]
  > β(0.30) : [-0.773446, -0.141287, 0.0, 0.278745, 0.695392]
  > β(0.35) : [-0.724023, -0.0898839, 0.0, 0.235459, 0.639504]
  > β(0.40) : [-0.674601, -0.0384805, 0.0, 0.192172, 0.583616]
  > β(0.45) : [-0.625167, -0.0, 0.0, 0.148976, 0.527941]
  > β(0.50) : [-0.575697, -0.0, 0.0, 0.106051, 0.472904]
  > β(0.55) : [-0.526227, -0.0, 0.0, 0.0631258, 0.417867]
  > β(0.60) : [-0.476756, -0.0, 0.0, 0.0202007, 0.362829]
  > β(0.65) : [-0.425514, -0.0, 0.0, 0.0, 0.307566]
  > β(0.70) : [-0.372697, -0.0

In [13]:
plot(path, w=3)