# Stochastic Approximation Simulations (Linear Regression)

It is difficult to compare stochastic approximation algorithms.  The following interactive example shows the behavior of several algorithms under a variety of different conditions for a linear regression model.  You'll notice there is no universal "winner", although some methods are more robust to the learning rate than others.

## Singleton Objective Function

$$f_t(\beta) = \frac{1}{2}(y_t - x_t^T\beta)^2$$

$$g_t = \nabla f_t(\beta) = -(y_t - x_t^T\beta)x_t$$

## Conditions

- Each algorithm uses a learning rate paramerized as 

$$\gamma_t = \frac{1}{t^r},\quad r\in\{.5, .7, .9\}$$.

- The data is generated from a linear model:

    $$y_t = x_t^T\beta + \epsilon_t,$$

    where:
    - $\epsilon_t\sim N(0, \sigma), \quad \sigma \in \{1, 5, 20, 50\}$
    - $\beta$ is one of the following:
        - $(-1,\ldots,1)$
        - $(1,\ldots,1)$
        - $(10,\ldots,10)$
        
## The Visualization

What is being plotted below is the value of the **population objective**

$$f(\beta) = \frac{1}{n}\sum_t f_t(\beta),$$

relative to the OLS solution, evaluated at the current estimate $\hat{\beta}^{(t)}_{SGD}$, $\hat{\beta}^{(t)}_{ADAGRAD},$ etc.  For example, the SGD line above a point $t$ on the x-axis represents

$$f(\beta^{(t)}_{SGD}) - f(\hat\beta_{OLS}).$$

Since the values are relative to the *best* solution, as the lines approach zero they approach the optimum of the population objective function.

In [None]:
using OnlineStats, Interact, Plots, Random
gr(fmt=:png)

Losses = [L1DistLoss(), .5L2DistLoss(), LogitMarginLoss()]
Alg1 = [SGD(), ADAGRAD(), RMSPROP(), ADAM(), ADAMAX(), OMAS(), OMAP(), MSPI()]
Alg2 = [SGD(), ADAGRAD(), RMSPROP(), ADAM(), ADAMAX(), OMAS(), OMAP(), MSPI()]

σs = [1, 2, 5, 10, 50]
βs = [collect(range(-1,stop=1,length=10)), ones(10), fill(10.0, 10)]
Loss = .5 * L2DistLoss()

In [None]:
@manipulate for A1 in Alg1, A2 in Alg2, σ in σs, β in βs, seed in 1:100, r in .5:.02:1
    Random.seed!(seed)
    n, p = 1000, 10
    x = randn(n, p)
    y = x * β + σ * randn(n)

    out = zeros(n, 2)
    o1 = StatLearn(p, Loss, A1, rate=LearningRate(r))
    o2 = StatLearn(p, Loss, A2, rate=LearningRate(r))
    s = Series(o1, o2)
    
    for i in 1:n
        xi = @view x[i, :]
        yi = y[i]
        fit!(s, (xi, yi))
        out[i, 1] = OnlineStats.objective(o1, x, y)
        out[i, 2] = OnlineStats.objective(o2, x, y)
    end
    ymax = 2minimum(out[end,1])
    plot(out, ylim = (0, ymax), label = [string(A1) string(A2)], w = 2, title = "Learn Rate = $r",
        linestyle = :auto)
end

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>