Regression and Stochastic Gradient Descent
===================================



Regression consists in finding a model $f$ that depends on parameters $\theta$ and that verifies $f(\theta, x_i) = y_i$ for a given set of $(x_i, y_i)_{i=1..n}$. 

Regression can be recast as an optimization problem, for example by minimizing the loss: 

\begin{equation}
L = \sum_i \| y_i - f(\theta, x_i) \|_2^2
\end{equation}


In [None]:
from __future__ import print_function
import numpy as np
from matplotlib import pyplot
import torch

# Regressing between 2 squares

A function to visualize a 2D -> 1D function as contours 

In [None]:
def plot_with_contours(f):     
    delta = 0.1
    xsteps = np.arange(-4.0, 4.0, delta)
    ysteps = np.arange(-3.0, 3.0, delta)
    Z = np.array([[f(torch.tensor([x, y])) for x in xsteps] for y in ysteps])
    X, Y = np.meshgrid(xsteps, ysteps)
    pyplot.contour(X, Y, Z, levels=np.arange(Z.min(), Z.max(), 0.05 * (Z.max() - Z.min())))

A complex function to minimize (convention: we always minimize rather than maximize)

Note that the function is not necessarily differentiable everywhere...

In [None]:
def f(x): 
    return torch.sin(x[0] + 1.1) + torch.sin(x[1]) + 0.4  * (torch.abs(x[0] - 2.5) + torch.abs(x[1] + .5))

plot_with_contours(f)

In [None]:
# starting point
x = torch.tensor([0.0, 0.0])

# set the learning rate
learning_rate = 0.5

objectives = []
points = []
for it in range(20):    
    points.append(x.numpy())   # logging 
    
    # we will need a gradient wrt. x
    x.requires_grad = True
    
    # call the function, record dependencies for the gradient
    y = f(x)
        
    print(it, y.item())
    objectives.append(y.item()) # logging
    
    # compute gradients
    y.backward()    
    
    # update current solution
    x = x.data - learning_rate * x.grad


In [None]:
pyplot.plot(objectives)

In [None]:
points = np.array(points)
plot_with_contours(f)
pyplot.plot(points[:, 0], points[:, 1], 'ro-')

Observations: is this a global minimum? 

Try changing the initial value to something else to get a better objective value

Try changing the learning rate to 2 and to 0.02. What happens?
