In [1]:
import numpy as np


import matplotlib.pyplot as plt
import matplotlib.text as text

## Linear regression

$$ y_i = x_{ij} w_j + b$$

$$ y_i = x_{ij} w_j, \quad x_{i,-1}=1,\quad b=w_{-1} $$

In [2]:
def linear(x,w):
    return x @ w

Generate a random feature vector $\mathbf{x}$ witch 10000 samples and three feature 
such that first feature is drawn from N(0,1), second feature from  U(,1) and third from N(1,2).

In [3]:
x = np.stack((np.random.normal(0, 1, (1000)), 
              np.random.uniform(0, 1, (1000)), 
              np.random.normal(1,2, (1000))), axis=1)
x.shape

(1000, 3)

N(mu,sigma) denotes normal distribution with mean mu and standard deviation sigma. You can use ``numpy.random.normal`` and ``numpy.random.uniform`` functions.

Using $\mathbf{x}$ and weights w = [0.2, 0.5,-0.25,1.0] generate output $\mathbf{y}$ assuming a $N(0,0.1)$ noise $\mathbf{\epsilon}$. 

In [4]:
w = np.array((0.2, 0.5, -0.25, 1.))
ones = np.ones((x.shape[0], 1))
x = np.concatenate((x,ones), axis = 1)
noise = np.random.normal(0, 0.1)
y = linear(x,w)
y = y + noise

$$ y_i = x_{ij} w_j+\epsilon_i, \quad x_{i,-1}=1,\quad b=w_{-1} $$

#### Loss

$$ \frac{1}{2}\frac{1}{N}\sum_{i=0}^{N-1} (y_i -  x_{ij} w_j  )^2$$

In [5]:
def getLoss(y, x, w):
    loss = np.square(y - linear(x,w))
    loss = np.sum(loss) / (2*y.shape[0])
    return loss


## Gradient descent 

### Problem 1 

Find the gradient of the loss function with respect to weights.

Write gradient function ``grad(y,x,w)``.

In [6]:
def grad(y, x, w):
    diff = (x @ w - y)
    return np.dot(x.T, diff) / x.shape[0]
#     return (w - (alpha/x.shape[0]) * tmp)
gradient = grad(y, x, w)
print(gradient)

[-0.00111374  0.01733117  0.03336937  0.03351661]


### Problem 2

Implement gradient descent for linear regression.

In [7]:
alpha = 0.1

def gradientDescent(y, x, w, maxIterations = 500, tolerance=0.0000001):
    for i in range(maxIterations):
        loss = getLoss(y, x, w)
        if loss < tolerance:
            maxIterations = i
            break
        gradient = grad(y, x, w)
        w = w - alpha*gradient

    print("finished gradient descent on iteration " + str(maxIterations))
    print("with loss equal " + str(loss))
    return w
        
gradientDescent(y, x, w)

finished gradient descent on iteration 349
with loss equal 9.899652083778778e-08


array([ 0.20000397,  0.49846432, -0.25001459,  0.96734723])

### Problem 3

Implement stochastic gradient descent (SGD).

In [15]:
def getBatches(x, y, batchSize):
    randomIndices = np.random.randint(1000, size=(batchSize))
    xResult = []
    yResult = []
    for i in randomIndices:
        xResult.append(x[i])
        yResult.append(y[i])
    return (np.asanyarray(xResult), np.asanyarray(yResult))

def sgd(y, x, w, maxIterations = 500, tolerance=0.0000001, batchSize = 10):
    for i in range(maxIterations):
        loss = getLoss(y, x, w)
        if loss < tolerance:
            maxIterations = i
            break
        randomIndices = np.random.randint(1000, size=(batchSize))
        selectedX = x[randomIndices]
        selectedY = y[randomIndices]
#         (selectedX, selectedY) = getBatches(x, y, batchSize)
        gradient = grad(selectedY, selectedX, w)
        w = w - alpha*gradient

    print("finished gradient descent on iteration " + str(maxIterations))
    print("with loss equal " + str(loss))
    return w
sgd(y, x, w)

finished gradient descent on iteration 345
with loss equal 9.889836325315991e-08


array([ 0.20000107,  0.49845625, -0.24999052,  0.96730586])

In [16]:
print("SGD takes: ")
%time tSGD = sgd(y, x, w)
print("gradient descent takes: ")
%time tGD = gradientDescent(y, x, w)

SGD takes: 
finished gradient descent on iteration 341
with loss equal 9.818739529034289e-08
CPU times: user 26.9 ms, sys: 250 µs, total: 27.2 ms
Wall time: 26.4 ms
gradient descent takes: 
finished gradient descent on iteration 349
with loss equal 9.899652083778778e-08
CPU times: user 11.4 ms, sys: 0 ns, total: 11.4 ms
Wall time: 11.3 ms


### Problem 4

Implement SGD using pytorch. Start by just rewritting Problem 3 to use torch Tensors instead of numpy arrays. 

To convert frrom numpy arrays to torch tensors you can use ``torch.from_numpy()`` function. 

### Problem 5 

Implement GD using pytorch automatic differentiation.

To this end the variable with respect to which the gradient will be calculated, ``t_w`` in this case, must have attribute
``requires_grad`` set to ``True`` (``t_w.require_grad=True``).

The torch will automatically track any expression containing ``t_w`` and store its computational graph. The method ``backward()`` can be run on the final expression to back propagate the gradient e.g. ``loss.backward()``. Then the gradient is accesible as ``t_w.grad``.