In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook

# Linear regression

## Gradient descent approach

Linear regression is the task of modelling data with a function of the form
$$
\begin{align*}
t &= w_1 \phi_1(\textbf{x}) + w_2 \phi_2(\textbf{x}) + \dots + w_M \phi_M(\textbf{x}) \\
&= \sum_{i = 1}^M w_i \phi_i(\textbf{x}) \\
&= \textbf{w}^\top \boldsymbol{\phi}(\textbf{x}),
\end{align*}
$$
where $\textbf{x}$ is an input, $\boldsymbol{\phi}(\cdot)$ is the feature vector consisting of $M$ basis functions (think of them as feature transformations), and $\textbf{w}$ is a vector of weights, i.e. parameters determining the behaviour of our model. Our goal is to learn the optimal weights, $\textbf{w}^*$, such that our model captures the structure of the data.

[Gradient descent](https://en.wikipedia.org/wiki/Gradient_descent) is a general framework for optimising functions. It works by iteratively adjusting the weights to approach a local minium of some loss function, L:
$$
\textbf{w}^{i+1} = \textbf{w} - \eta \nabla L(\textbf{w}),
$$
where $\eta$ is the learning rate or step size.



## Exercise

Implement the solution to the linear regresssion problem using gradient descent. Use the mean squared error (MSE) loss function 
$$
\text{MSE} = \frac{1}{2N} \sum_{i=1}^N (t - y)^2.
$$

Test it on the data in `ex1.dat`, `ex2.dat`, and `ex3.dat`.


# Example solution

## Data

Load data from exercise 1.

In [None]:
data = np.loadtxt("../data/ex1.dat")
X = data[:, 0]
y = data[:, 1]

In [None]:
fig, ax = plt.subplots()
ax.scatter(X, y)
plt.show()

## Solving the linear regression problem

Define feature vector $\phi$:

In [None]:
def basis_functions(x):
    # The transpose is a convenience when passing all data points in one go
    return np.array([np.ones_like(x), x]).T

Define the MSE loss function and its derivative:

In [None]:
def mse(t, y):
    N = len(y)
    return 1/(2*N) * np.sum((t - y)**2)

def grad_mse(t, y):
    N = len(y)
    return 1/N * (t - y)

Take $10^5$ steps towards the minimum error:

In [None]:
phi = basis_functions(X)                       # Compute feature vectors
w = 0.001 * np.random.randn(phi.shape[1])      # Initialise w to a small random value
eta = 0.01                                     # Set step size (learning rate)

for step in range(10**5):
    idx = np.random.randint(0, len(X), 5)      # Select 5 random points (for robustness)
    phi_r = phi[idx]
    
    t = np.sum(w * phi_r, axis=1)              # For each step, compute the predictions...
    w = w - eta*np.mean(grad_mse(t, y[idx]))   # ... and update w using gradient descent. We take the 
                                               # mean gradient of the selected points for robustness.
    
print("Optimised w = {}".format(w))

## Visualising the result

In [None]:
Xf = np.linspace(0,10,500)
tf = np.sum(w * basis_functions(Xf), axis=1)

fig, ax = plt.subplots()
ax.scatter(X, y)
ax.plot(Xf, tf, color="C1")
plt.show()