In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook

# Linear regression

## Design matrix approach

Gradient descent is a general framework for optimising functions. However, we can solve any linear regression problem in a much faster and more elegant way using a design matrix.

Define the *design matrix* as:
$$
\Phi = \begin{bmatrix}
\phi_1(x_1) & \phi_2(x_1) & \dots & \phi_M(x_1)\\
\phi_1(x_2) & \phi_2(x_2) & \dots & \phi_M(x_2)\\
\vdots & \vdots & \ddots & \vdots \\
\phi_1(x_N) & \phi_2(x_N) & \dots & \phi_M(x_N)\\
\end{bmatrix}
$$
where $\phi_i(x_j)$ is the output of the $i$th feature at the $j$th input.

The optimal weights can now be found as
$$
\textbf{w}^* = (\Phi^\top\Phi)^{-1} \Phi^\top \textbf{y},
$$
where $\textbf{y}$ is a vector of all outputs. 

## Exercise

Implement the solution to the linear regresssion problem using the above formula. Test it on the data in `ex1.dat`, `ex2.dat`, and `ex3.dat`.


# Example solution

## Data

Load data from exercise 3.

In [None]:
data = np.loadtxt("../data/ex3.dat")
X = data[:, 0]
y = data[:, 1]

In [None]:
fig, ax = plt.subplots()
ax.scatter(X, y)
plt.show()

## Solving the linear regression problem

Define feature vector $\phi$:

In [None]:
def phi(x):
    return np.array([np.ones_like(x), x, x**2, np.sin(x)])

Define the design matrix $\Phi$:

In [None]:
Phi = np.array(phi(X)).T

Compute maximum likelihood solution for weights as $ \mathbf{w} = (\Phi^\top\Phi)^{-1} \Phi^\top \mathbf{y} $

In [None]:
w = np.linalg.inv(Phi.T.dot(Phi)).dot(Phi.T).dot(y)
print(w)

## Visualising the result

In [None]:
Xf = np.linspace(0,25,500)
yf = np.array([np.sum(w*phi(xf)) for xf in Xf])

fig, ax = plt.subplots()
ax.scatter(X, y)
ax.plot(Xf, yf, color="C1")
plt.show()