## Linear Regression

Given a set of samples, $x_1, x_2, ... x_n$ and $y_1, y_2, ... y_n$, a simple linear model can be written as:

$$ y_i = a x_i + b\tag{1}$$

(1) can be rewritten in matrix form:

$$ \begin{bmatrix}
y_1 \\
y_2 \\
... \\
y_n
\end{bmatrix} = \begin{bmatrix}
x_1 & 1 \\
x_2 & 1 \\
... \\
x_n & 1
\end{bmatrix} \begin{bmatrix}
a \\
b
\end{bmatrix} \tag{2}$$

Introduce variables for simplification:

$$ y = \begin{bmatrix}
y_1 \\
y_2 \\
... \\
y_n
\end{bmatrix} \tag{3}$$

$$ A = \begin{bmatrix}
x_1 & 1 \\
x_2 & 1 \\
... \\
x_n & 1
\end{bmatrix} \tag{4}$$

Substitute variables defined in (3) and (4) to produce a simplifed form of (2):

$$ y = A \begin{bmatrix}
a \\
b
\end{bmatrix} \tag{5}$$

If $A$ is a square matrix, i.e. $n = 2$, $a$ and $b$ can be determined exactly:

$$ \begin{bmatrix}
a \\
b
\end{bmatrix} = A^{-1} y \tag{6}$$

In the case where $n > 2$, $A$ is not square and cannot be inverted. One solution is to compute a pseudo-inverse by multiplying $A$ by it's transpose $A^T$, which will yield a square matrix on the right hand side:

$$ A^T y = A^T A \begin{bmatrix}
a \\
b
\end{bmatrix} \tag{7}$$

The matrix $A^T A$ on the right can be eliminated by multiplying by it's inverse $(A^T A)^{-1}$:

$$ (A^T A)^{-1} A^T y = (A^T A)^{-1} (A^T A) \begin{bmatrix}
a \\
b
\end{bmatrix} \tag{8} $$

Simplifying yields the result for $a$ and $b$:

$$ \begin{bmatrix}
a \\
b
\end{bmatrix} = (A^T A)^{-1} A^T y\tag{9}$$

## Model Accuracy

The accuracy of the model above can be measured by computing the mean squared error between predicted and training values:

$$ \epsilon = \frac{1}{m} \sum_{i=1}^{m} (a x_i + b - y_i)^2 $$

The prediction error can be computed in the same way, but using the test samples instead of the training samples.

Test error can be different and training error, for example:

- If training error is high, this is a case of __high bias__. This generally means the model is not powerful enough to fit the training data.
- If the training error is low, but the test error is high, this is a case of __high variance__. This usually often means the model is overfitting and does not generalize well with data it has not seen.

## Extending the Model

In the case where a straight line will not accurately fit the available data, a more complex mode can be used. For example, a quadratic model might be more suitable:

$$ y_i = a {x_i}^2 + b x_i + c \tag{10}$$

In matrix form:

$$ \begin{bmatrix}
y_1 \\
y_2 \\
... \\
y_n
\end{bmatrix} = \begin{bmatrix}
{x_1}^2 & x_1 & 1 \\
{x_2}^2 & x_2 & 1 \\
... \\
{x_n}^2 & x_n & 1
\end{bmatrix} \begin{bmatrix}
a \\
b \\
... \\
c
\end{bmatrix} \tag{11}$$

In this case the matrix $A$ is:

$$ A = \begin{bmatrix}
{x_1}^2 & x_1 & 1 \\
{x_2}^2 & x_2 & 1 \\
... \\
{x_n}^2 & x_n & 1
\end{bmatrix}\tag{12}$$

The same procedure can be used to find $a$, $b$, and $c$:

$$ \begin{bmatrix}
a \\
b \\
c
\end{bmatrix} = (A^T A)^{-1} A^T y \tag{13}$$