#  Least Squares Solutions

---

## The problem

Let $\mathbf{A}$ denote a $m \times n$ matrix. The equation 

$$
\mathbf{A} \cdot \mathbf{x} = \mathbf{b}
$$

shall be solved. In general the solution is not unique. Since $\mathbf{A} \cdot \mathbf{x}$ is a weighted addition of the column vectors of matrix $\mathbf{A}$ vector $\mathbf{b}$ must be in the column-space if a solution exists. In general only an approximation

$$
\mathbf{A} \cdot \mathbf{x} = \mathbf{b} + \mathbf{r}
$$

can be found. $\mathbf{r}$ is the residual vector.

$$
\mathbf{r} = \mathbf{A} \cdot \mathbf{x} - \mathbf{b}
$$

Because $\mathbf{r}$ is not in the column space of matrix $\mathbf{A}$ the vector must be orthogonal to each column of $\mathbf{A}$. This condition is equivalent to this equation:

$$
\mathbf{A}^T \cdot \mathbf{r} = \mathbf{0}
$$

or 

$$\begin{gather}
\mathbf{A}^T \cdot \left( \mathbf{A} \cdot \mathbf{x} - \mathbf{b} \right) = \mathbf{0} \\
\mathbf{A}^T \cdot \mathbf{A} \cdot \mathbf{x} = \mathbf{A}^T \cdot \mathbf{b} \\
\underbrace{\left( \mathbf{A}^T \cdot \mathbf{A} \right)^{-1} \cdot \mathbf{A}^T}_{left \ inverse} \cdot \mathbf{A} \cdot \mathbf{x} = \left( \mathbf{A}^T \cdot \mathbf{A} \right)^{-1} \cdot \mathbf{A}^T \cdot \mathbf{b} \\
\mathbf{x} = \underbrace{\left( \mathbf{A}^T \cdot \mathbf{A} \right)^{-1} \cdot \mathbf{A}^T}_{left \ inverse} \cdot \mathbf{b} 
\end{gather}
$$

So if the `left-inverse` can be computed, the equation  $\mathbf{A} \cdot \mathbf{x} = \mathbf{b} + \mathbf{r}$ can be solved.

---

A different approach to the solution of equation

$$
\mathbf{A} \cdot \mathbf{x} = \mathbf{b} + \mathbf{r}
$$

is to minimise the L2 norm of the residual vector $||\mathbf{r}||$. This is equivalent to the minimisation of $||\mathbf{r}||^2=\mathbf{r}^T \cdot \mathbf{r}$

$$\begin{gather}
\mathbf{r}^T \cdot \mathbf{r} = \left( \mathbf{A} \cdot \mathbf{x} - \mathbf{b} \right)^T \cdot \left( \mathbf{A} \cdot \mathbf{x} - \mathbf{b} \right) \\
\mathbf{r}^T \cdot \mathbf{r} = \left(\mathbf{x}^T \cdot \mathbf{A}^T - \mathbf{b}^T \right) \cdot \left(\mathbf{A} \cdot \mathbf{x} -\mathbf{b} \right) \\
f(\mathbf{x}) = \mathbf{r}^T \cdot \mathbf{r} = \mathbf{x}^T \cdot \mathbf{A}^T \cdot \mathbf{A} \cdot \mathbf{x} - 2 \cdot\mathbf{b}^T \cdot \mathbf{A} \cdot \mathbf{x} +  \mathbf{b}^T \cdot \mathbf{b}
\end{gather}
$$

Thus the goal is to minimize a scalar function $f(\mathbf{x})$:

Regardless of the shape of matrix $\mathbf{A}$ the matrix $\mathbf{U}$ is square, symmetric and positive definite.

The gradient of $f(\mathbf{x})$ is computed like this:

$$
f'(\mathbf{x}) = \left[
\begin{array}{c}
\frac{\partial}{\partial x_1} f(\mathbf{x}) \\
\frac{\partial}{\partial x_2} f(\mathbf{x}) \\
\vdots \\
\frac{\partial}{\partial x_N} f(\mathbf{x})
\end{array}
\right] =
\left[
\begin{array}{c}
\frac{\partial}{\partial x_1} \left(\mathbf{x}^T \cdot \mathbf{A}^T \cdot \mathbf{A} \cdot \mathbf{x}\right) \\
\frac{\partial}{\partial x_2} \left(\mathbf{x}^T \cdot \mathbf{A}^T \cdot \mathbf{A} \cdot \mathbf{x}\right) \\
\vdots \\
\frac{\partial}{\partial x_N} \left(\mathbf{x}^T \cdot \mathbf{A}^T \cdot \mathbf{A} \cdot \mathbf{x}\right)
\end{array}
\right] -
2 \cdot \left[ \begin{array}{c}
\frac{\partial}{\partial x_1} \mathbf{b}^T \cdot \mathbf{A} \cdot \mathbf{x} \\
\frac{\partial}{\partial x_2} \mathbf{b}^T \cdot \mathbf{A} \cdot \mathbf{x} \\
\vdots \\
\frac{\partial}{\partial x_N} \mathbf{b}^T \cdot \mathbf{A} \cdot \mathbf{x}
\end{array}
\right]
$$

$$\begin{gather}
f'(\mathbf{x}) = \mathbf{A}^T \cdot \mathbf{A} \cdot \mathbf{x} + \mathbf{A} \cdot \mathbf{A}^T \cdot \mathbf{x} - 2 \cdot \mathbf{A}^T \cdot \mathbf{b} \\
f'(\mathbf{x}) = 2 \cdot \left( \mathbf{A}^T \cdot \mathbf{A} \cdot \mathbf{x} - \mathbf{A}^T \cdot \mathbf{b}\right) 
\end{gather}
$$

Setting the gradient to $0$ results in the *normal* equation for the unknown vector $\mathbf{x}$:

$$
\mathbf{A}^T \cdot \mathbf{A} \cdot \mathbf{x} = \mathbf{A}^T \cdot \mathbf{b}
$$

**Note**

By inserting $\mathbf{A} \cdot \mathbf{x} = \mathbf{b} + \mathbf{r}$ we get

$$\begin{gather}
\mathbf{A}^T \cdot \left(\mathbf{b} + \mathbf{r} \right) = \mathbf{A}^T \cdot \mathbf{b} \\
\mathbf{A}^T \cdot \mathbf{b} + \mathbf{A}^T \cdot \mathbf{r} = \mathbf{A}^T \cdot \mathbf{b} \\
\to \\
\mathbf{A}^T \cdot \mathbf{r} = \mathbf{0}
\end{gather}
$$

This equation shows again that the residual vector $\mathbf{r}$ is orthogonal to all columns of matrix $\mathbf{A}$.

If the inverse $\left(\mathbf{A}^T \cdot \mathbf{A}\right)^{-1}$ exist vector $\mathbf{x}$ is computed by:

$$
\mathbf{x} = \underbrace{\left(\mathbf{A}^T \cdot \mathbf{A}\right)^{-1} \cdot \mathbf{A}^T}_{left \ inverse} \cdot \mathbf{b}
$$


---

The solution can be found using `Numpy` method `numpy.linalg.lstsq`

https://numpy.org/doc/stable/reference/generated/numpy.linalg.lstsq.html

