In [2]:
#math and linear algebra stuff
import numpy as np

#plots
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (15.0, 15.0)
#mpl.rc('text', usetex = True)
import matplotlib.pyplot as plt
%matplotlib inline

# The recursive least square algorithm

## Some notation

By default, we will consider the framework of one dimensional time series, but our purpose can be later extended to multidimensional data.

### Input

Let's begin with the set of input samples:
$$
    \{ u(0), u(1), \dots u(N) \}
$$
Our time serie which will be an input of the RLS algorithm. In the real world, time series are often corrupted with noise, and we are interested in obtaining, for each time point $n$, an estimator $y(n)$ of the original perfect signal $d(n)$

For the convenience of our demonstration, we will consider that all previous signals are available, ie:
$$
    u(k) = 0 \forall k < 0
$$

We also call
$$
    \{ d(0), d(1), \dots d(N) \}
$$
the desired response, that we wish to recover from $u$

### Output

Let's call
$$
    y(n) = \sum_{k=0}^{M-1} w_k u(n-k)
$$
an estimator of the perfect time serie that can be considered as the result of the application of a linear system or filter over the $M$ previous elements of the time serie.

The $w_k$ are the coefficient of the filter, and we would like to find the one that are the most suited for our estimator.


## The (recursive) least square estimator

The least square estimator is very commonly used in many fields of science and engineering. Part of its success comes from the fact that it can be derived from a simple bayesian reasonning when using a gaussian noise model, in our case it would read:

$$
    \tilde{\vec{y(n)}} = \underset{y}{argmin} \| y(n) - d(n) \|_{2}^{2} \\
    \tilde{\vec{y(n)}} = \langle \vec{w}, \vec{u_n} \rangle
    \text{ When } \vec{w} = \underset{\vec{w}}{argmin} \| \langle \vec{w}, \vec{u_n} \rangle - d(n) \|_{2}^{2}
$$

This approach only take into account $y(n)$ and $d(n)$ at a single time point to derive the filter $\vec{w}$, and it is probably not the best option in case we have large deviation in a sample.
We can instead, look for a $\vec{w}$ that is optimal for the N last few samples, including the current one, and we can even use a forgetting factor $\beta$ such that the quadratic fidelity take more into account the most recent samples:

$$
    0 < \beta(n,i) \leq 1, i=n-N+1,n-N+2,\dots,n
$$

An exponentially decreasy $\beta$ may for instance be a good choice:

$$
   \beta(n,i) = \lambda^{n-i}
$$

We can then define our **_recursive_** least square estimator:


With:

$$
    \begin{pmatrix}
        u(n-(N-1)-(M-1)) & u(n-N+2)& \dots & u(n-N)   \\
        u(n-N-(M-1))     & u(n-N+3)& \dots & u(n-N+1) \\
        \vdots           & \vdots  & \dots & \vdots   \\
        u(n-(M-1))       & u(n-M)  & \dots & u(n)     \\
    \end{pmatrix}
$$

