# Statistical Linear Regression

**source**

Appendix `F` of `Kalman Filter from Ground Up`; author Alex Becker; https://www.kalmanfilter.net


**motivation**

Familiarity with the concept of statistical linear regression is necessary for a better understanding of the mathematics behind the *unscented* Kalman filter (`UKF`).

---

## Definition of statistical properties

A vectorial nonlinear function $\mathbf{y} = \mathbf{f}(\mathbf{x})$ is evaluated for $K$ inputs $\mathbf{x}_k \; \ 1 \le k \le K$. The outputs are denoted 
$\mathbf{y}_k \; \ 1 \le k \le K$.

$$
\mathbf{y}_k = \mathbf{f}(\mathbf{x}_k)
$$


| equations | descriptions |
|-----------|--------------|
|  $\mathbf{\mu}_x = \frac{1}{K} \sum_{k=1}^K \mathbf{x}_k$  | mean of $\mathbf{x}_k$ |
|  $\mathbf{\mu}_y = \frac{1}{K} \sum_{k=1}^K \mathbf{y}_k$  | mean of $\mathbf{y}_k$ |
| $\mathbf{P}_{x,x} = \frac{1}{K} \sum_{k=1}^K \left( \mathbf{x}_k - \mathbf{\mu}_x \right) \cdot \left( \mathbf{x}_k - \mathbf{\mu}_x \right)^T $ | covariance of $\mathbf{x}_k$ |
| $\mathbf{P}_{y,y} = \frac{1}{K} \sum_{k=1}^K \left( \mathbf{y}_k - \mathbf{\mu}_y \right) \cdot \left( \mathbf{y}_k - \mathbf{\mu}_y \right)^T $ | covariance of $\mathbf{y}_k$ |
| $\mathbf{P}_{x,y} = \frac{1}{K} \sum_{k=1}^K \left( \mathbf{x}_k - \mathbf{\mu}_x \right) \cdot \left( \mathbf{y}_k - \mathbf{\mu}_y \right)^T $ | cross covariance of $\mathbf{x}_k$ and $\mathbf{y}_k$ |
| $\mathbf{P}_{y,x} = \frac{1}{K} \sum_{k=1}^K \left( \mathbf{y}_k - \mathbf{\mu}_y \right) \cdot \left( \mathbf{x}_k - \mathbf{\mu}_x \right)^T $ | cross covariance of $\mathbf{y}_k$ and $\mathbf{x}_k$ |

$$
\mathbf{P}_{x,y} = \mathbf{P}_{y,x}^T
$$

Later we will use these two equations:

$$\begin{align}
K \cdot \mathbf{P}_{x,x} &= \sum_{k=1}^K \left( \mathbf{x}_k - \mathbf{\mu}_x \right) \cdot \left( \mathbf{x}_k - \mathbf{\mu}_x \right)^T \\
&= \sum_{k=1}^K  \mathbf{x}_k \cdot \mathbf{x}_k^T  - K \cdot \mathbf{\mu}_x \cdot \mathbf{\mu}_x^T
\end{align}
$$

$$\begin{align}
K \cdot \mathbf{P}_{y,x} &= \sum_{k=1}^K \left( \mathbf{y}_k - \mathbf{\mu}_y \right) \cdot \left( \mathbf{x}_k - \mathbf{\mu}_x \right)^T \\
&= \sum_{k=1}^K \mathbf{y}_k \cdot \mathbf{x}_k^T - \left(\sum_{k=1}^K \mathbf{y}_k \right) \cdot \mathbf{\mu}_x^T - \mathbf{\mu}_y \cdot \left(\sum_{k=1}^K \mathbf{x}_k^T \right) + K \cdot \mathbf{\mu}_y \cdot \mathbf{\mu}_x^T \\
&= \sum_{k=1}^K \mathbf{y}_k \cdot \mathbf{x}_k^T - K \cdot \mathbf{\mu}_y \cdot \mathbf{\mu}_x^T - \mathbf{\mu}_y \cdot \left(\sum_{k=1}^K \mathbf{x}_k^T \right) + K \cdot \mathbf{\mu}_y \cdot \mathbf{\mu}_x^T \\
&= \sum_{k=1}^K \mathbf{y}_k \cdot \mathbf{x}_k^T - K \cdot \mathbf{\mu}_y \cdot \mathbf{\mu}_x^T 
\end{align}
$$

**task**

We want to approximate $\mathbf{y}$ by a linear function $\mathbf{M} \cdot \mathbf{x} + \mathbf{b}$. The items in this equation have these dimensions:

| items | dimensions |
|-------|------------|
| $\mathbf{M}$ | $n \times m$ matrix |
| $\mathbf{x}$ | $m \times 1$ vector |
| $\mathbf{b}$ | $n \times 1$ vector |
| $\mathbf{y}$ | $n \times 1$ vector |



For an input $\mathbf{x}_k$ the approximation error $\mathbf{e}_k$ is defined by:

$$
\mathbf{e}_k = \mathbf{y}_k  - \left(\mathbf{M} \cdot \mathbf{x}_k + \mathbf{b} \right)
$$

The squared error $E$ is defined as the sum of squared errors $\mathbf{e}_k^T \cdot \mathbf{e}_k$.

$$\begin{align}
E &= \sum_{k=1}^K \mathbf{e}_k^T \cdot \mathbf{e}_k \\
&= \sum_{k=1}^K \left(\mathbf{y}_k  - \left(\mathbf{M} \cdot \mathbf{x}_k + \mathbf{b} \right) \right)^T \cdot \left(\mathbf{y}_k  - \left(\mathbf{M} \cdot \mathbf{x}_k + \mathbf{b} \right) \right) \\
&= \sum_{k=1}^K \left(\mathbf{y}_k^T  - \mathbf{x}_k^T \cdot \mathbf{M}^T  - \mathbf{b}^T \right) \cdot \left(\mathbf{y}_k  - \mathbf{M} \cdot \mathbf{x}_k - \mathbf{b} \right) \\
&= \sum_{k=1}^K \left(\mathbf{y}_k^T \cdot \mathbf{y}_k  - \mathbf{y}_k^T \cdot \mathbf{M} \cdot \mathbf{x}_k - \mathbf{y}_k^T \cdot \mathbf{b} \right) \\
&+ \sum_{k=1}^K \left(- \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot \mathbf{y}_k  + \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot  \mathbf{M} \cdot \mathbf{x}_k + \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot \mathbf{b} \right) \\
&+ \sum_{k=1}^K \left(- \mathbf{b}^T \cdot \mathbf{y}_k  + \mathbf{b}^T \cdot \mathbf{M} \cdot \mathbf{x}_k + \mathbf{b}^T \cdot \mathbf{b} \right)
\end{align}
$$

$$
E = \sum_{k=1}^K \left( \mathbf{y}_k^T \cdot \mathbf{y}_k  - 2 \cdot \mathbf{y}_k^T \cdot \mathbf{M} \cdot \mathbf{x}_k - 2 \cdot \mathbf{y}_k^T \cdot \mathbf{b} + \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot  \mathbf{M} \cdot \mathbf{x}_k + 2 \cdot \left( \mathbf{M} \cdot \mathbf{x}_k \right)^T \cdot \mathbf{b} + \mathbf{b}^T \cdot \mathbf{b} \right)
$$

Appropriately choosing matrix $\mathbf{M}$ and vector $\mathbf{b}$ will minimise the squared error $E$.

**differentiation with respect to elements of vector $\mathbf{b}$**

$$
\frac{\partial E}{\partial \mathbf{b}} = 2 \cdot \sum_{k=1}^K \left(- \mathbf{y}_k + \mathbf{M} \cdot \mathbf{x}_k  + \mathbf{b}  \right) = \mathbf{0}
$$

$$
\mathbf{\mu}_y = \mathbf{M} \cdot \mathbf{\mu}_x + \mathbf{b} 
$$

and solving for $\mathbf{b}$ :

$$
\mathbf{b} = \mathbf{\mu}_y - \mathbf{M} \cdot \mathbf{\mu}_x
$$

**differentiation with respect to elements of matrix $\mathbf{M}$**


$$
\frac{\partial E}{\partial \mathbf{M}} = \sum_{k=1}^K \left( - 2 \cdot \mathbf{y}_k^T \cdot \mathbf{M} \cdot \mathbf{x}_k + \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot  \mathbf{M} \cdot \mathbf{x}_k + 2 \cdot \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot \mathbf{b}  \right)
$$

We need to calculate these derivatives:

$$\begin{align}
& \frac{\partial \ \mathbf{y}_k^T \cdot \mathbf{M} \cdot \mathbf{x}_k}{\partial \mathbf{M}} \\
& \frac{\partial \ \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot \mathbf{b}} {\partial \mathbf{M}} \\
& \frac{\partial \ \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot  \mathbf{M} \cdot \mathbf{x}_k}{\partial \mathbf{M}}  
\end{align}
$$


We will now derive the equations for these matrix derivatives:

---

**case#1**

Compute the matrix derivative 

$$
\frac{\partial \ \mathbf{y}_k^T \cdot \mathbf{M} \cdot \mathbf{x}_k}{\partial \mathbf{M}}
$$

The scalar function $\mathbf{y}_k^T \cdot \mathbf{M} \cdot \mathbf{x}_k$ is defined by:

$$
f_1(\mathbf{M}) = \mathbf{y}_k^T \cdot \mathbf{M} \cdot \mathbf{x}_k = \sum_{i=1}^n \sum_{j=1}^m y_i \cdot M_{i,\ j} \cdot x_j
$$

Taking the derivatives with respect to matrix elements $\mathbf{M}_{k,\ l}$ yields:

$$
\frac{\partial}{\partial M_{k,\ l}}f_1(\mathbf{M}) = y_k \cdot x_m
$$

Arranging these derivatives as a $n \times m$ matrix results in the outer product of vectors $\mathbf{y}$ and $\mathbf{x}$:

$$
\frac{\partial \ \mathbf{y}_k^T \cdot \mathbf{M} \cdot \mathbf{x}_k}{\partial \mathbf{M}} = \mathbf{y} \cdot \mathbf{x}^T
$$

**case#2**

Compute the matrix derivative 

$$
\frac{\partial \ \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot \mathbf{b}} {\partial \mathbf{M}} 
$$

The scalar function is $\mathbf{x}_k^T \cdot \mathbf{M}^T \cdot \mathbf{b}$ is defined by:

$$
f_2(\mathbf{M}) = \frac{\partial \ \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot \mathbf{b}} {\partial \mathbf{M}}  = \sum_{i=1}^m \sum_{j=1}^n x_i \cdot M_{j,\ i} \cdot b_j
$$

Taking the derivatives with respect to matrix elements $\mathbf{M}_{k,\ l}$ yields:

$$
\frac{\partial}{\partial M_{k,\ l}}f_2(\mathbf{M}) =  b_k \cdot x_l 
$$

Arranging these derivatives as a $n \times m$ matrix results in the outer product of vectors $\mathbf{b}$ and $\mathbf{x}$:

$$
\frac{\partial \ \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot \mathbf{b}} {\partial \mathbf{M}} = \mathbf{b} \cdot \mathbf{x}^T
$$


**case#3**

compute the matrix derivative 

$$
\frac{\partial \ \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot  \mathbf{M} \cdot \mathbf{x}_k}{\partial \mathbf{M}}  
$$

The scalar function is $\mathbf{x}_k^T \cdot \mathbf{M}^T \cdot  \mathbf{M} \cdot \mathbf{x}_k$ is defined by:

$$
f_3(\mathbf{M}) = \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot  \mathbf{M} \cdot \mathbf{x}_k = \left(\mathbf{M} \cdot \mathbf{x}_k\right)^T \cdot \left(\mathbf{M} \cdot \mathbf{x}_k\right)
$$

With vector $\mathbf{c}$ defined by:

$$
\mathbf{c} = \mathbf{M} \cdot \mathbf{x}_k
$$

the scalar function is re-expressed as:

$$
f_3(\mathbf{M}) = \mathbf{c}^T \cdot \mathbf{c}
$$

The `k-th` element $c_k$ of vector $\mathbf{c}$ is:

$$
c_k = \sum_{j=1}^m M_{k,\ j} \cdot x_j
$$

Putting these equations into the expression of the scalar function yields:

$$
f_3(\mathbf{M}) = \sum_{k=1}^n c_k^2 = \sum_{k=1}^n \left(\sum_{j=1}^m M_{k,\ j} \cdot x_j \right) \cdot  \left(\sum_{i=1}^m M_{k,\ i} \cdot x_i \right)
$$

Taking the partial derivative of $f_3(\mathbf{M})$ with respect to matrix element $M_{l,\ p}$:




$$\begin{align}
 \frac{\partial f_3(\mathbf{M})}{\partial \ M_{l,\ p}} &= \sum_{j=1}^m M_{l,\ j} \cdot x_j \cdot x_p +  \sum_{i=1}^m M_{l,\ i} \cdot x_i \cdot x_p \\
&= 2 \cdot \sum_{j=1}^m M_{l,\ j} \cdot x_j \cdot x_p \\
&= 2 \cdot x_p \sum_{j=1}^m M_{l,\ j} \cdot x_j
\end{align}
$$ 



Putting all derivatives into a matrix gives an expression for the matrix derivative:

$$
\frac{\partial \ \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot  \mathbf{M} \cdot \mathbf{x}_k}{\partial \mathbf{M}} = 2 \cdot \mathbf{M} \cdot \mathbf{x}_k \cdot  \mathbf{x}_k^T
$$

Having found the expressions of the matrix derivatives we can now write the equation for $\frac{\partial E}{\partial \mathbf{M}}$.


---


$$\begin{align}
\frac{\partial E}{\partial \mathbf{M}} &= \sum_{k=1}^K \left( - 2 \cdot \frac{\partial \ \mathbf{y}_k^T \cdot \mathbf{M} \cdot \mathbf{x}_k}{\partial \mathbf{M}} + \frac{\partial \ \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot  \mathbf{M} \cdot \mathbf{x}_k}{\partial \mathbf{M}} + 2 \cdot \frac{\partial \ \mathbf{x}_k^T \cdot \mathbf{M}^T \cdot \mathbf{b}} {\partial \mathbf{M}}  \right) \\
&= \sum_{k=1}^K \left( - 2 \cdot \mathbf{y}_k \cdot \mathbf{x}_k^T + 2 \cdot \mathbf{M} \cdot \mathbf{x}_k \cdot  \mathbf{x}_k^T + 2 \cdot \mathbf{b} \cdot \mathbf{x}_k^T  \right)
\end{align}
$$

Now we insert the equation for $\mathbf{b}$:

$$\begin{align}
\frac{\partial E}{\partial \mathbf{M}} &= 2 \cdot \sum_{k=1}^K \left( - \mathbf{y}_k \cdot \mathbf{x}_k^T + \mathbf{M} \cdot \mathbf{x}_k \cdot  \mathbf{x}_k^T + \left(\mathbf{\mu}_y - \mathbf{M} \cdot \mathbf{\mu}_x \right) \cdot \mathbf{x}_k^T  \right) \\
&= 2 \cdot \sum_{k=1}^K \left( - \mathbf{y}_k \cdot \mathbf{x}_k^T  + \mathbf{\mu}_y \cdot \mathbf{x}_k^T + \mathbf{M} \cdot \mathbf{x}_k \cdot  \mathbf{x}_k^T - \mathbf{M} \cdot \mathbf{\mu}_x  \cdot \mathbf{x}_k^T  \right) \\
&= -2 \cdot \sum_{k=1}^K \left( \mathbf{y}_k   - \mathbf{\mu}_y  \right) \cdot \mathbf{x}_k^T + 2 \cdot \mathbf{M} \cdot \sum_{k=1}^K \left( \mathbf{x}_k  - \mathbf{\mu}_x  \right) \cdot \mathbf{x}_k^T  \\
\end{align}
$$

We require $\frac{\partial E}{\partial \mathbf{M}}= \mathbf{0}$ to minimise the squared error $E$:

$$
\sum_{k=1}^K \left( \mathbf{y}_k   - \mathbf{\mu}_y  \right) \cdot \mathbf{x}_k^T = \mathbf{M} \cdot \sum_{k=1}^K \left( \mathbf{x}_k  - \mathbf{\mu}_x  \right) \cdot \mathbf{x}_k^T 
$$

For the right hand and left hand side of this equation we obtain these equations:

**right hand side**

The right hand side is re-written as:


$$\begin{align}
\mathbf{M} \cdot \sum_{k=1}^K \left( \mathbf{x}_k  - \mathbf{\mu}_x  \right) \cdot \mathbf{x}_k^T &= \mathbf{M} \cdot \sum_{k=1}^K \left( \mathbf{x}_k \cdot \mathbf{x}_k^T  - \mathbf{\mu}_x \cdot \mathbf{x}_k^T  \right)  \\
&= \mathbf{M} \cdot \left( \sum_{k=1}^K  \mathbf{x}_k \cdot \mathbf{x}_k^T  - \sum_{k=1}^K \mathbf{\mu}_x \cdot \mathbf{x}_k^T  \right) \\
&= \mathbf{M} \cdot \left( \sum_{k=1}^K  \mathbf{x}_k \cdot \mathbf{x}_k^T  - K \cdot \mathbf{\mu}_x \cdot \mathbf{\mu}_x^T  \right) \\
&= K \cdot \mathbf{M} \cdot \mathbf{P}_{x,x}
\end{align}
$$

**left hand side**

And in a similar way for the left hand side:

$$\begin{align}
\sum_{k=1}^K \left( \mathbf{y}_k   - \mathbf{\mu}_y  \right) \cdot \mathbf{x}_k^T &= \sum_{k=1}^K \mathbf{y}_k \cdot \mathbf{x}_k^T   - \sum_{k=1}^K  \mathbf{\mu}_y  \cdot \mathbf{x}_k^T \\
&= \sum_{k=1}^K \mathbf{y}_k \cdot \mathbf{x}_k^T   - K \cdot \mathbf{\mu}_y  \cdot \mathbf{\mu}_x^T \\
&= K \cdot \mathbf{P}_{y,x}
\end{align}
$$

And finally we find an expression for matrix $\mathbf{M}$:

$$\begin{align}
K \cdot \mathbf{P}_{y,x} &= K \cdot \mathbf{M} \cdot \mathbf{P}_{x,x} \\
\mathbf{P}_{y,x} &= \mathbf{M} \cdot \mathbf{P}_{x,x}
\end{align}
$$

$$
\mathbf{M} = \mathbf{P}_{y,x}  \cdot \mathbf{P}_{x,x}^{-1}
$$




---
