# Learning about Kalman filter / Covariance

**Resources**

`Kalman Filter from Ground Up`; author Alex Becker; https://www.kalmanfilter.net

**Overview**

The properties of the covariance matrix.

---

# covariance / 2D case

We have two random variables $X$ and $Y$ each having $N$ elements/realisations $X : \{x_1, x_2,\ldots, x_i, \ldots, x_N\} $ and $Y : \{y_1, y_2,\ldots, y_i, \ldots, y_N\}$ .

Then the covariance is defined via the expected value of:

$$\begin{align}
Cov(X,Y) &= E\left( \left(X-E(X)\right) \cdot \left(Y - E(Y)  \right) \right) \\
&= E(X \cdot Y) - 2 \cdot E(X) \cdot E(Y) + E(X) \cdot E(Y) \\
&= E(X \cdot Y)  - E(X) \cdot E(Y) 
\end{align}
$$

For the discrete case we get:

$$\begin{align}
Cov(X,Y) &= E(X \cdot Y)  - E(X) \cdot E(Y) \\
&= \frac{1}{N} \sum_{i=1}^{N} x_i \cdot y_i - \frac{1}{N^2} \sum_{i=1}^{N} x_i \cdot \sum_{i=1}^{N} y_i \\
&= \frac{1}{N} \sum_{i=1}^{N} x_i \cdot y_i - \mu_x \cdot \mu_y
\end{align}
$$

If random variables $X$ and $Y$ have zero mean the covariance simplifies to:

$$\begin{align}
Cov(X,Y) &= E(X \cdot Y) \\
&= \frac{1}{N} \sum_{i=1}^{N} x_i \cdot y_i 
\end{align}
$$

In the literature the scaling factor $\frac{1}{N} $ is often replaced by $\frac{1}{N-1} $. In this notebook I will stick to the scaling factor $\frac{1}{N} $ because the rationale of using the modified factor is not entirely clear to me.



## Covariance Matrix / 2 D case

Define a random vector $\mathbf{x}$ with two random elements denoted $x_1$ and $x_2$.

$$
\mathbf{x} = \left[\begin{array}{c}
x_1 \\
x_2
\end{array}\right]
$$

The expectation of $\mathbf{x}$ is again a vector:

$$
E(\mathbf{x}) = \left[\begin{array}{c}
E(x_1) \\
E(x_2)
\end{array}\right] = \left[\begin{array}{c}
\mu_{x_1} \\
\mu_{x_2}
\end{array}\right]  = \frac{1}{N} \left[\begin{array}{c}
\sum_{i=1}^N x_1[i] \\
\sum_{i=1}^N x_2[i] 
\end{array}\right]
$$

For the variance of vector $\mathbf{x}$ we have:

$$
Var(\mathbf{x}) = \left[\begin{array}{c}
E\left((x_1 - \mu_{x_1})^2 \right)\\
E\left((x_1 - \mu_{x_1})^2 \right)
\end{array}\right] = \left[\begin{array}{c}
Var\left(x_1\right)\\
Var\left(x_2\right)
\end{array}\right] = \left[\begin{array}{c}
\sigma_{x_1}^2\\
\sigma_{x_2}^2
\end{array}\right]
$$

The covariance matrix for the multidimensional case is defined by this equation:

$$\begin{align}
Cov(\mathbf{x}) &= E\left(  \left(\mathbf{x} - \mathbf{\mu_x}\right) \cdot \left(\mathbf{x} - \mathbf{\mu_x}\right)^T  \right) \\
&= E\left(\mathbf{x} \cdot \mathbf{x}^T \right) - 2 \cdot E\left(\mathbf{x} \cdot \mathbf{\mu_x}^T \right) + E\left(\mathbf{\mu_x} \cdot \mathbf{\mu_x}^T  \right) \\
&= E\left(\mathbf{x} \cdot \mathbf{x}^T \right) - 2 \cdot E\left(\mathbf{x} \right) \cdot \mathbf{\mu_x}^T + \mathbf{\mu_x} \cdot \mathbf{\mu_x}^T \\
&= E\left(\mathbf{x} \cdot \mathbf{x}^T \right) - \mathbf{\mu_x} \cdot \mathbf{\mu_x}^T 
\end{align}
$$ 

And now explicitly for the 2D example:

$$\begin{align}
\mathbf{x} \cdot \mathbf{x}^T &= \left[\begin{array}{c}
x_1 \\
x_2
\end{array}\right] \cdot \left[\begin{array}{cc}
x_1 & x_2
\end{array}\right] = \left[\begin{array}{cc}
x_1 \cdot x_1 & x_1 \cdot x_2 \\
x_2 \cdot x_1 & x_2 \cdot x_2
\end{array}\right] \\
\mathbf{\mu_x} \cdot \mathbf{\mu_x}^T &= \left[\begin{array}{c}
\mu_{x_1} \\
\mu_{x_2}
\end{array}\right] \cdot \left[\begin{array}{cc}
\mu_{x_1} & \mu_{x_2}
\end{array}\right] = \left[\begin{array}{cc}
\mu_{x_1}^2 & \mu_{x_1} \cdot \mu_{x_2} \\
\mu_{x_1} \cdot \mu_{x_2} & \mu_{x_2}^2
\end{array}\right] 
\end{align}
$$

$$\begin{align}
Cov(\mathbf{x}) &= E\left(\mathbf{x} \cdot \mathbf{x}^T \right) - \mathbf{\mu_x} \cdot \mathbf{\mu_x}^T \\
\ \\
&= \left[\begin{array}{cc}
E\left(x_1^2\right) - \mu_{x_1}^2 & E\left(x_1 \cdot x_2\right) - \mu_{x_1} \cdot \mu_{x_2}\\
\ \\
E\left(x_2 \cdot x_1\right) - \mu_{x_1} \cdot \mu_{x_2} & E\left(x_2^2\right) - \mu_{x_2}^2
\end{array}\right] \\
\ \\
&= \left[\begin{array}{cc}
Var\left(x_1\right) & Cov\left(x_1 \cdot x_2\right) \\
Cov\left(x_1 \cdot x_2\right) & Var\left(x_2\right)
\end{array}\right]
\end{align}
$$

For the covariance matrix $Cov(\mathbf{x})$ an often used notation is:

$$
Cov(\mathbf{x}) = \mathbf{\Sigma}
$$

**properties of $\mathbf{\Sigma}$**

1) $\mathbf{\Sigma}$ is symmetric

2) $trace\left(\mathbf{\Sigma}\right) \gt 0$

3) $\mathbf{\Sigma}$ is positive semidefinite ; $\mathbf{v}^T \cdot \mathbf{\Sigma} \cdot \mathbf{v} \ge 0$

---

## Back to the multivariate normal distribution

$$
p(x) = \frac{1}{\sqrt{(2\pi)^n |\mathbf{\Sigma}|}} \cdot exp\left[-\frac{1}{2} \cdot \left(\mathbf{x} - \mathbf{\mu} \right)^T \cdot \mathbf{\Sigma^{-1}} \cdot \left(\mathbf{x} - \mathbf{\mu} \right)  \right]
$$

| property  | description |
|-----------|-------------|
| $\mathbf{\Sigma}$ | covariance matrix $\ \in \mathbb{R}^{n \times n}$|
| $|\mathbf{\Sigma}|$ | determinant of covariance matrix |
| $\mathbf{\Sigma^{-1}}$ | inverse of covariance matrix |
| $\mathbf{x}$ | vector $\ \in \mathbb{R}^{n \times 1}$|
| $\mathbf{\mu}$ | vector $\ \in \mathbb{R}^{n \times 1}$ of mean values of each component of vector $\mathbf{x}$ |

What do we know about the inverse matrix of a symmetric positive semi-definite matrix ?

Let $\mathbf{A}$ be an invertible symmetric (square) matrix. We will show, that the inverse matrix $\mathbf{A}^{-1}$ is symmetric as well.



**proof: the inverse matrix of a symmetric square matrix is symmetric**

We will use the property that left- or right hand multiplication by the inverse matrix yields the identity matrix. (only true for invertible square matrices).

$$\begin{align}
\mathbf{A} \cdot \mathbf{A}^{-1} &= \mathbf{I} \\
\mathbf{A}^{-1} \cdot \mathbf{A} &= \mathbf{I}
\end{align}
$$

Transposing the equation:

$$\begin{align}
\left(\mathbf{A} \cdot \mathbf{A}^{-1}\right)^T &= \mathbf{I} \\
\left(\mathbf{A}^{-1}\right)^T \cdot \mathbf{A}^T &= \left(\mathbf{A}^{-1}\right)^T \cdot \mathbf{A} = \mathbf{I} 
\end{align}
$$

Thus matrix $\left(\mathbf{A}^{-1}\right)^T $ is again an inverse of $\mathbf{A}$. But since the inverse matrix of a square matrix is **unique** we conclude:

$$
\left(\mathbf{A}^{-1}\right)^T = \mathbf{A}^{-1}
$$

---

**proof: the transpose of positive definite matrix is positive definitive too**

<inv>definition / positive definite</inv>

$$
\mathbf{x}^T \cdot \mathbf{A} \cdot \mathbf{x} \gt 0 ; \ if \ \mathbf{x} \neq \mathbf{0}
$$

Transposing yields:

$$
\left(\mathbf{x}^T \cdot \mathbf{A} \cdot \mathbf{x}\right)^T = \mathbf{x}^T \cdot \mathbf{A}^T \cdot \mathbf{x} \gt 0 ; \ if \ \mathbf{x} \neq \mathbf{0}
$$

So the transpose matrix $\mathbf{A}^T$ is positive definite too. It is therefore not strictly nessary for matrix $\mathbf{A}$ being symmetric. (Although it will be in many cases ...)

---

**proof: the inverse matrix of a positive definite matrix is positive definite**

Define a vector $\mathbf{y}$ by:

$$
\mathbf{y} = \mathbf{A} \cdot \mathbf{x}
$$

$$\begin{align}
\mathbf{y}^T \cdot \mathbf{A}^{-1} \cdot \mathbf{y} &= \mathbf{y}^T \cdot \mathbf{A}^{-1} \cdot \mathbf{A} \cdot \mathbf{x} \\
&= \mathbf{y}^T \cdot \mathbf{x} \\
&= \left(\mathbf{A} \cdot \mathbf{x}\right)^T \cdot \mathbf{x} \\
&= \mathbf{x}^T \cdot \mathbf{A}^T \cdot \mathbf{x} \gt 0 ; \ if \ \mathbf{x} \neq \mathbf{0}
\end{align}
$$

In the last equation we used the fact that the transpose of a positive definite matrix is positive definite too.


---

The exponent of the multivariate normal distribution is:

$$
-\frac{1}{2} \cdot \left(\mathbf{x} - \mathbf{\mu} \right)^T \cdot \mathbf{\Sigma^{-1}} \cdot \left(\mathbf{x} - \mathbf{\mu} \right)
$$

Due to the properties of the inverse matrix $\mathbf{\Sigma^{-1}} $ (symmetric, positive definite, quadratic form) the vector $\mathbf{\mu}$ is the global maximum.

For any other vector $\mathbf{x} \neq \mathbf{\mu}$ the exponent is *negative*. 

What we are interested in are iso-contour vectors $\mathbf{x}$ defined by equation:

$$
c = \left(\mathbf{x} - \mathbf{\mu} \right)^T \cdot \mathbf{\Sigma^{-1}} \cdot \left(\mathbf{x} - \mathbf{\mu} \right) \;\ with \ 0 \lt c \lt 1
$$

The iso contours of the multi-variate normal distribution are ellipses for the 2D case and ellipsoids in the multi-dimensional case.

**ToDo**

1) show that for 2D iso-contours are described by an ellipse

2) derive the parameters of such an ellipse by the parameters $\mathbf{\mu}$ and $\mathbf{\Sigma}$.

3) while it is obvious that the center of the ellipse must be at $\mu_x,\ \mu_y$ in the 2D (x,y) case, other properties such as the axes and angular orientation seem to be more complicated to derive. The book only presents these properties but does not provide an explanation how these results have been derived.

4) keywords to search: error ellipse, covariance ellipse

---

## Linear Time Invariant Systems

**Definition / Linearity**

$$
y(t) = F(a \cdot g(t) + b \cdot h(t)) = a \cdot F(g(t)) + b\cdot F(h(t))
$$

$a$ and $b$ are real numbers. $h(t), g(t)$ are functions independent of time variable $t$.

### Propagation Rules for Uncertainty

Let $\mathbf{x}$ represent a k element random vector. The matrix $\mathbf{M}; \ \in \mathbb{R^{k \times k}}$ transforms $\mathbf{x}$ into $\mathbf{y}$ via this equation:

$$
\mathbf{y} = \mathbf{M} \cdot \mathbf{x}
$$

Given the covariance matrix $\mathbf{\Sigma_x}$ we need to compute the covariance matrix $\mathbf{\Sigma_y}$ of $\mathbf{y}$.

From the definition of covariance we write the covariance matrix $\mathbf{\Sigma_y}$ as:

$$
\mathbf{\Sigma_y} = E\left(  \left(\mathbf{y} - E(\mathbf{y}) \right) \cdot \left(\mathbf{y} - E(\mathbf{y}) \right)^T \right)
$$

With

$$
E\left(\mathbf{y}\right) = E\left(\mathbf{\mathbf{M} \cdot \mathbf{x}}\right) = \mathbf{M} \cdot E\left(\mathbf{x}\right)
$$

$$\begin{align}
\mathbf{\Sigma_y} &= E\left(  \left(\mathbf{M} \cdot \mathbf{x} - \mathbf{M} \cdot E\left(\mathbf{x}\right)\right) \cdot \left(\mathbf{M} \cdot \mathbf{x} - \mathbf{M} \cdot E\left(\mathbf{x}\right)\right)^T \right) \\
&= E\left(  \mathbf{M} \cdot \left( \mathbf{x} - E\left(\mathbf{x}\right)\right) \cdot \left(\mathbf{x} - E\left(\mathbf{x}\right)\right)^T \cdot \mathbf{M}^T \right) \\
&= \mathbf{M} \cdot \underbrace{E\left( \left( \mathbf{x} - E\left(\mathbf{x}\right)\right) \cdot \left(\mathbf{x} - E\left(\mathbf{x}\right)\right)^T  \right)}_{\mathbf{\Sigma_x}} \cdot \mathbf{M}^T \\
\mathbf{\Sigma_y} &= \mathbf{M} \cdot \mathbf{\Sigma_x} \cdot \mathbf{M}^T
\end{align}
$$

Now we look at some examples:

**1D**

Matrix $\mathbf{M}$ *degenerates* into a scalar $m$ and $\mathbf{y} = m \cdot \mathbf{x}$.

$$
\sigma_y^2 = m^2 \cdot \sigma_x^2
$$

**2D**

$$
\mathbf{x} = \left[\begin{array}{c}
x_1 \\
x_2
\end{array}\right]
$$


$$
\mathbf{\Sigma_x} = \left[\begin{array}{cc}
\sigma_{1}^2 & \sigma_{1,2}\\
\sigma_{1,2} &  \sigma_{2}^2
\end{array}\right]
$$

With matrix 

$$
\mathbf{M} = \left[\begin{array}{xx}
m_{1,1} & m_{1,2} \\
m_{2,1} & m_{2,2}
\end{array}\right]
$$

$$\begin{align}
\mathbf{\Sigma_y} &= \mathbf{M} \cdot \mathbf{\Sigma_x} \cdot \mathbf{M}^T \\
&= \left[\begin{array}{xx}
m_{1,1} & m_{1,2} \\
m_{2,1} & m_{2,2}
\end{array}\right] \cdot \left[\begin{array}{cc}
\sigma_{1}^2 & \sigma_{1,2} \\
\sigma_{1,2} &  \sigma_{2}^2
\end{array}\right] \cdot \left[\begin{array}{xx}
m_{1,1} & m_{2,1} \\
m_{1, 2} & m_{2,2}
\end{array}\right]
\end{align}
$$

For the simpler case 

$$
\mathbf{\Sigma_x} = \left[\begin{array}{cc}
\sigma_{1}^2 & 0\\
0 &  \sigma_{2}^2
\end{array}\right]
$$

we get 

$$\begin{align}
\mathbf{\Sigma_y} &= \mathbf{M} \cdot \mathbf{\Sigma_x} \cdot \mathbf{M}^T \\
&= \left[\begin{array}{xx}
m_{1,1} & m_{1,2} \\
m_{2,1} & m_{2,2}
\end{array}\right] \cdot \left[\begin{array}{cc}
\sigma_{1}^2 & 0 \\
0 &  \sigma_{2}^2
\end{array}\right] \cdot \left[\begin{array}{xx}
m_{1,1} & m_{2,1} \\
m_{1, 2} & m_{2,2}
\end{array}\right]
\end{align}
$$

$$\begin{align}
\mathbf{\Sigma_y} &= \left[\begin{array}{xx}
m_{1,1} & m_{1,2} \\
m_{2,1} & m_{2,2}
\end{array}\right] \cdot \left[\begin{array}{xx}
\sigma_{1}^2 \cdot m_{1,1} & \sigma_{1}^2 \cdot m_{2,1} \\
\sigma_{2}^2 \cdot m_{1, 2} & \sigma_{2}^2 \cdot m_{2,2}
\end{array}\right]
\end{align}
$$
