# Learning about Kalman filter / Derivation of Equations

Resources

`Kalman Filter from Ground Up`; author Alex Becker; https://www.kalmanfilter.net

**Overview**

The equations of the `Kalman Filter` and their derivation.

---


## State Extrapolation Equation

Also known under different names such as:

1) prediction equation

2) transition equation

3) state space model / dynamic model

$$
\mathbf{\hat{x}_{n+1,n}} = \mathbf{F} \cdot \mathbf{\hat{x}_{n,n}} + \mathbf{G} \cdot \mathbf{u_n} + \mathbf{w_n}
$$

| property | description | dimension |
|----------|-------------|-----------|
| $\mathbf{\hat{x}_{n+1,n}}$ |  predicted state at time $n+1$ | $n_x \times 1$ |
| $\mathbf{\hat{x}_{n,n}}$ | estimated state at time $n$  | $n_x \times 1$ |
| $\mathbf{u_n}$ | driving input , deterministic | $n_u \times 1$ |
| $\mathbf{w_n}$ | process noise; eg: models uncertainty of system model   | $n_x \times 1$ |
| $\mathbf{F}$ | state transition matrix  | $n_x \times n_x$ |
| $\mathbf{G}$  | control / input matrix   | $n_x \times n_u$ |

The purpose of the `Kalman filter` is to estimated the state vector $\mathbf{\hat{x}_{n,n}}$ in a way to minimise the uncertainty. Uncertainty is measured by the variance of the state vector.

The book provides an example to clarify the notation used so far:

**example / airplane without control input**

no control input means $\mathbf{u_n}$.

If we assume an airplane moving in 3 dimensions the state vector provides information about the position, velocity and accelaration. And assuming that cartesial coordinates are used, that state vector  $\mathbf{\hat{x}_{n,n}}$ at time $n$ can be expressed like this:

$$
\mathbf{\hat{x}_{n,n}} = \left[\hat{x}_{n,n},\ \hat{y}_{n,n},\ \hat{z}_{n,n},\ \hat{\dot{x}}_{n,n},\ \hat{\dot{y}}_{n,n},\ \hat{\dot{z}}_{n,n}, \hat{\ddot{x}}_{n,n},\ \hat{\ddot{y}}_{n,n},\ \hat{\ddot{z}}_{n,n}  \right]^T
$$

The state extrapolation equation can now be expressed by (no noise, no inputs):

$$
\mathbf{\hat{x}_{n+1,n}} = \mathbf{F} \cdot \mathbf{\hat{x}_{n,n}} = \left[\begin{array}{c}
\hat{x}_{n+1,n} \\
\hat{y}_{n+1,n} \\
\hat{z}_{n+1,n} \\
\hat{\dot{x}}_{n+1,n} \\
\hat{\dot{y}}_{n+1,n} \\
\hat{\dot{z}}_{n+1,n} \\
\hat{\ddot{x}}_{n+1,n} \\
\hat{\ddot{y}}_{n+1,n} \\
\hat{\ddot{z}}_{n+1,n}
\end{array}\right] = \left[\begin{array}{ccc}
\hat{x}_{n,n} + \Delta t \cdot \hat{\dot{x}}_{n,n} + 0.5 \cdot  \Delta t^2 \cdot \hat{\ddot{x}}_{n,n} \\
\hat{y}_{n,n} + \Delta t \cdot \hat{\dot{y}}_{n,n} + 0.5 \cdot  \Delta t^2 \cdot \hat{\ddot{y}}_{n,n} \\
\hat{z}_{n,n} + \Delta t \cdot \hat{\dot{z}}_{n,n} + 0.5 \cdot  \Delta t^2 \cdot \hat{\ddot{z}}_{n,n} \\
\hat{\dot{x}}_{n,n} + \Delta t \cdot \hat{\ddot{x}}_{n,n} \\
\hat{\dot{y}}_{n,n} + \Delta t \cdot \hat{\ddot{y}}_{n,n} \\
\hat{\dot{z}}_{n,n} + \Delta t \cdot \hat{\ddot{z}}_{n,n} \\
\hat{\ddot{x}}_{n,n} \\
\hat{\ddot{y}}_{n,n} \\
\hat{\ddot{z}}_{n,n}
\end{array}\right]
$$

**example / free falling object**

The gravitational force acts upon the falling object. Accordingly we have constant acceleration.

The state vector has two elements:

1) the height or altitude

2) the velocity of the object

$$
\mathbf{\hat{x}_{n,n}} = \left[\begin{array}{c}
\hat{h_n} \\
\hat{\dot{h_n}}
\end{array}\right]
$$

The dynamics are expressed by the state transition matrix $\mathbf{F}$:

$$
\mathbf{F} = \left[\begin{array}{cc}
1 & \Delta t \\
0 & 1
\end{array}\right]
$$

For the control matrix $\mathbf{G}$:

$$
\mathbf{G} = \left[\begin{array}{c}
0.5 \cdot \Delta t^2 \\
\Delta t
\end{array}\right]
$$

And the input variable :

$$
\mathbf{u_n} = g
$$

Accordingly the state extrapolation equation becomes:

$$
\mathbf{\hat{x}_{n+1,n}} = \left[\begin{array}{c}
\hat{h}_{n+1,n} \\
\hat{\dot{h}}_{n+1,n}
\end{array}\right] = \left[\begin{array}{cc}
1 & \Delta t \\
0 & 1
\end{array}\right] \cdot \left[\begin{array}{c}
\hat{h}_{n,n} \\
\hat{\dot{h}}_{n,n}
\end{array}\right] + g \cdot \left[\begin{array}{c}
0.5 \cdot \Delta t^2 \\
\Delta t
\end{array}\right]
$$

---


## Covariance Extrapolation Equation

From the state extrapolation equation

$$
\mathbf{\hat{x}_{n+1,n}} = \mathbf{F} \cdot \mathbf{\hat{x}_{n,n}} + \mathbf{G} \cdot \mathbf{u_n} + \mathbf{w_n}
$$

the covariance $Cov(\mathbf{\hat{x}_{n+1,n}})$ shall be computed.

From the definition of the covariance we have:

$$
Cov(\mathbf{\hat{x}_{n+1,n}}) = E\left( \left(\mathbf{\hat{x}_{n+1,n}} - E(\mathbf{\hat{x}_{n+1,n}})  \right) \cdot \left(\mathbf{\hat{x}_{n+1,n}}  - E(\mathbf{\hat{x}_{n+1,n}}) \right)^T \right)
$$

Computing expectations first:

$$\begin{align}
E(\mathbf{\hat{x}_{n+1,n}}) &= E\left( \mathbf{F} \cdot \mathbf{\hat{x}_{n,n}} \right) + E\left( \mathbf{G} \cdot \mathbf{u_n} \right)  + E\left( \mathbf{w_n} \right) \\
&= \mathbf{F} \cdot E\left(  \mathbf{\hat{x}_{n,n}} \right) + \mathbf{G} \cdot \mathbf{u_n}  + E\left( \mathbf{w_n} \right)
\end{align}
$$

$$\begin{align}
\left(\mathbf{\hat{x}_{n+1,n}} - E(\mathbf{\hat{x}_{n+1,n}})  \right) &= \mathbf{F} \cdot \mathbf{\hat{x}_{n,n}} + \mathbf{G} \cdot \mathbf{u_n} + \mathbf{w_n} - \mathbf{F} \cdot E\left(  \mathbf{\hat{x}_{n,n}} \right)  + E\left( \mathbf{w_n} \right) \\
&= \mathbf{F} \cdot \mathbf{\hat{x}_{n,n}} + \mathbf{G} \cdot \mathbf{u_n} + \mathbf{w_n} - \mathbf{F} \cdot E\left(  \mathbf{\hat{x}_{n,n}} \right) -  \mathbf{G} \cdot \mathbf{u_n} -  E\left( \mathbf{w_n} \right) \\
&= \mathbf{F} \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) + \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)\\
\end{align}
$$

In a preliminary step we now compute $\left(\mathbf{\hat{x}_{n+1,n}} - E(\mathbf{\hat{x}_{n+1,n}})  \right) \cdot \left(\mathbf{\hat{x}_{n+1,n}}  - E(\mathbf{\hat{x}_{n+1,n}}) \right)^T $

$$\begin{align}
\left(\mathbf{\hat{x}_{n+1,n}} - E(\mathbf{\hat{x}_{n+1,n}})  \right) \cdot \left(\mathbf{\hat{x}_{n+1,n}}  - E(\mathbf{\hat{x}_{n+1,n}}) \right)^T &=
\left(\mathbf{F} \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) + \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)  \right) \cdot \left(\mathbf{F} \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) + \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)  \right)^T \\
&= \left(\mathbf{F} \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) + \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)  \right) \cdot \left( \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right)^T \cdot \mathbf{F}^T + \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)^T  \right) \\
&= \mathbf{F} \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right)^T \cdot \mathbf{F}^T + \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right) \cdot \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)^T \\
&+ \mathbf{F} \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) \cdot \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)^T + \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right) \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right)^T \cdot \mathbf{F}^T
\end{align}
$$

Taking expectations on all terms yields the covariance:

$$\begin{align}
Cov(\mathbf{\hat{x}_{n+1,n}}) &= \mathbf{F} \cdot E\left(\left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right)^T \right) \cdot \mathbf{F}^T + E\left(\left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right) \cdot \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)^T \right) \\
&+ \mathbf{F} \cdot E\left(\left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) \cdot \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)^T \right) + E\left(\left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right) \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right)^T \cdot \mathbf{F}^T \right)
\end{align}
$$

some simplication are now possible:

$$\begin{align}
Cov(\mathbf{\hat{x}_{n+1,n}}) &= \mathbf{F} \cdot Cov\left(\mathbf{\hat{x}_{n,n}} \right) \cdot \mathbf{F}^T + Cov\left(\mathbf{w_n} \right) \\
&+ \mathbf{F} \cdot E\left(\left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) \cdot \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)^T \right) + E\left(\left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right) \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right)^T \cdot \mathbf{F}^T \right)
\end{align}
$$

we still need to work on the expression

$$
\mathbf{D} =  \mathbf{D_1} + \mathbf{D_1}^T = \mathbf{F} \cdot E\left(\left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) \cdot \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)^T \right) + E\left(\left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right) \cdot \left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right)^T \cdot \mathbf{F}^T \right)
$$

and begin with the first term:

$$\begin{align}
\mathbf{D_1} = \mathbf{F} \cdot E\left(\left(\mathbf{\hat{x}_{n,n}}  - E\left(  \mathbf{\hat{x}_{n,n}} \right) \right) \cdot \left(\mathbf{w_n} -  E\left( \mathbf{w_n} \right) \right)^T \right)
\end{align}
$$

$$\begin{align}
\mathbf{D_1} 
&= \mathbf{F} \cdot E\left(\mathbf{\hat{x}_{n,n}} \cdot \mathbf{w_n}^T \right) - \mathbf{F} \cdot E\left(\mathbf{\hat{x}_{n,n}} \right) \cdot E\left( \mathbf{w_n}^T \right) - \mathbf{F} \cdot E\left(  \mathbf{\hat{x}_{n,n}} \right) \cdot E\left(\mathbf{w_n}^T\right) + \mathbf{F} \cdot E\left(  \mathbf{\hat{x}_{n,n}} \right) \cdot E\left(\mathbf{w_n} \right)^T \\
&= \mathbf{F} \cdot \left( E\left(\mathbf{\hat{x}_{n,n}} \cdot \mathbf{w_n}^T \right) - E\left(\mathbf{\hat{x}_{n,n}} \right) \cdot E\left( \mathbf{w_n}^T \right) \right)
\end{align}
$$

**Assumption**

$\mathbf{\hat{x}_{n,n}}$ and $\mathbf{w_n}$ are statistical independent. Then

$E\left(\mathbf{\hat{x}_{n,n}} \cdot \mathbf{w_n}^T \right) = E\left(\mathbf{\hat{x}_{n,n}} \right) \cdot E\left( \mathbf{w_n}^T \right)$ and accordingly $\mathbf{D_1} = \mathbf{0}$.

$$
Cov(\mathbf{\hat{x}_{n+1,n}}) = \mathbf{F} \cdot Cov\left(\mathbf{\hat{x}_{n,n}} \right) \cdot \mathbf{F}^T  + Cov\left(\mathbf{w_n} \right) 
$$

**summary & notation**

| formula | description | changed notation |
|---------|-------------|------------------|
| $Cov(\mathbf{\hat{x}_{n+1,n}})$ | covariance matrix of extrapolated state | $\mathbf{P_{n+1,n}}$ |
| $Cov\left(\mathbf{\hat{x}_{n,n}} \right)$ | covariance matrix of estimated state | $\mathbf{P_{n,n}}$ |
| $Cov\left(\mathbf{w_n} \right) $ | covariance matrix of process noise | $\mathbf{Q_n}$ |


With these changes in the notation the covariance update equation becomes:

$$
\mathbf{P_{n+1,n}} = \mathbf{F} \cdot \mathbf{P_{n,n}} \cdot \mathbf{F}^T  + \mathbf{Q_n}
$$

While the mathematics are not overly complicated there still remain some issues:

1) how to derive the state transition matrix for a specific problem

2) how should an initial value of covariance matrix $\mathbf{P_{0,0}}$ be chosen ?

3) what are the criteria that determine the appropriate choice of the covariance matrix $\mathbf{Q_n}$ of process noise ? 

---

### Process Noise

The concept of process noise is a bit vague. In the state update equation process noise is modelled as a random vector $\mathbf{w}_n$ that affects the extrapolated / predicted state. The book provides this definition:

$\mathbf{w}_n$ is an *unmeasurable* input that affects the state.

Other literatur just mentions that $\mathbf{w}_n$ is noise and uncorrelated to $\mathbf{\hat{x}_{n,n}}$.

Let us assume that process noise only affects the accelaration. The process noise vector can be expressed by:

$$
\mathbf{w}_n = \left[\begin{array}{c}
0 \\ 0 \\ w
\end{array}\right]
$$

Element $w$ is a zero mean random variable with variance $\sigma_w$.

For the covariance matrix of $\mathbf{w}_n$ we get:

$$
Cov\left(\mathbf{w}_n \cdot \mathbf{w}_n^T \right) = \mathbf{Q}_w = \left[\begin{array}{ccc}
0 & 0 & 0 \\
0 & 0 & 0 \\
0 & 0 & \sigma_w^2
\end{array}\right]
$$

For the covariance of $\mathbf{F} \cdot \mathbf{w}_n$ we obtain:

$$
\mathbf{Q} = \mathbf{F} \cdot \mathbf{Q}_w  \cdot \mathbf{F}^T
$$

For a one dimensional constant motion model the state transition matrix $\mathbf{F}$ becomes:

$$
\mathbf{F} = \left[\begin{array}{ccc}
1 & \Delta t & 0.5 \cdot \Delta t^2 \\
0 & 1 & \Delta t \\
0 & 0 & 1
\end{array}\right]
$$

$$\begin{align}
\mathbf{Q} &= \mathbf{F} \cdot \mathbf{Q}_w  \cdot \mathbf{F}^T \\
&= \left[\begin{array}{ccc}
1 & \Delta t & 0.5 \cdot \Delta t^2 \\
0 & 1 & \Delta t \\
0 & 0 & 1
\end{array}\right] \cdot \left[\begin{array}{ccc}
0 & 0 & 0 \\
0 & 0 & 0 \\
0 & 0 & \sigma_w^2
\end{array}\right] \cdot \left[\begin{array}{ccc}
1 & 0  & 0 \\
\Delta t & 1 & 0\\
 0.5 \cdot \Delta t^2  & \Delta t & 1
\end{array}\right] \\
&=  \left[\begin{array}{ccc}
1 & \Delta t & 0.5 \cdot \Delta t^2 \\
0 & 1 & \Delta t \\
0 & 0 & 1
\end{array}\right] \cdot \left[\begin{array}{ccc}
0 & 0 & 0 \\
0 &  0 & 0 \\
0.5 \cdot \Delta t^2 & \Delta t & 1
\end{array}\right] \cdot \sigma_w^2 \\
&=  \left[\begin{array}{ccc}
0.25 \cdot \Delta t^4 & 0.5 \cdot \Delta t^3 & 0.5 \cdot \Delta t^2 \\
0.5 \cdot \Delta t^3 &  \Delta t^2 & \Delta t \\
0.5 \cdot \Delta t^2 & \Delta t & 1
\end{array}\right] \cdot \sigma_w^2
\end{align} 
$$

**Two dimensional case**

$$
\mathbf{w}_n = \left[\begin{array}{c}
0 \\ 0 \\ w_x \\ 0 \\ 0 \\ w_y
\end{array}\right]
$$

$$
Cov\left(\mathbf{w}_n \cdot \mathbf{w}_n^T \right) = \mathbf{Q}_w = \left[\begin{array}{cccccc}
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & Var(w_x) & 0 & 0 & Cov(w_x, w_y) \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & Cov(w_x, w_y) & 0 & 0 & Var(w_y 
\end{array}\right]
$$

Setting $Var(w_x) = Var(w_y) = \sigma_w^2$ and $Cov(w_x, w_y) = 0$ :

$$
Cov\left(\mathbf{w}_n \cdot \mathbf{w}_n^T \right) = \mathbf{Q}_w = \left[\begin{array}{cccccc}
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 
\end{array}\right] \cdot \sigma_w^2
$$


For a two dimensional constant motion model the state transition matrix $\mathbf{F}$ becomes:

$$
\mathbf{F} = \left[\begin{array}{cccccc}
1 & \Delta t & 0.5 \cdot \Delta t^2 & 0 & 0 & 0 \\
0 & 1 & \Delta t & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & \Delta t & 0.5 \cdot \Delta t^2 \\
0 & 0 & 0 & 0 & 1 & \Delta t \\
0 & 0 & 0 & 0 & 0 & 1 
\end{array}\right]
$$

$$
\mathbf{F} \cdot \mathbf{Q}_w = \left[\begin{array}{cccccc}
1 & \Delta t & 0.5 \cdot \Delta t^2 & 0 & 0 & 0 \\
0 & 1 & \Delta t & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & \Delta t & 0.5 \cdot \Delta t^2 \\
0 & 0 & 0 & 0 & 1 & \Delta t \\
0 & 0 & 0 & 0 & 0 & 1 
\end{array}\right] \cdot \left[\begin{array}{cccccc}
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 
\end{array}\right] \cdot \sigma_w^2 = \left[\begin{array}{cccccc}
0 & 0 & 0.5 \cdot \Delta t^2 & 0 & 0 & 0 \\
0 & 0 & \Delta t & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0.5 \cdot \Delta t^2 \\
0 & 0 & 0 & 0 & 0 & \Delta t \\
0 & 0 & 0 & 0 & 0 & 1 \\
\end{array}\right] \cdot \sigma_w^2 
$$


$$
\mathbf{F} \cdot \mathbf{Q}_w \cdot \mathbf{F}^T = \left[\begin{array}{cccccc}
0 & 0 & 0.5 \cdot \Delta t^2 & 0 & 0 & 0 \\
0 & 0 & \Delta t & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0.5 \cdot \Delta t^2 \\
0 & 0 & 0 & 0 & 0 & \Delta t \\
0 & 0 & 0 & 0 & 0 & 1 \\
\end{array}\right]  \cdot \left[\begin{array}{cccccc}
1 & 0 & 0 & 0 & 0 & 0 \\
\Delta t & 1 & 0 & 0 & 0 & 0 \\
0.5 \cdot \Delta t^2 & \Delta t & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & \Delta t & 1 & 0\\
0 & 0 & 0 & 0.5 \cdot \Delta t^2 & \Delta t & 1 
\end{array}\right] \cdot \sigma_w^2 
$$

$$
\mathbf{F} \cdot \mathbf{Q}_w \cdot \mathbf{F}^T = \left[\begin{array}{cccccc}
0.25 \cdot \Delta t^4 & 0.5 \cdot \Delta t^3 & 0.5 \cdot \Delta t^2 & 0 & 0 & 0 \\
0.5 \cdot \Delta t^3 & \Delta t^2 & \Delta t & 0 & 0 & 0 \\
0.5 \cdot \Delta t^2 & \Delta t & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0.25 \cdot \Delta t^4 &  0.5 \cdot \Delta t^3 &  0.5 \cdot \Delta t^2 \\
0 & 0 & 0 & 0.5 \cdot \Delta t^3 & \Delta t^2 & \Delta t \\
0 & 0 & 0 & 0.5 \cdot \Delta t^2 & \Delta t & 1 
\end{array}\right] \cdot \sigma_w^2 
$$

---

## Measurement 

At time step $n$ a new measurement $\mathbf{z_n}$ is available. The measurement equation in vector notation is:

$$
\mathbf{z_n} = \mathbf{H} \cdot \mathbf{x_n} + \mathbf{v_n}
$$

| property | description | dimension |
|----------|-------------|-----------|
| $\mathbf{z_n}$ | measurement | $n_z \times 1$ |
| $\mathbf{H}$ | observation matrix | $n_z \times n_x$ |
| $\mathbf{x_n}$ | true system state (different from estimated state) | $n_x \times 1$ |
| $\mathbf{v_n}$ | measurement noise | $n_z \times 1$ |

The observation matrix $\mathbf{H}$ accounts for the fact that most measurements do not reflect the state directly . Thus the term `hidden state` is frequently found in the literature.

The book provides an example of measuring a hidden state.

A distance $x_n$ is measured using an echo-meter. Therefore the measurement is about a delay $t_n$. The speed of a propagating sound wave is $c$. So what is actually measured is $t_n = 2 \cdot \frac{x_n}{c}$.

On the other hand the number of elements $n_z$ in measurement $\mathbf{z_n}$ may be different from the number $n_x$ of states $\mathbf{x_n}$. Thus only a **selection** of state variables may contribute to $\mathbf{z_n}$.


And there may be a sitation where a **linear combination** of state variables contribute to an element of the measurement vector.


---

## Summary / What we have learned so far

Basically this is a re-write of chapter 8.4 (interim summary) of

`Kalman Filter from Ground Up`; author Alex Becker; https://www.kalmanfilter.net

### prediction equations

**state extrapolation equation**

$$
\mathbf{\hat{x}_{n+1,n}} = \mathbf{F} \cdot \mathbf{\hat{x}_{n,n}} + \mathbf{G} \cdot \mathbf{u_n} + \mathbf{w_n}
$$

and based on the state update equation the covariance update equation has been derived:

**covariance extrapolation equation**

$$
\mathbf{P_{n+1,n}} = \mathbf{F} \cdot \mathbf{P_{n,n}} \cdot \mathbf{F}^T  + \mathbf{Q_n}
$$



### auxiliary equations

The measurement equation belongs to this category .

**measurement equation**

$$
\mathbf{z_n} = \mathbf{H} \cdot \mathbf{x_n} + \mathbf{v_n}
$$

### Covariance equations

**covariance of measurement uncertainty**

$$
\mathbf{R_n} = E\left(\mathbf{v_n} \cdot \mathbf{v_n}^T \right)
$$

**covariance of process noise**

$$
\mathbf{Q_n} = E\left(\mathbf{w_n} \cdot \mathbf{w_n}^T \right)
$$

**estimation uncertainty**

$$
\mathbf{P_{n,n}} = E\left(  \left(\mathbf{x_n} - \mathbf{\hat{x}_{n,n}}  \right) \cdot \left(\mathbf{x_n} - \mathbf{\hat{x}_{n,n}}   \right)^T\right)
$$

---

## The State Update Equation 

The book just states the `state update equation` without deriving it beforehand. The derivation is done in later chapters of the book. According to my personal taste I would have preferred to see a derivation in the first place. So perhaps I will find other resources that might fill this gap.

$$
\mathbf{\hat{x}_{n,n} } = \mathbf{\hat{x}_{n,n-1}} + \mathbf{K_n} \cdot \left(\mathbf{z_n} - \mathbf{H} \cdot \mathbf{\hat{x}_{n,n-1}}   \right)
$$

Basically this equation combines the predicted state $\mathbf{\hat{x}_{n,n-1}}$ with some kind of measurement error  $\left(\mathbf{z_n} - \mathbf{H} \cdot \mathbf{\hat{x}_{n,n-1}}   \right)$ with some weighting matrix $\mathbf{K_n}$. This weighting is called the `Kalman Gain Matrix` or simply `Kalman Gain`.

The goal is to evaluate $\mathbf{K_n}$ is a way to minimise the trace of covariance matrix .

Starting point is the state update equation:

$$
\mathbf{\hat{x}_{n,n} } = \mathbf{\hat{x}_{n,n-1}} + \mathbf{K_n} \cdot \left(\mathbf{z_n} - \mathbf{H} \cdot \mathbf{\hat{x}_{n,n-1}}   \right)
$$

Inserting the expression for the measurement vector $\mathbf{z_n}$

$$
\mathbf{z_n} = \mathbf{H} \cdot \mathbf{x_n} + \mathbf{v_n}
$$

$$
\mathbf{\hat{x}_{n,n} } = \mathbf{\hat{x}_{n,n-1}} + \mathbf{K_n} \cdot \left(\mathbf{H} \cdot \mathbf{x_n} + \mathbf{v_n} - \mathbf{H} \cdot \mathbf{\hat{x}_{n,n-1}}   \right)
$$

Compute the estimation error $\mathbf{e_n}$:

$$\begin{align}
\mathbf{e_n} &= \mathbf{x_n} - \mathbf{\hat{x}_{n,n} } \\
&= \mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} - \mathbf{K_n} \cdot \left(\mathbf{H} \cdot \mathbf{x_n} + \mathbf{v_n} - \mathbf{H} \cdot \mathbf{\hat{x}_{n,n-1}}   \right) \\
&= \left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right) - \mathbf{K_n} \cdot \mathbf{H} \cdot \left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right) - \mathbf{K_n} \cdot \mathbf{v_n} \\
&= \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right) \cdot \left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right) - \mathbf{K_n} \cdot \mathbf{v_n}
\end{align}
$$

Now that we have found an expression for the estimation error $\mathbf{e_n}$ we are going to compute the covariance matrix $\mathbf{P_{n,n}}$:

$$\begin{align}
\mathbf{P_{n,n}} &= E\left(\mathbf{e_n} \cdot \mathbf{e_n}^T \right) \\
&= E\left(\left(\left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right) \cdot \left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right) - \mathbf{K_n} \cdot \mathbf{v_n} \right) \cdot \left(\left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right) \cdot \left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right) - \mathbf{K_n} \cdot \mathbf{v_n} \right)^T \right) \\
&= \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right) \cdot E\left(\left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right) \cdot \left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right)^T\right) \cdot \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right)^T + \mathbf{K_n} \cdot E\left(\mathbf{v_n} \cdot \mathbf{v_n}^T\right) \cdot \mathbf{K_n}^T \\
&- \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right) \cdot E\left(\left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right) \cdot \mathbf{v_n}^T\right) \cdot \mathbf{K_n}^T - \mathbf{K_n} \cdot E\left(\mathbf{v_n} \cdot \left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right)^T \right) \cdot \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right)^T
\end{align}
$$

quantity

$$
E\left(\left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right) \cdot \mathbf{v_n}^T\right) = \mathbf{0}
$$

since measurements at time $n$ are uncorrelated / independent from extrapolated state $\mathbf{\hat{x}_{n,n-1}}$.


$$\begin{align}
\mathbf{P_{n,n}} &= E\left(\mathbf{e_n} \cdot \mathbf{e_n}^T \right) \\
&= \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right) \cdot E\left(\left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right) \cdot \left(\mathbf{x_n} -  \mathbf{\hat{x}_{n,n-1}} \right)^T\right) \cdot \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right)^T + \mathbf{K_n} \cdot E\left(\mathbf{v_n} \cdot \mathbf{v_n}^T\right) \cdot \mathbf{K_n}^T \\
\mathbf{P_{n,n}} &= \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right) \cdot \mathbf{P_{n,n-1}} \cdot \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right)^T + \mathbf{K_n} \cdot \mathbf{R_n} \cdot \mathbf{K_n}^T 
\end{align}
$$

which is the covariance update equation expressed via `Kalman gain` $\mathbf{K_n}$. 

What is left is to derive the equation for the `Kalman gain`.

---


## Kalman Gain / Derivation

We start with the covariance update equation and do some re-arragements of the equation:

$$\begin{align}
\mathbf{P_{n,n}} &= \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right) \cdot \mathbf{P_{n,n-1}} \cdot \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right)^T + \mathbf{K_n} \cdot \mathbf{R_n} \cdot \mathbf{K_n}^T \\
&= \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right) \cdot \mathbf{P_{n,n-1}} \cdot \left(\mathbf{I} -  \mathbf{H}^T \cdot \mathbf{K_n}^T\right)+ \mathbf{K_n} \cdot \mathbf{R_n} \cdot \mathbf{K_n}^T \\
&= \left(\mathbf{P_{n,n-1}} - \mathbf{K_n} \cdot \mathbf{H} \cdot \mathbf{P_{n,n-1}}\right) \cdot \left(\mathbf{I} -  \mathbf{H}^T \cdot \mathbf{K_n}^T\right)+ \mathbf{K_n} \cdot \mathbf{R_n} \cdot \mathbf{K_n}^T \\
&= \mathbf{P_{n,n-1}} -  \mathbf{P_{n,n-1}} \cdot \mathbf{H}^T \cdot \mathbf{K_n}^T - \mathbf{K_n} \cdot \mathbf{H} \cdot \mathbf{P_{n,n-1}} +\mathbf{K_n} \cdot \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T \cdot \mathbf{K_n}^T + \mathbf{K_n} \cdot \mathbf{R_n} \cdot \mathbf{K_n}^T \\
&= \mathbf{P_{n,n-1}} -  \mathbf{P_{n,n-1}} \cdot \mathbf{H}^T \cdot \mathbf{K_n}^T - \mathbf{K_n} \cdot \mathbf{H} \cdot \mathbf{P_{n,n-1}} + \mathbf{K_n} \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right) \cdot \mathbf{K_n}^T 
\end{align}
$$

The last equation is also in the book as table 8.6.

We need to minimise the variance of $\mathbf{P_{n,n}}$ subjects to the matrix elements of `Kalman gain` $\mathbf{K_n}$. The aggregate variances of $\mathbf{P_{n,n}}$  is just the trace $ tr\left(\mathbf{P_{n,n}}\right)$ of this matrix.

For $ tr\left(\mathbf{P_{n,n}}\right)$ we get:

$$\begin{align}
tr\left(\mathbf{P_{n,n}}\right) &= tr\left(\mathbf{P_{n,n-1}}\right) -  tr\left(\mathbf{P_{n,n-1}} \cdot \mathbf{H}^T \cdot \mathbf{K_n}^T\right) - tr\left(\mathbf{K_n} \cdot \mathbf{H} \cdot \mathbf{P_{n,n-1}}\right) + tr\left(\mathbf{K_n} \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right) \cdot \mathbf{K_n}^T \right) \\
&= tr\left(\mathbf{P_{n,n-1}}\right) -  2 \cdot tr\left(\mathbf{P_{n,n-1}} \cdot \mathbf{H}^T \cdot \mathbf{K_n}^T\right)  + tr\left(\mathbf{K_n} \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right) \cdot \mathbf{K_n}^T \right) \\
&= tr\left(\mathbf{P_{n,n-1}}\right) -  2 \cdot tr\left(\mathbf{K_n} \cdot \mathbf{H} \cdot \mathbf{P_{n,n-1}}\right)  + tr\left(\mathbf{K_n} \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right) \cdot \mathbf{K_n}^T \right)
\end{align}
$$

In this derivation we used the property $tr\left(\mathbf{A}\right) = tr\left(\mathbf{A^T}\right) $ and the fact that covariance matrices are symmetric.

Before we differentiate these traces with respect to $\mathbf{K_n}$ two useful differentiation rules shall be derived:

---

**differentiation rule #1**

$$
\frac{d}{d \mathbf{A}} \left(tr\left( \mathbf{A} \cdot \mathbf{B} \right) \right)
$$

Dimensions of matrices:

$\mathbf{A} ; a_{m,n}  \in \mathbb{R}^{m \times n} $

$\mathbf{B} ; b_{n, m}  \in \mathbb{R}^{n \times m} $


$$
tr\left( \mathbf{A} \cdot \mathbf{B} \right) = \sum_{i=1}^m \sum_{j=1}^n a_{i,j} \cdot b_{j,i}
$$

Differentiation with respect to some matrix element $a_{k, l}$ yields:

$$
\frac{d}{d \ a_{k, l}}  \left( \sum_{i=1}^m \sum_{j=1}^n a_{i,j} \cdot b_{j,i} \right) = b_{l,k}
$$

$$
\frac{d}{d \mathbf{A}} \left(tr\left( \mathbf{A} \cdot \mathbf{B} \right) \right) = \left[\begin{array}{ccc}
b_{1,1} & \cdots & b_{n,1} \\
\vdots & \ddots & \vdots \\
b_{1,m} & \cdots & b_{n,m}
\end{array}\right] = \mathbf{B}^T
$$

**differentiation rule #2**

$$
\frac{d}{d \mathbf{A}} \left(tr\left( \mathbf{A} \cdot \mathbf{B} \cdot \mathbf{A}^T \right) \right)
$$

Dimensions of matrices:

$\mathbf{A} ; a_{m,n}  \in \mathbb{R}^{m \times n} $

$\mathbf{B} ; b_{n, n}  \in \mathbb{R}^{n \times n}; \ symmetric$

We define matrices $\mathbf{D}; \ \in \mathbb{R}^{m \times m}$ and $\mathbf{C}; \ \in \mathbb{R}^{m \times n}$ like this:

$$\begin{align}
\mathbf{D} &= \underbrace{\mathbf{A} \cdot \mathbf{B}}_{\mathbf{C}} \cdot \mathbf{A}^T \\
&= \left[\begin{array}{ccc}
a_{1,1} & \cdots & a_{1,n} \\
\vdots & \ddots & \vdots \\
a_{m,1} & \cdots & a_{m,n}
\end{array}\right] \cdot \left[\begin{array}{ccc}
b_{1,1} & \cdots & b_{1,n} \\
\vdots & \ddots & \vdots \\
b_{n,1} & \cdots & b_{n,n}
\end{array}\right] \cdot \left[\begin{array}{ccc}
a_{1,1} & \cdots & a_{m,1} \\
\vdots & \ddots & \vdots \\
a_{1,n} & \cdots & a_{m,n}
\end{array}\right]
\end{align}
$$

Let $c_{k, l}$ denote a matrix element of $\mathbf{C}$ with :

$$
c_{k, l} = \sum_{j=1}^n a_{k, j} \cdot b_{j, l}
$$

Similarly let $d_{h,g}$ denote a matrix element of $\mathbf{D}$ with :

$$\begin{align}
d_{h,g} &= \sum_{i=1}^n c_{h, i} \cdot a_{g, i} \\
&= \sum_{i=1}^n \sum_{j=1}^n a_{h, j} \cdot b_{j, i} \cdot a_{g, i}
\end{align}
$$

For the trace $tr\left(\mathbf{D}\right)$ we get:

$$
tr\left(\mathbf{D}\right) = \sum_{h=1}^m \sum_{i=1}^n \sum_{j=1}^n a_{h, j} \cdot b_{j, i} \cdot a_{h, i}
$$

Differentiation with respect to some matrix element $a_{r, s}$ yields:

$$\begin{align}
\frac{d}{d \ a_{r, s}}  \left( \sum_{h=1}^m \sum_{i=1}^n \sum_{j=1}^n a_{h, j} \cdot b_{j, i} \cdot a_{h, i} \right) &= \sum_{i=1}^n b_{s, i} \cdot a_{r, i}+ \sum_{j=1}^n a_{r, j} \cdot b_{j, s} \\
&= \sum_{i=1}^n b_{s, i} \cdot a_{r, i}+ \sum_{j=1}^n a_{r, j} \cdot b_{s, j} \\
&= \sum_{j=1}^n a_{r, j} \cdot b_{s, j}
\end{align}
$$

$$
\frac{d}{d \mathbf{A}} \left(tr\left( \mathbf{A} \cdot \mathbf{B} \cdot \mathbf{A}^T \right) \right) = 2 \cdot \mathbf{A} \cdot \mathbf{B}  
$$

These two rules for differentiation are now applied.

---


We need to calculate:

$$\begin{align}
\frac{d}{d \mathbf{K_n}} \left(tr\left(\mathbf{P_{n,n}}\right) \right) &= \frac{d}{d \mathbf{K_n}} \left(tr\left(\mathbf{P_{n,n-1}}\right) \right) -  2 \cdot \frac{d}{d \mathbf{K_n}} \left(tr\left(\mathbf{K_n} \cdot \mathbf{H} \cdot \mathbf{P_{n,n-1}}\right) \right) + \frac{d}{d \mathbf{K_n}} \left(tr\left(\mathbf{K_n} \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right) \cdot \mathbf{K_n}^T \right) \right)
\end{align}
$$

Since the covariance matrix $\mathbf{P_{n,n-1}}$ does not depend on the `Kalman gain` the derivative is 0.

$$
\frac{d}{d \mathbf{K_n}} \left(tr\left(\mathbf{P_{n,n-1}}\right) \right) = \mathbf{0}
$$

For the next equation we apply rule#1:

$$\begin{align}
\frac{d}{d \mathbf{K_n}} \left(tr\left(\mathbf{K_n} \cdot \mathbf{H} \cdot \mathbf{P_{n,n-1}}\right) \right) &= \left(\mathbf{H} \cdot \mathbf{P_{n,n-1}}\right)^T \\
&= \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T
\end{align}
$$

and here we apply rule#2:

$$
\frac{d}{d \mathbf{K_n}} \left(tr\left(\mathbf{K_n} \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right) \cdot \mathbf{K_n}^T \right) \right) = 2 \cdot \mathbf{K_n} \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right)
$$


Applying these equation we get:

$$\begin{align}
\frac{d}{d \mathbf{K_n}} \left(tr\left(\mathbf{P_{n,n}}\right) \right) &= -2 \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + 2 \cdot \mathbf{K_n} \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right)
\end{align}
$$

The `Kalman gain` is found by setting all derivatives to zero:

$$
\frac{d}{d \mathbf{K_n}} \left(tr\left(\mathbf{P_{n,n}}\right) \right)  = \mathbf{0}
$$

Which gives us an equation

$$
\mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T = \mathbf{K_n} \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right)
$$

which is solved for the `Kalman Gain` $\mathbf{K_n}$:

$$
\mathbf{K_n} = \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right)^{-1}  
$$

What remains to be done is to find an **efficient** way to evaluate the inverse matrix

$$
\left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right)^{-1} 
$$

---

## Final Summary 

The important equations around the `Kalman filter` are again summarised here. As before the presentation here is a re-write of chapter 8.9 of `Kalman Filter from Ground Up`; author Alex Becker; https://www.kalmanfilter.net.

In essence the filtering operation involves a prediction step and a correction step.

The prediction step involves 2 equations + 2 intial conditions:

| equation  | description |
|-----------|-------------|
| $\hat{\mathbf{x}}_{n+1,n} = \mathbf{F} \cdot \hat{\mathbf{x}}_{n,n} + \mathbf{G} \cdot \mathbf{u}_n$ | state extrapolation / prediction  |
| $\mathbf{P}_{n+1,n} = \mathbf{F} \cdot \mathbf{P}_{n,n}  \cdot \mathbf{F}^T$ | extrapolation of uncertainty |
| $\hat{\mathbf{x}}_{0,0}$ | assumptions about the initial state |
| $\mathbf{P}_{0,0}$ | assumption of covariance of initial state |

The correction step involve 3 equations:

| equation   |  description |
|------------|--------------|
| $\mathbf{K_n} = \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T \cdot \left( \mathbf{H} \cdot \mathbf{P_{n,n-1}} \cdot  \mathbf{H}^T + \mathbf{R_n} \right)^{-1}$ | `Kalman Gain` |
| $\mathbf{\hat{x}_{n,n} } = \mathbf{\hat{x}_{n,n-1}} + \mathbf{K_n} \cdot \left(\mathbf{z_n} - \mathbf{H} \cdot \mathbf{\hat{x}_{n,n-1}}   \right)$ | state update equation |
| $\mathbf{P_{n,n}} = \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right) \cdot \mathbf{P_{n,n-1}} \cdot \left(\mathbf{I} - \mathbf{K_n} \cdot \mathbf{H}\right)^T + \mathbf{K_n} \cdot \mathbf{R_n} \cdot \mathbf{K_n}^T $ | update uncertainty of estimate |

And here is an overview of the `workflow` (eg. the sequence in which these equations are applied in each iteration):

**initialisation**

Based on some prior knowledge we choose the initial state estimate $\hat{\mathbf{x}}_{0,0}$ and the initial covariance $\mathbf{P}_{0,0}$ of this estimate.

Using these initial properties the extrapolated state and the extrapolated uncertainty are evaluated:

**state extrapolation**

$\hat{\mathbf{x}}_{1,0} = \mathbf{F} \cdot \hat{\mathbf{x}}_{0,0} + \mathbf{G} \cdot \mathbf{u}_1$ 

**uncertainty extrapolation**

$\mathbf{P}_{1,0} = \mathbf{F} \cdot \mathbf{P}_{0,0}  \cdot \mathbf{F}^T$ 

**computation of Kalman gain**

$\mathbf{K_1} = \mathbf{P_{1,0}} \cdot  \mathbf{H}^T \cdot \left( \mathbf{H} \cdot \mathbf{P_{1,0}} \cdot  \mathbf{H}^T + \mathbf{R_1} \right)^{-1}$

**state estimation / state update**

$\mathbf{\hat{x}_{1,1} } = \mathbf{\hat{x}_{1,0}} + \mathbf{K_1} \cdot \left(\mathbf{z_1} - \mathbf{H} \cdot \mathbf{\hat{x}_{1,0}}   \right)$ 

**update uncertainty of estimated state $\mathbf{\hat{x}_{1,1} }$**

$\mathbf{P_{1,1}} = \left(\mathbf{I} - \mathbf{K_1} \cdot \mathbf{H}\right) \cdot \mathbf{P_{1,0}} \cdot \left(\mathbf{I} - \mathbf{K_1} \cdot \mathbf{H}\right)^T + \mathbf{K_1} \cdot \mathbf{R_1} \cdot \mathbf{K_1}^T $ 

Having found $\mathbf{\hat{x}_{1,1} }$ and $\mathbf{P_{1,1}}$ the process is repeated. 

We calculate $\hat{\mathbf{x}}_{2,1}$ from $\mathbf{\hat{x}_{1,1} }$ and possibly a new input $\mathbf{u}_2$ and $\mathbf{P_{2,1}}$  from $\mathbf{P_{1,1}}$ . 

Then we are able to use these values to compute:

1) the Kalman gain $\mathbf{K_2}$

2) the state estimate $\mathbf{\hat{x}_{2,2} }$ 

3) and the uncertainty $\mathbf{P_{2,2}}$. 

and so on ...

---

### Conclusion

Most of the derivation of the formulas which define the `Kalman filter` are clearly described in the book in great detail without leaving out intermediate steps. This helps to really understand the derivation. Many other book are very concise. You must then try to reconstruct the intermediate steps. Not only is this time consuming you may likely fail to do so. This a common source of frustation with many textbooks which try to be overly concise in how specific topics are presented. 

My only critisism is the fact that the state estimation formula has not been derived. The `Kalman gain` used in this formula seems to come from **nowhere**. 