# Learning about Kalman filter / Unscented Kalman Filter Part 1

**Resources**

Chapter 14 of `Kalman Filter from Ground Up`; author Alex Becker; https://www.kalmanfilter.net



**Overview**

The unscented Kalman filter


The book does not attempt to derive the mathematical background. The UKF is introduced as a series of recipes

Some more background may be found in publications from the inventors of the UKF.

`A new extension of the Kalman filter to nonlinear systems`; authors: Simon J. Julier, Jeffrey K. Uhlmann

and a much more complete article

`A general Method for approximating nonlinear transformations of probability distributions`; authors: Simon J. Julier, Jeffrey K. Uhlmann

**Objective**

Trying to make sense of chapter 14 and annotating some parts which I currently do not understand ...

But to get a real understanding I will be forced to consider reading the article

`A general Method for approximating nonlinear transformations of probability distributions`; authors: Simon J. Julier, Jeffrey K. Uhlmann

It is much more complete and mathematical demanding than the content of chapter 14.

---

## Sigma points

For a N dimensional state vector the number of sigma points is $2 \cdot N + 1$.

The first sigma point is the mean of the input state. For a N dimensional input state vector the sigma points are vectors with N elements.

$$
\mathbf{\chi}_{n,n}^{(0)} = \mathbf{x}_{n,n};\ \in \mathbb{R}^{N \times 1}
$$

The i`th sigma point is denoted by $\mathbf{\chi}_{n,n}^{(i)}$

The calulation of the sigma points is based on the input covariance matrix $\mathbf{P}_{n,n}$. 

$$\begin{align}
\mathbf{\chi}_{n,n}^{(i)} &= \mathbf{x}_{n,n} + \left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{i} \ ;  i = 1,\ \ldots,\ N \\
\mathbf{\chi}_{n,n}^{(i-N)} &= \mathbf{x}_{n,n} - \left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{i-N} \ ; i = N+1,\ \ldots,\ 2N 
\end{align}
$$

Quantities $\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}$ are matrices. 

$\left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{i}$ is the i'th row or i'th column vector (due to symmetry of covariance) of this matrix.

**Note**

The square root of a symmetric matrix is again symmetric. $\left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{i}$ has therefore an i'th row vector equal to the i'th column vector.

**preliminary summary**

| property | description |
|----------|-------------|
| $N$      | dimension (eg. number of elements of state vector) |
| $\kappa$ | a tuning parameter which is set to $N+\kappa= 3$ for normally distributed variables |
| $\left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{i}$ | i'th row or column vector |


---



## Example / one dimensional random variable

We assume $\hat{x}_{n,n} = 0$ and variance $p_{n,n} = 2^2$.

The number of dimensions / elements of state vector is $N=1$.

The number of sigma points is $2 \cdot N + 1 = 3$.

For $N + \kappa = 1 + \kappa = 3$ we have $\kappa = 2$.

The first sigma point is the mean:

$$
\mathbf{\chi}_{n,n}^{(0)} = \mathbf{x}_{n,n} = 0
$$

For the second and third sigma point we get:

$$\begin{align}
\mathbf{\chi}_{n,n}^{(1)} &= \mathbf{x}_{n,n} + \left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{1} = 2 \cdot \sqrt{3} \\
\mathbf{\chi}_{n,n}^{(2)} &= \mathbf{x}_{n,n} - \left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{2-1} = -2 \cdot \sqrt{3}
\end{align}
$$

**comments**

chapter 14 of the book just states the formulas for the computation of sigma points. No justification or hint is given, how the equations have been derived in the first place.

I do not find this `cookbook` approach very appealing.

---


## Example / two dimensional random variable

The state vector (mean value) is now a vector.

$$
\mathbf{\hat{x}}_{n,n} = \left[\begin{array}{c}
1 \\ \pi/2
\end{array}\right]
$$

For the covariance matrix we assume:

$$
\mathbf{P}_{n,n} = \left[\begin{array}{cc}
\sigma_r^2 & 0 \\
0 & \sigma_\theta^2
\end{array}\right] =  \left[\begin{array}{cc}
0.05^2 & 0 \\
0 & 0.5^2
\end{array}\right] = \left[\begin{array}{cc}
0.0025 & 0 \\
0 & 0.25
\end{array}\right]
$$

1) Number of dimensions is $N=2$

2) Number of sigma points is $2 \cdot N +1 = 5$

3) $N+\kappa = 3$ is chosen.

The first sigma point is the mean value (state vector).

$$
\mathbf{\chi}_{n,n}^{(0)} = \mathbf{x}_{n,n} =  \left[\begin{array}{c}
1 \\ \pi/2
\end{array}\right]
$$


For all other sigma points we need to calulate the square root matrix

$$
\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}} = \sqrt{N+\kappa} \cdot \sqrt{\mathbf{P}_{n,n}}
$$

According to chapter 14 of the book the square root matrix is computed via a `Cholesky`-decomposition of matrix $\mathbf{P}_{n,n}$.

$$
\mathbf{P}_{n,n} = \mathbf{L} \cdot \mathbf{L}^T = \sqrt{\mathbf{P}_{n,n}} \cdot \sqrt{\mathbf{P}_{n,n}}^T
$$

According to this the square root $\sqrt{\mathbf{P}_{n,n}}$ is the lower triangular matrix $\mathbf{L}$ from the `Cholesky`-decomposition:

$$
\sqrt{\mathbf{P}_{n,n}} = \mathbf{L}
$$

It not clear why it is called a square root however.

Since the covariance matrix $\mathbf{P}_{n,n}$ is symmetric and positive definite it can be factorized like this:

$$
\mathbf{P}_{n,n} = \mathbf{U} \cdot \mathbf{D} \cdot \mathbf{U}^T
$$

In this equation $\mathbf{U}$ is a matrix with the orthonormal eigenvectors as column vectors. Matrix $\mathbf{D}$ is the diagonal matrix of positive eigenvectors. Therefore the *exact* square root is obtained from this equation:


$$
\mathbf{P}_{n,n}^{1/2} = \mathbf{U} \cdot \mathbf{D}^{1/2} \cdot \mathbf{U}^T 
$$

Matrix $\mathbf{D}^{1/2}$ is a diagonal matrix with the positive square roots of the eigenvalues on the main diagonal.
Moreover it can be shown that the square root matrix $\mathbf{P}_{n,n}^{1/2}$ is symmetric. The matrix $\mathbf{L}$ from the `Cholesky`-decomposition is a triangular matrix. So it cannot be symmetric in general (if off-diagonal elements are $\neq 0$)!

There may be cases where we have

$$
\mathbf{P}_{n,n}^{1/2} = \mathbf{U} \cdot \mathbf{D}^{1/2} \cdot \mathbf{U}^T = \mathbf{L}
$$

This is demonstrated by the numerical examples below.

1) the *exact* square root is computed from the eigen-decomposition

2) the square root is computed as the lower triangular matrix of the `Cholesky`-decomposition

3) `Numpy` allows to compute the square root of matrix directly using `Scipy.linalg.sqrtm` 


In this special case all three methods yield comparable results (neglecting numerical effects).

---

In [1]:
import math
import numpy as np
import scipy as scp


# computation from eigen decomposition
P_mat = np.array([[0.0025, 0], [0, 0.25]])

# matrix decomposition
e_vals, e_vec = np.linalg.eigh(P_mat)

# A_mat should be identical to P_mat
A_mat = e_vec @ np.diag(e_vals) @ e_vec.T  

# compute square root
P_sroot_mat = e_vec @ np.diag(np.sqrt(e_vals)) @ e_vec.T  

# and this matrix should be again P_mat
B_mat = P_sroot_mat @ P_sroot_mat

print(f"P_mat :\n{P_mat}\n")
print(f"e_vals:\n{e_vals}\n")
print(f"e_vec :\n{e_vec}\n")
print(f"A_mat = e_vec @ np.diag(e_vals) @ e_vec.T :\n{A_mat}\n")
print(f"P_sroot_mat = e_vec @ np.diag(np.sqrt(e_vals)) @ e_vec.T :\n{P_sroot_mat}\n")
print(f"B_mat = P_sroot_mat @ P_sroot_mat :\n{B_mat}\n")

P_mat :
[[0.0025 0.    ]
 [0.     0.25  ]]

e_vals:
[0.0025 0.25  ]

e_vec :
[[1. 0.]
 [0. 1.]]

A_mat = e_vec @ np.diag(e_vals) @ e_vec.T :
[[0.0025 0.    ]
 [0.     0.25  ]]

P_sroot_mat = e_vec @ np.diag(np.sqrt(e_vals)) @ e_vec.T :
[[0.05 0.  ]
 [0.   0.5 ]]

B_mat = P_sroot_mat @ P_sroot_mat :
[[0.0025 0.    ]
 [0.     0.25  ]]



In [2]:
# computatation from cholesky factorization
L_mat = np.linalg.cholesky(P_mat)

print(f"(A_mat :\n{A_mat}\n")
print(f"(L_mat :\n{L_mat}\n")
print(f"(L_mat @ L_mat.T :\n{L_mat @ L_mat.T}\n")

(A_mat :
[[0.0025 0.    ]
 [0.     0.25  ]]

(L_mat :
[[0.05 0.  ]
 [0.   0.5 ]]

(L_mat @ L_mat.T :
[[0.0025 0.    ]
 [0.     0.25  ]]



In [3]:
# computation with special function (Scipy)
P_sroot2_mat = scp.linalg.sqrtm(P_mat)

print(f"P_sroot2_mat :\n{P_sroot2_mat}\n")

P_sroot2_mat :
[[0.05 0.  ]
 [0.   0.5 ]]



---

##  Continuation of :  Example / two dimensional random variable

Now we do a `Cholesky`-decomposition of

$$
\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}} = \sqrt{3 \cdot \mathbf{P}_{n,n}} = \mathbf{L} \cdot \mathbf{L}^T
$$

Denoting the two column vectors of $\mathbf{L}$ by $\mathbf{l}_i \;\ i \in [1, 2]$ the remaining 4 sigma points are computed as follows:

$$\begin{align}
\mathbf{\chi}_{n,n}^{(1)} &= \mathbf{x}_{n,n} + \mathbf{l}_1 \\
\mathbf{\chi}_{n,n}^{(2)} &= \mathbf{x}_{n,n} + \mathbf{l}_2 \\
\mathbf{\chi}_{n,n}^{(3)} &= \mathbf{x}_{n,n} - \mathbf{l}_1 \\
\mathbf{\chi}_{n,n}^{(4)} &= \mathbf{x}_{n,n} - \mathbf{l}_2 \\
\end{align}
$$

$$
\mathbf{L} = \left[\begin{array}{cc}
0.08660254 & 0 \\
0 & 0.8660254
\end{array}\right]
$$

$$\begin{align}
\mathbf{l}_1 &= \left[\begin{array}{c}
0.08660254 \\ 0
\end{array}\right] \\
\mathbf{l}_2 &= \left[\begin{array}{c}
0 \\ 0.8660254
\end{array}\right]
\end{align}
$$

and inserting

$$\begin{align}
\mathbf{\chi}_{n,n}^{(0)} &= \left[\begin{array}{c}
1 \\ \pi/2
\end{array}\right] = \left[\begin{array}{c}
1 \\ 1.57079633
\end{array}\right]\\
\mathbf{\chi}_{n,n}^{(1)} &= \left[\begin{array}{c}
1 \\ \pi/2
\end{array}\right] +\left[\begin{array}{c}
0.08660254 \\ 0
\end{array}\right]  = \left[\begin{array}{c}
1.08660254 \\ 1.57079633
\end{array}\right]\\
\mathbf{\chi}_{n,n}^{(2)} &= \left[\begin{array}{c}
1 \\ \pi/2
\end{array}\right] + \left[\begin{array}{c}
0 \\ 0.8660254
\end{array}\right] = \left[\begin{array}{c}
1 \\  2.43682173
\end{array}\right] \\
\mathbf{\chi}_{n,n}^{(3)} &= \left[\begin{array}{c}
1 \\ \pi/2
\end{array}\right] - \left[\begin{array}{c}
0.08660254 \\ 0
\end{array}\right] = \left[\begin{array}{c}
0.91339746 \\ 1.57079633
\end{array}\right] \\
\mathbf{\chi}_{n,n}^{(4)} &= \left[\begin{array}{c}
1 \\ \pi/2
\end{array}\right] - \left[\begin{array}{c}
0 \\ 0.8660254
\end{array}\right] = \left[\begin{array}{c}
1 \\ 0.70477092
\end{array}\right] \\
\end{align}
$$

The computed sigma points can be inspected below:

---

In [4]:
# cholesky decomposition 
L_mat = np.linalg.cholesky( 3*P_mat)

# calculation of sigma points
sigma_point_0 = np.array([1, np.pi/2])
sigma_point_1 = sigma_point_0 + L_mat[:,0]
sigma_point_2 = sigma_point_0 + L_mat[:,1]
sigma_point_3 = sigma_point_0 - L_mat[:,0]
sigma_point_4 = sigma_point_0 - L_mat[:,1]

print(f"L_mat:\n{L_mat}\n")
print(f"sigma_point_0:\n{sigma_point_0}\n")
print(f"sigma_point_1:\n{sigma_point_1}\n")
print(f"sigma_point_2:\n{sigma_point_2}\n")
print(f"sigma_point_3:\n{sigma_point_3}\n")
print(f"sigma_point_4:\n{sigma_point_4}\n")



L_mat:
[[0.08660254 0.        ]
 [0.         0.8660254 ]]

sigma_point_0:
[1.         1.57079633]

sigma_point_1:
[1.08660254 1.57079633]

sigma_point_2:
[1.         2.43682173]

sigma_point_3:
[0.91339746 1.57079633]

sigma_point_4:
[1.         0.70477092]



### How Sigma Points are related to the input covariance matrix

In article

`A general Method for approximating nonlinear transformations of probability distributions`; authors: Simon J. Julier, Jeffrey K. Uhlmann

it is shown that the input covariance matrix $\mathbf{P}_{x,x}$ is related to the sigma points by this equation:

$$
\mathbf{P}_{n,n} = \frac{1}{2 \cdot \left(N + \kappa  \right)} \cdot \sum_{i=1}^{2N} \left(\mathbf{\chi}_{n,n}^{(i)} - \mathbf{\chi}_{n,n}^{(0)}  \right) \cdot \left(\mathbf{\chi}_{n,n}^{(i)} - \mathbf{\chi}_{n,n}^{(0)}  \right)^T
$$

Let demonstrate this numerically and compare this with 

$$
\mathbf{P}_{n,n} = \left[\begin{array}{cc}
\sigma_r^2 & 0 \\
0 & \sigma_\theta^2
\end{array}\right] = \left[\begin{array}{cc}
0.0025 & 0 \\
0 & 0.25
\end{array}\right]
$$


In [5]:
T_mat = (1/6)*(np.outer((sigma_point_1-sigma_point_0), (sigma_point_1-sigma_point_0)) + 
               np.outer((sigma_point_2-sigma_point_0), (sigma_point_2-sigma_point_0)) +
               np.outer((sigma_point_3-sigma_point_0), (sigma_point_3-sigma_point_0)) +
               np.outer((sigma_point_4-sigma_point_0), (sigma_point_4-sigma_point_0)) )

# should be identical to P_mat (computed above)
print(f"T_mat :\n{T_mat}\n")
print(f"P_mat :\n{P_mat}\n")

T_mat :
[[0.0025 0.    ]
 [0.     0.25  ]]

P_mat :
[[0.0025 0.    ]
 [0.     0.25  ]]



### Propagation of Sigma Points through a nonlinear function

The nonlinear 2d function is :

$$
\left[\begin{array}{c}
x \\ y
\end{array}\right] = \left[\begin{array}{c}
r \cdot cos(\theta) \\ r \cdot sin(\theta)
\end{array}\right] = \mathbf{f}(r,\ \theta)
$$

The transformed sigma points are computed from this function

$$
\mathbf{\chi}_{n+1,n}^{(i)} = \mathbf{f}(\mathbf{\chi}_{n,n}^{(i)} )
$$

$$\begin{align}
\mathbf{\chi}_{n+1,n}^{(0)} &= \mathbf{f}(\mathbf{\chi}_{n,n}^{(0)}) = \left[\begin{array}{c}
1.0 \cdot cos(1.57079633) \\ 1.0 \cdot sin(1.57079633)
\end{array}\right] = \left[\begin{array}{c} 
0 \\ 1
\end{array}\right] \\
\mathbf{\chi}_{n+1,n}^{(1)} &= \mathbf{f}(\mathbf{\chi}_{n,n}^{(1)}) = \left[\begin{array}{c}
1.08660254 \cdot cos(1.57079633) \\ 1.08660254 \cdot sin(1.57079633)
\end{array}\right] = \left[\begin{array}{c} 
0 \\  1.08660254e+00
\end{array}\right]  \\
\mathbf{\chi}_{n+1,n}^{(2)} &= \mathbf{f}(\mathbf{\chi}_{n,n}^{(2)}) = \left[\begin{array}{c}
1 \cdot cos(2.43682173) \\ 1 \cdot sin(2.43682173)
\end{array}\right] = \left[\begin{array}{c} 
-0.76175998 \\ 0.64785934
\end{array}\right] \\
\mathbf{\chi}_{n+1,n}^{(3)} &= \mathbf{f}(\mathbf{\chi}_{n,n}^{(3)}) = \left[\begin{array}{c}
0.91339746 \cdot cos(1.57079633) \\ 0.91339746 \cdot sin(1.57079633)
\end{array}\right] = \left[\begin{array}{c} 
0 \\ 9.13397460e-01
\end{array}\right]  \\
\mathbf{\chi}_{n+1,n}^{(4)} &= \mathbf{f}(\mathbf{\chi}_{n,n}^{(4)}) = \left[\begin{array}{c}
1 \cdot cos(0.70477092) \\ 1 \cdot sin(0.70477092)
\end{array}\right] = \left[\begin{array}{c} 
0.76175998  \\ 0.64785934
\end{array}\right] \\
\end{align}
$$

The computational steps are presented below

In [6]:
# calculation of sigma points which have been propagated through a nonlinear function (basically a transformation from polar
# to cartesian coordinates
sigma_point_n_0 = np.array([sigma_point_0[0] * math.cos(sigma_point_0[1]), sigma_point_0[0] * math.sin(sigma_point_0[1])])
sigma_point_n_1 = np.array([sigma_point_1[0] * math.cos(sigma_point_1[1]), sigma_point_1[0] * math.sin(sigma_point_1[1])])
sigma_point_n_2 = np.array([sigma_point_2[0] * math.cos(sigma_point_2[1]), sigma_point_2[0] * math.sin(sigma_point_2[1])])
sigma_point_n_3 = np.array([sigma_point_3[0] * math.cos(sigma_point_3[1]), sigma_point_3[0] * math.sin(sigma_point_3[1])])
sigma_point_n_4 = np.array([sigma_point_4[0] * math.cos(sigma_point_4[1]), sigma_point_4[0] * math.sin(sigma_point_4[1])])

print(f"sigma_point_n_0:\n{sigma_point_n_0}\n")
print(f"sigma_point_n_1:\n{sigma_point_n_1}\n")
print(f"sigma_point_n_2:\n{sigma_point_n_2}\n")
print(f"sigma_point_n_3:\n{sigma_point_n_3}\n")
print(f"sigma_point_n_4:\n{sigma_point_n_4}\n")

sigma_point_n_0:
[6.123234e-17 1.000000e+00]

sigma_point_n_1:
[6.65352162e-17 1.08660254e+00]

sigma_point_n_2:
[-0.76175998  0.64785934]

sigma_point_n_3:
[5.59294638e-17 9.13397460e-01]

sigma_point_n_4:
[0.76175998 0.64785934]



### Weights of Sigma points

$\mathbf{\chi}_{n+1,n}^{(i)}$ have weighting factors $w_i$.

$w_0 = \kappa / \left(N+\kappa \right)$

For all other weighting factors we have:

$w_i = 1 / 2\cdot \left(N+\kappa \right)$

### Approximate the mean and the covariance of the output

**mean**

$$
\mathbf{\hat{x}}_{n+1,n} = \sum_{i=0}^{2 N} w_i \cdot \mathbf{\chi}_{n+1,n}^{(i)}
$$

The $2 \cdot N$ sigma points $\mathbf{\chi}_{n+1,n}^{(i)} $ can be arranged into a matrix $\mathbf{\chi}_{n+1,n}$ like this:

$$
\mathbf{\chi}_{n+1,n} = \left[\begin{array}{ccccc}
\vert & \cdots & \vert & \cdots & \vert \\
\mathbf{\chi}_{n+1,n}^{(0)} & \vdots & \mathbf{\chi}_{n+1,n}^{(i)} & \vdots & \mathbf{\chi}_{n+1,n}^{(2N)} \\
\vert & \cdots & \vert & \cdots & \vert 
\end{array}\right]
$$

Similarly the $2N+1$ weights are arranged into a column vector $\mathbf{w}$:

$$
\mathbf{w}= \left[\begin{array}{c}
w_0 \\ \vdots \\ w_i \\ \vdots \\ w_{2N}
\end{array}\right]
$$

With these notations the mean vector $\mathbf{\hat{x}}_{n+1,n}$ is expressed by a matrix-vector product:

$$
\mathbf{\hat{x}}_{n+1,n} = \mathbf{\chi}_{n+1,n} \cdot \mathbf{w}
$$

**covariance**

$$
\mathbf{P}_{n+1,n} = \sum_{i=0}^{2 N} w_i \cdot \left(\mathbf{\chi}_{n+1,n}^{(i)} -  \mathbf{\hat{x}}_{n+1,n}\right) \cdot \left(\mathbf{\chi}_{n+1,n}^{(i)} -  \mathbf{\hat{x}}_{n+1,n}\right)^T
$$

Defining vectors $\mathbf{U}_i = \left(\mathbf{\chi}_{n+1,n}^{(i)} -  \mathbf{\hat{x}}_{n+1,n}\right) $ we get arrange these vectors as column vector of a matrix $\mathbf{U}$.
Furthermore we put the weights $w_i$ into a diagonal matrix $\mathbf{W}$ (off-diagonal elements are $0$). Then the covariance matrix is expressed:


$$
\mathbf{P}_{n+1,n} = \mathbf{U} \cdot \mathbf{W} \cdot \mathbf{U}^T
$$

The numerical benefit in terms of computation speed may not be very significant.

A numerical demonstration of how to compute the covariance is provided below. We will proceed as follows:

1) computation of the mean $\mathbf{\hat{x}}_{n+1,n} = \mathbf{\chi}_{n+1,n} \cdot \mathbf{w}$. This requires some preliminary steps:

    a) generate matrix $\mathbf{\chi}_{n+1,n}$ from sigma points

    b) generate weight and arrange them into the weight vector $\mathbf{w}$

3) computation of the covariance $\mathbf{P}_{n+1,n} = \mathbf{U} \cdot \mathbf{W} \cdot \mathbf{U}^T$. This requires some preliminary steps:

    a) generate $\mathbf{U}$ effectively using broadcasting

    b) calculate $\mathbf{U} \cdot \mathbf{W}$ effectively without creating a full diagonal matrix $\mathbf{W}$



In [13]:
# weights
N = 2
kappa = 3 - N
w_0 = kappa/(N+kappa)
w_i = 1.0/(2*(N+kappa))

# the weight vector
w_vec = np.array([w_0, w_i, w_i, w_i, w_i])

# matrix of sigmal points (the transpose is required to arrange sigma_point vectors as column vectors 
SigmaPoint_mat = np.array([sigma_point_n_0, sigma_point_n_1, sigma_point_n_2, sigma_point_n_3, sigma_point_n_4]).T

# compute mean 
x_mean_vec = SigmaPoint_mat @ w_vec 

# generate matrix U using broadcasting
U_mat = (SigmaPoint_mat.T - x_mean_vec).T

# apply weights to U_mat
U_w_mat = U_mat * w_vec

# compute covariance matrix
P_mat = U_w_mat @ U_mat.T

print(f"w_vec :\n{w_vec}\n")
print(f"SigmaPoint_mat :\n{SigmaPoint_mat}\n")
print(f"x_mean_vec :\n{x_mean_vec}\n")
print(f"U_mat :\n{U_mat}\n")
print(f"U_w_mat :\n{U_w_mat}\n")
print(f"P_mat :\n{P_mat}\n")

w_vec :
[0.33333333 0.16666667 0.16666667 0.16666667 0.16666667]

SigmaPoint_mat :
[[ 6.12323400e-17  6.65352162e-17 -7.61759981e-01  5.59294638e-17
   7.61759981e-01]
 [ 1.00000000e+00  1.08660254e+00  6.47859345e-01  9.13397460e-01
   6.47859345e-01]]

x_mean_vec :
[8.32667268e-17 8.82619782e-01]

U_mat :
[[-2.20343869e-17 -1.67315107e-17 -7.61759981e-01 -2.73372631e-17
   7.61759981e-01]
 [ 1.17380218e-01  2.03982759e-01 -2.34760437e-01  3.07776780e-02
  -2.34760437e-01]]

U_w_mat :
[[-7.34479563e-18 -2.78858512e-18 -1.26959997e-01 -4.55621051e-18
   1.26959997e-01]
 [ 3.91267395e-02  3.39971265e-02 -3.91267395e-02  5.12961300e-03
  -3.91267395e-02]]

P_mat :
[[ 1.93426090e-01 -3.00688149e-17]
 [-3.22407853e-17  3.00562313e-02]]



**comment**

Computation of $\mathbf{\hat{x}}_{n+1,n}$ (`x_mean_vec`) and $\mathbf{P}_{n+1,n}$ (`P_mat`) are matching the result in chapter 14.2 of the book (disregarding rounding errors and numerical inaccuracies).

---

### Preliminary Summary

**selection of sigma points**

The calulation of the sigma points is based on the input covariance matrix $\mathbf{P}_{n,n}$. 

$$\begin{align}
\mathbf{\chi}_{n,n}^{(i)} &= \mathbf{x}_{n,n} + \left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{i} \ ;  i = 1,\ \ldots,\ N \\
\mathbf{\chi}_{n,n}^{(i-N)} &= \mathbf{x}_{n,n} - \left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{i-N} \ ; i = N+1,\ \ldots,\ 2N 
\end{align}
$$

Usually $\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}$  is computed from a `Cholesky`-decomposition of matrix $\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}$.

$N$ is the number of dimensions and $\kappa$ is a tuning factor.

**sigma points propagation**

The set of sigma points $\mathbf{\chi}_{n,n}^{(i)}$ is transformed by the nonlinear system function $\mathbf{f(\cdot)}$ into a new set of sigma points $\mathbf{\chi}_{n+1,n}^{(i)}$:

$$
\mathbf{\chi}_{n+1,n}^{(i)} = \mathbf{f}(\mathbf{\chi}_{n,n}^{(i)} )
$$

**weights of sigma points**

$w_0 = \kappa / \left(N+\kappa \right)$

For all other weighting factors we have:

$w_i = 1 / 2\cdot \left(N+\kappa \right)$

A mathematical justification how these weights are derived is not provided in the book.

**mean at system output**

$$
\mathbf{\hat{x}}_{n+1,n} = \sum_{i=0}^{2 N} w_i \cdot \mathbf{\chi}_{n+1,n}^{(i)}
$$

**covariance at system output**

$$
\mathbf{P}_{n+1,n} = \sum_{i=0}^{2 N} w_i \cdot \left(\mathbf{\chi}_{n+1,n}^{(i)} -  \mathbf{\hat{x}}_{n+1,n}\right) \cdot \left(\mathbf{\chi}_{n+1,n}^{(i)} -  \mathbf{\hat{x}}_{n+1,n}\right)^T
$$



## The Prediction Stage of the UKF

The prediction stage is all about finding 

1) $\mathbf{\hat{x}}_{n+1,n}$ from $\mathbf{\hat{x}}_{n,n}$

2) extrapolation covariance matrix $\mathbf{P}_{n+1,n}$ from $\mathbf{P}_{n,n}$

What is known about the prediction stage is summarised here:

| equation | notes / description |
|----------|---------------------|
| $\mathbf{\chi}_{n,n}^{(0)} = \mathbf{x}_{n,n} $ | the first sigma point; the mean vector of the state variable |
| $\mathbf{\chi}_{n,n}^{(i)} = \mathbf{x}_{n,n} + \left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{i}$ | sigma points for $1 \le i \le N$ |
| $\mathbf{\chi}_{n,n}^{(i-N)} = \mathbf{x}_{n,n} - \left(\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}}\right)_{i-N}$ | sigma points for $N+1 \le i \le 2 \cdot N$ |
| $\sqrt{\left(N+\kappa\right) \cdot \mathbf{P}_{n,n}} $ | matrix obtained from the columns of matrix $\mathbf{L}$ . $\mathbf{P}_{n,n} = \mathbf{L} \cdot \mathbf{L}^T$ (Cholesky-decomposition) |
| $\mathbf{\chi}_{n+1,n}^{(i)} = \mathbf{f}(\mathbf{\chi}_{n,n}^{(i)} )$ | propagation of sigma points by a nonlinear function |
| $\mathbf{\hat{x}}_{n+1,n} = \sum_{i=0}^{2 N} w_i \cdot \mathbf{\chi}_{n+1,n}^{(i)}$ | the computation of the predicted mean value from sigma points $\mathbf{\chi}_{n+1,n}^{(i)}$ that have been obtained from the nonlinear system model |
| $\mathbf{P}_{n+1,n} = \sum_{i=0}^{2 N} w_i \cdot \left(\mathbf{\chi}_{n+1,n}^{(i)} -  \mathbf{\hat{x}}_{n+1,n}\right) \cdot \left(\mathbf{\chi}_{n+1,n}^{(i)} -  \mathbf{\hat{x}}_{n+1,n}\right)^T$ | estimated covariance matrix; possibly the process noise covariance matrix $\mathbf{Q}$ should be added. |






For the update stage of the `UKF` filter some results about `statistical linear regression` will be needed. The book deals with this kind of regression in annex F. To understand the method I have set up another notebook.


http://localhost:8888/lab/tree/Statistics/KalmanFilter/statistical_linear_regression.ipynb

The most important facts are summarised here:

A nonlinear system $\mathbf{g(\cdot)}$ with input $\mathbf{x}$ produces the output $\mathbf{y}$.

We want to linearise this nonlinear mapping by the matrix equation:

$$
\mathbf{y} \approx \mathbf{M} \cdot \mathbf{x} + \mathbf{b}
$$

The elements of matrix $\mathbf{M}$ and vector $\mathbf{b}$ are computed by minimising the squared error.

In the case of the `UKF` we process *input* sigma points $\mathbf{\chi}^{(i)}$ into *output* sigma points $\mathbf{y}^{(i)}$.

$$
\mathbf{y}^{(i)} = \mathbf{g}(\mathbf{\chi}^{(i)})
$$

and 

$$
\mathbf{y}^{(i)} \approx \mathbf{M} \cdot \mathbf{\chi}^{(i)} + \mathbf{b}
$$

Defining the mean squared error $\mathbf{e}_i = \mathbf{y}^{(i)} - \left(\mathbf{M} \cdot \mathbf{\chi}^{(i)} + \mathbf{b}\right)$ 

The set of $2N+1$ input sigma points produces a set of $2N+1$ output sigma points. The aggregated squared error $E$ is thus:

$$
E = \sum_{i=0}^{2N} \mathbf{e}_i^T \cdot \mathbf{e}_i
$$

$\mathbf{M}$ and vector $\mathbf{b}$ are computed as:

$$\begin{align}
\mathbf{M} &= \mathbf{P}_{x,y}^T \cdot  \mathbf{P}_{x,x}^{-1} = \mathbf{P}_{y,x} \cdot  \mathbf{P}_{x,x}^{-1}  \\
\mathbf{b} &= \mathbf{\mu}_y - \mathbf{M} \cdot \mathbf{\mu}_x
\end{align}
$$

The terms in these two equations are summarised in the table below:

| equations | descriptions |
|-----------|--------------|
|  $\mathbf{\mu}_x = \frac{1}{K} \sum_{k=1}^K \mathbf{x}_k$  | mean of $\mathbf{x}_k$ |
|  $\mathbf{\mu}_y = \frac{1}{K} \sum_{k=1}^K \mathbf{y}_k$  | mean of $\mathbf{y}_k$ |
| $\mathbf{P}_{x,x} = \frac{1}{K} \sum_{k=1}^K \left( \mathbf{x}_k - \mathbf{\mu}_x \right) \cdot \left( \mathbf{x}_k - \mathbf{\mu}_x \right)^T $ | covariance of $\mathbf{x}_k$ |
| $\mathbf{P}_{y,y} = \frac{1}{K} \sum_{k=1}^K \left( \mathbf{y}_k - \mathbf{\mu}_y \right) \cdot \left( \mathbf{y}_k - \mathbf{\mu}_y \right)^T $ | covariance of $\mathbf{y}_k$ |
| $\mathbf{P}_{x,y} = \frac{1}{K} \sum_{k=1}^K \left( \mathbf{x}_k - \mathbf{\mu}_x \right) \cdot \left( \mathbf{y}_k - \mathbf{\mu}_y \right)^T $ | cross covariance of $\mathbf{x}_k$ and $\mathbf{y}_k$ |
| $\mathbf{P}_{y,x} = \frac{1}{K} \sum_{k=1}^K \left( \mathbf{y}_k - \mathbf{\mu}_y \right) \cdot \left( \mathbf{x}_k - \mathbf{\mu}_x \right)^T $ | cross covariance of $\mathbf{y}_k$ and $\mathbf{x}_k$ |

## UKF / Update Equations

As for the `EKF` filter we assume that the true measurement $\mathbf{z}_n$ depends non-linearly on the state vector and measurement noise.
The sigma points $\mathbf{\chi}_{n,n-1}^{(i)}$ and the computed measurements $\textit{Z}_n^{(i)}$ are related via equation:

$$
\textit{Z}_n^{(i)} = \mathbf{h}\left( \mathbf{\chi}_{n,n-1}^{(i)}\right)
$$

Processing all $2N+1$ sigma points permits to compute the *mean* value $\mathbf{\bar{\textit{Z}}}_n$.

$$
\mathbf{\bar{\textit{Z}}}_n = \sum_{i=0}^{2N} w_i \cdot \textit{Z}_n^{(i)}
$$

The innovation $\left( \mathbf{z}_n - \mathbf{\bar{\textit{Z}}}_n \right)$ is expressed by the difference of the true measurement and the computed measurement using the predicted state.


