# Learning about Kalman filter / Linearisation Methods and Uncertainty Propagation 

**Resources**

`Kalman Filter from Ground Up`; author Alex Becker; https://www.kalmanfilter.net

**Overview**

For an understanding of the extended Kalman filter as well as the unscented Kalman filter some background information about linearisation of a linear function in one, two or many dimension is necessary.

The book provides an overview of such methods however the are other resources that explain the techniques with greater details.

`The Tangent Approximation` from `1802SupplementaryNotes_full.pdf` which can be downloaded from 

https://ocw.mit.edu/courses/18-02-multivariable-calculus-fall-2007/pages/readings/supp_notes/


explains the approximation of a 2D function by a tangent plane and then generalises the approach to the multi-dimensional case.

https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/82620/eth-8432-01.pdf

provides details on how to propagate the uncertainty in case of a  nonlinear system. It makes use of the linearisation methods.

---

## Linear Approximation in 1 D

A function $f(x)$ shall be approximated at $x_0$. The Taylor series expansion of $f(x)$ is expressed here:

$$
f(x) = f(x_0) + \sum_{n=1}^{\infty} \frac{f^{(n)}(x_0)}{n!} \cdot (x - x_0)
$$

The term $f^{(n)}(x_0)$ is the n'th derivative of function $f(x)$. In the equation above it has been assumed that infinitely many derivatives exist. However for we obtain a finite series if only a finite number of derivatives exist.

The linear approximation is just the Taylor series truncated to the first term (n=1). The linear approximation is thus:

$$
f(x) = y = f(x_0) + \frac{d f(x_0)}{dx} \cdot (x - x_0)
$$


---



## Linear Approximation in 2D

The function is now $w = f(x,y)$. The task is to find a linear approximation in the vicinity of point $(x_0,y_0)$. 

$$
w_0 = f(x_0, y_0)
$$

To obtain a linear approximation at point $(x,y)$ two approximation steps are performed. With 

$$\begin{align}
\Delta x &= x - x_0 \\
\Delta y &= y - y_0 \\
w &= f(x_0 + \Delta x , y_0 + \Delta y)
\end{align}
$$

The first step is going in x-direction from $x_0$ to $x_0 + \Delta x$ while keeping $y$ constant at $y=y_0$. 

$$\begin{align}
w_x &= f(x_0 + \Delta x , y_0 ) \\
&\approx f(x_0, y_0) + f_x(x_0, y_0) \cdot \Delta x
\end{align}
$$

$f_x(x_0, y_0)$ denotes the partial derivative at point $x_0,y_0$ with respect to $x$

In a second step $y$ is changed from $y_0$ to $y_0 + \Delta y$ while keeping $x$ constant at $x=x_0 + \Delta x$

$$\begin{align}
w &\approx w_x + f_y(x_0 + \Delta x, y_0) \cdot \Delta y \\
&\approx  f(x_0, y_0) + f_x(x_0, y_0) \cdot \Delta x + f_y(x_0 + \Delta x, y_0) \cdot \Delta y
\end{align}
$$

For a smooth function $f(x,y)$ the partial derivative $f_y(x_0 + \Delta x, y_0)$ is not much different from $f_y(x_0, y_0)$. We may thus write:

$$
f_y(x_0 + \Delta x, y_0) = f_y(x_0, y_0) + \epsilon
$$

Here the additive term $\epsilon$ is small compared to the value of $f_y(x_0, y_0)$.

$$\begin{align}
w &\approx  f(x_0, y_0) + f_x(x_0, y_0) \cdot \Delta x + f_y(x_0, y_0) \cdot \Delta y + \epsilon \cdot \Delta y
\end{align}
$$

In the vicinity of $x_0,y_0$ the term $\epsilon \cdot \Delta y$ can be ignored (at least in the limit ...). 

The equation for the linear approximation around point $x_0,y_0$ is now:

$$\begin{align}
w = f(x,y) &\approx  f(x_0, y_0) + f_x(x_0, y_0) \cdot \Delta x + f_y(x_0, y_0) \cdot \Delta y \\
&\approx f(x_0, y_0) + f_x(x_0, y_0) \cdot (x - x_0) + f_y(x_0, y_0) \cdot (y - y_0)
\end{align}
$$

---


## Linear Approximation / Multivariate 

The multivariate linear approximation follows from the same procedure as for the 2D case. To illustrate this consider a function $f(x_1, x_2, \ldots, x_n$ which depends on $n$ variables. To obtain a linear approximation of

$f(x_1 +\Delta x_1, x_2 +\Delta x_2, \ldots, x_n +\Delta x_n)$

we just have to compute:

$$\begin{align}
w &= f(x_1 +\Delta x_1, x_2 +\Delta x_2, \ldots, x_n +\Delta x_n) \\
&\approx f(x_1, x_2, \ldots, x_n) + f_{x_1}(x_1, x_2, \ldots, x_n) \cdot \Delta x_1 + f_{x_2}(x_1, x_2, \ldots, x_n) \cdot \Delta x_2 + \ldots + f_{x_n}(x_1, x_2, \ldots, x_n) \cdot \Delta x_n
\end{align}
$$

In this equation $f_{x_1}(\ ), f_{x_2}(\ ), \ldots, f_{x_n}(\ )$ denote the partial derivatives with respect to $x_1, x_2, \dots, x_n$.

---



## Numerical Examples

$f(x,y,z) = x^2 \cdot y^3 \cdot z^4$

**partial derivatives**

$$\begin{align}
f_x(x,y,z) &= 2 x \cdot y^3 \cdot z^4 \\
f_y(x,y,z) &= 3 x^2 \cdot y^2 \cdot z^4 \\
f_z(x,y,z) &= 4 x^2 \cdot y^3 \cdot z^3 \\
\end{align}
$$

$(x_0, y_0, z_0) = (1, 2, 3)$

$f(x_0, y_0, z_0) =  8 \cdot 81 = 648$

In [1]:
import numpy as np

# point around which to linearize
x0 = 1
y0 = 2
z0 = 3

# partial derivatives
f0 = (x0**2) * (y0**3) * (z0**4)
fpx = 2 * x0 * (y0**3) * (z0**4)
fpy = 3 * (x0**2) * (y0**2) * (z0**4)
fpz = 4 * (x0**2) * (y0**3) * (z0**3)

xv = np.linspace(0.95, 1.05, 3)
yv = np.linspace(1.95, 2.05, 3)
zv = np.linspace(2.95, 3.05, 3)

X1, X2, X3 = np.meshgrid(xv, yv, zv, indexing='ij')

# true function values
fu = (X1**2) * (X2**3) * (X3**4)

# linearized function
fl = f0 + fpx*(X1 - x0) + fpy * (X2 - y0) + fpz * (X3 - z0)

In [2]:
# relative error of true function vs. linearized function
error_percentage = 100*(fl-fu)/f0

## Propagation of Uncertainty

Mostly copied or adapted from https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/82620/eth-8432-01.pdf



### Uncertainty propagation in 1 dimension

For a generally nonlinear function $f(x)$ and normally distributed input $x$ with mean value $\mu_x$ and standard deviation $\sigma_x$ we want to infer the statistics of the output $y$.

The statistics of $y$ are certainly no longer normally distributed. An approximation of $f(x)$ for $x=\mu_x$ by a linear function will used.

In reasonably small neighborhood of $\mu_x$ the linear approximation is expressed by.

$$
y = f(\mu_x) + \frac{\partial f}{\partial x} \bigg|_{\mu_x} \cdot  (x - \mu_x)
$$

From the mathematical rules for computing expections it follows:

$$\begin{align}
\mu_y &= f(\mu_x) \\
\sigma_y^2 &= E\left( \left(y - \mu_y \right)^2 \right) \\
&= \left(\frac{\partial f}{\partial x} \bigg|_{\mu_x}\right)^2 \cdot  E\left((x - \mu_x)^2 \right) \\
\sigma_y^2 &= \left(\frac{\partial f}{\partial x} \bigg|_{\mu_x} \right)^2 \cdot \sigma_x^2
\end{align}
$$

In all practical situation we must consider the fact that at best an estimated value of $\mu_x$ will be available. And similarly only a good guess of the standard deviation $\sigma_x$ may be known. So there are at least 3 sources of error:

1) the linearisation

2) the uncertainty about $\mu_x$

3) the uncertainty about $\sigma_x$

A section in https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/82620/eth-8432-01.pdf addresses these issues in some detail. For a small standard deviation $\sigma_x$ the linear approximation may still be a good approximation if the slope $\frac{\partial f}{\partial x} \bigg|_{\mu_x}$ does not change significantly in the range $\mu_x - \sigma_x \le x \le \mu_x + \sigma_x$. But even then we are still left with a possibly poor knowledge of $\mu_x$.

---

### Uncertainty propagation in multiple dimension

#### n inputs / 1 output

In this scenario a nonlinear system with multiple inputs $x_1, x_2, \ldots,\ x_n$ produces a **single** output $y_1$. 

$$
y_1 = f_1(x_1, x_2, \ldots,\ x_n)
$$


$f_1(\ )$ is a nonlinear function dependent on random variables $x_1, x_2, \ldots,\ x_n$ which are normally distributed. Mean values and standard deviations of these inputs are denoted $\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n}$ and $\sigma_{x_1}, \sigma_{x_2}, \ldots,\ \sigma_{x_n}$

The linear approximation is given by:

$$
y_1 = f_1\left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right) + \sum_{i=1}^n \frac{\partial f_1 \left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)}{\partial x_i} \cdot \left(x_i - \mu_{x_i} \right)
$$

To shorten the notation we introduce $a_0$ and $a_1, a_2, \ldots,\ a_n$ as:

$$\begin{align}
a_0 &= f_1\left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right) \\
a_i &= \frac{\partial f_1 \left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)}{\partial x_i}
\end{align}
$$

and get a more compact equation for the linear approximation:

$$
y_1 = a_0 + \sum_{i=1}^n a_i \cdot \left(x_i - \mu_{x_i} \right)
$$

The notation is a bit *sloppy* here. Actually the equation symbol $\ =$ should be replaced by $\approx$.

For the mean value $\mu_{y_1}$ we obtain:

$$\begin{align}
\mu_{y_1} &= E(y) \\
&= E(a_0) + \sum_{i=1}^n a_i \cdot E\left(x_i - \mu_{x_i} \right) \\
&= a_0 + \sum_{i=1}^n a_i \cdot \left(E(x_i) - \mu_{x_i} \right) \\
&= a_0 + \sum_{i=1}^n a_i \cdot \left(\mu_i - \mu_{x_i} \right) \\
&= a_0 \\
\mu_{y_1} &= f_1\left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)
\end{align}
$$

And for the variance $\sigma_{y_1}^2$ :

$$\begin{align}
\sigma_{y_1}^2 &= E\left( \left(y - \mu_{y_1} \right)^2 \right) \\
&= E\left( \left( \sum_{i=1}^n a_i \cdot \left(x_i - \mu_{x_i} \right) \right)^2 \right) \\
&= E\left( \left( \sum_{i=1}^n a_i \cdot \left(x_i - \mu_{x_i} \right) \right) \cdot \left( \sum_{j=1}^n a_j \cdot \left(x_j - \mu_{x_j} \right) \right) \right) \\
&= E\left( \sum_{i=1}^n \sum_{j=1}^n  a_i \cdot a_j \cdot \left(x_i - \mu_{x_i} \right) \cdot \left(x_j - \mu_{x_j} \right) \right) \\
&= \sum_{i=1}^n \sum_{j=1}^n  a_i \cdot a_j \cdot \underbrace{E\left( \left(x_i - \mu_{x_i} \right) \cdot \left(x_j - \mu_{x_j} \right) \right) }_{\sigma_{ij}} \\
&= \sum_{i=1}^n \sum_{j=1}^n  a_i \cdot a_j \cdot \sigma_{ij} \\
&= \sum_{i=1}^n a_i^2 \cdot \sigma_{x_i}^2 + \sum_{i=1;\ i \neq j}^n \sum_{j=1}^n  a_i \cdot a_j \cdot \sigma_{ij} \\
\sigma_{y_1}^2 &= \sum_{i=1}^n \left(\frac{\partial f_1 \left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)}{\partial x_i}\right)^2 \cdot \sigma_{x_i}^2 + \sum_{i=1;\ i \neq j}^n \sum_{j=1}^n  \left(\frac{\partial f_1 \left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)}{\partial x_i} \right) \cdot \left(\frac{\partial f_1 \left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)}{\partial x_j} \right) \cdot \sigma_{ij}
\end{align}
$$

Only if random variables $x_1, x_2, \ldots,\ x_n$ are independent the covariance $\sigma_{ij} = E\left( \left(x_i - \mu_{x_i} \right) \cdot \left(x_j - \mu_{x_j} \right) \right) $ is $0$.

We then get:

$$
\sigma_{y_1}^2 = \sum_{i=1}^n \left(\frac{\partial f \left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)}{\partial x_i}\right)^2 \cdot \sigma_{x_i}^2
$$

---

#### n inputs / m outputs

As before there are $n$ inputs. But now there are $m$ outputs $y_1, y_2,\ \ldots,\ y_m$. And there are $m$ possibly nonlinear functions $f_1(),\ f_2(),\ \ldots ,\ f_m()$.

Using the inputs the k'th output $y_k$ is computed from nonlinear function $f_k()$. 

As before we use linear approximations of functions $f_1(),\ f_2(),\ \ldots ,\ f_m()$.

For the variance $\sigma_{y_k}^2$ we obtain :

$$
\sigma_{y_k}^2 = \sum_{i=1}^n \left(\frac{\partial f_k \left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)}{\partial x_i}\right)^2 \cdot \sigma_{x_i}^2 + \sum_{i=1;\ i \neq j}^n \sum_{j=1}^n  \left(\frac{\partial f_k \left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)}{\partial x_i} \right) \cdot \left(\frac{\partial f_k \left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)}{\partial x_j} \right) \cdot \sigma_{ij}
$$

And making the implicit assumption that partial derivatives are evaluated for $\left(\mu_{x_1}, \mu_{x_2}, \ldots,\ \mu_{x_n} \right)$ we get a more readable equation:

$$
\sigma_{y_k}^2 = \sum_{i=1}^n \left(\frac{\partial f_k }{\partial x_i}\right)^2 \cdot \sigma_{x_i}^2 + \sum_{i=1;\ i \neq j}^n \sum_{j=1}^n  \left(\frac{\partial f_k }{\partial x_i} \right) \cdot \left(\frac{\partial f_k }{\partial x_j} \right) \cdot \sigma_{ij}
$$

The computation of $\sigma_{y_k}^2$ only involves evaluations of the partial derivatives of function $f_k()$.

The situation changes if we consider the covariances of the outputs. Here we will compute the covariance of outputs $y_l$ and $y_k$ and denote the covariance by $\sigma_{y_l, y_k}$.

$$\begin{align}
\sigma_{y_l, y_k} &= Cov(y_l, y_k) = E\left( \left(y_l - \mu_{y_l} \right) \cdot \left(y_k - \mu_{y_k} \right) \right) \\
&= E\left(y_l \cdot y_k \right) - E\left(y_l\right)  \cdot \mu_{y_k} - E\left(y_k\right)  \cdot \mu_{y_l} + E\left(\mu_{y_l} \cdot \mu_{y_k} \right) \\
&= E\left(y_l \cdot y_k \right) - E\left(y_l\right) \cdot E\left(y_k\right) \\
&= E\left(y_l \cdot y_k \right) - \mu_{y_l} \cdot \mu_{y_k} 
\end{align}
$$

Writing outputs $y_l$ and $y_k$ as linear approximations

$$\begin{align}
y_l &= \mu_{y_l} + \sum_{i=1}^n \frac{\partial f_l }{\partial x_i} \cdot \left(x_i - \mu_{x_i} \right) \\
y_k &= \mu_{y_k} + \sum_{i=1}^n \frac{\partial f_k }{\partial x_i} \cdot \left(x_i - \mu_{x_i} \right) 
\end{align}
$$

we get these equations for $\sigma_{y_l, y_k}$:

$$\begin{align}
\sigma_{y_l, y_k} &= E\left( \left(\mu_{y_l} + \sum_{i=1}^n \frac{\partial f_l }{\partial x_i} \cdot \left(x_i - \mu_{x_i} \right) \right) \cdot \left(\mu_{y_k} + \sum_{j=1}^n \frac{\partial f_k }{\partial x_j} \cdot \left(x_j - \mu_{x_j} \right) \right) \right) - \mu_{y_l} \cdot \mu_{y_k} \\
&= E\left(\mu_{y_l} \cdot \mu_{y_k}\right) + \mu_{y_l} \cdot E\left(\sum_{j=1}^n \frac{\partial f_k }{\partial x_j} \cdot \left(x_j - \mu_{x_j} \right) \right) + \mu_{y_k} \cdot E\left(\sum_{i=1}^n \frac{\partial f_l }{\partial x_i} \cdot \left(x_i - \mu_{x_i} \right) \right) + E\left(\sum_{i=1}^n \frac{\partial f_l }{\partial x_i} \cdot \left(x_i - \mu_{x_i} \right) \cdot \sum_{j=1}^n \frac{\partial f_k }{\partial x_j} \cdot \left(x_j - \mu_{x_j} \right)  \right)  - \mu_{y_l} \cdot \mu_{y_k} \\
&= E\left(\sum_{i=1}^n \frac{\partial f_l }{\partial x_i} \cdot \left(x_i - \mu_{x_i} \right) \cdot \sum_{j=1}^n \frac{\partial f_k }{\partial x_j} \cdot \left(x_j - \mu_{x_j} \right)  \right) \\
&= E\left( \sum_{i=1}^n \sum_{j=1}^n  \frac{\partial f_l }{\partial x_i} \frac{\partial f_k }{\partial x_j} \cdot \left(x_i - \mu_{x_i} \right) \cdot \left(x_j - \mu_{x_j} \right)  \right) \\
&= \sum_{i=1}^n \sum_{j=1}^n  \frac{\partial f_l }{\partial x_i} \frac{\partial f_k }{\partial x_j} \cdot E\left(\left(x_i - \mu_{x_i} \right) \cdot \left(x_j - \mu_{x_j} \right)  \right) \\
\sigma_{y_l, y_k} &= \sum_{i=1}^n \sum_{j=1}^n  \frac{\partial f_l }{\partial x_i} \frac{\partial f_k }{\partial x_j} \cdot \sigma_{ij}
\end{align}
$$

$\sigma_{y_l, y_k}$ can be interpreted as the $(l,k)$ element of the output convariance matrix $\mathbf{C_y}$ obtained by the dot product of the l'th row vector multiplied by the k'th column vector of another matrix. To see this we rewrite the last equation:

$$
\sigma_{y_l, y_k} = \sum_{i=1}^n  \frac{\partial f_l }{\partial x_i} \sum_{j=1}^n   \frac{\partial f_k }{\partial x_j} \cdot \sigma_{ij}
$$

So the $n$ elements of the l'th row vector are identified as:

$$\left[\begin{array}{ccccc}
\frac{\partial f_l }{\partial x_1} & \frac{\partial f_l }{\partial x_2} & \cdots & \frac{\partial f_l }{\partial x_{n-1}} & \frac{\partial f_l }{\partial x_n} 
\end{array}\right]
$$

Similarly the i'th element of the k'th column vector is the expression:

$$
\sum_{j=1}^n   \frac{\partial f_k }{\partial x_j} \cdot \sigma_{ij}
$$

which can be identified as the matrix product of the input covariance matrix $\mathbf{C_x}$ right multiplied by another matrix.

Defining the matrix $\mathbf{F};\ \in \mathbb{R}^{m \times n}$ by:

$$
\mathbf{F} = \left[\begin{array}{ccccc}
\frac{\partial f_1 }{\partial x_1} & \frac{\partial f_1 }{\partial x_2} & \cdots & \frac{\partial f_1 }{\partial x_{n-1}} & \frac{\partial f_1 }{\partial x_n} \\ 
\vdots & \vdots & \vdots & \vdots & \vdots \\
\frac{\partial f_p }{\partial x_1} & \frac{\partial f_p }{\partial x_2} & \cdots & \frac{\partial f_p }{\partial x_{n-1}} & \frac{\partial f_p }{\partial x_n} \\
\vdots & \vdots & \vdots & \vdots & \vdots \\
\frac{\partial f_m }{\partial x_1} & \frac{\partial f_m }{\partial x_2} & \cdots & \frac{\partial f_m }{\partial x_{n-1}} & \frac{\partial f_m }{\partial x_n} \\
\end{array}\right]
$$

we can see that the output covariance matrix $\mathbf{C_y}$ is computed from the product of three matrices:

$$
\mathbf{C_y} = \mathbf{F} \cdot \mathbf{C_x} \cdot \mathbf{F}^T
$$

The matrix $\mathbf{F}$ is commonly referred to as the `Jacobian` matrix.

**preliminary summary**

The covariances $\sigma_{y_l, y_k} $ are the elements of the output covariance matrix $\mathbf{C_y};\ \in \mathbb{R}^{m \times m}$.

The covariances $\sigma_{i, j} $ are the elements of the input covariance matrix $\mathbf{C_x};\ \in \mathbb{R}^{n \times n}$.

Since covariance matrices are symmetric and positive definite there exists a `Cholesky`-decomposition for the covariance matrix. This permits to write $\mathbf{C_x}$ as:

$$
\mathbf{C_x} = \mathbf{L} \cdot \mathbf{L}^T 
$$

$$\begin{align}
\mathbf{C_y} &= \mathbf{F} \cdot \mathbf{C_x} \cdot \mathbf{F}^T \\
&= \left(\mathbf{F} \cdot \mathbf{L}\right) \cdot \mathbf{L}^T  \cdot \mathbf{F}^T \\
&= \underbrace{\left(\mathbf{F} \cdot \mathbf{L}\right)}_{\mathbf{V}} \cdot \left(\mathbf{F} \cdot \mathbf{L}\right)^T \\
&= \mathbf{V} \cdot \mathbf{V}^T
\end{align}
$$

---

## Examples 

from `Kalman Filter from Ground Up` and  https://www.kalmanfilter.net



## Linearisation an uncertainty projection in a single dimension 

see chapter 13.3.1 from the book.

The altitude of a ballon is measured. At time instant $n$ the measurement is denoted $z_n$. It depends on the state vector $x_n$ via the equation

$$
z_n = h(x_n)
$$

$z_n$ is an angle $\theta$ . So we have:

$$
z_n = \theta = tan^{-1}\left(\frac{x_n}{d} \right)
$$

Accordingly the function $h()$ is:

$$
h(x_n) = tan^{-1}\left(\frac{x_n}{d} \right)
$$

To propagate the uncertainty of the altitude $x_n$ to the uncertainty of the measured angle $z_n= \theta$ we must linearise $h(x_n)$. We must therefore compute the derivative of $h(x_n)$ with respect to $x_n$.

$$
\frac{d\ h(x_n)}{d\ x_n} = \frac{1}{d} \cdot \frac{1}{1+\left(\frac{x_n}{d} \right)^2 } = \frac{d}{d^2 + x_n^2}
$$

Assuming that $x_n$ is a random variable with a normal distribution and $\mu_{x_n}$ and $\sigma_{x_n}^2$ we get the mean $\mu_{z_n}$ and the variance $\sigma_{z_n}^2$  of the measurement.


$$\begin{align}
\mu_{z_n} &= h(\mu_{x_n}) = tan^{-1}\left(\frac{\mu_{x_n}}{d} \right) \\
\sigma_{z_n}^2 &= \frac{d\ h(x_n)}{d\ x_n} = \frac{d}{d^2 + x_n^2} \cdot \sigma_{x_n}^2
\end{align}
$$

---


## Linearisation and uncertainty projection in two dimension 

see chapter 13.4 from the book.

For the nonlinear pendulum problem (chapter 12.3) the state vector has two components; angle $\theta_n$ and $\dot{\theta}_n$. For the state vector $\mathbf{x_n}$ we obtain:

$$
\mathbf{x_n} = \left[\begin{array}{c}
\theta_n \\ \dot{\theta}_n
\end{array}\right]
$$

**measurement**

The measurement $z_n$ provides 

$$
z_n = h(\theta_n) =  L \cdot sin \left(\theta_n \right)
$$

The covariance matrix of the state vector $\mathbf{x_n}$ is denoted $\mathbf{P}_{n,n}$ . For the covariance matrix $\mathbf{P}_{z_n}$ of the measurement uncertainty we compute the partial derivatives 

$$
\left[\begin{array}{cc}
\frac{\partial h}{\partial \theta_n} & \frac{\partial h}{\partial \dot{\theta}_n}
\end{array}\right] = \left[\begin{array}{cc}
L \cdot cos \left(\theta_n \right) & 0
\end{array}\right]
$$

$$
\mathbf{P}_{z_n} = \left[\begin{array}{cc}
L \cdot cos \left(\theta_n \right) & 0
\end{array}\right] \cdot \mathbf{P}_{n,n} \cdot \left[\begin{array}{c}
L \cdot cos \left(\theta_n \right) \\ 0
\end{array}\right]
$$

**dynamic model**

The state extrapolation equation is non-linear.

$$\begin{align}
\hat{\mathbf{x}}_{n+1,n} &= \mathbf{f}\left(\hat{\mathbf{x}}_{n,n}  \right) \\
&= \left[\begin{array}{c}
\hat{\theta}_{n+1,n} \\
\hat{\dot{\theta}}_{n+1,n} 
\end{array}\right] = \left[\begin{array}{c}
\hat{\theta}_{n,n} + \hat{\dot{\theta}}_{n,n} \cdot \Delta t \\
\hat{\dot{\theta}}_{n,n} - \frac{g}{L} \cdot sin(\hat{\theta}_{n,n}) \cdot \Delta t
\end{array}\right] = \left[\begin{array}{c}
f_1(\hat{\theta}_{n,n},\ \hat{\dot{\theta}}_{n,n}) \\
f_2(\hat{\theta}_{n,n},\ \hat{\dot{\theta}}_{n,n})
\end{array}\right]
\end{align}
$$

For the partial derivatives we obtain in matrix form (`Jacobian` matrix):

$$
\frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \left[\begin{array}{cc}
\frac{\partial {f_1}}{\partial \hat{\theta}_{n,n}} & \frac{\partial {f_1}}{\partial \hat{\dot{\theta}}_{n,n}}  \\
\frac{\partial {f_2}}{\partial \hat{\theta}_{n,n}} & \frac{\partial {f_1}}{\partial \hat{\dot{\theta}}_{n,n}} 
\end{array}\right] = \left[\begin{array}{cc}
1 & \Delta t \\
- \frac{g}{L} \cdot cos(\hat{\theta}_{n,n}) \cdot \Delta t & 1 
\end{array}\right]
$$

