# Kalman Filtering / Part 2

This notebook is based on the article

`An Elementary Introduction to Kalman Filtering`  , authors: Yan Pei, Donald S. Fussel, Swarnendu Biswas, Keshav Pingali

I tried to understand most parts of the article.

## Vector Estimates


Two random vectors $\mathbf{x}_1$ and $\mathbf{x}_2$ shall be combined into another random vector $\mathbf{y}$.

$$
\mathbf{y} = \mathbf{A}_1 \cdot \mathbf{x}_1 + \mathbf{A}_2 \cdot \mathbf{x}_2
$$

Vector $\mathbf{x}_1$ and $\mathbf{x}_2$ each have `M` elements/components. Each component represents a random variable. The mean of vectors $\mathbf{x}_1$ and $\mathbf{x}_2$ are again vectors. They are denoted $\mathbf{\mu_{x_1}}$ and $\mathbf{\mu_{x_2}}$

Matrices $\mathbf{A}_1$ and $\mathbf{A}_2$ act as weighting factors/matrices of these vector. Moreover these matrices shall be square matrices ($M \times M$). Thus vector $\mathbf{y}$ has `M` elements/components. 


The mean of vector $\mathbf{y}$ is a vector denoted $$.

**mean value of $\mathbf{y}$**

$$
E(\mathbf{y}) = \mathbf{\mu_{y}} = \mathbf{A}_1 \cdot E(\mathbf{x}_1) + \mathbf{A}_2 \cdot E(\mathbf{x}_2) = \mathbf{A}_1 \cdot \mathbf{\mu_{x_1}} + \mathbf{A}_2 \cdot \mathbf{\mu_{x_2}}
$$


**covariance matrix of $\mathbf{y}$**

$$\begin{align}
E\left( \left(\mathbf{y} - \mathbf{\mu_{y}} \right) \cdot \left(\mathbf{y} - \mathbf{\mu_{y}} \right)^T \right) &= E\left( \left(\mathbf{A}_1 \cdot (\mathbf{x}_1 - \mathbf{\mu_{x_1}}) + \mathbf{A}_2 \cdot (\mathbf{x}_2 - \mathbf{\mu_{x_2}})\right) \cdot \left(\mathbf{A}_1 \cdot (\mathbf{x}_1 - \mathbf{\mu_{x_1}}) + \mathbf{A}_2 \cdot (\mathbf{x}_2 - \mathbf{\mu_{x_2}})\right)^T \right) \\
\ &= E\left( \left(\mathbf{A}_1 \cdot \mathbf{x}_{1(c)} + \mathbf{A}_2 \cdot \mathbf{x}_{2(c)}\right) \cdot \left(\mathbf{A}_1 \cdot \mathbf{x}_{1(c)} + \mathbf{A}_2 \cdot \mathbf{x}_{2(c)}\right)^T  \right) = \mathbf{A_1} \cdot E\left( \mathbf{x}_{1(c)} \cdot \mathbf{x}_{1(c)}^T \right) \cdot \mathbf{A_1}^T + \mathbf{A_2} \cdot E\left( \mathbf{x}_{2(c)} \cdot \mathbf{x}_{2(c)}^T \right) \cdot \mathbf{A_2}^T \\
\ \\
\mathbf{\Sigma}_y &= E\left( \left(\mathbf{y} - \mathbf{\mu_{y}} \right) \cdot \left(\mathbf{y} - \mathbf{\mu_{y}} \right)^T \right) = \mathbf{A}_1 \cdot \mathbf{\Sigma}_1 \cdot \mathbf{A_1}^T + \mathbf{A}_2 \cdot \mathbf{\Sigma}_2 \cdot \mathbf{A_2}^T
\end{align}
$$

For the rest of this notebook we assume that weighting matrices sum up to the identity matrix:

$$
\mathbf{A}_1 + \mathbf{A}_2 = \mathbf{I}
$$


---


## Optimum weighting matrices

With constraint

$$
\mathbf{A}_1 + \mathbf{A}_2 = \mathbf{I}
$$

the estimate $\mathbf{y}$ can expressed by equation:

$$
\mathbf{y} = \mathbf{A}_1 \cdot \mathbf{x}_1 + \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \mathbf{x}_2
$$

$$
\mathbf{y} - \mathbf{\mu_y}= \mathbf{A}_1 \cdot \mathbf{x}_1 + \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \mathbf{x}_2 - \mathbf{\mu_y}
$$

Expressing mean value $\mathbf{\mu_y}$ by the weighted addition of mean values $\mathbf{\mu_{x_1}}$ and $\mathbf{\mu_{x_2}}$:

$$
\mathbf{\mu_y} = \mathbf{A}_1 \cdot \mathbf{\mu_{x_1}} + \mathbf{A}_2 \cdot \mathbf{\mu_{x_2}} = \mathbf{A}_1 \cdot \mathbf{\mu_{x_1}} + \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \mathbf{\mu_{x_2}}
$$

and therefore

$$
\mathbf{y} - \mathbf{\mu_y}= \mathbf{A}_1 \cdot \underbrace{\left(\mathbf{x}_1 - \mathbf{\mu_{x_1}}\right)}_{\mathbf{x}_{1(c)}}+ \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \underbrace{\left(\mathbf{x}_2 - \mathbf{\mu_{x_2}}\right)}_{\mathbf{x}_{2(c)}} = \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} + \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \mathbf{x}_{2(c)} 
$$

Weighting matrix $\mathbf{A}_1$ shall be chosen such as to minimise the expectation of the quadratic norm $||\mathbf{y} - \mathbf{\mu_y}||^2$:

$$
E\left(\left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right) \right)
$$

Before computing the optimum weighting matrix the properties of expression
$$
\left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right)
$$
are explored and transformed into an expression more suitable to compute the optimum weighting matrix. This transformation involves several steps outlined below:

**step#1**


$$\begin{align}
\left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right) &= \left(\mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T  + \mathbf{x}_{2(c)}^T \cdot \left(\mathbf{I} - \mathbf{A}_1^T \right) \right) \cdot \left( \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} + \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \mathbf{x}_{2(c)} \right) \\
\ &= \mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} + \mathbf{x}_{2(c)}^T \cdot \left(\mathbf{I} - \mathbf{A}_1^T \right) \cdot \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} + 
\mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T \cdot \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \mathbf{x}_{2(c)}  + \mathbf{x}_{2(c)}^T \cdot \left(\mathbf{I} - \mathbf{A}_1^T \right) \cdot \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \mathbf{x}_{2(c)} 
\end{align}
$$

**step#2**

$$\begin{align}
\left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right) &= \left(\mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T  + \mathbf{x}_{2(c)}^T \cdot \left(\mathbf{I} - \mathbf{A}_1^T \right) \right) \cdot \left( \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} + \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \mathbf{x}_{2(c)} \right) \\
\ &= \left(\mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T  + \mathbf{x}_{2(c)}^T  - \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1^T \right) \cdot
\left(\mathbf{A}_1 \cdot \mathbf{x}_{1(c)} + \mathbf{x}_{2(c)} - \mathbf{A}_1 \cdot \mathbf{x}_{2(c)} \right)
\end{align}
$$

**step#3**

$$\begin{align}
\left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right) &= \mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{1(c)}  + \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{1(c)}  - \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} \\ 
 &+   \mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{x}_{2(c)}  + \mathbf{x}_{2(c)}^T \cdot \mathbf{x}_{2(c)}  - \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{x}_{2(c)} \\ 
 &-  \mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{2(c)}  - \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{2(c)}  + \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{2(c)}
\end{align}
$$

**step#4**

$$\begin{align}
\left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right) &= \mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} - \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} - \mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{2(c)} + \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{2(c)} \\
 &+ \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} + \mathbf{x}_{1(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{x}_{2(c)}    - \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1^T \cdot \mathbf{x}_{2(c)}  - \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{2(c)} + \mathbf{x}_{2(c)}^T \cdot \mathbf{x}_{2(c)} \\  
\end{align}
$$

and finally

**step#5**

$$
\left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right) = \left(\mathbf{x}_{1(c)}^T - \mathbf{x}_{2(c)}^T\right) \cdot \mathbf{A}_1^T \cdot \mathbf{A}_1 \cdot \left(\mathbf{x}_{1(c)} - \mathbf{x}_{2(c)}\right) + 2 \cdot \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} - 2 \cdot \mathbf{x}_{2(c)}^T \cdot \mathbf{A}_1 \cdot \mathbf{x}_{2(c)} + \mathbf{x}_{2(c)}^T \cdot \mathbf{x}_{2(c)}
$$

Now the expression has been tranformed into something which makes finding the optimum matrix $\mathbf{A}_1$ to minimise expectation $E\left(\left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right) \right)$ far easier. The next step requires to take derivatives of all matrix elements of matrix $\mathbf{A}_1$. A short review of **matrix derivatives** is helpful in this context.

---


## Matrix derivatives

Let $f(\mathbf{X})$ be a scalar function of a matrix. Arranging all derivatives $\frac{\partial}{\partial x_{k,\ m}}f(\mathbf{X})$ as a matrix is called a matrix derivative.  For a matrix $\mathbf{A} \ : \ \in \mathbb{R}^{K \times M}$ the matrix derivative is again a $K \times M$ matrix.

For the application in this notebook we need two forms of matrix derivatives:

**case#1a**

compute the matrix derivative 

$$
\frac{\partial \mathbf{a}^T \mathbf{X} \mathbf{b}}{\partial \mathbf{X}}
$$

The scalar function is:

$$
f(\mathbf{X}) = \mathbf{a}^T \mathbf{X} \mathbf{b} = \sum_{i=1}^K \sum_{j=1}^M a_i \cdot x_{i,\ j} \cdot b_j
$$

Taking the derivatives with respect to matrix element $\mathbf{x}_{k,\ m}$ yields:

$$
\frac{\partial}{\partial x_{k,\ m}}f(\mathbf{X}) = a_k \cdot b_m
$$

Arranging these derivative as a $K \times M$ matrix results in the outer product of vectors $\mathbf{a}$ and $\mathbf{b}$:

$$
\frac{\partial \mathbf{a}^T \mathbf{X} \mathbf{b}}{\partial \mathbf{X}} = \mathbf{a} \cdot \mathbf{b}^T
$$

**case#1b**

compute the matrix derivative 

$$
\frac{\partial \mathbf{b}^T \mathbf{X}^T \mathbf{a}}{\partial \mathbf{X}}
$$

The scalar function is:

$$
f(\mathbf{X}) = \mathbf{b}^T \mathbf{X}^T \mathbf{a} = \sum_{j=1}^M \sum_{i=1}^K b_j  \cdot x_{i,j} \cdot a_i
$$

Taking the derivatives with respect to matrix element $\mathbf{x}_{k,\ m}$ yields:

$$
\frac{\partial}{\partial x_{k,\ m}}f(\mathbf{X}) = a_k \cdot b_m
$$

So we get:

$$
\frac{\partial \mathbf{b}^T \mathbf{X}^T \mathbf{a}}{\partial \mathbf{X}} = \mathbf{a} \cdot \mathbf{b}^T
$$


**case#2**

compute the matrix derivative 

$$
\frac{\partial \mathbf{a}^T \mathbf{X}^T \mathbf{X} \mathbf{b}}{\partial \mathbf{X}}
$$

The scalar function is:

$$
f(\mathbf{X}) = \mathbf{a}^T \mathbf{X}^T \mathbf{X} \mathbf{b} = \left(\mathbf{X} \cdot \mathbf{a} \right)^T \cdot \left( \mathbf{X} \cdot \mathbf{b} \right)
$$

With vectors $\mathbf{c},\ \mathbf{d}$ defined by:

$$\begin{gather}
\mathbf{c} = \mathbf{X} \cdot \mathbf{a} \\
\mathbf{d} = \mathbf{X} \cdot \mathbf{b}
\end{gather}
$$

the scalar function is re-expressed as:

$$
f(\mathbf{X}) = \mathbf{c}^T \cdot \mathbf{d}
$$

The `m-th` element $c_m$ of vector $\mathbf{c}$ is:

$$
c_m = \sum_{j=1}^N x_{m,\ j} \cdot a_j
$$

The `n-th` element $d_n$ of vector $\mathbf{d}$ is:

$$
d_n = \sum_{i=1}^N x_{n,\ i} \cdot b_i
$$

Putting these equations into the expression of the scalar function yields:

$$
f(\mathbf{X}) = \sum_{m=1}^N c_m \cdot d_m = \sum_{m=1}^N \left(\sum_{j=1}^N x_{m,\ j} \cdot a_j \right) \cdot \left(\sum_{i=1}^N x_{m,\ i} \cdot b_i \right)
$$

Taking the partial derivative of $f(\mathbf{X})$ with respect to matrix element $x_{k,\ n}$:

$$
\frac{\partial f(\mathbf{X})}{\partial x_{k,\ n}} = a_n \cdot \left(\sum_{i=1}^N x_{k,\ i} \cdot b_i \right) + b_n \cdot \left(\sum_{i=1}^N x_{k,\ i} \cdot a_i \right) 
$$

Putting all derivatives into a matrix gives an expression for the matrix derivative:

$$
\frac{\partial \mathbf{a}^T \mathbf{X}^T \mathbf{X} \mathbf{b}}{\partial \mathbf{X}} = \mathbf{X} \cdot \left(\mathbf{a} \cdot \mathbf{b}^T + \mathbf{b} \cdot \mathbf{a}^T \right)
$$

---


Actually we need to compute the matrix derivative of the expectation

$$
\frac{\partial E\left( \left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right)\right)}{\partial \mathbf{A}}
$$

However in this notebook matrix derivative is computed first. In a second step the expectation of the matrix derivative is computed. (no formal proof provided that the order expectation / derivatives can be interchanged)

$$\begin{gather}
\frac{\partial \left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right)}{\partial \mathbf{A}} =
2 \cdot  \mathbf{A}_1 \cdot \left(\mathbf{x}_{1(c)} - \mathbf{x}_{2(c)}\right) \cdot \left(\mathbf{x}_{1(c)} - \mathbf{x}_{2(c)}\right)^T + 2 \cdot \mathbf{x}_{2(c)} \cdot \mathbf{x}_{1(c)}^T - 2 \cdot \mathbf{x}_{2(c)} \cdot \mathbf{x}_{2(c)}^T \\
\ = 2 \cdot  \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} \cdot \mathbf{x}_{1(c)}^T - 2 \cdot  \mathbf{A}_1 \cdot \mathbf{x}_{2(c)} \cdot \mathbf{x}_{1(c)}^T  
- 2 \cdot  \mathbf{A}_1 \cdot \mathbf{x}_{1(c)} \cdot \mathbf{x}_{2(c)}^T + 2 \cdot  \mathbf{A}_1 \cdot \mathbf{x}_{2(c)} \cdot \mathbf{x}_{2(c)}^T  + 2 \cdot \mathbf{x}_{2(c)} \cdot \mathbf{x}_{1(c)}^T - 2 \cdot \mathbf{x}_{2(c)} \cdot \mathbf{x}_{2(c)}^T
\end{gather}
$$

Taking expectations and exploiting the fact that vectors $\mathbf{x}_{1(c)}$ and $\mathbf{x}_{2(c)}$ are uncorrelated:

$$\begin{gather}
E\left(\frac{\partial \left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right)}{\partial \mathbf{A}}\right)  = 2 \cdot  \mathbf{A}_1 \cdot E\left( \mathbf{x}_{1(c)} \cdot \mathbf{x}_{1(c)}^T \right)   
 + 2 \cdot  \mathbf{A}_1 \cdot E\left(\mathbf{x}_{2(c)} \cdot \mathbf{x}_{2(c)}^T\right)  - 2 \cdot E\left( \mathbf{x}_{2(c)} \cdot \mathbf{x}_{2(c)}^T \right)
\end{gather}
$$

and simplifying notation using covariances matrices $\mathbf{\Sigma}_1 = E\left( \mathbf{x}_{1(c)} \cdot \mathbf{x}_{1(c)}^T \right)$ and $\mathbf{\Sigma}_2 = E\left( \mathbf{x}_{2(c)} \cdot \mathbf{x}_{2(c)}^T \right)$ :


$$
E\left(\frac{\partial \left(\mathbf{y} - \mathbf{\mu_y}\right)^T \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right)}{\partial \mathbf{A}}\right)  = 2 \cdot  \mathbf{A}_1 \cdot \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right) - 2 \cdot \mathbf{\Sigma}_2
$$

setting all derivatives to 0 yields:

$$\begin{align}
\mathbf{A}_1 \cdot \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right) &= \mathbf{\Sigma}_2 \\
\mathbf{A}_1 &= \mathbf{\Sigma}_2 \cdot \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right)^{-1}
\end{align}
$$

Matrix $\mathbf{A}_2$ is computed from

$$\begin{align}
\mathbf{A}_2 &= \mathbf{I} - \mathbf{A}_1 \\
&= \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right) \cdot \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right)^{-1} - \mathbf{\Sigma}_2 \cdot \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right)^{-1} \\
&= \left(\mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 - \mathbf{\Sigma}_2 \right) \cdot \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right)^{-1} \\
&= \mathbf{\Sigma}_1 \cdot \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right)^{-1}
\end{align}
$$

---
---

## Summary / Sum of 2 Vector Estimates

Here is an overview of the basic facts about summing two vector estimates with optimum weigting matrices:


$$
\mathbf{y} = \mathbf{A}_1 \cdot \mathbf{x}_1 + \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \mathbf{x}_2
$$

In this equation we used the constraint $\mathbf{A}_1 + \mathbf{A}_2 = \mathbf{I}$.

In the literature the expression $\mathbf{I} - \mathbf{A}_1$ is often referred to as the `Kalman` gain $\mathbf{K}$.

$$
\mathbf{K} = \mathbf{I} - \mathbf{A}_1 = \mathbf{\Sigma}_1 \cdot \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right)^{-1}
$$

For the optimum weigthing matrix the following relationship to the covariance matrices $\mathbf{\Sigma}_1$ and $\mathbf{\Sigma}_2$ has been derived:

$$
\mathbf{A}_1 = \mathbf{\Sigma}_2 \cdot \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right)^{-1}
$$

**mean value of $\mathbf{y}$**

$$\begin{align*}
\mathbf{\mu_{y}} &= \left(\mathbf{I} - \mathbf{K}\right) \cdot \mathbf{\mu_{x_1}} + \mathbf{K} \cdot \mathbf{\mu_{x_2}} \\
\mathbf{\mu_{y}} &= \mathbf{\mu_{x_1}} + \mathbf{K} \cdot \left(\mathbf{\mu_{x_2}} - \mathbf{\mu_{x_1}}\right)
\end{align*}
$$


**Covariance Matrix**

The covariance matrix $\mathbf{\Sigma}_y$ is computed from:

$$
\mathbf{\Sigma_y} = E( \left(\mathbf{y} - \mathbf{\mu_y}\right) \cdot \left(\mathbf{y} - \mathbf{\mu_y}\right)^T ) 
$$

It is related to weighting matrix $\mathbf{A}_1$ and covariance matrices $\mathbf{\Sigma}_1$ and $\mathbf{\Sigma}_2$ by this equation:

$$
\mathbf{\Sigma_y} = \mathbf{A}_1 \cdot \mathbf{\Sigma_1} \cdot \mathbf{A}_1^T + \left(\mathbf{I} - \mathbf{A}_1 \right) \cdot \mathbf{\Sigma_2} \cdot \left(\mathbf{I} - \mathbf{A}_1^T \right)
$$

Inserting the expression for the optimum weighting matrix yields.

$$\begin{align}
\mathbf{\Sigma_y} &= \left(\mathbf{I} - \mathbf{K}  \right) \cdot \mathbf{\Sigma_1} \cdot \left(\mathbf{I} - \mathbf{K}  \right)^T + \mathbf{K} \cdot \mathbf{\Sigma_2} \cdot \mathbf{K}^T \\
  &= \mathbf{\Sigma_1} - 2 \cdot \mathbf{\Sigma_1} \cdot \mathbf{K}^T + \mathbf{K} \cdot \left(\mathbf{\Sigma_1} + \mathbf{\Sigma_2}\right) \cdot \mathbf{K}^T = \mathbf{\Sigma_1} - 2 \cdot \mathbf{\Sigma_1} \cdot \mathbf{K}^T + \mathbf{\Sigma}_1 \cdot \left( \mathbf{\Sigma}_1 + \mathbf{\Sigma}_2 \right)^{-1} \cdot \left(\mathbf{\Sigma_1} + \mathbf{\Sigma_2}\right) \cdot \mathbf{K}^T \\
\mathbf{\Sigma_y} &= \mathbf{\Sigma_1} - \mathbf{K} \cdot \mathbf{\Sigma_1} 
\end{align}
$$

In the last equation we exploited the symmetry property of covariance matrices  ($\mathbf{\Sigma} = \mathbf{\Sigma}^T$ .
