In [None]:
import resources.workspace as ws

$
% START OF MACRO DEF
% DO NOT EDIT IN INDIVIDUAL NOTEBOOKS, BUT IN macros.py
%
\newcommand{\Reals}{\mathbb{R}}
\newcommand{\Expect}[0]{\mathbb{E}}
\newcommand{\NormDist}{\mathcal{N}}
%
\newcommand{\DynMod}[0]{\mathscr{M}}
\newcommand{\ObsMod}[0]{\mathscr{H}}
%
\newcommand{\mat}[1]{{\mathbf{{#1}}}}
%\newcommand{\mat}[1]{{\pmb{\mathsf{#1}}}}
\newcommand{\bvec}[1]{{\mathbf{#1}}}
%
\newcommand{\trsign}{{\mathsf{T}}}
\newcommand{\tr}{^{\trsign}}
\newcommand{\tn}[1]{#1}
\newcommand{\ceq}[0]{\mathrel{≔}}
%
\newcommand{\I}[0]{\mat{I}}
\newcommand{\K}[0]{\mat{K}}
\newcommand{\bP}[0]{\mat{P}}
\newcommand{\bH}[0]{\mat{H}}
\newcommand{\bF}[0]{\mat{F}}
\newcommand{\R}[0]{\mat{R}}
\newcommand{\Q}[0]{\mat{Q}}
\newcommand{\B}[0]{\mat{B}}
\newcommand{\C}[0]{\mat{C}}
\newcommand{\Ri}[0]{\R^{-1}}
\newcommand{\Bi}[0]{\B^{-1}}
\newcommand{\X}[0]{\mat{X}}
\newcommand{\A}[0]{\mat{A}}
\newcommand{\Y}[0]{\mat{Y}}
\newcommand{\E}[0]{\mat{E}}
\newcommand{\U}[0]{\mat{U}}
\newcommand{\V}[0]{\mat{V}}
%
\newcommand{\x}[0]{\bvec{x}}
\newcommand{\y}[0]{\bvec{y}}
\newcommand{\z}[0]{\bvec{z}}
\newcommand{\q}[0]{\bvec{q}}
\newcommand{\br}[0]{\bvec{r}}
\newcommand{\bb}[0]{\bvec{b}}
%
\newcommand{\bx}[0]{\bvec{\bar{x}}}
\newcommand{\by}[0]{\bvec{\bar{y}}}
\newcommand{\barB}[0]{\mat{\bar{B}}}
\newcommand{\barP}[0]{\mat{\bar{P}}}
\newcommand{\barC}[0]{\mat{\bar{C}}}
\newcommand{\barK}[0]{\mat{\bar{K}}}
%
\newcommand{\D}[0]{\mat{D}}
\newcommand{\Dobs}[0]{\mat{D}_{\text{obs}}}
\newcommand{\Dmod}[0]{\mat{D}_{\text{obs}}}
%
\newcommand{\ones}[0]{\bvec{1}}
\newcommand{\AN}[0]{\big( \I_N - \ones \ones\tr / N \big)}
%
% END OF MACRO DEF
$
In this tutorial we shall derive:

# the Kalman filter for multivariate systems.  

The [forecast step](T3%20-%20Univariate%20Kalman%20filtering.ipynb#The-forecast-step)
remains essentially unchanged.
The only difference is that $\DynMod$ is now a matrix, as well as the use of the transpose ${}^T$ in the covariance equation:
$\begin{align}
\bb_k
&= \DynMod_{k-1} \hat{\x}_{k-1} \, , \tag{1a} \\\
\B_k
&= \DynMod_{k-1} \bP_{k-1} \DynMod_{k-1}^T + \Q_{k-1} \, . \tag{1b}
\end{align}$

However, the analysis step [[Exc 2.18](T2%20-%20Bayesian%20inference%20%26%20Gaussians.ipynb#Exc--2.18-'Gaussian-Bayes':)] gets a little more complicated...

#### Exc 2 (The likelihood):
<mark><font size="-1">
The analysis step is only concerned with a single time (index). We therefore drop the $k$ subscript in the following.
</font></mark>

Suppose the observation, $\y$, is related to the true state, $\x$, via a (possibly rectangular) matrix, $\bH$:
\begin{align*}
\y &= \bH \x + \br \, , \;\; \qquad (2)
\end{align*}
where the noise follows the law $\br \sim \NormDist(\bvec{0}, \R)$ for some $\R>0$ (i.e. $\R$ is symmetric-positive-definite).


Derive the expression for the likelihood, $p(\y|\x)$.

In [None]:
# ws.show_answer('Likelihood derivation')

The following exercise derives the analysis step

#### Exc 4 (The 'precision' form of the KF):
Similarly to [Exc 2.18](T2%20-%20Bayesian%20inference%20%26%20Gaussians.ipynb#Exc--2.18-'Gaussian-Bayes':),
it may be shown that the prior $p(\x) = \NormDist(\x \mid \bb,\B)$
and likelihood $p(\y|\x) = \NormDist(\y \mid \bH \x,\R)$,
yield the posterior:
\begin{align}
p(\x|\y)
&= \NormDist(\x \mid \hat{\x}, \bP) \tag{4}
\, ,
\end{align}
where the posterior/analysis mean (vector) and covariance (matrix) are given by:
\begin{align}
			\bP &= (\bH\tr \Ri \bH + \Bi)^{-1} \, , \tag{5} \\
			\hat{\x} &= \bP\left[\bH\tr \Ri \y + \Bi \bb\right] \tag{6} \, ,
\end{align}
Prove eqns (4-6).  
Hint: as in [Exc 2.18](T2%20-%20Bayesian%20inference%20%26%20Gaussians.ipynb#Exc--2.18-'Gaussian-Bayes':), the main part lies in "completing the square" in $\x$.

In [None]:
# ws.show_answer('KF precision')

<mark><font size="-1">
We have now derived (one form of) the Kalman filter. In the multivariate case,
we know how to:
<ul>
  <li>Propagate our estimate of $\x$ to the next time step using eqns (1a) and (1b). </li>
  <li>Update our estimate of $\x$ by assimilating the latest observation $\y$, using eqns (5) and (6).</li>
</ul>
</font></mark>

However, the computations can be pretty expensive...

#### Exc 5: Suppose $\x$ is $M$-dimensional and has a covariance matrix $\B$.
 * (a). What's the size of $\B$?
 * (b). How many "flops" (approximately, i.e. to leading order) are required  
 to compute the "precision form" of the KF update equation, eqn (5) ?
 * (c). How much memory (bytes) is required to hold its covariance matrix $\B$ ?
 * (d). How many megabytes is this if $M$ is a million?

In [None]:
# ws.show_answer('Cov memory')

This is one of the principal reasons why basic extended KF is infeasible for DA.  
The following derives another, often more practical, form of the KF analysis update.

#### Exc 6 (The "Woodbury" matrix inversion identity):
The following is known as the Sherman-Morrison-Woodbury lemma/identity,
$$\begin{align}
    \bP = \left( \B^{-1} + \V\tr \R^{-1} \U \right)^{-1}
    =
    \B - \B \V\tr \left( \R + \U \B \V\tr \right)^{-1} \U \B \, ,
    \tag{W}
\end{align}$$
which holds for any (suitably shaped matrices)
$\B$, $\R$, $\V,\U$ *such that the above exists*.

Prove the identity. Hint: don't derive it, just prove it!

In [None]:
# ws.show_answer('Woodbury')

#### Exc 7:
- Show that $\B$ and $\R$ must be square.
- Show that $\U$ and $\V$ are not necessarily square, but must have the same dimensions.
- Show that $\B$ and $\R$ are not necessarily of equal size.


Exc 7 makes it clear that the Woodbury identity may be used to compute $\bP$ by inverting matrices of the size of $\R$ rather than the size of $\B$.
Of course, if $\R$ is bigger than $\B$, then the identity is useful the other way around.

#### Exc 8 (Corollary 1):
Prove that, for any symmetric, positive-definite (SPD) matrices $\R$ and $\B$, and any matrix $\bH$,
$$\begin{align}
 	\left(\bH\tr \R^{-1} \bH + \B^{-1}\right)^{-1}
    &=
    \B - \B \bH\tr \left( \R + \bH \B \bH\tr \right)^{-1} \bH \B \tag{C1}
    \, .
\end{align}$$
Hint: consider the properties of [SPD](https://en.wikipedia.org/wiki/Definiteness_of_a_matrix#Properties) matrices.

In [None]:
# ws.show_answer('Woodbury C1')

#### Exc 10 (Corollary 2):
Prove that, for the same matrices as for Corollary C1,
$$\begin{align}
	\left(\bH\tr \R^{-1} \bH + \B^{-1}\right)^{-1}\bH\tr \R^{-1}
    &= \B \bH\tr \left( \R + \bH \B \bH\tr \right)^{-1}
    \tag{C2}
    \, .
\end{align}$$

In [None]:
# ws.show_answer('Woodbury C2')

#### Exc 12 (The "gain" form of the KF):
Now, let's go back to the KF, eqns (5) and (6). Since $\B$ and $\R$ are covariance matrices, they are symmetric-positive. In addition, we will assume that they are full-rank, making them SPD and invertible.  

Define the Kalman gain by:
 $$\begin{align}
    \K &= \B \bH\tr \big(\bH \B \bH\tr + \R\big)^{-1} \, . \tag{K1}
\end{align}$$
 * (a) Apply (C1) to eqn (5) to obtain the Kalman gain form of analysis/posterior covariance matrix:
$$\begin{align}
    \bP &= [\I_M - \K \bH]\B \, . \tag{8}
\end{align}$$

* (b) Apply (C2)  to (5) to obtain the identity
$$\begin{align}
    \K &= \bP \bH\tr \R^{-1}  \, . \tag{K2}
\end{align}$$

* (c) Show that $\bP \Bi = [\I_M - \K \bH]$.
* (d) Use (b) and (c) to obtain the Kalman gain form of analysis/posterior covariance
$$\begin{align}
     \hat{\x} &= \bb + \K\left[\y - \bH \bb\right] \, . \tag{9}
\end{align}$$

Together, eqns (8) and (9) define the Kalman gain form of the KF update.
The inversion (eqn 7) involved is of the size of $\R$, while in eqn (5) it is of the size of $\B$.

## In summary:
We have derived two forms of the multivariate KF analysis update step: the "precision matrix" form, and the "Kalman gain" form. The latter is especially practical when the number of observations is smaller than the length of the state vector.

### Next: [Time series analysis](T5%20-%20Time%20series%20analysis.ipynb)