# The Information Filter

Otherwise known as the *Kalman Dual*, the *Information Filter* (IF) is based upon the **canonical representation** of the Gaussian function --

*Singular Dimension*:
$$
\omega = \sigma^{-2} \\
\xi = \sigma^{-2} \mu \\
p(x) = \eta e^{-\frac{1}{2}x^2 \omega+ \xi x}
$$

*Multi-dimensional*:
$$
\Omega = \Sigma^{-1} \\
\xi = \Sigma^{-1} \mu \\
p(x) = \eta e^{-\frac{1}{2}x^\top \Omega x + x^\top \xi}
$$

Notice the difference in this form: the exponential is represented as a quadratic equation. This can allow us alternative representations when we're faced with numerical accuracy constraints. For example, when faced with multiplying two R.V.s, we can use the logarithmic approach:

$$
p(x) \cdot p(y) \rightarrow \ln(p(x) \cdot p(y)) = \eta_{p(x)} \eta_{p(y)} \cdot \left(-\frac{1}{2}x^\top \Omega_{p(x)} x + x^\top \xi_{p(x)} + -\frac{1}{2}y^\top \Omega_{p(y)} y + y^\top \xi_{p(y)} \right)
$$

And returning back to our standard representations of $\mu$ and $\Sigma$, we just invert $\Omega$:
$$
\Sigma = \Omega^{-1} \\
\mu = \Omega^{-1} \xi
$$

## The IF Algorithm
The algorithm still assumes that both state transition and sensor measurement are linear functions. That is:
$$
x_t = Ax_{t-1} + B_t u_t + \epsilon_t \\
z_t = C_t x_t + \delta_t
$$

**InformationFilter**($\xi_{t-1}, \Omega_{t-1}, u_t, z_t$):
* Predict $\xi_t , \Omega_t$
  * $ \overline{\Omega}_t = (A_t\Omega_{t-1}^{-1} A_t^\top + R_t)^{-1} $
  * $ \overline{\xi}_t = \overline{\Omega}_t(A_t \Omega_{t-1}^{-1} \xi_{t-1} + B_t u_t) $
* Correct estimate of $\xi_t , \Omega_t$
  * $ \Omega_t = C_t^\top Q_t^{-1} C_t + \overline{\Omega}_t $
  * $ \xi_t = C_t^\top Q_t^{-1} z_t + \overline{\xi}_t $
* Return $\xi_t, \Omega_t$

# The Extended Information Filter
Just like EKF, there is an EIF following the same linearization strategy as before. *Unfortunately*, we cannot exercise this extenstion using solely the *information vector* $\xi_x$ and *precision matrix* $\Omega_x$ -- we need to transform back to $\mu_x$ to apply the non-linear transform function $g(x)$:

**ExtendedInformationFilter**($\xi_{t-1}, \Omega_{t-1}, u_t, z_t$):
* Predict $\xi_t , \Omega_t$
  * $ \mu_{t-1} = \Omega_{t-1}^{-1} \xi_{t-1} $
  * $ \overline{\mu}_t = g(u_t, \mu_{t-1}) $
  * $ \overline{\Omega}_t = (G_t\Omega_{t-1}^{-1} G_t^\top + R_t)^{-1} $
  * $ \overline{\xi}_t = \overline{\Omega}_t \overline{\mu}_t $
* Correct estimate of $\xi_t , \Omega_t$
  * $ \Omega_t = \overline{\Omega}_t + H_t^\top Q_t^{-1} H_t $
  * $ \xi_t = \overline{\xi}_t + H_t^\top Q_t^{-1} \left[ z_t - h(\overline{\mu}_t) + H_t \overline{\mu}_t \right] $
* Return $\xi_t, \Omega_t$

# All Things Considered
As in all things engineering, there are advantages and disadvantages to this approach...

* It's really easy to represent universal uncertainty: just set $\Omega = 0$, but then all of the equations break down (i.e., can't invert the zero-matrix)!
* IF is more computationally stable than KF. With a little bit of tweeking, this is immensely helpful when dealing with exceptionally large state vectors (i.e., 100s of variables)
* When applying multiple sources of decentralized information, the logarithmic form is very stable.
* The inversion of the precision matrix for EIF *sucks*, especially when dealing with high-dimensional state spaces. Because of this, EKF is favored.
* The precision matrix can be thought of as a graph of a *Markov random field*; sparseness within that matrix connotes sparseness within the graph.