# Derivation and Abuse of the EKF Method for Real-World Slammage

### Preface
This guide page assumes a relative degree of comfort with probability and linear algebra. None of the math here is super nasty but there are a lot of steps so being comfortable with matrix operations (i.e. multiplications, what is a Transposition and why is it used, what is a Jacobian and why is it used), and clearly understanding what an expectation of a distribution is will go a long way. 

### Part 1 - The Derivation

#### 1.0 - Preliminaries

For now we will just focus on the robot's state, $x_k$. We will define the state vector distribution as some function of the previous state distribution, $f(x_{k-1})$, with zero-mean gaussian noise, $w_{k-1}$, added to it to account for jump uncertainty (for example, uncertainty in the path-planning algo). We will define our observation vector distribution, $z_k$, as some function of the current state, $f(x_k)$, with zero-mean gaussian noise, $v_{k}$, added to it to account for observation uncertainty (sensor noise). Our cookage manifests the following two equations:


>$$
x_k = f(x_{k-1}) + w_{k-1} 
\\[5pt]
z_k = f(x_k) + v_{k-1} 
\\[10pt]
$$

We define the initial state, $x_0$, as a random vector with known mean $\mu_0 = \mathbb{E}[x_0]$ and covariance $P_0 = \mathbb{E}\left[(x_0 - \mu_0)(x_0 - \mu_0)^T\right]$ 

From here, we are going to make a few assertions:

1. Noise vector distributions have a zero-valued mean (I explained the intuition behind this in the previous part): $\quad\mathbb{E}[w_k] = 0, \quad\mathbb{E}[v_k] = 0$  

2. The two types of noise vectors never correlate with each other: $\quad\mathbb{E}[w_k v_k^T j] = 0 \quad \forall k, j$  

3. Neither noise vector has any correlation with the start point: $\quad\mathbb{E}[w_k x_0^T] = 0 \quad \mathbb{E}[v_k x_0^T] = 0 \quad \forall k$  

4. Neither noise vector has any correlation whatsoever with any of its' predecessors or successors: $\quad\mathbb{E}[w_k w_j^T] = 0, \quad \mathbb{E}[v_k v_j^T] = 0 \quad  \forall k \neq j$

5. Vector variance (some sources use the term covariance but I think this is silly, because covariance with oneself is just variance) is represented with the following two matrices: $\quad\mathbb{E}[w_k w_k^T] = Q_k, \quad \mathbb{E}[v_k v_k^T] = R_k$


We also make the assumption that functions $f(\cdot)$ and $h(\cdot)$ as well as their first-order derivatives are continuous on the given domain.

To summarize, we have made a laundry-list of assertions to ensure that no Brimless Yankee activities will break what we do next. Dimensionality may be implictly known by now but nonetheless it is summarized below:


$$
x_k, \quad n \times 1 \quad\text{-- State vector at time step} \space k \newline
w_k, \quad n \times 1 \quad\text{-- Process noise vector} \newline
z_k, \quad m \times 1 \quad\text{-- Observation vector at time step} \space k \newline
v_k, \quad m \times 1 \quad\text{-- Measurement noise vector} \newline
f(\cdot), \quad n \times 1 \quad\text{-- Process nonlinear vector function} \newline
h(\cdot), \quad m \times 1 \quad\text{-- Observation nonlinear vector function} \newline
\mathbb{Q_k}, \quad n \times n \quad\text{-- Process noise covariance matrix} \newline
\mathbb{R_k}, \quad m \times m \quad\text{-- Measurement noise covariance matrix} \newline
$$


> [!NOTE]
>
>In most (all?) real-world use-cases as far as SLAM is concerned, n should always occupy a value between 1 and 3 inclusive. However, in order not to anger our high-dimensional overlords (the same mf's responsible for various lighthearted glitch-in-the-matrix activities such as ensuring that one runs into some very specific individual at very specific and unflattering times in a way and at a frequency that probabilistically just doesn't make sense), we are keeping n abstractly defined. As you may imagine, m is abstractly defined because it depends on the specifics of how the sensor suite collects data.

#### 1.1 - Model Forecast Step

This is analogous to the "Time Update" part of our generalized SLAM algorithm. In the beginning, the only information have is the mean, $\mu_0$. We use this to obtain our initial optimal estimate, $x_0^a$ and variance, $\mathbb{P_0}$ in the following manner:

$$
x_0^a = \mu_0 = \mathbb{E}[x_0]
\\[5pt]
\mathbb{P_0} = \mathbb{E}[(x_0 - x_0^a)(x_0 - x_0^a)^T]
$$

This is intuitive for our initial optimal estimate because we would expect it to be at the starting point that is by definition the most likely. 


### Part 2 - The Abuse


