# Course 3: Localization
## Part 1: Markov Localization in Theory
#### By Jonathan L. Moran (jonathan.moran107@gmail.com)
From the Self-Driving Car Engineer Nanodegree programme offered at Udacity.

## Objectives

* Apply the [Bayes' theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem) to vehicle localisation;
* Practise computing posterior probabilities for several observations;
* Use the [Markov assumption](https://en.wikipedia.org/wiki/Markov_chain) and [law of total probability](https://en.wikipedia.org/wiki/Law_of_total_probability) to initialise a [Bayes' filter](https://en.wikipedia.org/wiki/Recursive_Bayesian_estimation) with meaningful estimates.

## 1. Introduction

In [1]:
### Importing required modules

In [2]:
from decimal import Decimal
import numpy as np
import pandas as pd
import os

In [3]:
!python --version

In [4]:
### Setting environment variables and parameters

In [5]:
ENV_COLAB = False               # True if running in Google Colab instance

In [6]:
# Root directory
DIR_BASE = '' if not ENV_COLAB else '/content/3-Localization'
DIR_BASE = os.path.abspath(DIR_BASE)
DIR_BASE

'/Users/jonathanmoran/Development/ND0013-Self-Driving-Car-Engineer/3-Localization/3-1-Markov-Localization'

In this part of the Markov Localization course we set up the foundations necessary to implement the [Bayes' filter](https://en.wikipedia.org/wiki/Recursive_Bayesian_estimation) for robot localisation. In this notebook we will not be writing much code, as we leave our C++ implementation tasks to the second notebook, [`2022-11-25-Course-3-Localization-Exercises-Part-2.ipynb`](). Instead, we will be practising working out the Bayes' theorem calculations by hand using probability values computed for simulated data.

### 1.1. Bayes' Filter for Localisation

#### Introduction

In order to apply the [Bayes' theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem) to vehicle localisation, we must first define our state and observation variables:
* **Observation vector**: $z_{1:t}$ — contains the sensor data, e.g., range measurements, bearing angles, images, etc., from time $t=0$ to present;
* **Control vector**: $u_{1:t}$ — contains the vehicle control data, e.g., yaw / pitch / roll rates, velocity, etc., from time $t=0$ to present;
* **Map**: $m$ — the map data, e.g., discretised grid space, feature maps, landmark data, etc.;
* **Vehicle pose**: $x_{t}$ — the vehicle pose data, e.g., the 2D position $(x, y)$ and orientation angle $\theta$.

#### The belief state

We define the belief state $bel\left(x_{t}\right)$, i.e., where we think the vehicle is at the current time-step $x_{t}$, as the likelihood given all prior observation and control history, including the world state information provided by the map $m$. Putting this into an expression, we obtain:

$$
\begin{align}
bel\left(x_{t}\right) = p\left(x_{t} \vert z_{1:t}, u_{1:t}, m\right).
\end{align}
$$

Given this expression we know that in order to work out the belief state update, we need all prior observation and control history. When these vectors $z_{1:t}$ and $u_{1:t}$ cover a short duration of time, say, a duration of $t=1$ to $t=10$ seconds, the belief state update can be computed without much hesitation. However, when a vehicle has been collecting data over a longer period of time, say, the six-hour long road trip from Los Angeles to San Francisco, the belief state update quickly becomes computationally intractable. Let's demonstrate this in a quick example...

Suppose our test vehicle is sent on a six-hour test drive from Los Angeles to San Francisco. Assuming we have a LiDAR sensor refreshing at $10 Hz$ (i.e., 10 observations per second), which is capturing 100000 data points per observation. We also know that each of the 100000 observation data points contain five readings (points `id`, range, two inclination angles, and reflectivity info) — each of which are four bytes each. Computing the total data captured over this six-hour ride, we have:

$$
\begin{align}
size(z_{1:t}) &= 6 \textrm{ hours} \times \frac{3600 \textrm{ seconds}}{\textrm{hour}} \times \frac{10 \textrm{ cycles}}{\textrm{seconds}} \times \frac{100000 \textrm{ observation}}{\textrm{cycle}} \times \frac{5 \textrm{ data points}}{\textrm{observation}} \times \frac{4 \textrm{ bytes}}{\textrm{data point}},
\end{align}
$$

In other words, during that six-hour drive we accumulate a measurement vector $z_{1:t}$ size of $432,000,000,000$ bytes, or $\approx 400$ GiB. That's a **lot** of data — so much so, that we'd need hundreds of GiBs of space just to store a _single_ update of the localisation posterior. This clearly won't scale beyond a few seconds of driving time.

#### Bayes' theorem

In the above example we learned that the observation vector $z_{1:t}$ can be extremely large in data size, and therefore we will not want to carry the entire observation history in order to estimate the state beliefs. In this section we show that by manipulating the localisation posterior $p\left(x_{t} \vert z_{1:t}, u_{1:t}, m\right)$ we obtain a recursive state estimator. In other words, we can reduce the current belief as an expression of the belief state from only one step earlier, i.e., $bel\left(x_{t-1}\right)$. Then, we update the current belief $bel\left(x_{t}\right)$ with only new observation data. With this, we are able to restrict the update to only the measurement and control data from the previous time-step.

To achieve this recursive structure, we apply the [Bayes' theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem) and [law of total probability](https://en.wikipedia.org/wiki/Law_of_total_probability). The first step in reducing our dependence on the entire observation history is to split up the vector $z_{1:t}$ into the recursive form, i.e., $z_{1:t} \rightarrow z_{t}, z_{1:t-1}$. Applying Bayes' theorem with multiple distributions (i.e., the likelihood, prior, and normalising constant), we are able to form the following expression of the Bayes' formula:

$$
\begin{align}
p\left(x_{t} \vert z_{t}, z_{1:t-1}, u_{1:t}, m\right) = \frac{p\left(z_{t} \vert x_{t}, z_{1:t-1}, u_{1:t}, m\right)p\left(x_{t} \vert z_{1:t-1}, u_{1:t}, m\right)}{p\left(z_{t} \vert z_{1:t-1}, u_{1:t}, m\right)}.
\end{align}
$$

Here we have swapped the state and observation vectors with their previous state at $t-1$, then conditioned the probabilities based on the random variables $u_{1:t}$ and $m$. We call the likelihood term the observation model which describes the probability distribution of the observation vector under the assumption that the previous state $x_{t}$, all previous observations $z_{1:t-1}$ and all controls $u_{1:t}$, as well as the map $m$ are given. The prior probability here is the motion model, i.e., the probability distribution of current state $x_{t}$ given all previous observations $z_{1:t-1}$, all controls $u_{1:t}$, and the map $m$. Note that no current observations $z_{t}$ are included in the motion model assumption here. For the normalisation term, we define a factor $\eta$ such that,

$$
\begin{align}
p\left(x_{t} \vert z_{t}, z_{1:t-1}, u_{1:t}, m\right) &= \mathrm{\large \eta} \times \frac{p\left(z_{t} \vert x_{t}, z_{1:t-1}, u_{1:t}, m\right)p\left(x_{t} \vert z_{1:t-1}, u_{1:t}, m\right)}{1}.
\end{align}
$$

Summing the products of the observation and motion models over all possible states $x_{t}^{(i)}$ we obtain,
$$
\begin{align}
\mathrm{\Large \eta} &= \frac{1}{p\left(z_{t} \vert z_{1:t-1}, u_{1:t}, m\right)}  = \frac{1}{\sum_{i} p\left(z_{t} \vert x_{t}^{(i)}, z_{1:t-1}, u_{1:t}, m\right) p\left(x_{t}^{(i)} \vert z_{1:t-1}, u_{1:t}, m\right)}.
\end{align}
$$

#### Law of total probability

In order to estimate the vehicle state at time $t=0$, we need prior probabilities and likelihoods at a previous time-step $t-1$. Using the law of total probability, we obtain

$$
\begin{align}
p\left(x_{t} \vert z_{1:t-1}, u_{1:t}, m\right) = \int p\left(x_{t}, x_{t-1}, z_{1:t-1}, u_{1:t}, m\right)p\left(x_{t-1}, z_{1:t-1}, u_{1:t}, m\right) dx_{t-1},
\end{align}
$$

On initialisation, i.e., at time $t=0$, we can eliminate from the integral the conditions $z_{1:t-1}, u_{1:t}, m$ and the probability distribution of $x_{t-1}$ itself. However, since we have conditions in our target distribution, we have to assume that we (hypothetically) know the state at which the system is in at time-step $t-1$. We can therefore eliminate the need for past observations $z_{1:t-1}$, and controls $u_{1:t-1}$, since they would not provide us additional information needed to estimate the posterior $x_{t}$ as they were already used to estimate the previous state $x_{t-1}$. With this first assumption we simplify $p\left(x_{t} \vert z_{1:t-1}, u_{1:t}, m\right)$ to $p\left(x_{t} \vert x_{t-1}, u_{t}, m\right)$.

We also make a second assumption — since we consider $u_{t}$ from the previous time-step $t-1$, we can eliminate this term as it does not provide us with any additional information we could use to form an estimate about the previous time-step $x_{t-1}$. Therefore, we simplify $p\left(x_{t-1} \vert z_{1:t-1}, u_{1:t}, m\right)$ to $p\left(x_{t-1} \vert z_{1:t-1}, u_{1:t-1}, m\right)$, which restricts the controls vector to information up to only the previous time-step given for $x_{t-1}$. 

#### Markov assumption

We now have the basis needed to form a recursive state estimator. Using the first-order Markov assumption, we estimate the posterior $p\left(x_{t} \vert x_{1:t-1}\right)$ using a Markov chain. In other words, we form a prediction of the current state $x_{t}$ given only the state $x_{t-1}$ at the previous time-step. With the Markov assumption we know that the previous time-step serves as the best predictor for the next-time step, and with a complete state assumption we eliminate all preceding / successive states before and after the chain formed by $x_{t-1} \leftrightarrow x_{t}$.

Since we assume that $x_{t}$ only depends on the previous state $x_{t-1}$, we can re-write the posterior as follows:

$$
\begin{align}
p\left(x_{t}\vert x_{1:t-1}\right) = p\left(x_{t}\vert x_{t-1}\right),
\end{align}
$$

In order to apply this to the motion model, we need to split the motion model $u_{1:t}$ into the current $u_{t}$ and all previous control states $u_{1:t-1}$. Following suit with the simplifications in the law of total probability steps, we re-write the integral expression of the probability distribution of $x_{t}$,

$$
\begin{align}
 p\left(x_{t}, x_{t-1}, z_{1:t-1}, u_{1:t}, m\right) \rightarrow p\left(x_{t} \vert x_{t-1}, u_{t}, m\right),
\end{align}
$$

such that it is no longer conditioned on all previous observations and all previous controls. Applying the Markov assumption the first time, we obtain:

$$
\begin{align}
 p\left(x_{t} \vert z_{1:t-1}, u_{1:t}, m\right) &= \int p\left(x_{t} \vert x_{t-1}, u_{t}, m\right) p\left(x_{t-1} \vert z_{1:t-1}, u_{1:t}, m\right) dx_{t-1}.
\end{align}
$$

The first-order Markov assumption we derived is commonly referred to as the system/transition model. We further simplify this expression by eliminating the map $m$ from the first term in the integral as it does not influence the likelihood of the state $x_{t}$. We also simplify the second term, i.e., the posterior distribution of $x_{t-1}$ by applying the Markov assumption again. We assume that $u_{t}$ tells us nothing about $x_{t-1}$, since it is with respect to a "future" time-step. Therefore, we can ignore $u_{t}$ in the estimate of the previous state $x_{t-1}$ and re-write the motion model as follows:

$$
\begin{align}
 p\left(x_{t} \vert z_{1:t-1}, u_{1:t}, m\right) &= \int p\left(x_{t} \vert x_{t-1}, u_{t}, m\right) p\left(x_{t-1} \vert z_{1:t-1}, u_{1:t-1}, m\right) dx_{t-1}.
\end{align}
$$

The result of applying the Markov assumption twice gives us an expression of the posterior $p\left(x_{t-1} \vert z_{1:t-1}, u_{1:t-1}, m\right)$ which is nothing but the belief $x_{t-1}$ from the previous time-step. Therefore, we have successfully achieved a recursive structure,

$$
\begin{align}
 p\left(x_{t} \vert z_{1:t-1}, u_{1:t}, m\right) &= \int p\left(x_{t} \vert x_{t-1}, u_{t}, m\right) bel\left(x_{t-1}\right) dx_{t-1},
\end{align}
$$

which is completely independent from the entire control and observation history. Putting together the new recursive structure of the motion model and the previous derivation of the belief state $bel\left(x\right)$ given by the Bayes' formula, we have:

$$
\begin{align}
bel\left(x\right) = p\left(x_{t} \vert z_{t}, z_{1:t-1}, u_{1:t}, m\right) = \frac{p\left(z_{t} x_{t}, z_{1:t-1}, u_{1:t}, m\right) \times p\left(x_{t} \vert z_{1:t-1}, u_{1:t}, m\right)}{p\left(z_{t} \vert z_{1:t-1}, u_{1:t}, m\right)}.
\end{align}
$$

### 1.2. Discretised Motion Model

We will be using the following equations to represent the motion and transition models in the 1-D case:
* **Discretised motion model**: $\sum_{i}p\left(x_{t} \vert x_{t-1}^{(i)}, u_{t}, m\right)bel\left(x_{t-1}^{(i)}\right)$;
* **Transition model**: $p\left(x_{t} \vert x_{t-1}^{(i)}, u_{t}, m\right)$;
* **Motion model probability for the ith step**: $p\left(x_{t} \vert x_{t-1}^{(i)}, u_{t}, m\right)*bel\left(x_{t-1}^{(i)}\right)$.

From the discretised motion model we compute the probability that a vehicle is now at a given location, $x_{t}$. In the expression we see that the prior vehicle location $x_{t-1}$ is considered for all possible priors $x_{t-1}^{(1)}, \ldots, x_{t-1}^{(n)}$. For each possible prior location in the list, the summation yields the **total probability** that the vehicle really did start at the prior location $x_{t-1}^(i)$ and had wound up at current position $x_{t}$. Therefore, we can reduce the expression to the likelihood of the vehicle starting at position $x_{t-1}$ and arriving at $x_{t}$ as:

$$
\begin{align}
p\left(x_{t} \vert x_{t-1}^{(i)}\right)*p\left(x_{t-1}\right).
\end{align}
$$

We modify the expression for likelihood when incorporating all knowledge of the world state (i.e., with the inclusion of the map $m$ and the control vector $u_{t}$) with the following:

$$
\begin{align}
p\left(x_{t} \vert x_{t-1}^{(i)}, u_{t}, m\right)*bel\left(x_{t-1}^{(i)}\right).
\end{align}
$$

In summary, each of the $n$ total discretised motion model calculations are the product nothing but the product of the transition probability and the belief state at the $i$th step. Taking the sum of all products $i=0,\ldots,n$, we obtain the final position probability for the recursive structure of the motion model. 

### 1.3. Markov Assumption for the Observation Model

From the recursive form of the Bayes' filter derived in Part 1.1, we finalise its form with one additional step.

The observation model $p\left(z_{t} \vert x_{t}, z_{1:t-1}, u_{1:t}, m\right)$ becomes $p\left(z_{t}, x_{t}, m\right)$. Re-writing the observation model for the given vector of observations $z_{t}$ at the current time-step, we have:

$$
\begin{align}
p\left(z_{t} \vert x_{t}, m\right) = p\left(z_{t}^{1}, \ldots, z_{t}^{K} \vert x_{t}, m\right).
\end{align}
$$

Assuming that the noise behaviour of the individual range measurements $z_{t}^{1}, \ldots, z_{t}^{K}$ we can represent the observation model as a product of the individual probability distribution terms of each range measurement as such:

$$
p\left(z_{t}^{1}, \ldots, z_{t}^{K} \vert x_{t}, m\right) = \prod_{k=1}^{K}p\left(z_{t}^{k} \vert x_{t}, m\right).
$$

Note that each sensor (e.g., camera, LiDAR, radar, ultrasonics) will have a unique noise profile and performance. Furthermore, the observation model depends also on the map type, e.g., discretised, dense 2D or 3D grid maps, or sparse feature-based maps. In this notebook we assume that our map is a 1D range map representing the distances to the $n$ closest objects in the direction of the vehicle heading (motion). We assume in this case that our range measurement noise follows a Gaussian distribution with a standard deviation of $\sigma_{z_{t}} = 1.0 m$. We also assume that our on-board sensor can measure distances from $0$ to $100$ metres in front of the ego-vehicle.

To implement this observation model, we use the state $x_{t}$ and the given map $m$ to estimate the so-called pseudo-ranges $z_{t}^{*}$, which — when assuming the vehicle state $x_{t}$ is within the map $m$ at a specific position — represent the true range values. We define the likelihood of the pseudo-range measurements with respect to the ground-truth values as a normal distribution $p\left(z_{t}^{k} \vert x_{t}, m\right) \sim \mathcal{N}\left(z_{t}^{k}; z_{t}^{*k}, \sigma_{z_{t}}\right)$.

### 1.4. Markov Localisation

We have covered a lot of information up to this point. In summary, we simplified the observation model to eliminate the dependence on all prior information. For the motion model, we used the law of total probability and the Markov assumption in order to get the desired recursive structure. In this last part, we write the general form of the Bayes' filter for localisation and express the belief state $bel\left(x_{t}\right)$ as the product of the simplified observation model, the update step, and the simplified motion model, the prediction step — which we write in reduced form as $\hat{bel}\left(x_{t}\right)$. This gives us:

$$
\begin{align}
bel\left(x_{t}\right) \ &= \ p\left(x_{t}\vert z_{t}, z_{1:t-1}, u_{1:t}, m\right)
\ = \ \mathrm{\large\eta} \times p\left(z_{t} \vert x_{t}, m\right)\hat{bel}\left(x_{t}\right). 
\end{align}
$$

With the above expression we complete our derivation of the 1D Markov Localisation filter.

Together with the Kalman filter we derived in the [last course](), we have covered in detail the general framework for recursive state estimation. Congratulations!

## 2. Programming Task

In this assignment we will be calculating the missing probability values in each of the tables provided below. We will use the derivations above to compute probability values given incomplete information. Good luck! 

### 2.1. Calculate Localization Posterior

To continue developing our intuition for this filter and prepare for later coding exercises, let's walk through the calculations for determining posterior probabilities at several pseudo-positions $x$, for a single time-step. We will start with a time-step after the filter has already been initialised and run a few times. We will cover initialisation of the filter in an upcoming concept.

In [7]:
def value_to_decimal(value):
    if value in {'NULL', '?', 'None'}:
        return np.nan
    return '%.2E' % Decimal(value)

In [8]:
file_path = os.path.join(DIR_BASE, 'data/2022-11-25-Lesson-3-1-Markov-Localization-Calculate-Localization-Posterior.csv') 
df = pd.read_csv(file_path, index_col=0)
df = df.applymap(value_to_decimal)

In [9]:
df

Unnamed: 0_level_0,P(location),P(observation | location),Raw P(location | observation),Normalized P(location | observation)
pseudo_position (x),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1.67E-02,0.00E+00,0.00E+00,0.00E+00
2,3.86E-02,6.99E-03,NAN,2.59E-02
3,4.90E-02,8.52E-02,4.18E-03,4.01E-01
4,3.86E-02,NAN,5.42E-03,5.21E-01
5,1.69E-02,3.13E-02,5.31E-04,5.10E-02
6,6.51E-03,9.46E-04,6.16E-06,NAN
7,NAN,3.87E-06,6.55E-08,6.29E-06
8,3.86E-02,0.00E+00,0.00E+00,0.00E+00


Recall the general form of the Bayes' theorem:

$$
\begin{align}
P\left(a\vert b\right) = \frac{P\left(b \vert a\right)P\left(a\right)}{P\left(b\right)}
\end{align}
$$

For the localisation problem, we have the following terms:
* $P\left(\textrm{location} \ \vert \ \textrm{observation}\right)$ — the posterior probability $P\left(a \vert b\right)$, i.e., the _normalised_ probability of a position given the observation;
* $P\left(\textrm{observation} \ \vert \ \textrm{location}\right)$ — the likelihood $P\left(b \vert a\right)$, i.e., the probability of an observation given a position;
* $P\left(\textrm{location}\right)$ — the prior probability $P\left(a\right)$, i.e., the probability of a position;
* $P\left(\textrm{observation}\right)$ — the prior probability $P\left(b\right)$, i.e., the probability of an observation.

Note that in the table above we have the **Normalized P(location | observation)** term, which is the **Raw P(location | observation)** term after dividing by the $P\left(\textrm{observation}\right)$ value — the total probability of $P\left(b\right)$. In other words, the entire fraction given on the right-hand side of the Bayes' rule.  Consequently, the **Raw P(location | observation)** term is the posterior probability prior to dividing by the total probability $P\left(\textrm{observation}\right)$, i.e., the numerator of the fraction on the right-hand side of the Bayes' rule.

#### The observation likelihood

To compute the observation likelihood term, $P\left(\textrm{observation} \ \vert \ \textrm{location} \right)$, for the pseudo-position $x=4$, we use the following relation:

$$
\begin{align}
P\left(\textrm{b} \vert \textrm{a}\right) = \frac{P\left(a \vert b\right)}{P\left(a\right)}
\end{align}
$$

which we obtain after re-arranging the general form of the Bayes' rule. Note that here this corresponds to dividing the posterior term **Raw P(location | observation)** by the location prior probability **P(location)**.

In [10]:
### The pseudo-position x=4
x_4 = df.iloc[3]
x_4

P(location)                             3.86E-02
P(observation | location)                    NAN
Raw P(location | observation)           5.42E-03
Normalized P(location | observation)    5.21E-01
Name: 4, dtype: object

In [11]:
### Calculating the probability value
x_4 = x_4.astype(np.float64)
p_4 = value_to_decimal(x_4['Raw P(location | observation)'] / x_4['P(location)'])
p_4

'1.40E-01'

In [12]:
### Setting the value in the DataFrame
df['P(observation | location)'][4] = p_4

#### The posterior probability

To compute the raw posterior probability term, **Raw P(location | observation)**, for the pseudo-position $x = 2$, we use the following relation:

$$
\begin{align}
P\left(\textrm{posterior}\right) = P\left(b \vert a\right) * P\left(a\right)
\end{align}
$$

which is non-normalised expression on the right-hand side of the Bayes' rule. In other words, the product of the likelihood $P\left(b \vert a\right)$ and prior probability $P\left(a\right)$.

In [13]:
### The pseudo-position x=2
x_2 = df.iloc[1]
x_2

P(location)                             3.86E-02
P(observation | location)               6.99E-03
Raw P(location | observation)                NAN
Normalized P(location | observation)    2.59E-02
Name: 2, dtype: object

In [14]:
### Calculating the probability value
x_2 = x_2.astype(np.float64)
p_2 = value_to_decimal(x_2['P(observation | location)'] * x_2['P(location)'])
p_2

'2.70E-04'

In [15]:
### Setting the value in the DataFrame
df['Raw P(location | observation)'][2] = p_2

#### The normalised posterior probability

To compute the normalised posterior probability for the pseudo-position $x = 6$, we have to first obtain the sum of the **Raw P(location | observation)** terms to get the total posterior probability. Using the expression for the normalising constant $P\left(b\right) we have:

$$
\begin{align}
p\left(\theta\right) = \int p\left(x \vert \theta\right)p\left(\theta\right)d\theta = \sum_{x=1}^{n} p\left(x \vert a\right)p\left(a\right)
\end{align}
$$

which is the sum over all non-normalised posterior values as given by the [law of total probability](https://en.wikipedia.org/wiki/Bayesian_statistics#Bayes'_theorem). Assuming we have a discrete distribution given by psuedo-position variable $x$, this is nothing but the sum over the product of the likelihood and prior probability value.

Therefore, we add all values **Raw P(location | observation)** from $x=1$ to $x=8$,

In [16]:
P_posterior_raw = df['Raw P(location | observation)'].astype(np.float64)
P_posterior_raw

pseudo_position (x)
1    0.000000e+00
2    2.700000e-04
3    4.180000e-03
4    5.420000e-03
5    5.310000e-04
6    6.160000e-06
7    6.550000e-08
8    0.000000e+00
Name: Raw P(location | observation), dtype: float64

In [17]:
### Summing the non-normalised total posterior probability
p_sum = P_posterior_raw.sum()
p_sum

0.0104072255

Then, to find the normalised posterior probability, we divide the raw posterior probability value **Raw P(location | observation)** at the given pseduo-position $x = 6$ by the total probability normalisation term we computed above.

In [18]:
### Calculating the normalised posterior probability
p_6 = value_to_decimal(P_posterior_raw[6] / p_sum)
p_6

'5.92E-04'

In [19]:
### Setting the value in the DataFrame
df['Normalized P(location | observation)'][6] = p_6

#### The prior position probability

To compute the prior position probability for the pseudo-position $x = 7$, we can divide the posterior probability $P\left(\textrm{posterior}\right)$ by the prior observation probability $P\left(b\right)$. Recalling the formula for $P\left(\textrm{posterior}\right)$,

$$
\begin{align}
P\left(\textrm{posterior}\right) = P\left(b \vert a\right) * P\left(a\right),
\end{align}
$$

and knowing that **Normalized P(location | observation)** is

$$
\begin{align}
P\left(a \vert b\right) = \frac{P\left(b \vert a\right) * P\left(a\right)}{P\left(b\right)},
\end{align}
$$

we obtain the prior position probability by dividing the posterior **Raw P(location | observation)** by the observation likelihood **P(observation | location)**.

In [20]:
### The pseudo-position x=7
x_7 = df.iloc[6]
x_7

P(location)                                  NAN
P(observation | location)               3.87E-06
Raw P(location | observation)           6.55E-08
Normalized P(location | observation)    6.29E-06
Name: 7, dtype: object

In [21]:
### Calculating the prior position probability
x_7 = x_7.astype(np.float64)
p_7 = value_to_decimal(
    x_7['Raw P(location | observation)'] / x_7['P(observation | location)']
)
p_7

'1.69E-02'

In [22]:
### Setting the value in the DataFrame
df['P(location)'] = p_7

#### The final DataFrame

With the above calculations, we obtain a complete probability distribution with values:

In [23]:
df

Unnamed: 0_level_0,P(location),P(observation | location),Raw P(location | observation),Normalized P(location | observation)
pseudo_position (x),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0.0169,0.0,0.0,0.0
2,0.0169,0.00699,0.00027,0.0259
3,0.0169,0.0852,0.00418,0.401
4,0.0169,0.14,0.00542,0.521
5,0.0169,0.0313,0.000531,0.051
6,0.0169,0.000946,6.16e-06,0.000592
7,0.0169,3.87e-06,6.55e-08,6.29e-06
8,0.0169,0.0,0.0,0.0


From the [law of total probability](https://en.wikipedia.org/wiki/Law_of_total_probability) we know that our posterior probability values for the discrete 1-D case should add up to $1.0$. 

To verify this, we take the sum of the resulting normalised posterior values:

In [24]:
### Summing the normalised posterior values
df['Normalized P(location | observation)'].astype(np.float64).sum()

0.9994982900000001

such that we obtain a resulting total probability very close to $1.0$.

Hooray! This was a great start to Bayesian statistics, which we will use together with the [Markov Assumption](https://en.wikipedia.org/wiki/Markov_chain) to perform inference over the map range space using a [Bayes' filter](https://en.wikipedia.org/wiki/Recursive_Bayesian_estimation). This will allow us to estimate vehicle location using nothing but a single pair of consecutive measurements and a 1-D range map, i.e., a set of landmark positions defined relative to the ego-vehicle heading. Let's go! 

### 2.2. Initialize Belief State

To help develop an intuition for this filter and prepare for later coding exercises, let's walk through the process of initialising our prior belief state. That is, what values should our initial belief state take on for each possible position?

#### Warmup example

Suppose we have a 1D map extending from $0$ to $25$ metres. We have landmarks at $x = 5.0$, $x=10.0$ and $x=20.0$ metres, with a position standard deviation of $1.0$ metre.

In [25]:
len_map = 25
landmarks = [4, 9, 19]    # Indexing starting at 0

In [26]:
### Initialising the position vector
positions = np.zeros((len_map))

Assuming that we know our initial vehicle position is at one of these three landmarks, how should we define our initial belief state?

Given that we know our vehicle is parked next to a landmark, we can set our probability of being next to a landmark to $1.0$. Accounting for a position standard deviation of $\pm 1.0$ metres, this results in three non-zero initial position estimates, each within the range $\left[4, 6\right]$, $\left[9, 11\right]$ and $\left[19, 21\right]$. All other positions not in these ranges, i.e., positions not within $\pm 1.0$ metre from a landmark, are initialised to $0.0$.

In [27]:
### Setting the landmarks
positions[landmarks] = 1.0
positions

array([0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 1., 0., 0., 0., 0., 0.])

In [28]:
### Setting the landmarks' neighbouring positions +/- 1.0m away
positions[np.array(landmarks) - 1] = 1.0    # Left of the landmarks
positions[np.array(landmarks) + 1] = 1.0    # Right of the landmarks
positions

array([0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0.,
       0., 1., 1., 1., 0., 0., 0., 0.])

We then divide each position probability value by the total number of possible non-zero position probabilities such that the normalised total probability sums to $1.0$. In this case, we have $9$ non-zero position probabilities from $3$ landmarks, resulting in an individual position probability value of $1.0 / 9 = 1.11\mathrm{E}{-01}$. 

In [29]:
### Setting the prior position probability
prior = 1.0 / np.count_nonzero(positions)
positions *= prior

Therefore, we have the following belief state vector:

In [30]:
positions

array([0.        , 0.        , 0.        , 0.11111111, 0.11111111,
       0.11111111, 0.        , 0.        , 0.11111111, 0.11111111,
       0.11111111, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.11111111, 0.11111111,
       0.11111111, 0.        , 0.        , 0.        , 0.        ])

#### Quiz question

To reinforce this concept, let's practice with a quiz.

Here we define the problem statement:
* **Map size**: $100$ metres;
* **Landmark positions**: $\{8, 15, 30, 70, 80\}$;
* **Position standard deviation**: $2.0$ metres.

We also assume that the vehicle starts out parked at one of the five possible landmarks. 

In [31]:
len_map = 100                      # Range [0, 99] metres
landmarks = [7, 14, 29, 69, 79]    # Indexing starting at 0
stdev = 2                          # +/- 2.0m position precision 

In [32]:
### Initialising the position vector
positions = np.zeros((len_map))

In [33]:
positions

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [34]:
### Setting the landmarks and their neighbouring positions

In [35]:
landmarks = np.array(landmarks)
for i in range(0, stdev + 1):
    positions[landmarks - i] = 1.0
    positions[landmarks + i] = 1.0

In [36]:
positions

array([0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       1., 1., 1., 1., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [37]:
### Setting the prior position probability
prior = 1.0 / np.count_nonzero(positions)
positions *= prior

In [38]:
positions

array([0.  , 0.  , 0.  , 0.  , 0.  , 0.04, 0.04, 0.04, 0.04, 0.04, 0.  ,
       0.  , 0.04, 0.04, 0.04, 0.04, 0.04, 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.  , 0.  , 0.  , 0.  , 0.04, 0.04, 0.04, 0.04, 0.04, 0.  ,
       0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.04, 0.04, 0.04, 0.04, 0.04, 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.04, 0.04, 0.04, 0.04, 0.04, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  ])

##### The initial position probability

To compute the prior probability for the position $x = 11$, we can simply fetch the corresponding probability value at the given position in the belief state vector. 

In [39]:
### Fetching the prior position probability for x = 11
p_11 = positions[10]    # Indexing starting at 0
value_to_decimal(p_11)

'0.00E+00'

To compute the prior probability for the position $x = 71$, we can simply fetch the corresponding probability value at the given position in the belief state vector.

In [40]:
### Fetching the prior position probability for x = 11
p_71 = positions[70]    # Indexing starting at 0
value_to_decimal(p_71)

'4.00E-02'

In summary, we can obtain the initial position probability by dividing the total probability $1.0$ by the number of non-zero position probabilities. Here, that is $1.0$ divided by the total number of positions within $2.0$ metres of a landmark. Since we have $5$ landmarks with a standard deviation of $\pm 2.0$ metres, that yields $5$ potentially occupied positions at each landmark position (i.e., the landmark plus two positions on each side). Therefore, we have $25$ total possible non-zero positions, resulting in a prior probability value of $1.0 / 25 = 4.00\mathrm{E}{-02}$.  

### 2.3. Motion Model Probability II

Applying the discretised motion model we derived at the beginning of this notebook, we will see how we can use the belief from the previous time-step $bel\left(x_{t-1}\right)$ to estimate the state transition probabilities between a pre-pseudo and pseudo position **delta position**. Let's fetch the data from our problem statement...

In [41]:
file_path = os.path.join(
    DIR_BASE,
    'data/2022-11-25-Lesson-3-1-Markov-Localization-Motion-Model-Probability-II.csv'
)
df = pd.read_csv(file_path, index_col=0)
df.applymap(value_to_decimal)
df

Unnamed: 0_level_0,pre-pseudo_position,delta position,P(transition),bel(xt−1​),P(position)
pseudo_position (x),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
7,1,6.0,1e-06,0.0556,8.27e-08
7,2,5.0,0.000134,0.0556,7.44e-06
7,3,4.0,0.00443,0.0556,0.000246
7,4,,0.054,0.0,0.0
7,5,2.0,,0.0,0.0
7,6,1.0,0.399,0.0,0.0
7,7,0.0,0.242,,0.00166
7,8,-1.0,0.054,0.00179,


#### The position deltas

In order to compute the delta position $x$, we subtract the **pseudo_position (x)** value from the **pre-pseudo_position** to obtain the values in the **delta position** column.

For example, we have the following delta position for a pseudo-position $x$ of $7$ and a pre-pseudo position of $4$. 

In [42]:
x_pseudo = 7
x_pre_pseudo = 4
x_7_4 = df.loc[x_pseudo].iloc[x_pre_pseudo - 1]    # Indexing starting at zero
x_7_4

pre-pseudo_position    4.000
delta position           NaN
P(transition)          0.054
bel(xt−1​)             0.000
P(position)            0.000
Name: 7, dtype: float64

In [43]:
### Calculating the delta position
d_4 = x_pseudo - x_pre_pseudo
d_4

3

In [44]:
### Setting the delta position in the DataFrame
df['delta position'][x_pseudo].iloc[x_pre_pseudo - 1] = d_4

In [45]:
df

Unnamed: 0_level_0,pre-pseudo_position,delta position,P(transition),bel(xt−1​),P(position)
pseudo_position (x),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
7,1,6.0,1e-06,0.0556,8.27e-08
7,2,5.0,0.000134,0.0556,7.44e-06
7,3,4.0,0.00443,0.0556,0.000246
7,4,3.0,0.054,0.0,0.0
7,5,2.0,,0.0,0.0
7,6,1.0,0.399,0.0,0.0
7,7,0.0,0.242,,0.00166
7,8,-1.0,0.054,0.00179,


#### The transition probability

Suppose we have a pseudo-position $x=7$ and a pre-pseduo position of $x = 5$. We can use the probability distribution function (PDF) of a continuous normal distribution to determine a corresponding transition probability value.

In [46]:
x_pseudo = 7
x_pre_pseudo = 5

To determine the transition probability for a pseudo-position $x= 7$ and a pre-pseudo position of $x = 5$, we evaluate the probability distribution function (PDF) with a control parameter of $1.0$ and a position standard deviation of $1.0$. Note that we are evaluating the PDF of the normal distribution at the _delta_ position $x = 7 - 5 = 2$. 

In [47]:
x_delta = x_pseudo - x_pre_pseudo
control_parameter = 1.0
stdev_position = 1.0

In [48]:
from scipy.stats import norm

In [49]:
### Obtaining the transition probability for the delta position
p_t_delta = norm.pdf(x_delta, loc=control_parameter, scale=stdev_position)
value_to_decimal(p_t_delta)

'2.42E-01'

Note that we can also use the `normpdf` function we wrote previously in C++ from the [`2022-11-25-Course-3-Localization-Exercises-Part-2.ipynb`]() to evaluate the PDF and obtain the transition probability.

In [50]:
### Setting the transition probability in the DataFrame
df['P(transition)'][x_pseudo].iloc[x_pre_pseudo - 1] = p_t_delta

#### The belief state

To calculate the belief state of a given $x_{t-1}$, we can use the following relation:

$$
\begin{align}
\textbf{P(position)} = \textbf{P(transition)} * \textbf{bel(xt-1)}
\end{align}
$$

Rearranging the above in terms of the **bel(xt-1)**, we obtain an expression for the belief state equal to the position probability **P(position)** divided by the transition probability **P(transition)**. 

Calculating the belief state for the second-to-last row in our table, we have:

In [51]:
x_pseudo = 7
x_pre_pseudo = 7
x_7_7 = df.loc[x_pseudo].iloc[x_pre_pseudo - 1]    # Indexing starting at zero
x_7_7

pre-pseudo_position    7.00000
delta position         0.00000
P(transition)          0.24200
bel(xt−1​)                 NaN
P(position)            0.00166
Name: 7, dtype: float64

In [52]:
### Computing the belief state
x_7_7 = x_7_7.astype(np.float64)
p_bel = x_7_7['P(position)'] / x_7_7['P(transition)']
value_to_decimal(p_bel)

'6.86E-03'

In [53]:
### Setting the belief state in the DataFrame
df['bel(xt−1​)'][x_pseudo].iloc[x_pre_pseudo - 1] = p_bel

#### The position probability

To determine the discretised position probability for a pseudo-position $x-7$ and a pre-pseudo position of $x=8$, we can calculate the position probability with the following relation:

$$
\begin{align}
\textbf{P(position)} = \textbf{P(transition)} * \textbf{bel(xt-1)}
\end{align}
$$

Therefore we have,

In [54]:
x_pseudo = 7
x_pre_pseudo = 8
x_7_8 = df.loc[x_pseudo].iloc[x_pre_pseudo - 1]    # Indexing starting at zero
x_7_8

pre-pseudo_position    8.00000
delta position        -1.00000
P(transition)          0.05400
bel(xt−1​)             0.00179
P(position)                NaN
Name: 7, dtype: float64

In [55]:
x_7_8.index

Index(['pre-pseudo_position', 'delta position', 'P(transition)', 'bel(xt−1​)',
       'P(position)'],
      dtype='object')

In [56]:
### Computing the position probability
x_7_8 = x_7_8.astype(np.float64)
p_pos = x_7_8['P(transition)'] * x_7_8['bel(xt−1​)']
p_pos

9.666e-05

In [57]:
### Setting the position probability in the DataFrame
df['P(position)'][x_pseudo].iloc[x_pre_pseudo - 1] = p_pos

#### The aggregated discretised position probability

Given our complete table of probability values, we can compute the total probability returned by the motion model as the sum of the discrete probability values from the table:

In [58]:
df

Unnamed: 0_level_0,pre-pseudo_position,delta position,P(transition),bel(xt−1​),P(position)
pseudo_position (x),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
7,1,6.0,1e-06,0.0556,8.27e-08
7,2,5.0,0.000134,0.0556,7.44e-06
7,3,4.0,0.00443,0.0556,0.000246
7,4,3.0,0.054,0.0,0.0
7,5,2.0,0.241971,0.0,0.0
7,6,1.0,0.399,0.0,0.0
7,7,0.0,0.242,0.00686,0.00166
7,8,-1.0,0.054,0.00179,9.666e-05


In [59]:
### Computing the total discretised position probability
p_total = df['P(position)'].sum()
value_to_decimal(p_total)

'2.01E-03'

The total position probability we obtained approximates the probability value extracted from a continuous normal distribution

### 2.4. Observation Model Probability

We will complete the Bayes' filter exercises by implementing the observation model. The observation model uses psuedo-range measurement estimates $z_{t}^{*}$ and observation measurements $z_{t}$ as inputs.

In order to implement the observation model, we must perform the following at each time-step:
1. Collect measurements from the vehicle in the forward direction of motion;
2. Estimate the pseudo-range of each landmark by subtracting the pseudo- position from the true landmark position;
3. Associate each pseudo-range estimate to its nearest observation measurement;
4. Calculate the probability of each pseudo-range / observation measurement pair;
5. Return the product of all individual probabilities.

The final probability must factor in all pseudo-range / observation pairs. Therefore, by taking the [product](https://bio.libretexts.org/Bookshelves/Introductory_and_General_Biology/Book%3A_General_Biology_(Boundless)/12%3A_Mendel's_Experiments_and_Heredity/12.01%3A_Mendels_Experiments_and_the_Laws_of_Probability/12.1E%3A_Rules_of_Probability_for_Mendelian_Inheritance#:~:text=The%20product%20rule%20of%20probability,of%20each%20event%20occurring%20alone.) of the individual probability values in Step 5, we compute the intersection of the individual events and obtain an estimate that reflects the overall belief state.


Let's practice this with an example. Assuming the following:
* **Pseudo-position**: $x_{t} = 10m$ — position of vehicle relative to the map range;
* **Landmark positions vector**: $X_{m} = \left[6, 15, 21, 40\right]$ — relative distance (metres) to the landmarks in direction of vehicle heading;
* **Observation measurements vector**: $z_{t_{k}} = \left[5.5, 11.0\right]$ — relative distances (metres) to the objects (landmarks);
* **Observation standard deviation**: $\sigma_{t_{k}} = 1.0 m$ — the observation measurement error.

We will compute the individual quantities needed to form the final observation probability. Defining the problem statement, we have:

In [60]:
# The pseudo-position
x_t = 10
# The landmark positions
X_m = [6, 15, 21, 40]
# The observation measurements
z_t = [5.5, 11.0]
# The observation standard deviation
sigma_t = 1.0

#### The pseudo-range estimates

Given the vector of landmark positions $X_{m}$ located on the map $m$, we calculate the pseudo-range estimates using the observation probability distribution $p\left(z_{t}^{k} \vert x_{t}, m\right) \sim \mathcal{N}\left(z_{t}^{k}; z_{t}^{*k}, \sigma_{z_{t}}\right)$.

In order to estimate the pseudo-ranges, we first limit the landmark positions vector to only the positions in front of the vehicle at its current position.  

In [61]:
### Filtering landmarks to those in front of vehicle 
X_m = np.array(X_m)
X_t = X_m[np.where(X_m >= x_t)]
X_t

array([15, 21, 40])

In [62]:
### Getting the landmark positions relative to the vehicle
landmarks_rel = X_t - x_t
landmarks_rel

array([ 5, 11, 30])

#### Association

Given a set of observation measurements $z_{t_{k}}$ and estimated pseudo-ranges $z_{t}^{*}$, we perform nearest-neighbour association. In other words, we assign each measurement to the closest landmark using a single closest neighbour assignment (i.e., no re-assignments).

In [63]:
### Computing the nearest-neighbour associations
l_rel = list(landmarks_rel)
pairs = [(z, l_rel.pop(np.argmin(l_rel))) for z in z_t]
pairs

[(5.5, 5), (11.0, 11)]

#### The association probability

Using the probability distribution function (PDF) of the Gaussian normal distribution, we can compute the association likelihood values.

In [64]:
### Computing the association probabilities
probs = [
    norm.pdf(observation_measurement, pseudo_range_estimate, sigma_t)
    for observation_measurement, pseudo_range_estimate in pairs
]
[value_to_decimal(p) for p in probs]

['3.52E-01', '3.99E-01']

#### The observation model probability

Recall that the overall observation probability is given as the product of the individual association probabilities. Therefore, we obtain the estimate of the overall belief state as the intersection of the individual observation measurement / pseudo-range estimate pairs.

In [65]:
p = np.prod(probs)
value_to_decimal(p)

'1.40E-01'

## Credits

This assignment was prepared by Aaron Brown, Tiffany Huang and Maximilian Muffert of Mercedes-Benz Research & Development of North America (MBRDNA), 2021 (link [here]()).