The following notes are meant to be brief as this material is covered extensively by other notes.

# Gaussian Distribution(s)
Otherwise known as the **Error PDF**, it is described by the following equations:

Single dimension of $x$
$$
p(x) = \frac{1}{\sigma\sqrt{2\pi}} \cdot \exp\bigg(\frac{-(x-\mu)^2}{2\sigma^2} \bigg)
$$

Therefore...
$$
P(a < x < b) = \int_a^b{p(x)dx}
$$

Multi-dimensional vector $X=<x_1 ... x_n> \in \mathbb{R}^n$:
$$
p(X) = \frac{1}{\sqrt{2\pi\det(\Sigma)}} \cdot \exp\bigg( -\frac{1}{2} (X-\mu)^\intercal \Sigma^{-1} (X-\mu) \bigg)
$$

Therefore...
$$
P(a_1 < x_1 < b_1, ..., a_n < x_n < b_n) = \int_{a_1}^{b_1}...\int_{a_n}^{b_n} p(X) dx_1...dx_n
$$

# Conditionally Joint Distributions
This is in reference to joint distributions such as *what is the probability of X=x when Y=y?*

By counting occurances in a sample, we can compute the such a probability as:
$$
p(x|y) = \frac{p(x,y)}{p(y)}
$$

Sometimes, it is not possible (or impractical) to count samples; we can employ *Bayes Rule* in such circumstances:
$$
p(x|y) = \frac{p(y|x) \cdot p(x)}{p(y)}
$$

It's important to bear in mind the implication, as this is the mathematical basis of inference: the denominator does not depend upon *x* at all, and rather is a *normalization* of $p(y|x) \cdot p(x)$. Because of this, we can typically see the following notation:
$$
p(x|y) = \eta p(y|x) \cdot p(x) \quad \text{where} \quad \eta = \frac{1}{p(y)}
$$

If *x* is the parameter we wish to infer, then $p(x)$ is called the **prior probability distribution** while *y* would be treated as **data**. Thus the calculated $p(x|y)$ is called the **posterior probability distribution**.

# Expectation & Moments
**Expected value** is the estimated value of an arbitrary measurement of the random variable *X*. This is sometimes confused with *mean* or *median*, and though they are related, those two meanings pertain only when the probability distribution is truly unknown.

In this case, expected value is calculated as:
$$
E[X] = \sum_x x \cdot p(x) \quad \text{(discrete)} \\
E[X] = \int_x x \cdot p(x) dx \quad \text{(continuous)}
$$

Considering *expected value* as an operator, it is a linear operator. That is:
$$
E[aX + b] = aE[X] + b
$$

Every probability distribution has a unique family of statistics called **moments**; these are also computed with the *expected value* operator. In general, we define the *first moment* as $E[X]$, also called $E_1[X]$. The *second moment* is defined as follows:
$$
E_2[X] = E[X - E[X]]^2 = E[X^2] - E[X]^2
$$

If the first moment is considered to be the *mean*, the second is the *variance*. Subsequent moments are computed as follows:
$$
E_n[X] = E[X - E[X]]^n
$$

# Entropy
Probability is inherently tied to *Information Theory*, the study of how to quantize and effectively use information on a mathematical level. One key measure of how useful an event is is a measurement called **entropy**. For simplicity, we'll compute it as:
$$
H_p(x) = E[-\log_2 p(x)]
$$

Which resolves to:
$$
H_p(x) = -\sum_x p(x) \log_2 p(x) \quad \text{(discrete)} \\
H_p(x) = -\int_x p(x) \log_2 p(x) dx \quad \text{(continuous)}
$$

# Probabilistic Generative Laws & Belief
In general, we model the **state** of the environment and its transitions as a *stochastic process* (that is, a sequence of random variables), particularly as a *Markov Process*. What do we mean by that? The state of the environment at any given time, $x_t$, *depends* upon all previous states and the actions taken by the robot. The actions taken by the robot depend on sensor observations $z_{1:t-1}$, which in turn also depend upon the aforementioned state. Through convention, however, we assume that the action ${u_t}$ occurs just before the state observation $u_t$.

We define the state to be **complete** when it contains sufficient information to extrapolate all previous states. This Markovian condition allows for constraining the control problem:

$$
p(x_t | x_{0:t}, z_{1:t-1}, u_{1:t}) = p(x_t | x_{t-1}, u_t) \\
p(z_t | x_{0:t}, z_{1:t-1}, u_{1:t}) = p(z_t | x_t)
$$

Graphically, this model can be depicted as follows:

![Dynamic Bayes Net of Generative Model](images/DBN_GenModel.png)
###### *Figure 1: Dynamic Bayes net of a Generative Belief Model of State*

Note that in the image there is a dotted-arrow pointing from each sensor measurement to a subsequent action. This represents the causal model of our robot reacting to sensory data (we don't want a truly *random* action selection!).