# Gaussian Distribution

- Also known as the Normal distribution, it is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is:

f(x) = (1 / sqrt(2πσ²)) * exp( - (x - μ)² / (2σ²) )

where:
  - `μ` is the mean or expectation of the distribution (and also its median and mode),
  - `σ` is the standard deviation, and
  - `σ²` is the variance.

- **Intuition**: The Gaussian distribution is symmetric and its mean, median and mode are all equal. It is defined by two parameters: the mean (μ) which determines the center of the distribution, and the standard deviation (σ) which determines the spread or width of the distribution. The shape of the Gaussian distribution is a bell curve, with the majority of the observations falling close to the mean, and fewer observations in the tails.

- **Properties**: The Gaussian distribution has some important properties:
  - It is fully described by its mean and variance.
  - It has a skewness of 0 and a kurtosis of 3.
  - About 68% of values drawn from a Gaussian distribution are within one standard deviation σ away from the mean; about 95% are within two standard deviations and about 99.7% lie within three standard deviations. This is known as the 68-95-99.7 rule or the empirical rule.

## Kalman Filter

- The Kalman filter is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone.

- **Intuition**: The Kalman filter works in two steps: prediction and update. In the prediction step, it produces estimates of the current state variables, along with their uncertainties. Once the outcome of the next measurement (necessarily corrupted with some amount of error, including random noise) is observed, these estimates are updated using a weighted average, with more weight being given to estimates with higher certainty.

- **Mathematically**: The Kalman filter operates by producing a joint probability distribution over the variables for each timeframe. The filter updates the current state and variance matrices with the information from the most recent measurement, resulting in a new set of Gaussians. The mean of the resulting Gaussian distribution is a weighted average of the means of the two original distributions, where the weights are the precisions (reciprocals of the variances) of the original distributions. The variance of the resulting distribution is less than the variances of the two original distributions, reflecting the fact that combining information from multiple sources generally reduces uncertainty.

## Univariate and Multivariate Gaussian Distributions

- **Univariate Gaussian Distribution**: This is the Gaussian distribution for a single random variable. It is defined by two parameters: the mean (μ) and the variance (σ²). The probability density function of a univariate Gaussian distribution is given by:

f(x) = (1 / sqrt(2πσ²)) * exp( - (x - μ)² / (2σ²) )

- **Multivariate Gaussian Distribution**: This is the generalization of the Gaussian distribution to multiple dimensions. It is defined by a mean vector (μ) and a covariance matrix (Σ). The mean vector contains the means of each variable and the covariance matrix contains the variances along the diagonals and the covariances off the diagonals. The probability density function of a multivariate Gaussian distribution is given by:

f(x) = (1 / (sqrt((2π)^k * det(Σ)))) * exp( -0.5 * (x - μ)' * Σ^-1 * (x - μ) )

where:
  - `x` is a k-dimensional real vector,
  - `μ` is the mean vector,
  - `Σ` is the covariance matrix,
  - `k` is the number of dimensions (variables),
  - `det(Σ)` is the determinant of the covariance matrix,
  - `'` denotes the transpose of a vector or a matrix.

### Product of Gaussian Distributions

- **Product of Gaussian Distributions**: When you multiply two Gaussian distributions together, you get another Gaussian distribution. This property is unique to Gaussian distributions and is one of the reasons why they are widely used in statistics and machine learning.

- **Mathematically**: If we have two Gaussian distributions N(μ1, σ1²) and N(μ2, σ2²), their product is proportional to a third Gaussian distribution N(μ, σ²), where:

μ = (σ² * μ1 + σ1² * μ2) / (σ1² + σ2²)
σ² = (σ1² * σ2²) / (σ1² + σ2²)

- **Intuition**: The mean of the resulting Gaussian distribution is a weighted average of the means of the two original distributions, where the weights are the precisions (reciprocals of the variances) of the original distributions. The variance of the resulting distribution is less than the variances of the two original distributions, reflecting the fact that combining information from multiple sources generally reduces uncertainty.

### Sums and Linear Transformations of Gaussian Random Variables

- **Sum of Gaussian Random Variables**: If X and Y are independent Gaussian random variables, then their sum Z = X + Y is also a Gaussian random variable. The mean of Z is the sum of the means of X and Y, and the variance of Z is the sum of the variances of X and Y.

- **Mathematically**: If X ~ N(μ1, σ1²) and Y ~ N(μ2, σ2²) are independent, then Z = X + Y ~ N(μ1 + μ2, σ1² + σ2²).

- **Linear Transformation of a Gaussian Random Variable**: If X is a Gaussian random variable and Y = aX + b is a linear transformation of X, then Y is also a Gaussian random variable. The mean of Y is a times the mean of X plus b, and the variance of Y is a² times the variance of X.

- **Mathematically**: If X ~ N(μ, σ²), then Y = aX + b ~ N(aμ + b, a²σ²).

## Gaussian Noise

- **Gaussian Noise**: Gaussian noise, also known as white noise or Gaussian white noise, is a type of statistical noise having a probability density function equal to that of the normal distribution, which is also known as the Gaussian distribution. In other words, the values that the noise can take on are Gaussian-distributed.

- **Intuition**: In many cases, systems are often subjected to random variations or 'noise'. When this noise is assumed to be Gaussian distributed, it is referred to as Gaussian noise. It's called white noise because it has uniform power across the frequency band for the system. It's a common assumption in many systems including communications, control systems, and signal processing.

- **Properties**: Gaussian noise is statistically described by a zero mean and a certain variance σ². These two parameters (mean and variance) completely describe the noise.

## Sampling from Multivariate Gaussian Distributions

- **Multivariate Gaussian Distribution**: A multivariate Gaussian distribution is a generalization of the one-dimensional (univariate) Gaussian distribution to higher dimensions. A multivariate Gaussian distribution is specified by a mean vector and a covariance matrix.

- **Sampling**: Sampling from a multivariate Gaussian distribution is the process of generating random vectors where each vector is a point in the space defined by the dimensions of the multivariate Gaussian distribution. Each point is drawn according to the probability density function of the multivariate Gaussian distribution.

- **Method**: One common method for sampling from a multivariate Gaussian distribution involves the following steps:
  1. Sample a vector from the standard Gaussian distribution. This can be done by sampling each component of the vector from the univariate standard Gaussian distribution.
  2. Transform the sampled vector using the mean and covariance matrix of the multivariate Gaussian distribution. This can be done by multiplying the sampled vector by the square root of the covariance matrix (obtained through a method such as Cholesky decomposition) and then adding the mean vector.

## Sampling from Multivariate Gaussian Distributions

- **Multivariate Gaussian Distribution**: A multivariate Gaussian distribution is a generalization of the one-dimensional (univariate) Gaussian distribution to higher dimensions. A multivariate Gaussian distribution is specified by a mean vector and a covariance matrix.

- **Sampling**: Sampling from a multivariate Gaussian distribution is the process of generating random vectors where each vector is a point in the space defined by the dimensions of the multivariate Gaussian distribution. Each point is drawn according to the probability density function of the multivariate Gaussian distribution.

- **Method**: One common method for sampling from a multivariate Gaussian distribution involves the following steps:
  1. Sample a vector from the standard Gaussian distribution. This can be done by sampling each component of the vector from the univariate standard Gaussian distribution.
  2. Transform the sampled vector using the mean and covariance matrix of the multivariate Gaussian distribution. This can be done by multiplying the sampled vector by the square root of the covariance matrix (obtained through a method such as Cholesky decomposition) and then adding the mean vector.

## Sampling from a Multivariate Gaussian Distribution using Cholesky Decomposition

Cholesky decomposition is used in sampling from a multivariate Gaussian distribution to transform samples from a standard normal distribution. Here's the step-by-step process:

1. **Generate a Vector of Standard Normal Random Variables**: First, generate a vector of random variables from a standard normal distribution. This can be done by generating each component of the vector independently from a standard normal distribution.

2. **Compute the Cholesky Decomposition**: Compute the Cholesky decomposition of the covariance matrix of the multivariate Gaussian distribution. The Cholesky decomposition is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose.

3. **Transform the Vector**: Multiply the vector of standard normal random variables by the lower triangular matrix obtained from the Cholesky decomposition. This transforms the vector of standard normal random variables into a vector of correlated random variables.

4. **Shift the Mean**: Add the mean vector of the multivariate Gaussian distribution to the transformed vector. This shifts the mean of the distribution from zero to the desired mean.

The resulting vector is a sample from the multivariate Gaussian distribution with the desired mean and covariance.

# Conjugacy and Exponential Family

- **Conjugacy**: In Bayesian statistics, a prior is said to be conjugate to a likelihood function if the resulting posterior distribution is in the same family as the prior. Conjugate priors are useful in Bayesian inference as they make the computation of the posterior distribution much simpler. For example, the Gaussian distribution is conjugate to itself, and the Beta distribution is conjugate to the Bernoulli and binomial distributions.

- **Exponential Families**: An exponential family is a set of probability distributions of a certain form, specified by a type of function. Many common distributions are in the exponential family, including the Gaussian, exponential, chi-squared, gamma, and beta distributions. The concept of an exponential family is important in statistical theory and in the construction of generalized linear models.

- **Conjugacy in Exponential Families**: Conjugacy is particularly simple in exponential families. If a likelihood function is in an exponential family, there often exists a conjugate prior also within the exponential family. This makes the posterior distribution easier to compute and interpret.

## Bernoulli Distribution

- **Bernoulli Distribution**: The Bernoulli distribution is a discrete probability distribution for a random variable which can take one of two possible outcomes, usually labeled 0 and 1. The Bernoulli distribution is a special case of the binomial distribution where a single experiment/trial is conducted.

- **Parameters**: The Bernoulli distribution is parameterized by a single parameter `p` which is the probability of the outcome 1. The probability of the outcome 0 is therefore `1 - p`.

- **Probability Mass Function (PMF)**: The PMF of a Bernoulli distribution is given by:

P(X = k) = p^k * (1 - p)^(1 - k) for k in {0, 1}

- **Mean and Variance**: The mean of a Bernoulli distribution is `p` and the variance is `p * (1 - p)`.

## Binomial Distribution

- **Binomial Distribution**: The Binomial distribution is a discrete probability distribution of the number of successes in a sequence of n independent experiments. Each experiment results in a success with probability p or a failure with probability 1 - p.

- **Parameters**: The Binomial distribution is parameterized by two parameters: `n` which is the number of trials, and `p` which is the probability of success on each trial.

- **Probability Mass Function (PMF)**: The PMF of a Binomial distribution is given by:

P(X = k) = C(n, k) * p^k * (1 - p)^(n - k) for k in {0, 1, ..., n}

where `C(n, k)` is the binomial coefficient, which is the number of ways to choose `k` successes from `n` trials.

- **Mean and Variance**: The mean of a Binomial distribution is `np` and the variance is `np * (1 - p)`.

## Beta Distribution

- **Beta Distribution**: The Beta distribution is a family of continuous probability distributions defined on the interval [0, 1]. It is parameterized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution.

- **Parameters**: The Beta distribution is parameterized by two parameters: `α` and `β`. These are positive shape parameters that control the shape of the distribution.

- **Probability Density Function (PDF)**: The PDF of a Beta distribution is given by:

f(x; α, β) = x^(α - 1) * (1 - x)^(β - 1) / B(α, β) for 0 <= x <= 1

where `B(α, β)` is the Beta function, which is a normalization constant to ensure that the total probability is 1.

- **Mean and Variance**: The mean of a Beta distribution is `α / (α + β)` and the variance is `α * β / [(α + β)^2 * (α + β + 1)]`.

