# Mixture of Gaussians

Mixture of Gaussians (aka Expectation Maximation) is a clustering method. The idea of this model is simpel: for a given dataset, each point is generated by linearly combining multiple multivariate Gaussians.

## What are Gaussians?

![The characteristic symmetric bell curve shape of a Gaussian.](https://upload.wikimedia.org/wikipedia/commons/7/74/Normal_Distribution_PDF.svg)

Source: Wikimedia

A Gaussian is a function of the form:

\begin{equation*}
f(x)=a e^{-\frac{(x-b)^2}{2c^2}}
\end{equation*}

where
- $a\in \mathbb{R}$ is the height of the curve's peak 
- $b \in \mathbb{R}$ is the position of center of the peak,
- $c \in \mathbb{R}$ is the [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "The standard deviation σ is a measure that is used to quantify the amount of variation or dispersion of a set of data values") which controls the width of the bell

The function is mathematically convenient that is often used to describe a dataset that typically has the normal [distribution](https://en.wikipedia.org/wiki/Frequency_distribution "A distribution is a listing of outcomes of an experiment and the probability associated with each outcome."). Its plot shape is called a bell curve.

A Gaussian can be represented by two variables when it is represents the probability density function:
- the mean of all the data points, which define the center of the curve
- the standard deviation which describe the spread of the data
\begin{equation*}
f(x)=\frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{(x-\mu)^2}{2 \sigma^2}}
\end{equation*}
where
- $\mu$ is the mean
- $\sigma$ is the standard deviation  

The probability density function of a continuous random variable whose integral across an interval gives the probability that the value of the variable lies within the same interval.

A dataset may have more than one peak. A Mixtures of Gaussians model captures multiple peaks in the dataset.

## Variance-Covariance Matrix

Covariance is a measure how changes in one variable are associated with changes in a second variables and tells us how two variables behave as a pair. In other words, covariance is a measure of the linear relationship between two variables. We are only interested in the sign of a covariance value:
- A positive value indicates a direct or increase linear relationship
- A negative value indicates a decreasing relationship
- Zero (or around zero) indicates that there is probably not a linear relationship between the two variables

We are not interested in the number itself since covariance does not tells us anything about the strength of the relationship. To find the strength of the relationship, we need to find the correlation.


Variance and covariance are often displayed together in a **variance-covariance** matrix aka a covariance matrix. The diagonal of covariance matrix provides the variance of each individual variable, whereas the off-diagonal entries provide the covariance between each pair of variables.

## Gaussian Mixture Model (GMM)

A GMM is a probability distribution that consists of multiple probability distributions. 

The Gaussian distribution of  a vector with $d$ elements $\vec{x} = (x_1, x_2, \cdots, x_d)^T$ is defined by:

![Gaussian of a vector](images/gaussian-of-vector.png)

where
- $\mu$ is the mean 
- $\Sigma$ is the covariance matrix of the Gaussian

**Our problem is as follows:**

Give a dataset $X={x_1, x_2, \cdots, x_N}$ drawn from an unknown distribution (assume it is a multiple Gaussians distribution), estimate the parameters $\theta$ of the GMM that fits the data.

To find the parameters $\theta$, we

Maximise the likelihood $p(X \mid \theta)$ of the data with regard to the model parameters