# Density Estimation with Gaussian Mixture Models

When we apply machine learning to data we often aim to represent data in some way. A straightforward way is to take the data points them- selves as the representation of the data e.g. scatter plot.  In density esti- mation, we represent the data compactly using a density from a parametric family, e.g., a Gaussian or Beta distribution. For example, we may be looking for the mean and variance of a dataset in order to represent the data compactly using a Gaussian distribution. We can then use the mean and variance of this Gaussian to represent the distribution underlying the data, i.e., we think of the dataset to be a typical realization from this distribution if we were to sample from it.

In practice, the Gaussian (or similarly all other distributions we encountered so far) have limited modeling capabilities. In the following, we will look at a more expressive family of distributions, which we can use for density estimation: mixture models.

Mixture models can be used to describe a distribution $p(x)$ by a convex combination of $K$ simple (base) distributions:

\begin{equation}
p(x) = \sum_{k=1}^K \pi_k p_k(x)
\end{equation}
\begin{equation}
0 \leqslant \pi_k  \leqslant 1, \sum_{k=1}^K \pi_k = 1
\end{equation}

where the components $p_k$ are members of a family of basic distributions, e.g., Gaussians, Bernoullis, or Gammas, and the $\pi_k$ are mixture weights. Mixture models are more expressive than the corresponding base distributions because they allow for multimodal data representations, i.e., they can describe datasets with multiple "clusters". We will focus on Gaussian mixture models (GMMs), where the basic distributions are Gaussians. For a given dataset, we aim to maximize the likelihood of the model parameters to train the GMM.

## Gaussian Mixture Model

A Gaussian mixture model is a density model where we combine a finite number of $K$ Gaussian distributions $\mathcal{N}(x|\mu_k, \sum_k)$ so that

\begin{equation}
p(x|\theta) = \sum_{k=1}^K \pi_k \mathcal{N}(x|\mu_k \sum_k)
\end{equation}

\begin{equation}
0 \leqslant \pi_k \leqslant 1, \sum_{k=1}^K \pi_k = 1
\end{equation}

where $\theta := \{ \mu_k, \sum_k, \pi_k : k=1, \cdots, K \}$ is the collection of all parameteres of the model. This convex combination of Gaussian disribution gives us signifincatly more flexibility for modelling complex densities than a simple Gaussian distribution. The figure below displays the weighted components and the mixture density which is given as:

<img src="attachment:a3d59a6d-2982-4be5-81db-8bd929500e13.png" style="width: 400px">

<img src="attachment:1210f606-2f6e-47f9-ab9b-74d441a9251a.png" style="width: 400px">

_Gaussian mixture model. The Gaussian mixture distribution (black) is composed of a convex combination of Gaussian distributions and is more expressive than any individual component. Dashed lines represent the weighted Gaussian components._