<br><br>
<font size = '6'><b>Gaussian Mixture Model</b></font>

- <a href="./reference_files/13.mixture-models.pdf" target="_blank">Slides</a> by David Rosenberg

<table style="border-style: hidden; border-collapse: collapse;" width = "90%"> 
    <tr style="border-style: hidden; border-collapse: collapse;">
        <td width = 60% style="border-style: hidden; border-collapse: collapse;">
             
        </td>
        <td width = 30%>
        Collected by Prof. Seungchul Lee<br>
        iSystems<br>http://isystems.unist.ac.kr/<br>
        UNIST
        </td>
    </tr>
</table>

Table of Contents
<div id="toc"></div>



# Probabilistic Model for Clustering

Let's consider a generative model for the data

Suppose
- there are $k$ clusters
- we have a probability density for each cluster

Generate a point as follows

$\;\;$1) Choose a random cluster $z \in \{1,2,\cdots,k\}$

$$
\begin{align*}
&Z \sim \left( \pi_1, \cdots, \pi_k\right) \\
&p(z_i = 1) = \pi_i \;, \quad
\sum_{i=1}^{k} \pi_i =1
\end{align*}
$$


$\;\;$2) Choose a point from the distribution for cluster $Z$

$$\left(X \mid Z = z \right) \sim p(x \mid z)$$

# Gaussian Mixture Model

For example, $k = 3$

$\;\;$1) Select $Z \in \{1,2,3\} \sim \left(\frac{1}{3},\frac{1}{3},\frac{1}{3} \right)$

$\;\;$2) Sample from $\left(X \mid Z = z \right) \sim N(x \mid \mu_k, \Sigma_k)$


<img src = "./image_files/GMM.png" width=500>

Example: generating data from two Gaussians

$$
\begin{align*}
\mu_1 &= \begin{bmatrix} 3\\3 \end{bmatrix}, & \Sigma_1 &= \begin{bmatrix}1 & 0 \\ 0 & 2 \end{bmatrix} \\
\mu_2 &= \begin{bmatrix} 1\\-3 \end{bmatrix}, & \Sigma_2 &= \begin{bmatrix}2 & 0 \\ 0 & 1 \end{bmatrix}
\end{align*}
$$

$$\begin{bmatrix} \pi_1 &\pi_2 \end{bmatrix} = \begin{bmatrix} 0.7 & 0.3\end{bmatrix}$$

<img src = "./image_files/twoGMM.jpg" width=500>

Latent Variable Model

- Back in reality, we observe $X$, not $(X,Z)$

- Cluster assignemnt $Z$ is called a hidden variable

- A latent variable model is a probability model for which certain variables are never observed.

Model-based clustering

- we observe $X = x$

- The conditional distribution is a soft assignment to clusters

$$p(z \mid X = x) = \frac{p(x,z)}{p(x)}$$

- A hard assignment is 

$$z^* = \arg \min_{z \in (1,\cdots,k)} \mathbb{P}(Z = z \mid X = x)$$

Estimating/Learning the Gaussian Mixture Model (GMM)

- What does it mean to "have" or "know" the GMM? It means we know the following parameters:

$$
\begin{align*}
\text{Cluster probabilities} && \pi &= (\pi_1,\cdots,\pi_k) \\
\text{Cluster means} && \mu &= (\mu_1,\cdots,\mu_k)\\
\text{Cluster covariance matrices} && \Sigma &= (\Sigma_1,\cdots,\Sigma_k)
\end{align*}
$$

- We have a probability model: let's find the MLE
    - Suppose we have data $D = \{x_1,\cdots,x_m\}$
    - We need the model likelihood for $D$
    
- Since we only observe $X$, we need the marginal distribution:

$$p(x) = \sum_{z=1}^{k}p(x,z)=\sum_{z=1}^{k}\pi_z \,N(x \mid \mu_z, \Sigma_z)$$

- The model likelihood for $D = \{x_1,\cdots,x_m\}$ is 

$$L(\pi,\mu,\Sigma) = \prod_{i=1}^{m} p(x_i) = \prod_{i=1}^{m} \sum_{z=1}^{k}\pi_z \,N(x \mid \mu_z, \Sigma_z)$$

- As usual, we will take our objective function to be the log of this:

$$J(\pi,\mu,\Sigma) = \sum_{i=1}^{m} \log \left\{ \sum_{z=1}^{k}\pi_z \,N(x \mid \mu_z, \Sigma_z) \right\}$$

- The log-likelihood for a single Gaussian:

$$\sum_{i=1}^{m} \log  \,N(x \mid \mu, \Sigma) = -\frac{md}{2} \log(2\pi) - \frac{m}{2} \log \lvert \Sigma\rvert -\frac{1}{2} \sum_{i=1}^{m} (x_i-\mu)^{T}\Sigma^{-1}(x_i-\mu)$$

- For a single Gaussian, the log cancels the $\exp$ in the Gaussian density $\implies$ things simplify a lot.

- For the GMM, the sum inside the log prevents this cancellation $\implies$ No closed form expression for MLE

## MLE for Gaussian Model

- Let's start by considering MLE for the _single_ Gaussian model

- For data $D = \{x_1,\cdots,x_m\}$, the log likelihood is given by

$$\sum_{i=1}^{m} \log  \,N(x \mid \mu, \Sigma) = -\frac{md}{2} \log(2\pi) - \frac{m}{2} \log \lvert \Sigma\rvert -\frac{1}{2} \sum_{i=1}^{m} (x_i-\mu)^{T}\Sigma^{-1}(x_i-\mu)$$

- With some calculus, we find that the MLE parameters are

$$
\begin{align*}
\mu_{\text{MLE}} &= \frac{1}{m} \sum_{i=1}^{m} x_i\\ 
\Sigma_{\text{MLE}} &= \frac{1}{m} \sum_{i=1}^{m} (x_i - \mu_{\text{MLE}})(x_i - \mu_{\text{MLE}})^T\\
\end{align*}
$$

- For GMM, if we know the cluster assignment $z_i$ for each $x_i$, we could compute the MLEs for each cluster

## Cluster Responsibilities 

- Denote the probability that observed value $x_i$ comes from cluster $j$ by

$$\gamma_{i}^{j} = \mathbb{P}(Z = j \mid X = x_i)$$

- The responsibility that cluster $j$ takes for obervation $x_i$

- Computationally

$$
\begin{align*}
\gamma_{i}^{j} &= \mathbb{P}(Z = j \mid X = x_i)\\ \\
& = \frac{p(Z=j, X=x_i)}{p(x)}\\ \\
& = \frac{\pi_j N(x_i \mid \mu_j, \Sigma_j)}{\sum_{c=1}^{k} \pi_c N(x_i \mid \mu_c, \Sigma_c)}
\end{align*}
$$

- The vector $\left(\gamma_{i}^{1},\cdots,\gamma_{i}^{k}\right)$ is exactly the soft assignment for $x_i$

## EM Algorithm for GMM

1) Initialize parameters $\mu, \Sigma, \pi$

2) "E step": Evaluate the responsibilities using current parameters

$$\gamma_{i}^{j} = \frac{\pi_j N(x_i \mid \mu_j, \Sigma_j)}{\sum_{c=1}^{k} \pi_c N(x_i \mid \mu_c, \Sigma_c)}$$

$\;\;\,$for $i=1,\cdots,m$ and $j=1,\cdots,k$

3) "M step": Re-estimate the parameters using responsibilites:

$$
\begin{align*}
\mu_{\text{c}}^{\text{new}} &= \frac{1}{m_c} \sum_{i=1}^{m} \gamma_{i}^{c}x_i\\ 
\Sigma_{\text{c}}^{\text{new}} &= \frac{1}{m_c} \sum_{i=1}^{m} \gamma_{i}^{c}\,(x_i - \mu_{\text{MLE}})(x_i - \mu_{\text{MLE}})^T\\
\pi_{c}^{\text{new}} &= \frac{m_c}{m} = \frac{\sum_{i=1}^{m}\gamma_{i}^{c}}{m}
\end{align*}
$$

4) Repeat from Step 2, until log-likelihood converges


In [1]:
%%html
<center><iframe src="https://www.youtube.com/embed/PejHsxneli8"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>