# Chapter 9: Mixture Models and EM

### 9.1 K-means clustering

We are interested in `clustering` data points in a multidimensional space. Consider a data set $\{x_1 \dots x_N\}$. We split the data into `K` clusters, with the following objective function:

$$J=\sum_{n=1}^N\sum_{k=1}^K r_{nk}||x_n - \mu_k||^2$$

Where $r_{nk} \in \{0, 1\}$ is an `indicator variable` that assigns a group to a particular cluster, and $\mu_k$ is the mean of cluster $k$. This represents the `sum of squares` of distances of each data point to it's assigned vector $\mu_k$.

We want to find values of $r_{nk}$ and $\mu_k$ to minimize $J$. We do this through an iterative proceduer where we:

First, assign some initial values to $\mu_k$, then we repeat until convergence:
<ol>
    <li>minimize $J$ with respect to $r_{nk}$, keeping $\mu_k$ fixed</li>
    <li>minimize $J$ with respect to $\mu_k$, keeping $r_{nk}$ fixed</li>
</ol>

These two steps corresponding to the `expectation` and `maximization` of the EM algorithm. 


In step `1`, we simply assign the closest cluster (based on $\mu_k$) to $r_{nk}$. 

$$\begin{cases}
    1 & \text{if}~~k = argmin_j ||x_n - \mu_j||^2\\
    0 & \text{otherwise}
\end{cases}$$

In step `2`, the objective function is a quadratic function of $\mu_k$, and it can be minimized by setting its derivative with respect to $\mu_k$ to zero:
$$2\sum_{n=1}^N r_{nk}(x_n - \mu_k) = 0$$
which gives us$\dots$

$$\mu_k = \frac{\sum_n r_{nk}x_n}{\sum_n r_{nk}}$$

This is the same as the `mean` of the total number of points  $x_n$ in cluster $k$.