# Week 5: K-nearest Neighbors and Classification 


## Expectation-maximization: 

$ g(y) = (1 - \pi)\phi_{\theta1}(y) + \pi \phi_{\theta2}(y) $ 

Likelihood: 

$L(\theta) = \Pi_{i~st~\Delta_{i}~=~0} (1-\pi)\phi_{\theta1}(y) \Pi_{i~st~\Delta_{i}~=~1} \pi \phi_{\theta2}(y)$

Take log of likelihood: 

$l(\theta) = log(L(\theta)) = \sum_{i~st~\Delta_{i}=0} [log(1-\pi) + log\phi_{\theta1}(y)] + \sum_{i~st~\Delta_{i}=1}[log\pi + log\phi_{\theta2}(y)]$ 

Simplifying: 

$= \sum_{i=1}^{N} (1-\Delta_{i})log(1-\pi) + (1-\Delta_{i})log\phi_{\theta1}(y)) + \sum_{i=1}^{N} \Delta_{i}log\pi + \Delta_{i}log\phi_{\theta2}(y)$

$l(\theta) = \sum_{i=1}^{N} [(1-\Delta_{i})log\phi_{\theta1}(y) + \Delta_{i}log\phi_{\theta2}(y)] + \sum_{i=1}^{N} [(1-\Delta_{i})log(1-\pi) + \Delta_{i}log\pi] $ 

## Expectation step: 

$\hat \gamma_{i} = \frac{\hat \pi \phi_{\hat \theta2}(y_{i})}{(1-\hat \pi)\phi_{\hat \theta1}(y_{i}) + \hat \pi \phi_{\hat \theta2}(y_{i})}$ 

where $\hat \gamma$ is the responsibility. We want to replace $\Delta_{i}$ with $\hat \gamma_{i}$, which is a probabilistic estimate. If we put these in the $l(\theta)$  then it's a maximization problem, which we know how to solve. 

## Maximization step: 

The following look like means and variances with some weighting factors: 

$\hat \mu_{1} = \frac{\sum_{i=1}^{N} (1-\hat\gamma_{i})y_{i}}{\sum_{i=1}^{N} (1-\hat \gamma_{i})} $ 

$\hat \sigma_{1}^{2} = \frac{\sum_{i=1}^{N} (1-\hat\gamma_{i})(y_{i} - \hat \mu_{1})^{2}}{\sum_{i=1}^{N}...} $

$\hat \mu_{2} = \frac{\sum_{i=1}^{N} \hat \gamma_{i} y_{i}}{\sum_{i=1}^{N} \hat \gamma_{i}} $ 

$\hat \sigma_{2}^{2} = \frac{\sum_{i=1}^{N} \hat \gamma_{i} (y_{i} - \hat \mu_{1})^{2}}{\sum_{i=1}^{N} \hat \gamma_{i}}$

$\hat \pi = \frac{\sum_{i=1}^{N} \hat \gamma_{i}}{N} $ 


$\phi_{\theta1} = \frac{1}{\sqrt{2\pi}\sigma_{1}}e^-{\frac{(y-\mu_{1})^{2}}{2\sigma_{1}^{2}}}$ 

$\phi_{\theta2} = \frac{1}{\sqrt{2\pi}\sigma_{2}}e^-{\frac{(y-\mu_{2})^{2}}{2\sigma_{2}^{2}}}$ 

$log\phi_{\theta1} = \frac{-(y_{i} - \mu_{1})^{2}}{2\sigma_{1}^{2}} - log\sqrt{2\pi} - log\sigma_{1}$ 

$log\phi_{\theta2} = \frac{-(y_{i} - \mu_{2})^{2}}{2\sigma_{2}^{2}} - log\sqrt{2\pi} - log\sigma_{2}$

$\frac{\partial l}{\partial \mu_{1}} = 0$ --> $\mu_{1} = \frac{\sum_{i=1}^{N} (1-\gamma_{i})y_{i}}{\sum_{i=1}^{N}(1-\gamma_{i})}$ 

same for $\frac{\partial l}{\partial \mu_{1}} = 0$

$\frac{\partial l}{\partial \pi}  = 0$ --> $\pi = \frac{\sum_{i=1}^{N} \gamma_{i}}{N} $ 

$\frac{\partial l}{\partial \sigma_{1}} = 0$  --> $\sigma_{1}$ 

$\frac{\partial l}{\partial \sigma_{2}} = 0$  --> $\sigma_{2}$ 


Start with some initial guess for $\theta_{1,2}$, then find $\gamma_{i}$ --> $\theta_{1,2}$ and then iterate until fractional change is small.  

## Expectation-Maximization Algorithm for M-component Gaussian Mixtures


1) Take initial guess for $\hat \mu_{k}$, $\hat \sigma_{k}^{2}$, $\hat f_{k}$, where $\sum_{k=1}^{M} \hat f_{k} = 1$, and $g(y) = \sum_{k=1}^{M} f_{k} \phi_{\hat \theta_{k}}(y)$ 

2) Expectation step: calculate responsibilities $\hat \gamma_{i}^{(k)} = \frac{\hat f_{k} \phi_{\hat \theta_{k}}(y_{i})}{\sum_{k=1}^{M}f_{k} \phi_{\hat \theta_{k}}(y_{i})}$ 

3) Maximization step: $\hat \mu_{k} = \frac{\sum_{k=1}^{M} \hat \gamma_{i}^{(k)} y_{i}}{\sum_{k=1}^{M}  \hat \gamma_{i}^{(k)}}$ and $\hat \sigma_{k}^{2} = \frac{\sum_{k=1}^{M} \gamma_{i}^{(k)} (y_{i} - \hat \mu_{k})^{2}}{\hat \gamma_{i}}$, and $\hat f_{k} = \frac{\sum_{k=1}^{N} \hat \gamma_{i}^{(k)}}{N}$ 

4) Iterate step 2 and 3 until fractional change in $\theta$ < $\epsilon$

$L(\theta_{k}) = \Pi_{i=1}^{N} (\sum_{k=1}^{M} f_{k} \phi_{\theta_{k}}(y_{i}))$ 

$ l(\theta_{k}) = logL(\theta_{k}) = \sum_{i=1}^{N} log(\sum_{k=1}^{M} f_{k} \phi_{\theta_{k}}(y_{i})) $ 