**GMMs fit a number of normal distributions to our data set by estimating their parameters using what's called expectation maximization. This is a two-step iterative algorithm (in some ways similar to K-means):**

* **Expectation:** Generate a number of distributions with reasonable parameters (mean and variance) based on the given data, then "ask" every data point how likely is it to fall within each. As with K-means, you need to specify the number of clusters — or in this case, we call them "components" — a priori.

* **Maximization:** Iterate and update our distribution parameters to maximize the data points' likelihood of being assigned to the most probable cluster.
-------------------------------
* Pros: GMM allows data to vary anisotropically and provides probability estimates of cluster membership rather than "hard labeling" data points like K-means.

* Cons: GMM still assumes normal distributions across dimensions and requires the number of components/clusters are specified a priori.

**Both K-means and GMMs include every data point in a cluster no matter how far away it is from the nearest centroid.**

In a real world dataset (being used for customer segmentation) there is bound to be **outliners** and both K-Means and GMMs fail to detect those thus their clusters contain **Noise**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from collections import Counter
from itertools import combinations 
from sklearn.cluster import MeanShift
from sklearn.preprocessing import scale

In [None]:
from sklearn.mixture import GaussianMixture

#Predict GMM cluster membership
gm_messy = GaussianMixture(n_components=3).fit(x_messy).predict(x_messy)

plt.figure(figsize=(15,8))
plt.subplot(121, title='"Messy" K-Means')
plt.scatter(x_messy[:,0], x_messy[:,1], c=km_messy, cmap=cmap)
plt.subplot(122, title='"Messy" GMM')
plt.scatter(x_messy[:,0], x_messy[:,1], c=gm_messy, cmap=cmap)