## Anomaly Detection Using Gaussian Mixtures

Anomaly detection (also called outlier detection) is the task of detecting instances that deviate strongly from the norm. These instances are called anomalies, or outliers, while the normal instances are called inliers. Anomaly detection is useful in a wide variety of applications, such as fraud detection, detecting defective products in manufacturing, or removing outliers from a dataset before training another model (which can significantly improve the performance of the resulting model) 

Using a Gaussian mixtrure model for anomaly detection is quite simple: any instance located in a low-density region can be considered an anomaly. We must define what density threshold we want to use.

For example, in a manufacturing company that tries to detect defective products, the ratio of defective products is usually well known. Say it is equal to 4%. We then set the density threshold to be the value that results in having 4% of the instances located in areas below that threshold density. If we notice that we get too many false positives (perfectly good products that are flagged as defective), you can lower the threshold. Conversely, if you have too many false negative (defective products that the system does not flag as defective), we can increase the threshold. This is the precision/recall trade-off. 

Here is how we would identify the outliers using the fourth percentile lowest density as the threshold (e.g., approximately 4% of the instances will be flagged as anomolies):
    

In [1]:
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=1000, noise=0.05)

In [4]:
from sklearn.mixture import GaussianMixture

gm = GaussianMixture(n_components=3, n_init=10)
gm.fit(X)

GaussianMixture(n_components=3, n_init=10)

In [7]:
import numpy as np

densities = gm.score_samples(X)
density_threshold = np.percentile(densities, 4)
anomalies = X[densities < density_threshold]
anomalies

array([[ 1.19704522, -0.50773101],
       [-0.96905764, -0.0578824 ],
       [ 1.97825049,  0.56418079],
       [-0.95043252,  0.05939401],
       [-0.2120919 ,  0.95138865],
       [ 1.9545389 ,  0.44903743],
       [ 1.23300595, -0.48006507],
       [ 2.03574676,  0.52792933],
       [ 1.19453439, -0.53022682],
       [-1.0597144 , -0.06641306],
       [ 1.30395244, -0.38341852],
       [ 1.19058025, -0.4704052 ],
       [-0.96738628,  0.00993892],
       [ 0.99580937,  0.55454215],
       [-1.02637829,  0.00731303],
       [ 1.20032629, -0.45974999],
       [ 1.37180184, -0.29016988],
       [ 1.23513236, -0.40915604],
       [ 1.26437352, -0.34235832],
       [ 1.99148518,  0.57083883],
       [ 1.99913375,  0.53707771],
       [-1.00727314, -0.00930005],
       [-0.13979   ,  1.06240234],
       [-0.94209061,  0.04002852],
       [-0.20671228,  0.90729231],
       [-0.10545596,  1.12512775],
       [-0.22264224,  0.94535617],
       [-0.22732885,  0.92276511],
       [ 1.28827615,