### Performance metrics for Clustering methods

#### 1. Adjusted Rand Index

- a function that computes a similarity measure between two clustering
- for computation it considers all pairs of samples and counting pairs that are assigned in the similar or different clusters in the predicted and true clustering
- value of ARI ranges 0 to 1, more closer to one would mean better clustering

Mathematically:
$$Adjusted\:RI=\left(RI-Expected_{-}RI\right)/\left(max\left(RI\right)-Expected_{-}RI\right)$$

In [2]:
from sklearn.metrics.cluster import adjusted_rand_score
   
labels_true = [0, 0, 1, 1, 1, 1]
labels_pred = [0, 0, 2, 2, 3, 3]

adjusted_rand_score(labels_true, labels_pred)

0.4444444444444444

#### 2.Mutual Information Based Score

    Computes the agreement of the two assignments. It ignores the permutations. There are following versions available: 
    
Normalized Mutual Information (NMI):
    
    from sklearn.metrics.cluster import normalized_mutual_info_score as NMIS
    NMIS(labels_true, labels_pred)
    
Adjusted Mutual Information (AMI):
    
    from sklearn.metrics.cluster import adjusted_mutual_info_score as AMIS
    AMIS(labels_true, labels_pred)

#### 3. Fowlkes-Mallows Score

- measures the similarity of two clustering of a set of points.
- it is geometric mean of the pairwise precision and recall.

$$FMS=\frac{TP}{\sqrt{\left(TP+FP\right)\left(TP+FN\right)}}$$

- Here, TP = True Positive − number of pair of points belonging to the same clusters in true as well as predicted labels both.
- FP = False Positive − number of pair of points belonging to the same clusters in true labels but not in the predicted labels.
- FN = False Negative − number of pair of points belonging to the same clusters in the predicted labels but not in the true labels

In [6]:
from sklearn.metrics.cluster import fowlkes_mallows_score
labels_true = [0, 0, 1, 1, 1, 1]
labels_pred = [0, 0, 2, 2, 3, 3]

fowlkes_mallows_score (labels_true, labels_pred)

0.6546536707079771

#### 4.Silhouette Coefficient

it compute the mean Silhouette Coefficient of all samples using the mean intra-cluster distance and the mean nearest-cluster distance for each sample.

Mathematically,
$$S=\left(b-a\right)/max\left(a,b\right)$$

    Here, a is intra-cluster distance.
    and, b is mean nearest-cluster distance.

In [8]:
# Compute silhoute coffient for KMeans model

In [11]:
import numpy as np
import pandas as pd

from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.metrics.cluster import silhouette_score

# U can also import directly from metrics
from sklearn.metrics import silhouette_score,pairwise_distances

# load data
iris=load_iris()
X=iris.data
y=iris.target

model = KMeans(n_clusters = 3, random_state = 1).fit(X)
labels = model.labels_
silhouette_score(X, labels, metric = 'euclidean')

0.5528190123564091

#### 5. Contingency Matrix

This matrix will report the intersection cardinality for every trusted pair of (true, predicted). Confusion matrix for classification problems is a square contingency matrix.

In [12]:
from sklearn.metrics.cluster import contingency_matrix
x = ["a", "a", "a", "b", "b", "b"]
y = [1, 1, 2, 0, 1, 2]
contingency_matrix(x, y)

array([[0, 2, 1],
       [1, 1, 1]], dtype=int64)

In [None]:
# 

In [14]:
xfit = np.linspace(-1, 11)

In [16]:
Xfit = xfit[:, np.newaxis]

array([[-1.        ],
       [-0.75510204],
       [-0.51020408],
       [-0.26530612],
       [-0.02040816],
       [ 0.2244898 ],
       [ 0.46938776],
       [ 0.71428571],
       [ 0.95918367],
       [ 1.20408163],
       [ 1.44897959],
       [ 1.69387755],
       [ 1.93877551],
       [ 2.18367347],
       [ 2.42857143],
       [ 2.67346939],
       [ 2.91836735],
       [ 3.16326531],
       [ 3.40816327],
       [ 3.65306122],
       [ 3.89795918],
       [ 4.14285714],
       [ 4.3877551 ],
       [ 4.63265306],
       [ 4.87755102],
       [ 5.12244898],
       [ 5.36734694],
       [ 5.6122449 ],
       [ 5.85714286],
       [ 6.10204082],
       [ 6.34693878],
       [ 6.59183673],
       [ 6.83673469],
       [ 7.08163265],
       [ 7.32653061],
       [ 7.57142857],
       [ 7.81632653],
       [ 8.06122449],
       [ 8.30612245],
       [ 8.55102041],
       [ 8.79591837],
       [ 9.04081633],
       [ 9.28571429],
       [ 9.53061224],
       [ 9.7755102 ],
       [10