#### Silhouette Score
Silhouette Score is used to find the optimal number of clusters.  

average distance between a point($i$) and all other points($j$) in the same cluster($C_I$):
$$a(i) = \frac{1}{|C_i| - 1} \sum_{j \in C_I, i \neq j} ||i-j||$$
minimum average distance between a point($i$) and all other points($j$) in the other cluster($C_J$):
$$b(i) = \min_{J\neq I} \frac{1}{|C_J|} \sum_{j \in C_J} ||i-j||$$
silhouette score of a point($i$):
$$s(i) = 
\begin{cases} \frac{b(i) - a(i)}{\max(a(i), b(i))},\mathrm{if} |C_I| > 1\\ 0,\mathrm{if} |C_I| = 1 \end{cases}$$
$$s(i) \in [-1, 1]$$
silhouette score of a cluster:
$$s(C) = \frac{1}{|C|} \sum_{i \in C} s(i)$$
$$s(C) \in [-1, 1]$$
silhouette score of the whole dataset:
$$s = \frac{1}{n} \sum_{i=1}^{n} s(i)$$
$$s \in [-1, 1]$$

The optimal number of clusters is the number of clusters that maximizes the silhouette score.  
If the silhouette score is close to 1, the clusters are well apart from each other and clearly distinguished.  
If the silhouette score is close to 0, the clusters are not well apart from each other.
If the silhouette score of an individual point is close to 1, the point is well-matched to nearby clusters but not to the current cluster.

In [1]:
import numpy as np

def silhouette_score(X, labels):
    n = len(X)
    a = np.array([np.linalg.norm(X - X[i], axis=1)[labels == labels[i]].mean() for i in range(n)])
    b = np.array([np.min([np.linalg.norm(X - X[i], axis=1)[labels == k].mean() for k in set(labels) - {labels[i]}]) for i in range(n)])
    s = (b - a) / np.maximum(a, b)
    return s.mean()