# Clustering Evaluation Metrics

Ada 2 skenario mengevaluasi model untuk kasus clustering:
1. Label target diketahui / hasil prediksi diterjemahkan ke label yang sesungguhnya
2. Label target tidak diketahui

In [2]:
import numpy as np
import pandas as pd

<hr>

### A. Label target diketahui

In [4]:
df = pd.DataFrame({
    'target': [0,0,0,0,0,1,1,1,1,1,2,2,2,2,2],
    'prediksi': [0,0,1,0,1,0,1,1,1,2,1,2,2,2,2]
})

<hr>

__1. Contingency Matrix__

In [6]:
from sklearn.metrics.cluster import contingency_matrix

In [7]:
contingency_matrix(df['target'], df['prediksi'])

array([[3, 2, 0],
       [1, 3, 1],
       [0, 1, 4]])

In [12]:
dfcm = pd.DataFrame(
    contingency_matrix(df['target'], df['prediksi']),
    columns = ['Pred 0', 'Pred 1', 'Pred 2'],
    index = ['Aktual 0', 'Aktual 1', 'Aktual 2']
)
dfcm

Unnamed: 0,Pred 0,Pred 1,Pred 2
Aktual 0,3,2,0
Aktual 1,1,3,1
Aktual 2,0,1,4


In [8]:
from sklearn.metrics import confusion_matrix

In [9]:
confusion_matrix(df['target'], df['prediksi'])

array([[3, 2, 0],
       [1, 3, 1],
       [0, 1, 4]], dtype=int64)

<hr>

__2. Adjusted Rand Index__

Index yang menyatakan _value similarity_ antara data aktual & prediksi. Nilai sempurna = __1__.

In [13]:
from sklearn.metrics import adjusted_rand_score

In [15]:
print(adjusted_rand_score(df['target'], df['prediksi']))
print(adjusted_rand_score(df['prediksi'], df['target']))

0.1914191419141914
0.1914191419141914


<hr>

__3. Adjusted Mutual Information (AMI)__

Index yang menyatakan _agreement_ antara data aktual & prediksi. Nilai sempurna = __1__.

In [16]:
from sklearn.metrics import adjusted_mutual_info_score

In [17]:
print(adjusted_mutual_info_score(df['target'], df['prediksi']))
print(adjusted_mutual_info_score(df['prediksi'], df['target']))

0.22018114009437642
0.22018114009437664


<hr>

__4. Homogeneity, Completeness & V-measure__

- __Homogeneity/Kesamaan__: Nilai yang menyatakan seberapa homogen elemen dalam klaster. Jika dalam suatu klaster hasil prediksi hanya terdiri dari elemen klaster tersebut, maka model semakin baik.
- __Completeness/Kelengkapan__: Nilai yang menyatakan seberapa lengkap elemen masuk ke suatu klaster. 
- __V-measure__: rata-rata harmoni homogeneity & completeness.

_Contoh_:

Jika hasil prediksi terdapat klaster C berisi semuanya elemen dari klaster A. Maka model tersebut _homogeneity_-nya bagus, tapi _completeness_-nya buruk.

In [18]:
from sklearn.metrics import (
    homogeneity_score, completeness_score, v_measure_score,
    homogeneity_completeness_v_measure
)

In [20]:
print(homogeneity_score(df['target'], df['prediksi']))
print(completeness_score(df['target'], df['prediksi']))
print(v_measure_score(df['target'], df['prediksi']))

0.3434275588083024
0.347675723708813
0.34553858466811854


In [21]:
homogeneity_completeness_v_measure(df['target'], df['prediksi'])

(0.3434275588083024, 0.347675723708813, 0.34553858466811854)

<hr>

__5. Fowlkes-Mallows Index__

Rata-rata geometri dari pasangan _precision_ dan _recall_. Nilai sempurna = __1__.

$$\textrm{FMI} = \frac {\textrm{TP}} {(\textrm{TP} + \textrm{FP}) \cdot (\textrm{TP} + \textrm{FN})}$$

In [22]:
from sklearn.metrics import fowlkes_mallows_score

In [23]:
fowlkes_mallows_score(df['target'], df['prediksi'])

0.4262867932595694

<hr>

### B. Label target tidak diketahui

In [24]:
from sklearn.datasets import load_iris
data = load_iris()

In [25]:
dfIris = pd.DataFrame(
    data['data'],
    columns = ['SL', 'SW', 'PL', 'PW']
)
dfIris.head(3)

Unnamed: 0,SL,SW,PL,PW
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2


In [26]:
from sklearn.cluster import KMeans

In [37]:
model = KMeans(n_clusters=3)
model.fit(dfIris[['SL', 'SW', 'PL', 'PW']])

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=None, tol=0.0001, verbose=0)

In [38]:
model.predict(dfIris[['SL', 'SW', 'PL', 'PW']])

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2,
       2, 2, 2, 0, 0, 2, 2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 0, 2, 2, 2, 2,
       2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 0], dtype=int32)

In [39]:
model.labels_

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2,
       2, 2, 2, 0, 0, 2, 2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 0, 2, 2, 2, 2,
       2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 0])

<hr>

__1. Silhouette Coefficient__

Menyatakan seberapa baik model dalam mendefinisikan cluster. Nilainya __-1__ sampai __1__, semakin tinggi semakin baik. Nilai __0__ artinya model kita bias dalam mendefinisikan cluster atau ada cluster yang saling beririsan.

In [31]:
from sklearn.metrics import silhouette_score

In [40]:
silhouette_score(
    dfIris[['SL', 'SW', 'PL', 'PW']],
    model.predict(dfIris[['SL', 'SW', 'PL', 'PW']]), # bisa juga dengan = model.labels_
)

0.5528190123564091

<hr>

__2. Calinski-Harabasz Index__

Menyatakan seberapa baik model dalam mendefinisikan cluster. Semakin tinggi nilainya, semakin baik.

In [41]:
from sklearn.metrics import calinski_harabasz_score

In [42]:
calinski_harabasz_score(
    dfIris[['SL', 'SW', 'PL', 'PW']],
    model.predict(dfIris[['SL', 'SW', 'PL', 'PW']]), # bisa juga dengan = model.labels_
)

561.62775662962

<hr>

__3. Davies-Bouldin Index__

Mengindikasikan seberapa baik model dalam memisahkan cluster satu dengan yang lainnya. Nilai sempurna = __0__. Nilai mendekati 0 artinya model semakin baik dalam membuat partisi/separator antar cluster.

In [43]:
from sklearn.metrics import davies_bouldin_score

In [44]:
davies_bouldin_score(
    dfIris[['SL', 'SW', 'PL', 'PW']],
    model.predict(dfIris[['SL', 'SW', 'PL', 'PW']]), # bisa juga dengan = model.labels_
)

0.6619715465007542