# <center style='color:dodgerblue'> Internal Cluster Validation: `Hartigan` index </center>

## 1. Import required libraries

In [1]:
import river
print('river module version:', river.__version__)

from river.cluster import KMeans
from river.stream import iter_array
from river.metrics.cluster import Hartigan

from sklearn import datasets

river module version: 0.7.0


## 2. Create dataset 

In [2]:
features, _ = datasets.make_classification(n_samples=250, n_features=4, random_state=0)

##### `Internal cluster validation` is applicaple in situations where ground truth information is absent. Hence, we didn't generate any labels while creating our dataset. 

In [3]:
features.shape # (samples, features)

(250, 4)

## 3. Perform K-Means clustering considering 2 and 3 clusters

In [4]:
kmeans1 = KMeans(n_clusters=2, seed=0)
kmeans2 = KMeans(n_clusters=3, seed=0)

## 4. Calculate `Hartigan` index

In [5]:
metric1 = Hartigan()
metric2 = Hartigan()

In [6]:
for i, _ in iter_array(features):
    kmeans1 = kmeans1.learn_one(i)
    pred1 = kmeans1.predict_one(i)
    clustering_metric1 = metric1.update(i, pred1, kmeans1.centers)

In [7]:
for i, _ in iter_array(features):
    kmeans2 = kmeans2.learn_one(i)
    pred2 = kmeans2.predict_one(i)
    clustering_metric2 = metric2.update(i, pred2, kmeans2.centers)

In [8]:
print('Hartigan index for 2-clusters:', round(clustering_metric1.get(), 5))
print('Hartigan index for 3-clusters:', round(clustering_metric2.get(), 5))

Hartigan index for 2-clusters: 0.94261
Hartigan index for 3-clusters: 2.06436


### Higher value of `Hartigan` index indicates better clustering.