Skip to content

gulraizchoudhary/CentroidIndex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Centroid Index: Cluster Level Evaluation Algorithm

Centroid Index is a cluster level measures and returns the number of mismatches between two solutions. A simple method and is useful when ground truth is available.

Example 1: Cluster level similarity between k-means and k-means++ solutions using random dataset

import numpy as np
from evaluation import CentroidIndex as ci
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

#random 2D data set
X=np.random.rand(1000,2)

# number of centroids
k=50

for i in range(5):
    km = KMeans(n_clusters=k, init='random').fit(X)
    kmp = KMeans(n_clusters=k).fit(X)
    
    
    # relative SSE improvement of kmeans++ over kmeans
    imp = 1 - kmp.inertia_/km.inertia_
    print(f"SSE improvement over k-means: {imp:.2%}")
    
    #CI: Number of mismatch cluster from both solutions(kmeans++, kmeans)
    CI = ci.CentroidIndex(km.labels_, kmp.labels_)
    print(f"Mismatch between k-means and k-means++: {CI}")
    
    #plotting the k-means results
    for j in np.unique(km.labels_):
         plt.scatter(X[km.labels_ == j , 0] , X[km.labels_ == j , 1] , label = j)
    plt.scatter(km.cluster_centers_[:,0] , km.cluster_centers_[:,1] , s = 80, color = 'k')
    # displaying the title
    plt.title("k-means results of iteration: "+str(i))
    plt.show()
    
    #plotting the k-means++ results
    for j in np.unique(kmp.labels_):
         plt.scatter(X[kmp.labels_ == j , 0] , X[kmp.labels_ == j , 1] , label = j)
    plt.scatter(kmp.cluster_centers_[:,0] , kmp.cluster_centers_[:,1] , s = 80, color = 'k')
    # displaying the title
    plt.title("k-means++ results of iteration: "+str(i))
    plt.show()

Output

SSE improvement over k-means: 2.96%
Mismatch between k-means and k-means++: 8
SSE improvement over k-means: 4.65%
Mismatch between k-means and k-means++: 7
SSE improvement over k-means: 0.10%
Mismatch between k-means and k-means++: 4
SSE improvement over k-means: 3.61%
Mismatch between k-means and k-means++: 8
SSE improvement over k-means: 3.09%
Mismatch between k-means and k-means++: 8

Visuals

"Kmeans++ results on iteration: 0" "Kmeans results on iteration: 0"

"Kmeans++ results on iteration: 1" "Kmeans results on iteration: 1"

"Kmeans++ results on iteration: 2" "Kmeans results on iteration: 2"

"Kmeans++ results on iteration: 3" "Kmeans results on iteration: 3"

"Kmeans++ results on iteration: 4" "Kmeans results on iteration: 4"

Acknowledgements

Credit goes to Pasi Fränt, You may consider to read his paper for more understanding "Centroid index: Cluster level similarity measure", also credit to the scikit-learn team for their excellent sklearn.cluster.KMeans class.

License

MIT

About

Centroid Index Algorithm for Cluster Level Evaluation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages