Skip to content
K-Means and Bisecting K-Means clustering algorithms implemented in Python 3.
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
kmeans Add function to get inter-cluster distances Apr 18, 2018
test Fix manhattan distance test Apr 17, 2018
k-means-clustering.pdf Update notebook and make PDF demonstrating results Apr 18, 2018
k-means-plot.png Plot centroids Apr 17, 2018

K-Means Clustering

Build Status Coverage Status

A k-means clustering implementation in Python.

API inspired by Scikit-learn.

Reference: Introduction to Data Mining (1st Edition) by Pang-Ning Tan Section 8.2, Page 496


from typing import List

from dataviz import generate_clusters
from dataviz import plot_clusters
from kmeans import KMeans

def generate_data(num_clusters: int, seed=None) -> List[List]:
    num_points = 20
    spread = 7
    bounds = (1, 100)
    return generate_clusters(num_clusters, num_points, spread, bounds, bounds, seed)

num_clusters = 4
clusters = generate_data(num_clusters, seed=1)
k_means = KMeans(num_clusters=num_clusters, seed=4235)
plot_clusters(clusters, k_means.labels_, k_means.centroids_)


print('Total Sum of Squared Error (SSE): {}'.format(k_means.inertia_))
Total Sum of Squared Error (SSE): 230.0880894560679
You can’t perform that action at this time.