Skip to content
K-Means and Bisecting K-Means clustering algorithms implemented in Python 3.
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
dataviz
kmeans Add function to get inter-cluster distances Apr 18, 2018
test Fix manhattan distance test Apr 17, 2018
.gitignore
.travis.yml
LICENSE
README.md
k-means-clustering.ipynb
k-means-clustering.pdf Update notebook and make PDF demonstrating results Apr 18, 2018
k-means-plot.png
main.py Plot centroids Apr 17, 2018
requirements.txt
specification.pdf

README.md

K-Means Clustering

Build Status Coverage Status

A k-means clustering implementation in Python.

API inspired by Scikit-learn.

Reference: Introduction to Data Mining (1st Edition) by Pang-Ning Tan Section 8.2, Page 496

Usage

from typing import List

from dataviz import generate_clusters
from dataviz import plot_clusters
from kmeans import KMeans

def generate_data(num_clusters: int, seed=None) -> List[List]:
    num_points = 20
    spread = 7
    bounds = (1, 100)
    return generate_clusters(num_clusters, num_points, spread, bounds, bounds, seed)

num_clusters = 4
clusters = generate_data(num_clusters, seed=1)
k_means = KMeans(num_clusters=num_clusters, seed=4235)
k_means.fit(clusters)
plot_clusters(clusters, k_means.labels_, k_means.centroids_)

png

print('Total Sum of Squared Error (SSE): {}'.format(k_means.inertia_))
Total Sum of Squared Error (SSE): 230.0880894560679
You can’t perform that action at this time.