Jupyter Notebook Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
lib
notebooks
pickle
README.md
environment.yml

README.md

The k-means-u* algorithm

non-local jumps and greedy retries improve k-means++ clustering

GitHub Logo

This repository contains example python code for the k-means-u and k-mean-u* algorithms as proposed in https://arxiv.org/abs/1706.09059.

Quick Start

  • clone repository: git clone https://github.com/gittar/k-means-u-star
  • cd main directory: cd k-means-u-star
  • install miniconda or anaconda: https://conda.io/docs/install/quick.html
  • create kmus environment: conda env create -f envsimple.yml
  • activate environment: source activate kmus (on windows: activate kmus)
  • start one of the jupyter notebooks, e.g.: jupyter notebook notebooks/algo-pure.ipynb
  • continue in the browser window which opens (jupyter manual: http://jupyter-notebook.readthedocs.io/en/latest/)

jupyter notebooks:

  • algo-pure.ipynb
    a bare-bones implementation meant for easy understanding of the algorithms
  • simu-detail.ipynb
    detailed simulations and graphics to illustrate the way the algorithms work, uses kmeansu.py
  • simu-bulk.ipynb
    systematic simulations with various data sets to compare k-means-++, k-means-u and k-means-u*, uses kmeansu.py
  • dataset_class.ipynb
    examples for using the data generator

python files:

  • kmeansu.py
    main implementation of k-means-u and k-means-u*, makes heavy use of http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html for efficient implementations of k-means and k-means++, gathers certain statistics while training to enable systematic evaluation, code therefore a bit larger
  • bfdataset.py
    contains a class "dataset" to generate test data sets and also an own implementation of k-means++ which allows to access the codebook after initialization but before the run of k-means
  • bfutil.py
    various utility functions for plotting etc.