BICO is a fast streaming algorithm and reduction technique for the k-means problem. It is a python implementation of the original code from http://ls2-www.cs.uni-dortmund.de/bico/ .
Switch branches/tags
Nothing to show
Clone or download
Marc Bury
Latest commit f471ca3 Jan 12, 2017

README.rst

BICO (BIrch using COresets)

This is a python implementation of the original BICO code.

BICO is a fast streaming algorithm for the k-means problem. BICO reduces the number of input points to a specified amount such that the k-means solution on the compressed input set is still a high quality solution to the original data set. BICO combines a tree-like data structure of SIGMOND Test of Time Award winning algorithm BIRCH with reduction techniques from clustering theory.


Applications

Massive Data Sets

Running k-means on a very large data set is even nowadays infeasible. BICO can reduce the amount of input points drastically such that running k-means on the resulting reduced data set is just a matter of seconds. BICO ensures that the solution is of similar quality as the solution on the original input set.

Compression

BICO reduces a set of input points to a specified number of points while preserving the k-means solution quality as good as possible. Thus, BICO is a perfect tool to shrink the size of the data in order to reduce space or bandwidth.

Getting Started

Have a look at this example for an introduction on how to use this library and some visualizations presenting the results of the computed data reduction.

Slides showing a more technical overview of BICO can be found here. If you are interested in the theoretical point of view of BICO, please feel free to check Section 5.4 of this thesis which contains a very detailed description of the algorithm including all proofs of theoretical guarantees.