This project implements online-k-clustering algorithm as mentioned in this paper(http://cseweb.ucsd.edu/~dasgupta/291/lec6.pdf). It produces REALTIME k-clustering on an infinite stream of data. It is implemented on top of twitter storm and uses cassandra as database. It deals with 2-dimensional matrices and clusters in Euclidean space.
Java
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src/clustering
README

README

This project implements online-k-clustering algorithm as mentioned in this paper(http://cseweb.ucsd.edu/~dasgupta/291/lec6.pdf). It produces a REALTIME, DISTRIBUTED k-clustering on an infinite stream of data(Yes! you heard it right, it's realtime :-)). It is implemented on top of twitter storm and uses cassandra as distributed database. It deals with 2-dimensional matrices and clusters in Euclidean space.
Note: You can read more about twitter storm here(https://github.com/nathanmarz/storm/). This projects implements the algorithm in the local mode and not on actual cluster, but the same implementation can be ported to an actual cluster with very little changes.