Skip to content

DStream clustering algorithm implementation in Clojure

License

Notifications You must be signed in to change notification settings

ogeagla/clj-dstream

Repository files navigation

clj-dstream

Build Status

What Is It

Density-based data stream clustering for arbitrary dimension data written in Clojure.

Reference paper.

Sample clustering for time-series data with moving hotspot: Moving Stream GIF

Running Tests

lein test

Note: If you install imagemagick for your platform, which supplies the command convert, animated gifs (like in this README) will be generated by the tests in addition to heatmaps of clusters and grid density data.

In addition to the unit tests, there are tests which use generated data, each of which creates an output directory that will contain heatmaps in a time-series for the clusters at a given time and the grid densities at a given time.

Integration

You can run integration tests of the RPC server running a DStream clustering server instance:

./run_docker_compose_stack.sh

Crater Dataset

When faced with data shapes like a crater, which can look like so: Crater Grids SVG

Clustering results, where each cluster has a distinct color: Crater Clusters GIF

Visualizing Clusters For High-Dimensional Data

When dimensionality is greater than 2, we use t-SNE to reduce the dimensionality of the clusters at a given time to 2. We can then create plots like this, which are 3 different clusters in 5-dimensional space staying static through time, demonstrating the non-deterministic nature of t-SNE, but also how valuable visualizing its results can be: TSNE Clusters

License

Copyright © 2017 FIXME

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

Releases

No releases published

Packages

No packages published