DStream clustering algorithm implementation in Clojure
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
doc
papers
src/og/clj_dstream
test/og/clj_dstream
.gitignore
.travis.yml
CHANGELOG.md
LICENSE
README.md
docker-compose.yml
int-test.Dockerfile
project.clj
run_docker_compose_stack.sh
server.Dockerfile

README.md

clj-dstream

Build Status

What Is It

Density-based data stream clustering for arbitrary dimension data written in Clojure.

Reference paper.

Sample clustering for time-series data with moving hotspot: Moving Stream GIF

Running Tests

lein test

Note: If you install imagemagick for your platform, which supplies the command convert, animated gifs (like in this README) will be generated by the tests in addition to heatmaps of clusters and grid density data.

In addition to the unit tests, there are tests which use generated data, each of which creates an output directory that will contain heatmaps in a time-series for the clusters at a given time and the grid densities at a given time.

Integration

You can run integration tests of the RPC server running a DStream clustering server instance:

./run_docker_compose_stack.sh

Crater Dataset

When faced with data shapes like a crater, which can look like so: Crater Grids SVG

Clustering results, where each cluster has a distinct color: Crater Clusters GIF

Visualizing Clusters For High-Dimensional Data

When dimensionality is greater than 2, we use t-SNE to reduce the dimensionality of the clusters at a given time to 2. We can then create plots like this, which are 3 different clusters in 5-dimensional space staying static through time, demonstrating the non-deterministic nature of t-SNE, but also how valuable visualizing its results can be: TSNE Clusters

License

Copyright © 2017 FIXME

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.