Core streaming heterogenous graph clustering and anomaly detection code (KDD 2016)
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
LICENSE
LICENSE-MIT
Makefile
NOTICE
README.md Update README.md Feb 4, 2018
cluster.cpp
cluster.h add website link to headers Feb 17, 2016
docopt.cpp argument parsing with docopt Feb 18, 2016
docopt.h
docopt_private.h
docopt_util.h
docopt_value.h argument parsing with docopt Feb 18, 2016
graph.cpp
graph.h
hash.cpp add website link to headers Feb 17, 2016
hash.h
io.cpp
io.h
main.cpp update readme Feb 18, 2016
param.h
simhash.cpp
simhash.h
streamhash.cpp
streamhash.h
test_bootstrap_clusters.txt reorganize core code Feb 17, 2016
test_edges.txt argument parsing with docopt Feb 18, 2016
util.h add website link to headers Feb 17, 2016

README.md

StreamSpot Core

https://sbustreamspot.github.io

This repository contains the core streaming heterogenous graph clustering and anomaly detection code.

Before attempting execution, ensure you have the following available as two separate files:

  • Edges: A file containing one edge per line for all input graphs in the dataset. A sample is provided as test_edges.txt. The edge file used for the experiments in the paper is available at sbustreamspot-data.

  • Bootstrap clusters: A file describing the bootstrap clusters. A sample is provided as test_bootstrap_clusters.txt. Bootstrap clusters used for the experiments in the paper are available here.

The output will contain a summary of the execution parameters, runtime, and the graph cluster assignments and anomaly scores every 10,000 edges. This output can be further analyzed in various dimensions with sbustreamspot-analyze.

Compilation and execution has been tested with GCC 5.2.1 on Ubuntu 15.10.

Quickstart

git clone https://github.com/sbustreamspot/sbustreamspot-core.git
cd sbustreamspot-core
make clean optimized
./streamspot --edges=test_edges.txt \
             --bootstrap=test_bootstrap_clusters.txt \
             --chunk-length=10 \
             --num-parallel-graphs=10 \
             --max-num-edges=10 \
             --dataset=all

To use a different dataset, change test_edges.txt and test_bootstrap_clusters.txt accordingly.

Parameters

Most parameter settings are via the command-line; details on setting them can be viewed by running ./streamspot --help.

A few parameters are set at compile-time and can be found in param.h.

Contact