CSB

Getting Started

Dependencies

Apache Maven - Version 3.3.9
Apache Spark - Version 2.1.0

Building

mvn package

Running

The execution of the application is split in two different phases:

Analysis of the input dataset and generation of the seed graph and the probability distributions of the connections properties
Data generation with either Barabási–Albert or Kronecker algorithms

Running phase 1

$SPARK_HOME/bin/spark-submit csb.jar seed -a $DATA/dataset_01/alert -b $DATA/dataset_01/conn.log

The output will be:

An aug.log file which represents the join between the input log files
A set of .ser files which represent the probability distributions of the connections properties
seed_vertices and seed_edges folders which contain respectively the serialized representation of the vertices and edges of the seed graph

Running phase 2

The phase 2 is composed of three separate steps:

Synthetic graph generation, which is done using either one the algorithms listed below
Properties generation, which can be skipped with the -x or the --exclude-prop options
Graph saving using Spark serialization methods (Neo4j serialization is under development)
Veracity computation of degree, in-degree, out-degree and PageRank metrics

Barabási–Albert

The following will run the Barabási–Albert algorithm with 10 iterations:

$SPARK_HOME/bin/spark-submit csb.jar synth ba -m all 10 0.2

Kronecker

Note: the KronFit algorithm is currently under development, so a static version of the seed matrix (seed.mtx) of the provided dataset is included in the same folder.

The following will run the stochastic Kronecker algorithm with 15 iterations using the seed.mtx matrix:

$SPARK_HOME/bin/spark-submit csb.jar synth kro -m all 15 $DATA/dataset_01/seed.mtx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CSB

Getting Started

Dependencies

Building

Running

Running phase 1

Running phase 2

Barabási–Albert

Kronecker

Files

README.md

Latest commit

History

README.md

File metadata and controls

CSB

Getting Started

Dependencies

Building

Running

Running phase 1

Running phase 2

Barabási–Albert

Kronecker