Learn more at stingergraph.com.
.
├── CMakeLists.txt
├── README.md
├── SOURCEME.sh
└── src
├── bin
│ ├── clients
│ │ ├── algorithms
│ │ ├── streams
│ │ └── tools
│ ├── server
│ └── standalone
│ ├── breadth_first_search
│ ├── community_reagglomeration
│ ├── connected_components
│ ├── insert_remove_benchmark
│ ├── protobuf_test
│ ├── streaming_clustering_coefficients
│ └── streaming_connected_components
└── lib
├── fmemopen
├── int_hm_seq
├── int_ht_seq
├── intvec
├── kv_store
├── protobuf
├── pugixml
├── stinger_core
├── stinger_utils
├── string
└── vtx_set
STINGER is built using CMake. From the root of STINGER, first create a build directory:
mkdir build && cd build
. ../SOURCME.sh
Then call CMake from that build directory to automatically configure the build and to create a Makefile:
cmake ..
Finally, call make to build all libraries and executable targets (or call make and the name of an executable or library to build):
make
All binary targets will be built and placed in build/bin. They are named according to the folder from which they were built (so src/bin/server produces build/bin/server, src/bin/clients/tools/json_rpc_server produces build/bin/json_rpc_server, etc.). If you ran SOURCEME.sh from the build directory as instructed above, the build/bin directory is appended to your path.
All library targets are built as both static and shared libraries by default and are placed in build/lib as .so and .a files (or .dylib on Mac). Headers for these libraries are copied into build/include/library_name. The build/include directory is in the include path of all targets.
As indicated by the directory structure, there are three primary types of targets (client, server, standalone) and subtypes in the case of clients.
Standalone executables are generally self-contained and use the STINGER libraries for the graph structure and supporting functions. Most of the existing standalone executables demonstrate a single streaming or static algorithm on a synthetic R-MAT graph and edge stream.
The STINGER server maintains a STINGER graph in memory and can maintain multiple connections with clients.
Client streams can send edges consisting of source, destination, weight, time, and type where some fields are optional and others can optionally be text strings.
Client algorithms will receive these batches of updates in a somewhat
synchronous manner as well as shared-memory read only access to the complete graph. The server provides the capability
for client algorithms to request a shared memory space to store results and communicate with other algorithms.
Client algorithms declare dependencies when they connect and receive the mapped data in the returned structure.
The server guarantees that all of an algorithm's dependencies will finish processing before that algorithm is executed.
Clients algorithms are required to provide a description string that indicates what data is stored and the type of the data.
Client tools are intended to be read-only, but are notified of all running algorithms and shared data. An example of this kind of client is the JSON RPC server (src/bin/clients/tools/json_rpc_server). This server provides access to shared algorithm data via JSON RPC calls over HTTP. Additionally, some primitive operations are provided to support selecting the top vertices as scored by a particular algorithm or obtaining the shortest paths between two vertices.
To run an example using the server and five terminals:
term1:build$ server
term2:build$ json_rpc_server
term3:build$ static_components
term4:build$ pagerank
term5:build$ rmat_edge_generator -n 100000 -x 10000
This will start a stream of R-MAT edges over 100,000 vertices in batches of 10,000 edges. A connected component labeling and PageRank scoring will be maintained. The JSON RPC server will host interactive web pages at http://localhost:8088/full.html are powered by the live streaming analysis.
Given a stream of Tweets in Twitter's default format (a stream of JSON objects, one per line), it is fairly easy to pipe the user mentions / retweets graph into STINGER using the json_stream. The json_stream is a templated JSON stream parser designed to consume one object per line like the Twitter stream and to produce edges from this stream based on a template.
The templates can use the following variables (where one of the two source and one of the two destination variables must be used):
$source_str - The source vertex name
$source - The source of the edge as a number (must be able to parse as an integer
less than the maximum vertex ID in the STINGER server).
$source_type - A string representing the type of the source vertex
$source_weight - A number to be added to the weight of the source vertex (vertex weights
start at zero).
$destination_str - The destination vertex name
$destination - The destination of the edge as a number (must be able to parse as an
integer less than the maximum vertex ID in the STINGER server).
$destination_type - A string representing the type of the destination vertex
$destination_weight - A number to be added to the weight of the destination vertex (vertex
weights start at zero).
$type_str - The edge type as a string
$weight - The weight of the edge (must be able to parse as an integer).
$time - The time of the edge (must be able to parse as an integer).
For example, the simplest template for Twitter mentions and retweets would be (we'll call this template.json):
{
"user": {
"screen_name": "$source_str1"
},
"entities": {
"user_mentions": [
{
"screen_name": "$destination_str1"
}
]
},
"this_doesnt_matter": "$source_type=user",
"same_here": "$destination_type=user",
"and_here": "$type=mention"
}
To parse a Twitter stream into STINGER using this template:
cat twitter_sample.json | ./bin/json_stream template.json
You can replace the 'cat twitter_sample.json' command with one of the curl commands from the Twitter developer API page to directly inject a live Twitter stream (obviously you should go to dev.twitter.com to get your own OAuth data):
curl --request 'POST' 'https://stream.twitter.com/1.1/statuses/sample.json' --header
'Authorization: OAuth oauth_consumer_key="KEYKEYKEY", oauth_nonce="NONCENONCENONCE",
oauth_signature="SIGSIGSIG", oauth_signature_method="HMAC-SHA1", oauth_timestamp="ts",
oauth_token="TOKENTOKENTOKEN", oauth_version="1.0"' --verbose | ./bin/json_stream template.json
To create a toy R-MAT graph (256K vertices and 2M undirected edges) and run the insert-remove benchmark:
term1:build$ rmat_graph_generator -s 18 -e 8
term1:build$ insert_remove_benchmark -n 1 -b 100000 g.18.8.bin a.18.8.100000.bin