LDBC SNB Data Converter

Scripts to convert from raw graphs produced by the SNB Datagen to graph data sets using various layouts (e.g. storing edges as merged foreign keys).

This repository uses a mix of Bash, Python, and DuckDB SQL scripts. The get.sh script installs the Python dependencies and downloads a recent DuckDB binary if it does not exist in the repository directory (the script is automatically invoked by load.sh).

If you want to use a custom-built DuckDB binary:

set the DUCKDB_PATH environment variable to the location of the duckdb binary (default value: .)
make sure the Python packages has been recompiled (see instructions)

Example data set

The example data set in this repository reflects the toy graphs used in the LDBC SNB:

The example graph is serialized using the raw serializer (composite-merged-fk layout) which contains the entire temporal graph without filtering/batching.

Generate data sets

Use the data generator in raw mode to generate the data sets. Set the $LDBC_DATA_DIRECTORY environment variable to point to the directory of Datagen's output (containing the static and dynamic directories). Currently, you also have to concatenate the CSVs using the following script.

DATAGEN_OUTPUT_DIR=TodoSetMe
LDBC_DATA_DIRECTORY=${DATAGEN_OUTPUT_DIR}/csv/raw/composite-merged-fk
./spark-concat.sh ${LDBC_DATA_DIRECTORY}

Processing data sets

To process the data sets, run the following scripts (the first one downloads DuckDB if it's not yet available):

./load.sh ${LDBC_DATA_DIRECTORY} --no-header
./transform.sh
./export.sh
# optional
./rename.sh

The duckdb directory contains Python and SQL scripts to convert data to other formats (e.g. CsvCompositeProjectedFK and CsvSingularMergedFK).

Deployed data sets

Parameter generation

Run paramgen as follows:

./load.sh ${LDBC_DATA_DIRECTORY} --no-header
./transform.sh
./factor-tables.sh
./paramgen.sh

Workflows

The workflow-* directories test the benchmark workflow, i.e. loading the initial data set, then applying the batches sequentially. Each batch consists of deletes and inserts. Currently, the scripts first apply the the deletes, then the inserts. Note however that the updates can be applied in any order, even interleaved.

Generating batches

To generate batches and test them, first load the data with a load.sh (parameterized for your data set), then run the scripts for producing/loading the data set/batches.

./load.sh
./transform.sh
./generate-batches.sh

The transform.sh script produces the initial snapshot of the data.
The generate-batches.sh script produces batches of a given timespan (e.g. one per year) in the batches/ directory.

On the example graph:

The data spans 4 years in the interval 2010-2013 (inclusive on both ends).
There is one batch per year.

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
.github/workflows		.github/workflows
data		data
degree		degree
export		export
parameters		parameters
schema		schema
snippets		snippets
sql		sql
workflow-cypher		workflow-cypher
workflow-sql		workflow-sql
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
batches.py		batches.py
export.sh		export.sh
factor-tables.sh		factor-tables.sh
generate-batches.sh		generate-batches.sh
get.sh		get.sh
load.sh		load.sh
paramgen.sh		paramgen.sh
rename.sh		rename.sh
spark-concat.sh		spark-concat.sh
test-paramgen.sh		test-paramgen.sh
transform.sh		transform.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LDBC SNB Data Converter

Example data set

Generate data sets

Processing data sets

Deployed data sets

Parameter generation

Workflows

Generating batches

About

Releases

Packages

Contributors 2

Languages

License

ldbc/ldbc_snb_example_data

Folders and files

Latest commit

History

Repository files navigation

LDBC SNB Data Converter

Example data set

Generate data sets

Processing data sets

Deployed data sets

Parameter generation

Workflows

Generating batches

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages