Reservoir Sampling over Joins

This project provides the experiment code and scripts of our paper: Reservoir Sampling over Joins.

Structure

RSJoin: Code and scripts of our algorithms RSJoin and RSJoin_opt.
SJoinMod: Modifications to the baseline SJoin in order to support more experiments.

Queries

The queries used in our experiments are included in Queries.

Preparation

RSJoin

# build RSJoin/RSJoin_opt
cd RSJoin
mkdir release
cd release
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build .

SJoin

# clone SJoin
git clone git@github.com:InitialDLab/SJoin.git
# apply the modifications
cp -rf SJoinMod/* ./SJoin
# build SJoin
cd SJoin/sjoin
mkdir release
cd release
CXXFLAGS=-O2 CPPFLAGS=-DNDEBUG ../configure
make
# build TPC-DS data preprocess in SJoin
cd ../../tpcds_data_proc
make

Data

Graph

Download the Epinions dataset

curl -O https://snap.stanford.edu/data/soc-Epinions1.txt.gz > /dev/null 2>&1
gzip -d soc-Epinions1.txt.gz
tail -n +5 soc-Epinions1.txt > epinions.txt
rm -f soc-Epinions1.txt
sed -i "s/\t/,/g" epinions.txt

Modify the inputFile and outputDir in Utils/GraphDataPreprocess.scala. Then run

scala -J-Xmx200g -J-Xms200g ./Utils/GraphDataPreprocess.scala

TPC-DS

Download the TPC-DS tool from the website and unzip.
Run make under the $tpc-ds-tool/tools/ folder. ($TPC-DS-tool is the extracted folder)
Run $tpc-ds-tool/tools/dsdgen [your options] to generate TPC-DS data
Run create_qx_data $tpc-ds-tool/tools/ qx_sf10.dat in SJoin/tpcds_data_proc/
Run create_qy_data and create_qz_data

LDBC-SNB

Clone LDBC SNB Datagen from git@github.com:ldbc/ldbc_snb_datagen_spark.git
Build the Datagen following the instructions
Run the following

PLATFORM_VERSION=$(sbt -batch -error 'print platformVersion')
DATAGEN_VERSION=$(sbt -batch -error 'print version')
LDBC_SNB_DATAGEN_JAR=$(sbt -batch -error 'print assembly / assemblyOutputPath')
./tools/run.py --parallelism 1 --memory 180g -- --format csv --scale-factor 1 --mode raw

Modify the inputDir and outputFile in Utils/SNBDataPreprocess.scala. Then run

scala -J-Xmx200g -J-Xms200g ./Utils/SNBDataPreprocess.scala

Experiments

RSJoin

Modify the $data_path in RSJoin/scripts/run_*.sh
Run RSJoin/scripts/run_rsjoin.sh and RSJoin/scripts/run_rsjoin_opt.sh
Run RSJoin/scripts/run_line3_input_size.sh
Run RSJoin/scripts/run_line3_sample_size.sh
Run RSJoin/scripts/run_line4_update_time.sh
Run RSJoin/scripts/run_qz_scale_factor.sh
Run RSJoin/scripts/run_qz_fk_scale_factor.sh

SJoin

Modify the $data_path in SJoin/scripts/run_*.sh
Run SJoin/scripts/run_sjoin.sh and SJoin/scripts/run_sjoin_opt.sh
Run SJoin/scripts/run_line3_input_size.sh
Run SJoin/scripts/run_line3_sample_size.sh
Run SJoin/scripts/run_line4_update_time.sh
Run SJoin/scripts/run_qz_fk_scale_factor.sh

RSWP

Modify the $data_path in RSJoin/scripts/run_predicate*.sh
Run RSJoin/scripts/run_predicate_density.sh and RSJoin/scripts/run_predicate_input_size.sh

Result

Collect the results from run_*.out files in the same folder.

Build your own queries

Line-k joins and Star-k joins

Line-k joins and Star-k joins are supported for any k > 1. See line_joins and star_joins.

General acyclic joins

You can implement acyclic joins using the JoinTreeTemplate. See Q10Algorithm as an example.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
RSJoin		RSJoin
SJoinMod/sjoin		SJoinMod/sjoin
Utils		Utils
Queries.md		Queries.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reservoir Sampling over Joins

Structure

Queries

Preparation

RSJoin

SJoin

Data

Graph

TPC-DS

LDBC-SNB

Experiments

RSJoin

SJoin

RSWP

Result

Build your own queries

Line-k joins and Star-k joins

General acyclic joins

About

Releases

Packages

Languages

hkustDB/Reservoir-Sampling-over-Joins

Folders and files

Latest commit

History

Repository files navigation

Reservoir Sampling over Joins

Structure

Queries

Preparation

RSJoin

SJoin

Data

Graph

TPC-DS

LDBC-SNB

Experiments

RSJoin

SJoin

RSWP

Result

Build your own queries

Line-k joins and Star-k joins

General acyclic joins

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages