Skip to content
The Swarm64 TPC Toolkit
PLpgSQL TSQL Python Shell Smarty
Branch: master
Clone or download
sdressler Merge pull request #16 from swarm64/develop
Make loader runnable from any path (#15)
Latest commit 9358e9a Jan 16, 2020
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
configs Initial GH commit Jun 1, 2019
correctness_results correctness results for the new Q15 with CTE. Nov 25, 2019
queries Merge pull request #4 from swarm64/feature/ssb Dec 18, 2019
resources saving benchmark results and correctness as html Sep 17, 2019
schemas Make loader runnable from any path (#15) Jan 16, 2020
scripts Fail if dbname containes uppercase letters (#12) Jan 13, 2020
swarm64_tpc_toolkit Merge pull request #7 from swarm64/bugfix/5-toolkit-results-path Dec 20, 2019
tests Add check_correctness flag to streams test Dec 19, 2019
.gitignore dump query results Aug 12, 2019
.travis.yml Add pytest-mock Oct 15, 2019
LICENSE
README.md Change toolset to toolkit Jan 13, 2020
deploy.sh Fix deploy script Jun 3, 2019
requirements.txt added natsort to requirement.txt Jul 30, 2019
swarm64_run_tpc_benchmark Make loader runnable from any path (#15) Jan 16, 2020

README.md

Summary

This toolkit provides methods to execute the TPC-H, TPC-DS, and SSB benchmark on Swarm64 DA and native PostgreSQL.

Prerequisites

  • Python min. 3.6 and pip3
  • Install additional packages with pip3 install -r requirements.txt
  • For loading the data, the database must be accessible with the user postgres without password

Create a database and load data

  1. To load a database with a dataset, go to the correct benchmark directory:
    For TPC-H: cd schemas/tpch
    For TPC-DS: cd schemas/tpcds
    For SSB: cd schemas/ssb

  2. Run the loader.sh script with the following parameters:

    ./loader.sh
    --schema=
    --scale-factor=
    --dbname=

Required Parameters

Parameter Description
schema The schema to deploy. Schemas are directories in the current working directory and start with either sdb_ or psql_. The schema name equals the directory name.
scale-factor The scale factor to use, such as 10, 100 or 1000.
dbname The name of the target database. If the database does not exist, it will be created. If it does exist, it will be deleted and recreated.

Optional Parameters

Parameter Description
num-partitions The number of partitions to use, if applicable. Default: 32
chunks Chunk large tables into smaller pieces during ingestion. Default: 10
db-host Alternative host for the database. Default: localhost
db-port Alternative port for the database. Default: 5432

Depending on the scale factor you choose, the time it takes for the script to finish might take up to several hours. After the script creates the database, it loads the data, creates primary keys, foreign keys, and indices. Afterwards, it runs VACUUM and ANALYZE.

Run a benchmark

Start a benchmark:

./swarm64_run_tpc_benchmark \
    --dsn postgresql://postgres@localhost/<target-db> \
    --benchmark <tpch|tpcds|ssb>

This runs the benchmark without any query runtime restriction. Ideally, use the --timeout parameter to limit query runtime. Queries might otherwise run for several hours or longer.

Required Parameters

Parameter Description
dsn The full DSN of the DB to connect to. DSN layout:
postgresql://<user>@<host>:<target-port>/<target-db>
The port is optional and the default is 5432.
Example with port 5433: --dsn postgresql://postgres@localhost:5433/example-database
benchmark The benchmark to use: tpch, tpcds or ssb

Optional Parameters

Parameter Description
config Path to additional YAML configuration file.
timeout The maximum time a query may run, such as 15min.

Test parameterization with additional YAML configuration

You can create an additional configuration file to control test execution more granularly. An example YAML file is as follows:

timeout: 30min
ignore:
  - 20
  - 21
  - 22

dbconfig:
  max_parallel_workers: 96
  max_parallel_workers_per_gather: 32

To use this file, pass the --config=<path-to-file> argument to the test executor. In this example, the query timeout is set to 30min. Queries 20, 21, and 22 will not execute. Additionally, the database parameters max_parallel_workers will change to 96 and max_parallel_workers_per_gather will change to 32. Any change to the database configuration is applied before the benchmark starts and are reverted after the benchmark completes.

You can’t perform that action at this time.