This toolkit provides methods to execute the TPC-H, TPC-DS, and SSB benchmark on Swarm64 DA and native PostgreSQL.
- Python min. 3.6 and pip3
- Install additional packages with
pip3 install -r requirements.txt
- For loading the data, the database must be accessible with the user
Create a database and load data
To load a database with a dataset, go to the correct benchmark directory:
loader.shscript with the following parameters:
||The schema to deploy. Schemas are directories in the current working directory and start with either
||The scale factor to use, such as
||The name of the target database. If the database does not exist, it will be created. If it does exist, it will be deleted and recreated.|
||The number of partitions to use, if applicable. Default: 32|
||Chunk large tables into smaller pieces during ingestion. Default: 10|
||Alternative host for the database. Default: localhost|
||Alternative port for the database. Default: 5432|
Depending on the scale factor you choose, the time it takes for the script to finish might take up to several hours. After the script creates the database, it loads the data, creates primary keys, foreign keys, and indices. Afterwards, it runs VACUUM and ANALYZE.
Run a benchmark
Start a benchmark:
./swarm64_run_tpc_benchmark \ --dsn postgresql://postgres@localhost/<target-db> \ --benchmark <tpch|tpcds|ssb>
This runs the benchmark without any query runtime restriction. Ideally, use the
--timeout parameter to limit query runtime. Queries might otherwise run for
several hours or longer.
||The full DSN of the DB to connect to. DSN layout:
The port is optional and the default is 5432.
Example with port 5433:
||The benchmark to use:
||Path to additional YAML configuration file.|
||The maximum time a query may run, such as
Test parameterization with additional YAML configuration
You can create an additional configuration file to control test execution more granularly. An example YAML file is as follows:
timeout: 30min ignore: - 20 - 21 - 22 dbconfig: max_parallel_workers: 96 max_parallel_workers_per_gather: 32
To use this file, pass the
--config=<path-to-file> argument to the test
executor. In this example, the query timeout is set to
30min. Queries 20, 21,
and 22 will not execute. Additionally, the database parameters
max_parallel_workers will change to 96 and
will change to
32. Any change to the database configuration is applied before
the benchmark starts and are reverted after the benchmark completes.