feat: add a compaction test to the load generator #24925

hiltontj · 2024-04-17T20:46:35Z

Added a new compact sub-command to the load generator. This provides a means to test compaction performance for a variety of parameters:

--num-tags: number of tags in generated measurement data
--num-rows: number of rows per file generated
--cardinality: the cardinality, i.e., total unique combinations of tags
--num-input-files: the number of files that will be generated, and fed into the compaction routine
--series-id: use a data model with the _series_id column, or not
--num-threads: number of threads to give the iox_query::Executor that performs the compaction

It works by generating a set of parquet files with the given number of rows, each row having the given number of tags, and resulting in the given cardinality. Each generated file is sorted by the primary key of the data model under test, i.e., with or without _series_id column.

It then streams the generated files through the iox_query::ReorgPlanner::compact routine to compact the the generated data and save the output in a new set of files. The new set of files will be a re-shuffle of the input, so that there will be the same number of output files as was inputted, but each will have reduced cardinality, due to the sorting during compaction.

hiltontj · 2024-05-07T15:20:30Z

Closing after we decided to move on from _series_id. See #24815

feat: add a compaction test to the load generator

3df0720

hiltontj added v3 epic/perf-prototyping labels Apr 17, 2024

hiltontj self-assigned this Apr 17, 2024

hiltontj added 4 commits April 18, 2024 11:53

refactor: have source files be sorted

9b87398

feat: save compact results to csv file

58c7684

feat: add command runner to execute multiple compact runs

d6d44d2

chore: indentation on justfile

f118309

hiltontj mentioned this pull request Apr 19, 2024

Benchmark sort/deduplicate on some data sets with 1-15 tags and with different cardinalities #24919

Closed

hiltontj added 2 commits April 19, 2024 14:17

feat: track memory usage in compaction test

6fbc02d

refactor: for better memory usage

5bbabbf

hiltontj closed this May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add a compaction test to the load generator #24925

feat: add a compaction test to the load generator #24925

hiltontj commented Apr 17, 2024 •

edited

hiltontj commented May 7, 2024

feat: add a compaction test to the load generator #24925

feat: add a compaction test to the load generator #24925

Conversation

hiltontj commented Apr 17, 2024 • edited

hiltontj commented May 7, 2024

hiltontj commented Apr 17, 2024 •

edited