feat: add a compaction test to the load generator #24925
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Part of #24919
Added a new
compact
sub-command to the load generator. This provides a means to test compaction performance for a variety of parameters:--num-tags
: number of tags in generated measurement data--num-rows
: number of rows per file generated--cardinality
: the cardinality, i.e., total unique combinations of tags--num-input-files
: the number of files that will be generated, and fed into the compaction routine--series-id
: use a data model with the_series_id
column, or not--num-threads
: number of threads to give theiox_query::Executor
that performs the compactionIt works by generating a set of parquet files with the given number of rows, each row having the given number of tags, and resulting in the given cardinality. Each generated file is sorted by the primary key of the data model under test, i.e., with or without
_series_id
column.It then streams the generated files through the
iox_query::ReorgPlanner::compact
routine to compact the the generated data and save the output in a new set of files. The new set of files will be a re-shuffle of the input, so that there will be the same number of output files as was inputted, but each will have reduced cardinality, due to the sorting during compaction.