Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align compaction output file boundaries to the next level ones #10655

Closed
wants to merge 10 commits into from

Conversation

jay-zhuang
Copy link
Contributor

@jay-zhuang jay-zhuang commented Sep 9, 2022

Try to align the compaction output file boundaries to the next level ones
(grandparent level), to reduce the level compaction write-amplification.

In level compaction, there are "wasted" data at the beginning and end of the
output level files. Align the file boundary can avoid such "wasted" compaction.
With this PR, it tries to align the non-bottommost level file boundaries to its
next level ones. It may cut file when the file size is large enough (at least
50% of target_file_size) and not too large (2x target_file_size).

db_bench shows about 12.56% compaction reduction:

TEST_TMPDIR=/data/dbbench2 ./db_bench --benchmarks=fillrandom,readrandom -max_background_jobs=12 -num=400000000 -target_file_size_base=33554432

# baseline:
Flush(GB): cumulative 25.882, interval 7.216
Cumulative compaction: 285.90 GB write, 162.36 MB/s write, 269.68 GB read, 153.15 MB/s read, 2926.7 seconds

# with this change:
Flush(GB): cumulative 25.882, interval 7.753
Cumulative compaction: 249.97 GB write, 141.96 MB/s write, 233.74 GB read, 132.74 MB/s read, 2534.9 seconds

The compaction simulator shows a similar result (14% with 100G random data).
As a side effect, with this PR, the SST file size can exceed the
target_file_size, but is capped at 2x target_file_size. And there will be
smaller files. Here are file size statistics when loading 100GB with the target
file size 32MB:

          baseline      this_PR
count  1.656000e+03  1.705000e+03
mean   3.116062e+07  3.028076e+07
std    7.145242e+06  8.046139e+06

The feature is enabled by default, to revert to the old behavior disable it
with AdvancedColumnFamilyOptions.level_compaction_dynamic_file_size = false

Also includes #1963 to cut file before skippable grandparent file. Which is for
use case like user adding 2 or more non-overlapping data range at the same
time, it can reduce the overlapping of 2 datasets in the lower levels.

@jay-zhuang jay-zhuang changed the base branch from zjay_will_del_comp_precut to main September 15, 2022 18:45
@jay-zhuang jay-zhuang marked this pull request as ready for review September 15, 2022 18:45
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Member

@cbi42 cbi42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, the grandparent logic is very readable. Added some comments/questions. Will look at tests next.

db/compaction/compaction.cc Outdated Show resolved Hide resolved
db/compaction/compaction_outputs.cc Outdated Show resolved Hide resolved
db/compaction/compaction_outputs.cc Show resolved Hide resolved
db/compaction/compaction_outputs.h Show resolved Hide resolved
db/compaction/compaction_outputs.cc Outdated Show resolved Hide resolved
db/compaction/compaction_outputs.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

4 similar comments
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

const Slice& internal_key = c_iter.key();
// no need to have complicated cut logic for non level compaction
if (compaction_->immutable_options()->compaction_style !=
kCompactionStyleLevel) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ongoing project of incremental universal compaction project, we rely on original line 134 to 139 to align with next level to reduce recomputing in those incremental runs. Can we keep the logic for universal compaction too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm removing it.

// etc.
const Slice& internal_key = c_iter.key();
const uint64_t previous_overlapped_bytes = grandparent_overlapped_bytes_;
size_t switched_num = UpdateGrandparentBoundaryInfo(internal_key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps pick a better variable name. Perhaps more straight forward ones like num_parent_boundaries_in_gap or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to num_grandparent_boundaries_crossed.

// without any overlap, it may likely happen.
// More details, check PR #1963
const size_t skippable_file_switched_num =
size_t{3} - being_grandparent_gap_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit confused how 3 - bool means here. Is there a way to write the code to be more intuitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to

const size_t num_skippable_boundaries_crossed = being_grandparent_gap_ ? 2 : 3;

also consistent with the num_grandparent_boundaries_crossed.

current_output_file_size_ >
(50 +
std::min(grandparent_boundary_switched_num_ * 5, size_t{40})) *
compaction_->target_output_file_size() / 100) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a potential chance of overflow, considering user can set target file size to arbitrary level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it would be better to do the / 100 first. Updated.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general approach looks good and I will defer to @cbi42 for a thorough review.

// More details, check PR #1963
const size_t num_skippable_boundaries_crossed =
being_grandparent_gap_ ? 2 : 3;
if (compaction_->immutable_options()->level_compaction_dynamic_file_size &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we probably would check leveled compaction too before dynamic level, as in universal compaction, users can still set level_compaction_dynamic_file_size=true although the expectation is that it is no-op. This check can probably be combined with line 236.

db/compaction/compaction_outputs.cc Show resolved Hide resolved
db/db_range_del_test.cc Show resolved Hide resolved
db/db_test2.cc Outdated Show resolved Hide resolved
db/compaction/compaction_job_test.cc Outdated Show resolved Hide resolved
db/compaction/compaction_job_test.cc Show resolved Hide resolved
db/compaction/compaction_job_test.cc Outdated Show resolved Hide resolved
db/compaction/compaction.cc Outdated Show resolved Hide resolved
@mdcallag
Copy link
Contributor

This change seems like it can have a big impact but the commit message content is too brief. Please include:

  • a few more sentences about what was changed
  • more detail on the perf test
  • a description of whether or not this behavior depends on options. From the files I see there is a new option, that option should be explained in the commit message

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Member

@cbi42 cbi42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM once we add more detail to the commit message as Mark suggested above.

@jay-zhuang jay-zhuang changed the title Improves compaction output file cut to reduce WA Align compaction output file boundaries to the next level ones Sep 30, 2022
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR, thanks!

Comment on lines +216 to +222
if (compaction_->immutable_options()->compaction_style ==
kCompactionStyleLevel &&
compaction_->immutable_options()->level_compaction_dynamic_file_size &&
num_grandparent_boundaries_crossed >=
num_skippable_boundaries_crossed &&
grandparent_overlapped_bytes_ - previous_overlapped_bytes >
compaction_->target_output_file_size() / 8) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a minimum on current_output_file_size_?

Comment on lines +238 to +246
if (compaction_->immutable_options()->compaction_style ==
kCompactionStyleLevel &&
compaction_->immutable_options()->level_compaction_dynamic_file_size &&
current_output_file_size_ >
((compaction_->target_output_file_size() + 99) / 100) *
(50 + std::min(grandparent_boundary_switched_num_ * 5,
size_t{40}))) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the threshold be higher when being_grandparent_gap_ == true? It feels less urgent in that case.

facebook-github-bot pushed a commit that referenced this pull request Feb 2, 2024
Summary:
The option is introduced in #10655 to allow reverting to old behavior. The option is enabled by default and there has not been a need to disable it. Remove it for 9.0 release. Also fixed and improved a few unit tests that depended on setting this option to false.

Pull Request resolved: #12325

Test Plan: existing tests.

Reviewed By: hx235

Differential Revision: D53369430

Pulled By: cbi42

fbshipit-source-id: 0ec2440ca8d88db7f7211c581542c7581bd4d3de
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants