-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align compaction output file boundaries to the next level ones #10655
Conversation
5209a1b
to
cba67dc
Compare
@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, the grandparent logic is very readable. Added some comments/questions. Will look at tests next.
9fafbb2
to
96a6cba
Compare
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
4 similar comments
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
db/compaction/compaction_outputs.cc
Outdated
const Slice& internal_key = c_iter.key(); | ||
// no need to have complicated cut logic for non level compaction | ||
if (compaction_->immutable_options()->compaction_style != | ||
kCompactionStyleLevel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ongoing project of incremental universal compaction project, we rely on original line 134 to 139 to align with next level to reduce recomputing in those incremental runs. Can we keep the logic for universal compaction too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm removing it.
db/compaction/compaction_outputs.cc
Outdated
// etc. | ||
const Slice& internal_key = c_iter.key(); | ||
const uint64_t previous_overlapped_bytes = grandparent_overlapped_bytes_; | ||
size_t switched_num = UpdateGrandparentBoundaryInfo(internal_key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps pick a better variable name. Perhaps more straight forward ones like num_parent_boundaries_in_gap
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to num_grandparent_boundaries_crossed
.
db/compaction/compaction_outputs.cc
Outdated
// without any overlap, it may likely happen. | ||
// More details, check PR #1963 | ||
const size_t skippable_file_switched_num = | ||
size_t{3} - being_grandparent_gap_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit confused how 3 - bool
means here. Is there a way to write the code to be more intuitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to
const size_t num_skippable_boundaries_crossed = being_grandparent_gap_ ? 2 : 3;
also consistent with the num_grandparent_boundaries_crossed
.
db/compaction/compaction_outputs.cc
Outdated
current_output_file_size_ > | ||
(50 + | ||
std::min(grandparent_boundary_switched_num_ * 5, size_t{40})) * | ||
compaction_->target_output_file_size() / 100) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a potential chance of overflow, considering user can set target file size to arbitrary level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it would be better to do the / 100
first. Updated.
ec293d4
to
ec10214
Compare
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
ec10214
to
8955eef
Compare
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general approach looks good and I will defer to @cbi42 for a thorough review.
db/compaction/compaction_outputs.cc
Outdated
// More details, check PR #1963 | ||
const size_t num_skippable_boundaries_crossed = | ||
being_grandparent_gap_ ? 2 : 3; | ||
if (compaction_->immutable_options()->level_compaction_dynamic_file_size && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we probably would check leveled compaction too before dynamic level, as in universal compaction, users can still set level_compaction_dynamic_file_size=true although the expectation is that it is no-op. This check can probably be combined with line 236.
This change seems like it can have a big impact but the commit message content is too brief. Please include:
|
8955eef
to
67773a8
Compare
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM once we add more detail to the commit message as Mark suggested above.
67773a8
to
c5798a5
Compare
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR, thanks!
if (compaction_->immutable_options()->compaction_style == | ||
kCompactionStyleLevel && | ||
compaction_->immutable_options()->level_compaction_dynamic_file_size && | ||
num_grandparent_boundaries_crossed >= | ||
num_skippable_boundaries_crossed && | ||
grandparent_overlapped_bytes_ - previous_overlapped_bytes > | ||
compaction_->target_output_file_size() / 8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a minimum on current_output_file_size_
?
if (compaction_->immutable_options()->compaction_style == | ||
kCompactionStyleLevel && | ||
compaction_->immutable_options()->level_compaction_dynamic_file_size && | ||
current_output_file_size_ > | ||
((compaction_->target_output_file_size() + 99) / 100) * | ||
(50 + std::min(grandparent_boundary_switched_num_ * 5, | ||
size_t{40}))) { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the threshold be higher when being_grandparent_gap_ == true
? It feels less urgent in that case.
Summary: The option is introduced in #10655 to allow reverting to old behavior. The option is enabled by default and there has not been a need to disable it. Remove it for 9.0 release. Also fixed and improved a few unit tests that depended on setting this option to false. Pull Request resolved: #12325 Test Plan: existing tests. Reviewed By: hx235 Differential Revision: D53369430 Pulled By: cbi42 fbshipit-source-id: 0ec2440ca8d88db7f7211c581542c7581bd4d3de
Try to align the compaction output file boundaries to the next level ones
(grandparent level), to reduce the level compaction write-amplification.
In level compaction, there are "wasted" data at the beginning and end of the
output level files. Align the file boundary can avoid such "wasted" compaction.
With this PR, it tries to align the non-bottommost level file boundaries to its
next level ones. It may cut file when the file size is large enough (at least
50% of target_file_size) and not too large (2x target_file_size).
db_bench shows about 12.56% compaction reduction:
The compaction simulator shows a similar result (14% with 100G random data).
As a side effect, with this PR, the SST file size can exceed the
target_file_size, but is capped at 2x target_file_size. And there will be
smaller files. Here are file size statistics when loading 100GB with the target
file size 32MB:
The feature is enabled by default, to revert to the old behavior disable it
with
AdvancedColumnFamilyOptions.level_compaction_dynamic_file_size = false
Also includes #1963 to cut file before skippable grandparent file. Which is for
use case like user adding 2 or more non-overlapping data range at the same
time, it can reduce the overlapping of 2 datasets in the lower levels.