-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero seqnum of final key / drop final tombstone when compacting to bottommost level #4927
Conversation
Tested with: : line 1: 552422 Aborted (core dumped) ./env_timed_test &> t/log-env_timed_test All finished running ./db_bench successfully. |
And I'm surprised that ZeroOutSequenceAtBottomLevel doesn't need to be updated to pass, suggestion will be appreciated. |
I guess it's because the compaction is faking an end key that is larger than any key in that test case: https://github.com/facebook/rocksdb/blob/master/db/compaction_iterator_test.cc#L171-L173. |
Do you have an "Import to Phabricator" button on this page? If so, you can click it to sync this to an internal diff on Phabricator where internal CI (sandcastle) tests can run. Once they pass and this is accepted, you can click "Land to master". |
Note for myself: I verified end key in subcompaction boundaries is exclusive so we do not need to worry about multiple key versions for the same user key being written by multiple subcompactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for contributing to RocksDB!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mzhaom has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
db/version_set.cc
Outdated
@@ -2028,7 +2028,6 @@ void VersionStorageInfo::ComputeBottommostFilesMarkedForCompaction() { | |||
bottommost_files_mark_threshold_ = kMaxSequenceNumber; | |||
for (auto& level_and_file : bottommost_files_) { | |||
if (!level_and_file.second->being_compacted && | |||
level_and_file.second->fd.largest_seqno != 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally level_and_file.second->fd.largest_seqno != 0
tells us that a file has at least one of these properties:
- contains overwritten keys
- contains deleted keys
- contains range deletions
Prior to your change it definitely didn't tell us that because the compaction's final key seqnum wasn't zeroed out. After your change it might be enough, though we should test (I think I saw once a compaction's first key seqnum not being zeroed... we need to make sure that doesn't happen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issues with using level_and_file.second->num_deletions > 1
instead are:
- it doesn't trigger compaction when there's one deletion, which could be a range deletion covering a lot of keys in the file
- it doesn't trigger compaction when there are many overwritten keys. Some MyRocks use case actually doesn't use
Delete
but instead callsPut
with empty value to "delete" keys, so we'd like to be able to trigger compaction in this case too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except I'm not so sure about the db/version_set.cc change.. feel free to either revert that part and land the PR, or we can address the concerns in this PR.
db/version_set.cc
Outdated
@@ -2028,7 +2028,6 @@ void VersionStorageInfo::ComputeBottommostFilesMarkedForCompaction() { | |||
bottommost_files_mark_threshold_ = kMaxSequenceNumber; | |||
for (auto& level_and_file : bottommost_files_) { | |||
if (!level_and_file.second->being_compacted && | |||
level_and_file.second->fd.largest_seqno != 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issues with using level_and_file.second->num_deletions > 1
instead are:
- it doesn't trigger compaction when there's one deletion, which could be a range deletion covering a lot of keys in the file
- it doesn't trigger compaction when there are many overwritten keys. Some MyRocks use case actually doesn't use
Delete
but instead callsPut
with empty value to "delete" keys, so we'd like to be able to trigger compaction in this case too.
@mzhaom has updated the pull request. Re-import the pull request |
Thank you. I reverted the change to version_set.cc so this PR can move forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mzhaom has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
This is a duplicate of facebook#4948 by @mzhaom to fix tests after rebase. This change is a follow-up to facebook#4927, which made this possible by allowing tombstone dropping/seqnum zeroing optimizations on the last key in the compaction. Now the `largest_seqno != 0` condition suffices to prevent snapshot release triggered compaction from entering an infinite loop. The issues caused by the extraneous condition `level_and_file.second->num_deletions > 1` are: - it doesn't trigger compaction when there's one deletion, which could be a range deletion covering a lot of keys in the file - it doesn't trigger compaction when there are many overwritten keys. Some MyRocks use case actually doesn't use Delete but instead calls Put with empty value to "delete" keys, so we'd like to be able to trigger compaction in this case too. Test plan: - make check
) Summary: This is a duplicate of #4948 by mzhaom to fix tests after rebase. This change is a follow-up to #4927, which made this possible by allowing tombstone dropping/seqnum zeroing optimizations on the last key in the compaction. Now the `largest_seqno != 0` condition suffices to prevent snapshot release triggered compaction from entering an infinite loop. The issues caused by the extraneous condition `level_and_file.second->num_deletions > 1` are: - files could have `largest_seqno > 0` forever making it impossible to tell they cannot contain any covering keys - it doesn't trigger compaction when there are many overwritten keys. Some MyRocks use case actually doesn't use Delete but instead calls Put with empty value to "delete" keys, so we'd like to be able to trigger compaction in this case too. Pull Request resolved: #8357 Test Plan: - make check Reviewed By: jay-zhuang Differential Revision: D28855340 Pulled By: ajkr fbshipit-source-id: a261b51eecafec492499e6d01e8e43112f801798
No description provided.