Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero seqnum of final key / drop final tombstone when compacting to bottommost level #4927

Closed
wants to merge 1 commit into from

Conversation

mzhaom
Copy link

@mzhaom mzhaom commented Jan 28, 2019

No description provided.

@mzhaom
Copy link
Author

mzhaom commented Jan 28, 2019

Tested with:
make all check -j32
I got two segfaults, but I don't think that are relevant to this change.

: line 1: 552422 Aborted (core dumped) ./env_timed_test &> t/log-env_timed_test
ETA: 375s Left: 1556 AVG: 0.25s local:24/114/100%/0.2s /bin/bash: line 1: 630729 Segmentation fault (core dumped) ./partitioned_filter_block_test &> t/log-partitioned_filter_block_test

All finished running ./db_bench successfully.

@mzhaom
Copy link
Author

mzhaom commented Jan 28, 2019

And I'm surprised that ZeroOutSequenceAtBottomLevel doesn't need to be updated to pass, suggestion will be appreciated.

@ajkr
Copy link
Contributor

ajkr commented Jan 28, 2019

I guess it's because the compaction is faking an end key that is larger than any key in that test case: https://github.com/facebook/rocksdb/blob/master/db/compaction_iterator_test.cc#L171-L173.

@ajkr
Copy link
Contributor

ajkr commented Jan 28, 2019

Do you have an "Import to Phabricator" button on this page? If so, you can click it to sync this to an internal diff on Phabricator where internal CI (sandcastle) tests can run. Once they pass and this is accepted, you can click "Land to master".

@ajkr
Copy link
Contributor

ajkr commented Jan 28, 2019

Note for myself: I verified end key in subcompaction boundaries is exclusive so we do not need to worry about multiple key versions for the same user key being written by multiple subcompactions.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for contributing to RocksDB!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzhaom has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -2028,7 +2028,6 @@ void VersionStorageInfo::ComputeBottommostFilesMarkedForCompaction() {
bottommost_files_mark_threshold_ = kMaxSequenceNumber;
for (auto& level_and_file : bottommost_files_) {
if (!level_and_file.second->being_compacted &&
level_and_file.second->fd.largest_seqno != 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally level_and_file.second->fd.largest_seqno != 0 tells us that a file has at least one of these properties:

  • contains overwritten keys
  • contains deleted keys
  • contains range deletions

Prior to your change it definitely didn't tell us that because the compaction's final key seqnum wasn't zeroed out. After your change it might be enough, though we should test (I think I saw once a compaction's first key seqnum not being zeroed... we need to make sure that doesn't happen)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issues with using level_and_file.second->num_deletions > 1 instead are:

  • it doesn't trigger compaction when there's one deletion, which could be a range deletion covering a lot of keys in the file
  • it doesn't trigger compaction when there are many overwritten keys. Some MyRocks use case actually doesn't use Delete but instead calls Put with empty value to "delete" keys, so we'd like to be able to trigger compaction in this case too.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except I'm not so sure about the db/version_set.cc change.. feel free to either revert that part and land the PR, or we can address the concerns in this PR.

@@ -2028,7 +2028,6 @@ void VersionStorageInfo::ComputeBottommostFilesMarkedForCompaction() {
bottommost_files_mark_threshold_ = kMaxSequenceNumber;
for (auto& level_and_file : bottommost_files_) {
if (!level_and_file.second->being_compacted &&
level_and_file.second->fd.largest_seqno != 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issues with using level_and_file.second->num_deletions > 1 instead are:

  • it doesn't trigger compaction when there's one deletion, which could be a range deletion covering a lot of keys in the file
  • it doesn't trigger compaction when there are many overwritten keys. Some MyRocks use case actually doesn't use Delete but instead calls Put with empty value to "delete" keys, so we'd like to be able to trigger compaction in this case too.

@facebook-github-bot
Copy link
Contributor

@mzhaom has updated the pull request. Re-import the pull request

@mzhaom
Copy link
Author

mzhaom commented Jan 31, 2019

Thank you. I reverted the change to version_set.cc so this PR can move forward.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzhaom has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ajkr added a commit to ajkr/rocksdb that referenced this pull request Jun 4, 2021
This is a duplicate of facebook#4948 by @mzhaom to fix tests after rebase.

This change is a follow-up to facebook#4927, which made this possible by allowing tombstone dropping/seqnum zeroing optimizations on the last key in the compaction. Now the `largest_seqno != 0` condition suffices to prevent snapshot release triggered compaction from entering an infinite loop.

The issues caused by the extraneous condition `level_and_file.second->num_deletions > 1` are:

- it doesn't trigger compaction when there's one deletion, which could be a range deletion covering a lot of keys in the file
- it doesn't trigger compaction when there are many overwritten keys. Some MyRocks use case actually doesn't use Delete but instead calls Put with empty value to "delete" keys, so we'd like to be able to trigger compaction in this case too.

Test plan:
 - make check
facebook-github-bot pushed a commit that referenced this pull request Jun 4, 2021
)

Summary:
This is a duplicate of #4948 by mzhaom to fix tests after rebase.

This change is a follow-up to #4927, which made this possible by allowing tombstone dropping/seqnum zeroing optimizations on the last key in the compaction. Now the `largest_seqno != 0` condition suffices to prevent snapshot release triggered compaction from entering an infinite loop.

The issues caused by the extraneous condition `level_and_file.second->num_deletions > 1` are:

- files could have `largest_seqno > 0` forever making it impossible to tell they cannot contain any covering keys
- it doesn't trigger compaction when there are many overwritten keys. Some MyRocks use case actually doesn't use Delete but instead calls Put with empty value to "delete" keys, so we'd like to be able to trigger compaction in this case too.

Pull Request resolved: #8357

Test Plan: - make check

Reviewed By: jay-zhuang

Differential Revision: D28855340

Pulled By: ajkr

fbshipit-source-id: a261b51eecafec492499e6d01e8e43112f801798
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants