-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax VersionStorageInfo::GetOverlappingInputs check #4050
Relax VersionStorageInfo::GetOverlappingInputs check #4050
Conversation
I'm not seeing why that invariant (treating all sstables which overlap a range tombstone as a single unit) is important. The only test which fails with this PR is the one you added in #2769, but that test is simply checking for the invariant you describe, not some problematic outcome of violating the invariant. If we don't treat the sstables as a single unit, then a range tombstone can exist in multiple levels, though I'm not seeing why that is problematic as compaction already needs to deal with the same range tombstone appearing in multiple adjacent sstables. What am I missing? |
Cc @benesch who has been thinking about range tombstones a lot. |
Hm, looking at it more there might've been some circular reasoning behind that assumption :/. It was like:
|
In summary I think a prerequisite to this PR would be shortening the range tombstones in the meta-block. Besides that I can't think of any problem. |
I think nothing prevents it. We just didn't need to do it before because the compaction picking always picked all files that a range tombstone spanned. |
Roger. I'll work on a PR to trim range tombstones to sstable boundaries and then revisit this PR. Also, thanks for the explanation of why the current invariant that all of the sstables which overlap a range tombstone must be treated as a single unit. That was not at all obvious. |
When adding range tombstones to a RangeDelAggregator, truncate the tombstones to the sstable boundaries. This is done as an alternative to truncating the range tombstones within the sstables themselves as it properly handles existing data where range tombstones are not truncated in sstables. Truncating range tombstones to sstable boundaries avoids having two process all of the sstables with overlapping tombstones as a unit. See facebook#4050
6af5b58
to
a1ea122
Compare
Depends on #4081. |
When adding range tombstones to a RangeDelAggregator, truncate the tombstones to the sstable boundaries. This is done as an alternative to truncating the range tombstones within the sstables themselves as it properly handles existing data where range tombstones are not truncated in sstables. Truncating range tombstones to sstable boundaries avoids having two process all of the sstables with overlapping tombstones as a unit. See facebook#4050
When adding range tombstones to a RangeDelAggregator, truncate the tombstones to the sstable boundaries. This is done as an alternative to truncating the range tombstones within the sstables themselves as it properly handles existing data where range tombstones are not truncated in sstables. Truncating range tombstones to sstable boundaries avoids having two process all of the sstables with overlapping tombstones as a unit. See facebook#4050
a1ea122
to
ac857ae
Compare
// be careful to use SerializeEndKey(), allocates new memory | ||
InternalKey SerializeEndKey() const { | ||
return InternalKey(end_key_, seq_, kTypeRangeDeletion); | ||
return InternalKey(end_key_, kMaxSequenceNumber, kTypeRangeDeletion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr This was a fun bug to track down. The only usage of SerializeEndKey
is from RangeDelAggregator::AddToBuilder
. The last sstable output by a compaction could have it's largest key set to RangeTombstone::SerializeEndKey()
, but the tombstone end-key is exclusive so that could cause the sstable to overlap with the next sstable. I need to think through whether this was a bug prior to this PR or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh got it... I suspect it wasn't a problem before this PR. In the compaction you described, the "next sstable" should've been pulled into a single compaction due to key-range overlap. The single compaction should be able to output files with non-overlapping key-ranges.
ac857ae
to
d1d5935
Compare
d1d5935
to
2ae3dec
Compare
When adding range tombstones to a RangeDelAggregator, truncate the tombstones to the sstable boundaries. This is done as an alternative to truncating the range tombstones within the sstables themselves as it properly handles existing data where range tombstones are not truncated in sstables. Truncating range tombstones to sstable boundaries avoids having two process all of the sstables with overlapping tombstones as a unit. See facebook#4050
When adding range tombstones to a RangeDelAggregator, truncate the tombstones to the sstable boundaries. This is done as an alternative to truncating the range tombstones within the sstables themselves as it properly handles existing data where range tombstones are not truncated in sstables. Truncating range tombstones to sstable boundaries avoids having two process all of the sstables with overlapping tombstones as a unit. See facebook#4050
When adding range tombstones to a RangeDelAggregator, truncate the tombstones to the sstable boundaries. This is done as an alternative to truncating the range tombstones within the sstables themselves as it properly handles existing data where range tombstones are not truncated in sstables. Truncating range tombstones to sstable boundaries avoids having two process all of the sstables with overlapping tombstones as a unit. See facebook#4050
2ae3dec
to
fd766ca
Compare
Do not consider the range tombstone sentinel key as causing 2 adjacent sstables in a level to overlap. When a range tombstone's end key is the largest key in an sstable, the sstable's end key is set to a "sentinel" value that is the smallest key in the next sstable with a sequence number of kMaxSequenceNumber. This "sentinel" is guaranteed to not overlap in internal-key space with the next sstable. Unfortunately, GetOverlappingFiles uses user-keys to determine overlap and was thus considering 2 adjacent sstables in a level to overlap if they were separated by this sentinel key. This in turn would cause compactions to be larger than necessary. This previous behavior of GetOverlappingInputs was necessary due to the following scenario: * Write a delete range [a, d). * After compaction, this deleted range may be added to multiple sst files: a.sst, b.sst, c.sst, but the boundaries of these sst files are [a, b), [b, c), [c, d). * When a.sst and b.sst reach the bottommost level, the delete range of the sst files will be dropped. * Write a new key in the range [a, c). * When the newly written key [a, c) reaches the bottommost level, its sequence number will be set to zero. * When the front c.sst compacts with the key in the range [a, c) the sequence number of that key is zero, and the key will be incorrectly dropped. That scenario no longer occurs because we are truncating range deletion tombstones to sstable boundaries when adding them to RangeDelAggregator.
fd766ca
to
7363c90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
@@ -946,22 +949,111 @@ TEST_F(DBRangeDelTest, CompactionTreatsSplitInputLevelDeletionAtomically) { | |||
if (i == 0) { | |||
ASSERT_OK(db_->CompactFiles( | |||
CompactionOptions(), {meta.levels[1].files[0].name}, 2 /* level */)); | |||
ASSERT_EQ(0, NumTableFilesAtLevel(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why does CompactFiles
pull in all the adjacent files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to. In the test, we tell CompactFiles
to compact sstable 17. CompactionPicker::SanitizeCompactionInputFilesForAllLevels
pulls in sstables 20, 19, 18, and 7 as well. Looks like that method has logic to expand the input file set similar to VersionSet::GetOverlappingInputs
. Probably worthwhile to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can follow up on that sometime, though I'll land this part now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Glad to see this get merged.
// be careful to use SerializeEndKey(), allocates new memory | ||
InternalKey SerializeEndKey() const { | ||
return InternalKey(end_key_, seq_, kTypeRangeDeletion); | ||
return InternalKey(end_key_, kMaxSequenceNumber, kTypeRangeDeletion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh got it... I suspect it wasn't a problem before this PR. In the compaction you described, the "next sstable" should've been pulled into a single compaction due to key-range overlap. The single compaction should be able to output files with non-overlapping key-ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Do not consider the range tombstone sentinel key as causing 2 adjacent sstables in a level to overlap. When a range tombstone's end key is the largest key in an sstable, the sstable's end key is so to a "sentinel" value that is the smallest key in the next sstable with a sequence number of kMaxSequenceNumber. This "sentinel" is guaranteed to not overlap in internal-key space with the next sstable. Unfortunately, GetOverlappingFiles uses user-keys to determine overlap and was thus considering 2 adjacent sstables in a level to overlap if they were separated by this sentinel key. This in turn would cause compactions to be larger than necessary. Note that this conflicts with facebook#2769 and cases `DBRangeDelTest.CompactionTreatsSplitInputLevelDeletionAtomically` to fail. Pull Request resolved: facebook#4050 Differential Revision: D8844423 Pulled By: ajkr fbshipit-source-id: df3f9f1db8f4cff2bff77376b98b83c2ae1d155b
Do not consider the range tombstone sentinel key as causing 2 adjacent
sstables in a level to overlap. When a range tombstone's end key is the
largest key in an sstable, the sstable's end key is so to a "sentinel"
value that is the smallest key in the next sstable with a sequence
number of kMaxSequenceNumber. This "sentinel" is guaranteed to not
overlap in internal-key space with the next sstable. Unfortunately,
GetOverlappingFiles uses user-keys to determine overlap and was thus
considering 2 adjacent sstables in a level to overlap if they were
separated by this sentinel key. This in turn would cause compactions to
be larger than necessary.
Note that this conflicts with
#2769 and cases
DBRangeDelTest.CompactionTreatsSplitInputLevelDeletionAtomically
tofail.