Preserve tombstones for allow_ingest_behind#13807
Preserve tombstones for allow_ingest_behind#13807cbi42 wants to merge 2 commits intofacebook:mainfrom
allow_ingest_behind#13807Conversation
db/compaction/compaction_iterator.h
Outdated
| // compaction rules. This is an optimization for outputting a put after | ||
| // a single delete. |
There was a problem hiding this comment.
Just to clarify a bit more on why to output a put after a single delete instead of dropping them together, "....This is an optimization for outputting a put after outputting a single delete because there's an earlier snapshot preventing the put and single delete to be dropped together. ""
db/compaction/compaction_iterator.h
Outdated
| // Stores whether current_user_key_ is valid. If so, it stores the user key | ||
| // of the last key seen by the iterator. | ||
| // If false, treat the next key to read as a new user key. |
There was a problem hiding this comment.
nit: Describes whether current_user_key_ is valid. If true, the current_user_key_ stores the user key of the last key seen by the iterator....
db/compaction/compaction_iterator.cc
Outdated
| // example: from new to old: SingleDelete, PUT, SingleDelete, PUT | ||
| // (ingested behind). If the older SingleDelete is dropped due to being | ||
| // covered by PUT, the PUT can be then compacted away with the new | ||
| // SingleDelete. The older PUT then incorrectly becomes visible. |
There was a problem hiding this comment.
N00b to SD here has a few questions to ensure comments are accessible to future n00bs:
Assuming from new to old: SingleDelete2, PUT2, SingleDelete1, PUT1, (ingested behind)
(1) Under rule (A), "If the older SingleDelete is dropped due to being covered by PUT,"
- Do you mean being covered by the PUT2 or PUT1?
(2) "Then the PUT can be then compacted away with the new SingleDelete" - By "PUT", do you mean PUT2 or PUT1?
- Can you be more explicit in why, otherwise (i.e, the current behavior), the PUT can't be compacted away?
There was a problem hiding this comment.
Now I understand after checking in offline: there can be compaction trying to compact PUT2 and SingleDelete1 even though SingleDelete1 is for PUT1. This is what "the older SingleDelete is dropped due to being covered by PUT" meant.
And then another compaction can compact and drop both SingleDelete2, PUT2. This is what "the PUT can be then compacted away with the new SingleDelete." meant.
@cbi42 If you can specify a bit more into these two compaction events and distinguish the "old PUT" and the "new PUT" in your comment, that will be perfect!
There was a problem hiding this comment.
Added more clarification.
| // comparison, so the value of has_current_user_key does not matter. | ||
| has_current_user_key_ = false; | ||
| if (compaction_ != nullptr && | ||
| if (compaction_ != nullptr && !compaction_->allow_ingest_behind() && |
There was a problem hiding this comment.
Forikey_.type == kTypeValuePreferredSeqno, it also checks for compaction_->KeyNotExistsBeyondOutputLevel().
Do you think it will encounter the same problem as deletion if the ikey_ is swapped with a lower seqno like 0 and collide with the later ingested-behind data?
There was a problem hiding this comment.
Good catch, I think it's possible that we swap the sequence number to zero:
rocksdb/db/seqno_to_time_mapping.cc
Line 65 in 5435032
|
Looking good so far - will look at the test tomorrow. I need a little bit help in understanding the SingleDelete nuances as commented above. @cbi42 |
db/external_sst_file_test.cc
Outdated
| ASSERT_OK(Delete(Key(7))); | ||
| ASSERT_OK(SingleDelete(Key(8))); | ||
| ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr)); | ||
| ASSERT_EQ(Get(Key(8)), "NOT_FOUND"); | ||
| ASSERT_EQ(Get(Key(7)), "NOT_FOUND"); |
There was a problem hiding this comment.
nit: might be helpful to add some comment to clarify the intention of Delete(Key(7)) and SingleDelete(Key(8)) like "// Test that SingleDelte overwritten by Put is not dropped" though the verification happens further down in the test.
I was slightly confused and mistakenly thought the Delete and SingleDelete were here to delete some existing keys and you verified this by ASSERT_EQ(Get(Key(8)), "NOT_FOUND"); and ASSERT_EQ(Get(Key(7)), "NOT_FOUND");
There was a problem hiding this comment.
Removed ASSERT_EQ(*, "NOT_FOUND") since they will be true regardless of the Delete operation. Added comment about how they are verified.
cbi42
left a comment
There was a problem hiding this comment.
Thanks for the review!
db/compaction/compaction_iterator.h
Outdated
| // compaction rules. This is an optimization for outputting a put after | ||
| // a single delete. |
db/external_sst_file_test.cc
Outdated
| ASSERT_OK(Delete(Key(7))); | ||
| ASSERT_OK(SingleDelete(Key(8))); | ||
| ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr)); | ||
| ASSERT_EQ(Get(Key(8)), "NOT_FOUND"); | ||
| ASSERT_EQ(Get(Key(7)), "NOT_FOUND"); |
There was a problem hiding this comment.
Removed ASSERT_EQ(*, "NOT_FOUND") since they will be true regardless of the Delete operation. Added comment about how they are verified.
| // comparison, so the value of has_current_user_key does not matter. | ||
| has_current_user_key_ = false; | ||
| if (compaction_ != nullptr && | ||
| if (compaction_ != nullptr && !compaction_->allow_ingest_behind() && |
There was a problem hiding this comment.
Good catch, I think it's possible that we swap the sequence number to zero:
rocksdb/db/seqno_to_time_mapping.cc
Line 65 in 5435032
db/compaction/compaction_iterator.cc
Outdated
| // example: from new to old: SingleDelete, PUT, SingleDelete, PUT | ||
| // (ingested behind). If the older SingleDelete is dropped due to being | ||
| // covered by PUT, the PUT can be then compacted away with the new | ||
| // SingleDelete. The older PUT then incorrectly becomes visible. |
There was a problem hiding this comment.
Added more clarification.
Summary: Preserve tombstone when allow_ingest_behind` is enabled so that they can be applied to ingested files. This can be useful when users use ingest_behind to buffer updates where Deletion needs to be preserved. This fixes #13571.
Test plan: updated a unit test to verify that tombstones are not dropped during compaction.