Skip to content

Preserve tombstones for allow_ingest_behind#13807

Closed
cbi42 wants to merge 2 commits intofacebook:mainfrom
cbi42:ingest-behind-cf
Closed

Preserve tombstones for allow_ingest_behind#13807
cbi42 wants to merge 2 commits intofacebook:mainfrom
cbi42:ingest-behind-cf

Conversation

@cbi42
Copy link
Contributor

@cbi42 cbi42 commented Jul 25, 2025

Summary: Preserve tombstone when allow_ingest_behind` is enabled so that they can be applied to ingested files. This can be useful when users use ingest_behind to buffer updates where Deletion needs to be preserved. This fixes #13571.

Test plan: updated a unit test to verify that tombstones are not dropped during compaction.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this in D79016109.

@cbi42 cbi42 requested review from hx235 and pdillinger July 28, 2025 16:50
Comment on lines +439 to +440
// compaction rules. This is an optimization for outputting a put after
// a single delete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify a bit more on why to output a put after a single delete instead of dropping them together, "....This is an optimization for outputting a put after outputting a single delete because there's an earlier snapshot preventing the put and single delete to be dropped together. ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reference to "Optimization 3"

Comment on lines +420 to +422
// Stores whether current_user_key_ is valid. If so, it stores the user key
// of the last key seen by the iterator.
// If false, treat the next key to read as a new user key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Describes whether current_user_key_ is valid. If true, the current_user_key_ stores the user key of the last key seen by the iterator....

Comment on lines +892 to +895
// example: from new to old: SingleDelete, PUT, SingleDelete, PUT
// (ingested behind). If the older SingleDelete is dropped due to being
// covered by PUT, the PUT can be then compacted away with the new
// SingleDelete. The older PUT then incorrectly becomes visible.
Copy link
Contributor

@hx235 hx235 Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N00b to SD here has a few questions to ensure comments are accessible to future n00bs:
Assuming from new to old: SingleDelete2, PUT2, SingleDelete1, PUT1, (ingested behind)
(1) Under rule (A), "If the older SingleDelete is dropped due to being covered by PUT,"

  • Do you mean being covered by the PUT2 or PUT1?
    (2) "Then the PUT can be then compacted away with the new SingleDelete"
  • By "PUT", do you mean PUT2 or PUT1?
  • Can you be more explicit in why, otherwise (i.e, the current behavior), the PUT can't be compacted away?

Copy link
Contributor

@hx235 hx235 Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand after checking in offline: there can be compaction trying to compact PUT2 and SingleDelete1 even though SingleDelete1 is for PUT1. This is what "the older SingleDelete is dropped due to being covered by PUT" meant.

And then another compaction can compact and drop both SingleDelete2, PUT2. This is what "the PUT can be then compacted away with the new SingleDelete." meant.

@cbi42 If you can specify a bit more into these two compaction events and distinguish the "old PUT" and the "new PUT" in your comment, that will be perfect!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more clarification.

// comparison, so the value of has_current_user_key does not matter.
has_current_user_key_ = false;
if (compaction_ != nullptr &&
if (compaction_ != nullptr && !compaction_->allow_ingest_behind() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forikey_.type == kTypeValuePreferredSeqno, it also checks for compaction_->KeyNotExistsBeyondOutputLevel().

Do you think it will encounter the same problem as deletion if the ikey_ is swapped with a lower seqno like 0 and collide with the later ingested-behind data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I think it's possible that we swap the sequence number to zero:

return kUnknownSeqnoBeforeAll;
. Added comment here and in option comment noting that TimedPut can fail ingestion behind.

@hx235
Copy link
Contributor

hx235 commented Jul 28, 2025

Looking good so far - will look at the test tomorrow. I need a little bit help in understanding the SingleDelete nuances as commented above. @cbi42

Comment on lines +2466 to +2470
ASSERT_OK(Delete(Key(7)));
ASSERT_OK(SingleDelete(Key(8)));
ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr));
ASSERT_EQ(Get(Key(8)), "NOT_FOUND");
ASSERT_EQ(Get(Key(7)), "NOT_FOUND");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be helpful to add some comment to clarify the intention of Delete(Key(7)) and SingleDelete(Key(8)) like "// Test that SingleDelte overwritten by Put is not dropped" though the verification happens further down in the test.

I was slightly confused and mistakenly thought the Delete and SingleDelete were here to delete some existing keys and you verified this by ASSERT_EQ(Get(Key(8)), "NOT_FOUND"); and ASSERT_EQ(Get(Key(7)), "NOT_FOUND");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed ASSERT_EQ(*, "NOT_FOUND") since they will be true regardless of the Delete operation. Added comment about how they are verified.

Copy link
Contributor

@hx235 hx235 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@cbi42 cbi42 force-pushed the ingest-behind-cf branch from d81b342 to 4083790 Compare July 30, 2025 17:18
Copy link
Contributor Author

@cbi42 cbi42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Comment on lines +439 to +440
// compaction rules. This is an optimization for outputting a put after
// a single delete.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reference to "Optimization 3"

Comment on lines +2466 to +2470
ASSERT_OK(Delete(Key(7)));
ASSERT_OK(SingleDelete(Key(8)));
ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr));
ASSERT_EQ(Get(Key(8)), "NOT_FOUND");
ASSERT_EQ(Get(Key(7)), "NOT_FOUND");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed ASSERT_EQ(*, "NOT_FOUND") since they will be true regardless of the Delete operation. Added comment about how they are verified.

// comparison, so the value of has_current_user_key does not matter.
has_current_user_key_ = false;
if (compaction_ != nullptr &&
if (compaction_ != nullptr && !compaction_->allow_ingest_behind() &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I think it's possible that we swap the sequence number to zero:

return kUnknownSeqnoBeforeAll;
. Added comment here and in option comment noting that TimedPut can fail ingestion behind.

Comment on lines +892 to +895
// example: from new to old: SingleDelete, PUT, SingleDelete, PUT
// (ingested behind). If the older SingleDelete is dropped due to being
// covered by PUT, the PUT can be then compacted away with the new
// SingleDelete. The older PUT then incorrectly becomes visible.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more clarification.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this in D79016109.

@facebook-github-bot
Copy link
Contributor

@cbi42 merged this pull request in e7a4505.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

allow_ingest_behind can drop tombstones

3 participants