Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport][v23.3.x] tx/compaction: don't deduplicate control batches #17100

Merged
merged 1 commit into from
Mar 15, 2024

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented Mar 14, 2024

This is a select backport of commit from #16295

Fixes: #16679

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

Bug Fixes

  • Retains control batches from transactions to preserve transaction boundaries. This prevents some (very unlikely) scenarios where aborted data is read.

Control batches (tx_commit, tx_abort) are special data batches that
indicate the completion of a transaction. Because these are technically
data batches, they participate in deduplication.

It was previously thought that as long as we kept the last control batch
in a segment, that deduplicating such records was safe. However,
consider the following sequence:

node 1:
1. begin tx 1, write data batch in tx 1
2. write many non-tx data batches, up to offset 100
3. abort tx 1
4. begin tx 2, write data batch in tx 2, abort tx 2 which is offset 110
5. begin recovery, replicating data to node 2, sending records [0, 50]
6. roll segment

node 2:
7. receives data [0, 50], including [begin tx 1, data tx 1, data]

node 1:
8. self compact segment: since all abort records have the same key, only
   abort tx 2 is kept
9. continue recovery, replicating data to node 2, sending records in the
   range [50, 110], but is missing abort tx 1

node 2:
10. receives data [50, 110], and now has [begin tx 1, data tx 1, data,
    begin tx 2, data tx 2, abort tx 2]
11. roll segment
12. self compact segment: rm_stm on this node doesn't know that tx 1 is
    considered aborted
13. data tx 1 is left in the segment as committed

If instead, at step 8, we had kept [abort tx 1] as well, this wouldn't
have been an issue: node 2 would have replayed the abort and correctly
compacted away data tx 1.

This commit does this disallowing the deduplication of control batches
entirely. These batches are only used in transactions, and are expected
to be uncommon. This happens in the `is_compactible()` method, which is
used at compaction time to determine whether the entire batch should be
kept without consulting further record-based maps.

I'm explicilty doing it here rather than e.g. filtering control batches
when writing to the compaction index because we'll ultimately need to
filter here anyway: older versions of Redpanda will have these control
records in their compaction index.

(cherry picked from commit 4ce6552)
@bharathv bharathv changed the title tx/compaction: don't deduplicate control batches [backport][v23.3.x] tx/compaction: don't deduplicate control batches Mar 14, 2024
@bharathv bharathv requested a review from andrwng March 14, 2024 19:13
@bharathv bharathv merged commit 90d5066 into redpanda-data:v23.3.x Mar 15, 2024
17 of 19 checks passed
@bharathv bharathv deleted the tx_compact_233x_bp branch March 15, 2024 17:48
@BenPope BenPope added this to the v23.3.8 milestone Mar 15, 2024
@BenPope BenPope added the kind/backport PRs targeting a stable branch label Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants