-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
verifier/tx: explicit flush before aborting transaction #17773
Conversation
Kafka client has this optimization where if there is unflushed data while aborting a transaction, it is silently dropped on the client side and only the abort request is sent. With verifiers we want the aborted data in the partition log so the test code invariants (like no aborted data is read etc) are actually doing meaningful work. This patch explicitly flushes the data before aborting the transaction to ensure data ends up in the log.
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47639#018ecac0-5025-4658-8885-f1522a05f921 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
previously the TX_UNIQUE_KEYS workload would generate mostly non compactible segments, so the metric check needed to be <= this is not a bug rather a known behaviour of kafka + compaction: we need to preserve "tx abort" batches across compaction, and the workload generator would generate mostly only "tx abort" batches without any intermediate data. a recent pr redpanda-data#17773 fixes this behavior so this test can be strict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With verifiers we want the aborted data in the partition log so the test
code invariants (like no aborted data is read etc) are actually doing
meaningful work.
Makes sense to me. Was it the case that the verifiers weren't doing any work, or were they sometimes doing work because the aborted data flushing was a race--sometimes so, sometimes not?
I think its a race but the probability of aborted data batches getting flushed seems quite low in ducktape (not really sure what is influencing it).. we use a similar verifier in Chaos tests (which is where this is copied from) and I'm pretty sure we see aborted batches there, so it really depends on how loaded the producer side of things are probably. We stumbled up on this because @andijcr was adding a metric check for compaction that ensures that segments are actually getting compacted (based on the before vs after size I believe) and he noticed that this particular test always fails the check and turns out there are no aborted data batches in the segments (and the data is unique), so nothing is really getting compacted. |
Kafka client has this optimization where if there is unflushed data while aborting a transaction, it is silently dropped on the client side and only the abort request is sent.
With verifiers we want the aborted data in the partition log so the test code invariants (like no aborted data is read etc) are actually doing meaningful work.
This patch explicitly flushes the data before aborting the transaction to ensure data ends up in the log.
Backports Required
Release Notes