Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verifier/tx: explicit flush before aborting transaction #17773

Merged
merged 1 commit into from
Apr 11, 2024

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented Apr 10, 2024

Kafka client has this optimization where if there is unflushed data while aborting a transaction, it is silently dropped on the client side and only the abort request is sent.

With verifiers we want the aborted data in the partition log so the test code invariants (like no aborted data is read etc) are actually doing meaningful work.

This patch explicitly flushes the data before aborting the transaction to ensure data ends up in the log.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • none

Kafka client has this optimization where if there is unflushed data
while aborting a transaction, it is silently dropped on the client side
and only the abort request is sent.

With verifiers we want the aborted data in the partition log so the test
code invariants (like no aborted data is read etc) are actually doing
meaningful work.

This patch explicitly flushes the data before aborting the transaction to
ensure data ends up in the log.
@vbotbuildovich
Copy link
Collaborator

@bharathv bharathv requested a review from andijcr April 11, 2024 03:05
Copy link
Contributor

@andijcr andijcr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

andijcr added a commit to andijcr/redpanda that referenced this pull request Apr 11, 2024
previously the TX_UNIQUE_KEYS workload would generate mostly non compactible
segments, so the metric check needed to be <=

this is not a bug rather a known behaviour of kafka + compaction: we
need to preserve "tx abort" batches across compaction, and the workload
generator would generate mostly only "tx abort" batches without any
intermediate data.

a recent pr redpanda-data#17773 fixes
this behavior so this test can be strict
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With verifiers we want the aborted data in the partition log so the test
code invariants (like no aborted data is read etc) are actually doing
meaningful work.

Makes sense to me. Was it the case that the verifiers weren't doing any work, or were they sometimes doing work because the aborted data flushing was a race--sometimes so, sometimes not?

@bharathv
Copy link
Contributor Author

Was it the case that the verifiers weren't doing any work, or were they sometimes doing work because the aborted data flushing was a race--sometimes so, sometimes not?

I think its a race but the probability of aborted data batches getting flushed seems quite low in ducktape (not really sure what is influencing it).. we use a similar verifier in Chaos tests (which is where this is copied from) and I'm pretty sure we see aborted batches there, so it really depends on how loaded the producer side of things are probably.

We stumbled up on this because @andijcr was adding a metric check for compaction that ensures that segments are actually getting compacted (based on the before vs after size I believe) and he noticed that this particular test always fails the check and turns out there are no aborted data batches in the segments (and the data is unique), so nothing is really getting compacted.

@bharathv bharathv merged commit 258ce03 into redpanda-data:dev Apr 11, 2024
18 checks passed
@bharathv bharathv deleted the fix_comp_verifier branch April 11, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants