Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform/logging/test: deflake TransformLogManagerTest.LargeBuffer #16478

Merged
merged 1 commit into from
Feb 5, 2024

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Feb 5, 2024

I saw a test flake in CI for another PR and dove in to debug this test
via: transform_logging_rpunit --gtest_repeat=1000 --gtest_break_on_failure

Running that I quickly saw failures, and after turning on logs and added
a couple logs of my own, I saw that in debug mode because the scheduler
can reorder tasks, that there is a concurrent flush running when we
advance our manual clock (from the LWM flush), so the clock advance
doesn't do anything. Fix that by always awaiting for pending tasks to
finish before triggering the next flush.

After performing this change, I can now run 1000 iterations of this
test on debug mode without failures.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

I saw a test flake in CI for another PR and dove in to debug this test
via: `transform_logging_rpunit --gtest_repeat=1000 --gtest_break_on_failure`

Running that I quickly saw failures, and after turning on logs and added
a couple logs of my own, I saw that in debug mode because the scheduler
can reorder tasks, that there is a concurrent flush running when we
advance our manual clock (from the LWM flush), so the clock advance
doesn't do anything. Fix that by always awaiting for pending tasks to
finish before triggering the next flush.

After performing this change, I can now run 1000 iterations of this
test on debug mode without failures.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

@oleiman
Copy link
Member

oleiman commented Feb 5, 2024

@rockwotj - we'll need an independent backport for this (see #16420 (comment))

@rockwotj
Copy link
Contributor Author

rockwotj commented Feb 5, 2024

Ah right, will tick the box then (I see you did - thanks!)

@rockwotj
Copy link
Contributor Author

rockwotj commented Feb 5, 2024

CI failing due to flaky networking :/

@rockwotj
Copy link
Contributor Author

rockwotj commented Feb 5, 2024

This is a unit test fix only and unit tests pass (CI is borked due to network dependencies down) - merging.

@rockwotj rockwotj merged commit 81715f2 into redpanda-data:dev Feb 5, 2024
17 of 20 checks passed
@rockwotj rockwotj deleted the transform_logging_rpunit branch February 5, 2024 21:19
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda area/wasm WASM Data Transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants