Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted improvements to self-test #8695

Merged
merged 10 commits into from
Feb 8, 2023

Conversation

graphcareful
Copy link
Contributor

@graphcareful graphcareful commented Feb 7, 2023

This PR fixes some small bugs and adds some quality of life improvements to redpanda self-test such as:

  • Fixes hang on shutdown bug due to active self-test(s)
  • Fixes bug that would start slightly more network test runs then necessary
  • Adds unit tests for the above fix
  • Fixed bug where rpk sends incorrect test duration to admin_server
  • Assorted rpk fixes to do things like have nicer terminal output and fix startup defaults

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v22.3.x
  • v22.2.x
  • v22.1.x

Release Notes

Bug Fixes

  • Fixes hang on shutdown bug due to active self-test(s)
  • Fixes bug that would start slightly more network test runs then necessary
  • Fixed bug where rpk sends incorrect test duration to admin_server

- For when a node is up, but contains no cached results
- Reversing parallelism & duration members
- Fixes a bug where the netcheck benchmark would run in both directions,
i.e. given two nodes 0 & 1, the benchmark would run twice, with one
machine as the client the other as a server, then vice versa.

- The actual desired behavior of this method is to have it generate
pairs of node ids where no two pairs occur more then once.
- Test that the network_test_plan method for self-test produces unique
pairs of node identifiers.
Copy link
Contributor

@r-vasquez r-vasquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rpk changes look good to me, however, I'll wait for the core review 😄 Thanks!

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c++ bits lgtm

src/v/cluster/self_test_backend.cc Outdated Show resolved Hide resolved
- Hang occured because the code waits for all outstanding work to leave
the gate before stop() is called on any jobs

- The solution is to call close but don't wait on it. Then return the
future returned by close.
- The benchmarks were catching soft errors such as
`benchmark_aborted` (only called on stop()) and logging at error, making
ducktape tests fail when things were working as expected. Modifying
these exceptions to print at debug level.

- There is already a catch all exception handler in self_test_backend.cc
that logs at warn. This can catch the hard errors and log at error. No
need to have the tests intercept and rethrow.

- Also handle possible gate_closed_exceptions so they don't percolate as
hard errors.
@dotnwat dotnwat merged commit 9074c16 into redpanda-data:dev Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants