Assorted improvements to self-test #8695

graphcareful · 2023-02-07T20:27:23Z

This PR fixes some small bugs and adds some quality of life improvements to redpanda self-test such as:

Fixes hang on shutdown bug due to active self-test(s)
Fixes bug that would start slightly more network test runs then necessary
Adds unit tests for the above fix
Fixed bug where rpk sends incorrect test duration to admin_server
Assorted rpk fixes to do things like have nicer terminal output and fix startup defaults

Backports Required

Release Notes

Bug Fixes

Fixes hang on shutdown bug due to active self-test(s)
Fixes bug that would start slightly more network test runs then necessary
Fixed bug where rpk sends incorrect test duration to admin_server

- For when a node is up, but contains no cached results

- Reversing parallelism & duration members

- Fixes a bug where the netcheck benchmark would run in both directions, i.e. given two nodes 0 & 1, the benchmark would run twice, with one machine as the client the other as a server, then vice versa. - The actual desired behavior of this method is to have it generate pairs of node ids where no two pairs occur more then once.

- Test that the network_test_plan method for self-test produces unique pairs of node identifiers.

r-vasquez

rpk changes look good to me, however, I'll wait for the core review 😄 Thanks!

dotnwat

c++ bits lgtm

src/v/cluster/self_test_backend.cc

- Hang occured because the code waits for all outstanding work to leave the gate before stop() is called on any jobs - The solution is to call close but don't wait on it. Then return the future returned by close.

- The benchmarks were catching soft errors such as `benchmark_aborted` (only called on stop()) and logging at error, making ducktape tests fail when things were working as expected. Modifying these exceptions to print at debug level. - There is already a catch all exception handler in self_test_backend.cc that logs at warn. This can catch the hard errors and log at error. No need to have the tests intercept and rethrow. - Also handle possible gate_closed_exceptions so they don't percolate as hard errors.

graphcareful added 8 commits January 31, 2023 19:28

rptest: Remove only_conn flag from self-test start

9425cf4

rpk: Modify self-test default duration to 30s

c1b161d

rpk: Forwarding incorrect duration_ms to selftest

225c490

rpk: Add missing newline to self-test start msg

66a28c1

rpk: Add info for empty self-test status result

b62843c

- For when a node is up, but contains no cached results

cluster: Fix incorrect print of member vars

233fcd0

- Reversing parallelism & duration members

cluster/test: Unit tests for netcheck plan fn

2a4e8b8

- Test that the network_test_plan method for self-test produces unique pairs of node identifiers.

graphcareful requested review from dotnwat, jcsp, andrwng and r-vasquez February 7, 2023 20:27

graphcareful requested review from twmb and 0x5d as code owners February 7, 2023 20:27

github-actions bot added the area/redpanda label Feb 7, 2023

graphcareful removed the request for review from twmb February 7, 2023 20:27

github-actions bot added the area/rpk label Feb 7, 2023

graphcareful removed the request for review from 0x5d February 7, 2023 20:27

r-vasquez reviewed Feb 7, 2023

View reviewed changes

dotnwat reviewed Feb 7, 2023

View reviewed changes

src/v/cluster/self_test_backend.cc Outdated Show resolved Hide resolved

graphcareful added 2 commits February 7, 2023 17:19

self_test: Fix bug where shutdown would hang

1bb3301

- Hang occured because the code waits for all outstanding work to leave the gate before stop() is called on any jobs - The solution is to call close but don't wait on it. Then return the future returned by close.

graphcareful force-pushed the self-test-nits branch from 31bf096 to 2831339 Compare February 7, 2023 22:19

dotnwat approved these changes Feb 7, 2023

View reviewed changes

r-vasquez approved these changes Feb 7, 2023

View reviewed changes

andrwng approved these changes Feb 7, 2023

View reviewed changes

dotnwat merged commit 9074c16 into redpanda-data:dev Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assorted improvements to self-test #8695

Assorted improvements to self-test #8695

graphcareful commented Feb 7, 2023 •

edited

r-vasquez left a comment

dotnwat left a comment

Assorted improvements to self-test #8695

Assorted improvements to self-test #8695

Conversation

graphcareful commented Feb 7, 2023 • edited

Backports Required

Release Notes

Bug Fixes

r-vasquez left a comment

Choose a reason for hiding this comment

dotnwat left a comment

Choose a reason for hiding this comment

graphcareful commented Feb 7, 2023 •

edited