-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rptest: fix cluster healthy check #16337
Conversation
example test run in description, ready for review |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44395#018d4972-da49-4b32-8bd2-259f1738a09a ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44858#018d8660-433f-45d5-af8f-765ec59e3e51 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44858#018d8671-0d59-4053-9180-fb87c8571dd8 |
# Under-replicated partitions (0): [] | ||
|
||
lines = ret.decode().splitlines() | ||
self.logger.debug(f'rpk cluster health lines: {lines}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feedback from travis: compare this healthy check with the one that is performed in bare metal situation
@@ -478,6 +478,8 @@ def get_broker_pod(self): | |||
|
|||
@cluster(num_nodes=2) | |||
def test_throughput_simple(self): | |||
assert self.redpanda.cluster_healthy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need to go in one place which will be called automatically before and after the test. Maybe the "before" check in make_redpanda_service_cloud
? and the after check in tearDown
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the make_redpanda_service_cloud()
looks to be called in just the __init__()
method, i think a more suitable place would be setup()
which is called before every test_*
method and failures in setup()
will fail the test.
it looks like ducktape won't fail a test if there is a failure in the tearDown()
method, however. it will just log a warning: https://github.com/redpanda-data/ducktape/blob/62e0285f6b3a2f22fd4a43b5fdbc13be8d4290d9/ducktape/tests/runner_client.py#L295
so, i'll leave it at the end of each individual test unless there is a more suitable place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like ducktape won't fail a test if there is a failure in the
tearDown()
method, however. it will just log a warning:
Ah, that explains something that John had written in our variation of the @cluster decorator: they use it exactly to avoid this problem with tearDown.
So let's do this for now but we can switch to a decorator soon enough.
tests/rptest/redpanda_cloud_tests/simple_redpanda_service_cloud_test.py
Outdated
Show resolved
Hide resolved
return True | ||
break | ||
|
||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good for now, but ideally we want the unhealthy reasons in the error. A variation of this which threw an exception could include that info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline comments.
1ba5ae3
to
620d9b9
Compare
@travisdowns PR updated; ready for review |
i add git commit acc2b01 because it seems like it can be used to satisfy https://github.com/redpanda-data/core-internal/issues/1013 |
new failures in https://buildkite.com/redpanda/redpanda/builds/44755#018d7f15-e115-4465-870c-3c3ec48a5043:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/ci-repeat |
1f61f73
to
9b151dc
Compare
rebased with tip of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
9b151dc
to
367c7ea
Compare
rebased again with tip of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
367c7ea
to
7c2c448
Compare
i had to redo the rebase because it looks like the previous rebase to 367c7ea was missing 4 commits (not sure why). i've also re-run the tests and updated output in the PR description to verify. |
Fixes https://github.com/redpanda-data/core-internal/issues/1003
New method
RedpandaServiceCloud.cluster_healthy()
according to instructions on ensuring cluster is healthy.Added assertions with
cluster_healthy()
to the start and end of the heavy tests that target cloud clusters.Also added a simple self test for the new method:
output:
Backports Required
Release Notes