Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeoutError in NodesDecommissioningTest.test_multiple_decommissions #7816

Closed
VladLazar opened this issue Dec 16, 2022 · 2 comments · Fixed by #7789
Closed

TimeoutError in NodesDecommissioningTest.test_multiple_decommissions #7816

VladLazar opened this issue Dec 16, 2022 · 2 comments · Fixed by #7789
Assignees
Labels
area/raft ci-failure kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages

Comments

@VladLazar
Copy link
Contributor

Noticed the failure in https://buildkite.com/redpanda/redpanda/builds/19894 of #7815 (which only touched unit test code). The failing test seems to have been recently added (ba96642).

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/nodes_decommissioning_test.py", line 471, in test_multiple_decommissions
    wait_until(lambda: self._partitions_moving(),
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

There have been a number of other instances of this failure in the last 24 hours:

FAIL test: NodesDecommissioningTest.test_multiple_decommissions (4/21 runs)
  failure at 2022-12-16T06:46:20.056Z: TimeoutError('')
      on (arm64, container) in job https://buildkite.com/redpanda/redpanda/builds/19869#01851972-b053-4bfb-adea-153b3555e97a
  failure at 2022-12-16T07:15:01.813Z: TimeoutError('')
      on (arm64, container) in job https://buildkite.com/redpanda/redpanda/builds/19872#0185198c-a04d-406d-ba7c-71f3d65f7e1b
  failure at 2022-12-16T13:35:27.534Z: TimeoutError('')
      on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/19883#01851b00-016f-45cf-b01d-85dada2a8ff0
  failure at 2022-12-16T06:41:11.094Z: TimeoutError('')
      on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/19872#01851989-6253-4e43-b511-8568f0840b9e
@mmaslankaprv
Copy link
Member

working on it

mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue Dec 19, 2022
Nodes decommissioning test was waiting for partition to begin moving. In
some cases all the partitions might have been moved before the wait
triggered leading to test failure. Removed waiting for decommissioning
status and partitions moving in favor of waiting for the node to be
removed from the cluster which is the end result of decommissioning
operation.

Fixes: redpanda-data#7816

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@jcsp jcsp added sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages area/raft labels Dec 19, 2022
mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue Dec 19, 2022
Nodes decommissioning test was waiting for partition to begin moving. In
some cases all the partitions might have been moved before the wait
triggered leading to test failure. Removed waiting for decommissioning
status and partitions moving in favor of waiting for the node to be
removed from the cluster which is the end result of decommissioning
operation.

Fixes: redpanda-data#7816

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit b299a46)
mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue Dec 19, 2022
Nodes decommissioning test was waiting for partition to begin moving. In
some cases all the partitions might have been moved before the wait
triggered leading to test failure. Removed waiting for decommissioning
status and partitions moving in favor of waiting for the node to be
removed from the cluster which is the end result of decommissioning
operation.

Fixes: redpanda-data#7816

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit b299a46)
mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue Dec 19, 2022
Nodes decommissioning test was waiting for partition to begin moving. In
some cases all the partitions might have been moved before the wait
triggered leading to test failure. Removed waiting for decommissioning
status and partitions moving in favor of waiting for the node to be
removed from the cluster which is the end result of decommissioning
operation.

Fixes: redpanda-data#7816

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit b299a46)
andrwng pushed a commit to mmaslankaprv/redpanda that referenced this issue Dec 19, 2022
Nodes decommissioning test was waiting for partition to begin moving. In
some cases all the partitions might have been moved before the wait
triggered leading to test failure. Removed waiting for decommissioning
status and partitions moving in favor of waiting for the node to be
removed from the cluster which is the end result of decommissioning
operation.

Fixes: redpanda-data#7816

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit b299a46)
andrwng pushed a commit to mmaslankaprv/redpanda that referenced this issue Dec 19, 2022
Nodes decommissioning test was waiting for partition to begin moving. In
some cases all the partitions might have been moved before the wait
triggered leading to test failure. Removed waiting for decommissioning
status and partitions moving in favor of waiting for the node to be
removed from the cluster which is the end result of decommissioning
operation.

Fixes: redpanda-data#7816

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit b299a46)
mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue Dec 23, 2022
Nodes decommissioning test was waiting for partition to begin moving. In
some cases all the partitions might have been moved before the wait
triggered leading to test failure. Removed waiting for decommissioning
status and partitions moving in favor of waiting for the node to be
removed from the cluster which is the end result of decommissioning
operation.

Fixes: redpanda-data#7816

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit b299a46)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/raft ci-failure kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants