Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (Timeout - Failed to start) in MultiTopicAutomaticLeadershipBalancingTest.test_topic_aware_rebalance #11044

Closed
r-vasquez opened this issue May 25, 2023 · 16 comments · Fixed by #11472
Assignees
Labels
ci-failure kind/bug Something isn't working

Comments

@r-vasquez
Copy link
Contributor

https://buildkite.com/redpanda/redpanda/builds/29920#0188548a-60df-4a2e-97f8-5a8375161457/6-5906

Module: rptest.tests.leadership_transfer_test
Class:  MultiTopicAutomaticLeadershipBalancingTest
Method: test_topic_aware_rebalance
====================================================================================================
test_id:    rptest.tests.leadership_transfer_test.MultiTopicAutomaticLeadershipBalancingTest.test_topic_aware_rebalance
status:     FAIL
run time:   2 minutes 38.155 seconds


    TimeoutError('Redpanda service docker-rp-8 failed to start within 20 sec')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/leadership_transfer_test.py", line 189, in test_topic_aware_rebalance
    self.redpanda.start_node(node)
  File "/root/tests/rptest/services/redpanda.py", line 1796, in start_node
    self.start_service(node, start_rp)
  File "/root/tests/rptest/services/redpanda.py", line 1868, in start_service
    start()
  File "/root/tests/rptest/services/redpanda.py", line 1787, in start_rp
    wait_until(
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Redpanda service docker-rp-8 failed to start within 20 sec
@r-vasquez r-vasquez added kind/bug Something isn't working ci-failure labels May 25, 2023
@NyaliaLui
Copy link
Contributor

FAIL test: MultiTopicAutomaticLeadershipBalancingTest.test_topic_aware_rebalance (6/30 runs)
  failure at 2023-05-26T07:02:05.442Z: TimeoutError('Redpanda service docker-rp-4 failed to start within 20 sec')
      on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/29952#018856a6-00da-403c-8eb9-66fb914026d1

@abhijat
Copy link
Contributor

abhijat commented May 30, 2023

https://buildkite.com/redpanda/redpanda/builds/30108#018868e9-8cfa-4239-a449-fa4c1bc3d5c4

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/leadership_transfer_test.py", line 189, in test_topic_aware_rebalance
    self.redpanda.start_node(node)
  File "/root/tests/rptest/services/redpanda.py", line 2144, in start_node
    self.start_service(node, start_rp)
  File "/root/tests/rptest/services/redpanda.py", line 2216, in start_service
    start()
  File "/root/tests/rptest/services/redpanda.py", line 2135, in start_rp
    wait_until(
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Redpanda service docker-rp-11 failed to start within 20 sec

@michael-redpanda
Copy link
Contributor

@ballard26
Copy link
Contributor

ballard26 commented Jun 4, 2023

In https://buildkite.com/redpanda/redpanda/builds/30231#01886fa2-9b7b-4
Redpanda's process starts at 3:25:15 , the admin api service starts at 3:25:33, and the test last tries to ping the RP instance at 03:25:33. It looks like Redpanda is starting slower than normal and racing with the test. I'm not seeing anything specific to this test failing so far.

@piyushredpanda
Copy link
Contributor

piyushredpanda commented Jun 4, 2023

Yeah possibly tied to clang-16 upgrade, or something like that. A bunch of tests in this state.

@andijcr
Copy link
Contributor

andijcr commented Jun 5, 2023

@andijcr
Copy link
Contributor

andijcr commented Jun 7, 2023

@rystsov
Copy link
Contributor

rystsov commented Jun 8, 2023

@michael-redpanda
Copy link
Contributor

@michael-redpanda
Copy link
Contributor

@abhijat
Copy link
Contributor

abhijat commented Jun 15, 2023

@twmb
Copy link
Contributor

twmb commented Jun 15, 2023

@michael-redpanda
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants