CI Failure (`assert len(orig_hwms) == 4` AssertionError) in `DisablingPartitionsTest.test_disable` #15949

ztlpn · 2024-01-04T18:21:20Z

https://buildkite.com/redpanda/redpanda/builds/43406#018cd484-b3d2-48d0-8ad3-11f831c77b24

Module: rptest.tests.recovery_mode_test
Class:  DisablingPartitionsTest
Method: test_disable

test_id:    rptest.tests.recovery_mode_test.DisablingPartitionsTest.test_disable
status:     FAIL
run time:   13.022 seconds


    AssertionError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/recovery_mode_test.py", line 418, in test_disable
    assert len(orig_hwms) == 4
AssertionError

The text was updated successfully, but these errors were encountered:

ztlpn · 2024-01-04T18:23:08Z

PR build but error unrelated. Here is what rpk topic describe returned:

[DEBUG - 2024-01-04 13:32:42,106 - rpk - _execute - lineno:1043]: Executing command: ['/var/lib/buildkite-agent/builds/buildkite-amd64-xfs-builders-i-05e95862a86342792-1/redpanda/redpanda/vbuild/redpanda_installs/ci/bin/rpk', 'topic', '-X', 'brokers=docker-rp-5:9092,docker-rp-16:9092,docker-rp-17:9092,docker-rp-8:9092', 'describe', 'mytopic2', '-p', '-v']
[DEBUG - 2024-01-04 13:32:42,141 - rpk - _execute - lineno:1063]: 
PARTITION  LEADER  EPOCH  REPLICAS  LOG-START-OFFSET  HIGH-WATERMARK
0          1       -1     [1 3 4]   0                 474
1          3       1      [1 2 3]   0                 526

Leader epoch for partition mytopic2/0 is -1 even though there weren't any disruptive events that could have caused that.

rockwotj · 2024-01-06T04:43:55Z

Related? #15972

When partition is first created in Redpanda some of the cluster nodes which are not hosting partition replicas may not yet have leadership metadata. In this case Redpanda still has to return partition metadata. In order not to disturb the client (returning -1 as a leader id may cause some clients to stop) Redpanda has to return a leader id. If the information is not present we will always return the first node from replica set in leader epoch equal to 0. This way client will either communicate with the actual leader or issue a metadata request to other node that may contain up to date information. Fixes: redpanda-data#15949 Signed-off-by: Michal Maslanka <michal@redpanda.com>

ztlpn · 2024-01-08T12:52:34Z

Related? #15972

yep

When partition is first created in Redpanda some of the cluster nodes which are not hosting partition replicas may not yet have leadership metadata. In this case Redpanda still has to return partition metadata. In order not to disturb the client (returning -1 as a leader id may cause some clients to stop) Redpanda has to return a leader id. If the information is not present we will always return the first node from replica set in leader epoch equal to 0. This way client will either communicate with the actual leader or issue a metadata request to other node that may contain up to date information. Fixes: redpanda-data#15949 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 7a60b75)

When partition is first created in Redpanda some of the cluster nodes which are not hosting partition replicas may not yet have leadership metadata. In this case Redpanda still has to return partition metadata. In order not to disturb the client (returning -1 as a leader id may cause some clients to stop) Redpanda has to return a leader id. If the information is not present we will always return the first node from replica set in leader epoch equal to 0. This way client will either communicate with the actual leader or issue a metadata request to other node that may contain up to date information. Fixes: redpanda-data#15949 Signed-off-by: Michal Maslanka <michal@redpanda.com>

When partition is first created in Redpanda some of the cluster nodes which are not hosting partition replicas may not yet have leadership metadata. In this case Redpanda still has to return partition metadata. In order not to disturb the client (returning -1 as a leader id may cause some clients to stop) Redpanda has to return a leader id. If the information is not present we will always return the first node from replica set in leader epoch equal to 0. This way client will either communicate with the actual leader or issue a metadata request to other node that may contain up to date information. Fixes: redpanda-data#15949 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 7a60b75)

ztlpn added kind/bug Something isn't working area/controller ci-failure labels Jan 4, 2024

ztlpn mentioned this issue Jan 4, 2024

r/offset_translator: remove unsafe bootstrap code #15919

Merged

7 tasks

andijcr mentioned this issue Jan 5, 2024

tests/si_utils: error message for quiesce_uploads timeouts #15963

Merged

7 tasks

mmaslankaprv mentioned this issue Jan 6, 2024

k/metadata: guesstimate leader when information is not yet present #15981

Merged

7 tasks

rockwotj mentioned this issue Jan 8, 2024

wasm: add runtime and max binary configs #15967

Merged

7 tasks

ztlpn mentioned this issue Jan 8, 2024

CI Failure (assert set(hwms4.keys()) == set(orig_hwms.keys())) in DisablingPartitionsTest.test_disable #15972

Closed

mmaslankaprv closed this as completed in #15981 Jan 8, 2024

andijcr mentioned this issue Jan 9, 2024

tests/si_utils: assert message for verify_file_layout() #15990

Merged

7 tasks

michael-redpanda mentioned this issue Jan 10, 2024

CI Failure (key symptom) in DisablingPartitionsTest.test_disable #16013

Closed

andijcr mentioned this issue Jan 26, 2024

[v23.3.x] "enable by default spillover manifest" testing followups #16292

Merged

This was referenced Jan 29, 2024

[v23.3.x] rpk: fix mixed use of backcompat flags with -X #16351

Merged

[v23.3.x] rpk: improve --exit-when-healthy in cluster health #16363

Merged

vbotbuildovich mentioned this issue Feb 6, 2024

[v23.3.x] CI Failure (assert len(orig_hwms) == 4 AssertionError) in DisablingPartitionsTest.test_disable #16490

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Failure (`assert len(orig_hwms) == 4` AssertionError) in `DisablingPartitionsTest.test_disable` #15949

CI Failure (`assert len(orig_hwms) == 4` AssertionError) in `DisablingPartitionsTest.test_disable` #15949

ztlpn commented Jan 4, 2024

ztlpn commented Jan 4, 2024 •

edited

Loading

rockwotj commented Jan 6, 2024

ztlpn commented Jan 8, 2024

CI Failure (assert len(orig_hwms) == 4 AssertionError) in DisablingPartitionsTest.test_disable #15949

CI Failure (assert len(orig_hwms) == 4 AssertionError) in DisablingPartitionsTest.test_disable #15949

Comments

ztlpn commented Jan 4, 2024

ztlpn commented Jan 4, 2024 • edited Loading

rockwotj commented Jan 6, 2024

ztlpn commented Jan 8, 2024

CI Failure (`assert len(orig_hwms) == 4` AssertionError) in `DisablingPartitionsTest.test_disable` #15949

CI Failure (`assert len(orig_hwms) == 4` AssertionError) in `DisablingPartitionsTest.test_disable` #15949

ztlpn commented Jan 4, 2024 •

edited

Loading