Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (assert len(orig_hwms) == 4 AssertionError) in DisablingPartitionsTest.test_disable #15949

Closed
ztlpn opened this issue Jan 4, 2024 · 3 comments · Fixed by #15981
Closed
Labels

Comments

@ztlpn
Copy link
Contributor

ztlpn commented Jan 4, 2024

https://buildkite.com/redpanda/redpanda/builds/43406#018cd484-b3d2-48d0-8ad3-11f831c77b24

Module: rptest.tests.recovery_mode_test
Class:  DisablingPartitionsTest
Method: test_disable
test_id:    rptest.tests.recovery_mode_test.DisablingPartitionsTest.test_disable
status:     FAIL
run time:   13.022 seconds


    AssertionError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/recovery_mode_test.py", line 418, in test_disable
    assert len(orig_hwms) == 4
AssertionError
@ztlpn ztlpn added kind/bug Something isn't working area/controller ci-failure labels Jan 4, 2024
@ztlpn
Copy link
Contributor Author

ztlpn commented Jan 4, 2024

PR build but error unrelated. Here is what rpk topic describe returned:

[DEBUG - 2024-01-04 13:32:42,106 - rpk - _execute - lineno:1043]: Executing command: ['/var/lib/buildkite-agent/builds/buildkite-amd64-xfs-builders-i-05e95862a86342792-1/redpanda/redpanda/vbuild/redpanda_installs/ci/bin/rpk', 'topic', '-X', 'brokers=docker-rp-5:9092,docker-rp-16:9092,docker-rp-17:9092,docker-rp-8:9092', 'describe', 'mytopic2', '-p', '-v']
[DEBUG - 2024-01-04 13:32:42,141 - rpk - _execute - lineno:1063]: 
PARTITION  LEADER  EPOCH  REPLICAS  LOG-START-OFFSET  HIGH-WATERMARK
0          1       -1     [1 3 4]   0                 474
1          3       1      [1 2 3]   0                 526

Leader epoch for partition mytopic2/0 is -1 even though there weren't any disruptive events that could have caused that.

@rockwotj
Copy link
Contributor

rockwotj commented Jan 6, 2024

Related? #15972

mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue Jan 6, 2024
When partition is first created in Redpanda some of the cluster nodes
which are not hosting partition replicas may not yet have leadership
metadata. In this case Redpanda still has to return partition metadata.
In order not to disturb the client (returning -1 as a leader id may
cause some clients to stop) Redpanda has to return a leader id. If the
information is not present we will always return the first node from
replica set in leader epoch equal to 0. This way client will either
communicate with the actual leader or issue a metadata request to other
node that may contain up to date information.

Fixes: redpanda-data#15949

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@ztlpn
Copy link
Contributor Author

ztlpn commented Jan 8, 2024

Related? #15972

yep

mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue Jan 24, 2024
When partition is first created in Redpanda some of the cluster nodes
which are not hosting partition replicas may not yet have leadership
metadata. In this case Redpanda still has to return partition metadata.
In order not to disturb the client (returning -1 as a leader id may
cause some clients to stop) Redpanda has to return a leader id. If the
information is not present we will always return the first node from
replica set in leader epoch equal to 0. This way client will either
communicate with the actual leader or issue a metadata request to other
node that may contain up to date information.

Fixes: redpanda-data#15949

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 7a60b75)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Jan 26, 2024
When partition is first created in Redpanda some of the cluster nodes
which are not hosting partition replicas may not yet have leadership
metadata. In this case Redpanda still has to return partition metadata.
In order not to disturb the client (returning -1 as a leader id may
cause some clients to stop) Redpanda has to return a leader id. If the
information is not present we will always return the first node from
replica set in leader epoch equal to 0. This way client will either
communicate with the actual leader or issue a metadata request to other
node that may contain up to date information.

Fixes: redpanda-data#15949

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 7a60b75)
ballard26 pushed a commit to ballard26/redpanda that referenced this issue Jan 27, 2024
When partition is first created in Redpanda some of the cluster nodes
which are not hosting partition replicas may not yet have leadership
metadata. In this case Redpanda still has to return partition metadata.
In order not to disturb the client (returning -1 as a leader id may
cause some clients to stop) Redpanda has to return a leader id. If the
information is not present we will always return the first node from
replica set in leader epoch equal to 0. This way client will either
communicate with the actual leader or issue a metadata request to other
node that may contain up to date information.

Fixes: redpanda-data#15949

Signed-off-by: Michal Maslanka <michal@redpanda.com>
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Feb 6, 2024
When partition is first created in Redpanda some of the cluster nodes
which are not hosting partition replicas may not yet have leadership
metadata. In this case Redpanda still has to return partition metadata.
In order not to disturb the client (returning -1 as a leader id may
cause some clients to stop) Redpanda has to return a leader id. If the
information is not present we will always return the first node from
replica set in leader epoch equal to 0. This way client will either
communicate with the actual leader or issue a metadata request to other
node that may contain up to date information.

Fixes: redpanda-data#15949

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 7a60b75)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants