Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c/topics_dispatcher: do not guesstimate leader ids #15678

Merged

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Dec 15, 2023

Updating leadership metadata before the topic is ready to serve traffic is not desired. It prevents waiting for leader to be reported by raft group when it is actually ready.

Fixes: #14673

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

@ztlpn
Copy link
Contributor

ztlpn commented Dec 15, 2023

I wonder what was the original motivation? Can this change lead to undesirable effects like health status flip-flop?

@mmaslankaprv
Copy link
Member Author

Original motivation comes from the times when we didn't have infrastructure to wait for topic creation. We wanted to speed up metadata propagation. Now since the necessary API is there we do not need to estimate leaders anymore.

ztlpn
ztlpn previously approved these changes Dec 15, 2023
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 18, 2023

new failures in https://buildkite.com/redpanda/redpanda/builds/42970#018c7dc6-d933-48d0-9eb7-f5b7f6495974:

"rptest.tests.recovery_mode_test.DisablingPartitionsTest.test_disable"

new failures in https://buildkite.com/redpanda/redpanda/builds/43118#018c86c7-e656-4f44-a679-9f2fcbcf12d4:

"rptest.tests.recovery_mode_test.DisablingPartitionsTest.test_disable"

new failures in https://buildkite.com/redpanda/redpanda/builds/43139#018c876b-b5ff-45c1-bc68-e294080f1ca2:

"rptest.tests.consumer_group_recovery_tool_test.ConsumerOffsetsRecoveryToolTest.test_consumer_offsets_partition_count_change"

new failures in https://buildkite.com/redpanda/redpanda/builds/43139#018c877a-c983-46ac-bb2f-53ef24dc80f1:

"rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/43209#018c8c1a-5875-4980-b82e-db682607291b:

"rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/43209#018c8c1a-5878-4570-82f5-19a0a60c0b95:

"rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/43209#018c8c1f-4f7d-41d7-a61c-1166074bd617:

"rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/43209#018c8c1a-587e-45dd-beeb-9f9d0bd3c33f:

"rptest.tests.recovery_mode_test.DisablingPartitionsTest.test_disable"

new failures in https://buildkite.com/redpanda/redpanda/builds/43209#018c8c1f-4f7a-40e6-bd72-a1dad44b7057:

"rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False"

@mmaslankaprv mmaslankaprv force-pushed the do-not-guestimate-leaders branch 2 times, most recently from 86a4b2f to 9569e0b Compare December 18, 2023 18:58
ztlpn
ztlpn previously approved these changes Dec 19, 2023
src/v/cluster/tests/partition_balancer_planner_fixture.h Outdated Show resolved Hide resolved
ztlpn
ztlpn previously approved these changes Dec 20, 2023
ztlpn
ztlpn previously approved these changes Dec 21, 2023
@mmaslankaprv
Copy link
Member Author

/ci-repeat 1

Updating leadership metadata before the topic is ready to serve traffic
is not desired. It prevents waiting for leader to be reported by raft
group when it is actually ready.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Instead of returning `-1` indicating that there is an ongoing leader
election we return randomly selected node as a partition leader. This is
much less interrupting for the client as it simply forces it to refresh
metadata. Some clients do not tolerate `-1` returned from Metadata
handler and simply stop working.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
The metadata dissemination test must account for exponential backoff
which may be longer than used 10 seconds timeout. Now as we do not
update leaders table after topic create command is applied we need to
wait for the request to be delivered to joining node.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
The consumer group recovery test doesn't do any validation if offset
listing was successful. Disabling leader balancer will make the test
more stable before we introduce validation in recovery tool (now the
tool assumes that user is going to review offsets listing).

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Wait for at least some of the messages to be successfully written to
topic before starting consumers to prevent them trying to read empty
partition.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv
Copy link
Member Author

known unrelated ci-failure: #15679

@mmaslankaprv mmaslankaprv merged commit 2c4e3a0 into redpanda-data:dev Jan 2, 2024
18 of 20 checks passed
@mmaslankaprv mmaslankaprv deleted the do-not-guestimate-leaders branch January 2, 2024 19:12
@mmaslankaprv
Copy link
Member Author

/backport v23.3.x

@mmaslankaprv
Copy link
Member Author

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-15678-v23.2.x-762 remotes/upstream/v23.2.x
git cherry-pick -x 3e86234048aab4991c2356b4d62dc7d42b9b0b6d 8f81843ffcebd5a9765397bc923207e9f79ac864 86d583b486d033377be4c9f2b331be25ff6d9632 763c0ac0ebc5e232a9fd747a9f6fba56d2759ee2 5bf2a272311bd30d21d98ab30af1e0ed3d69aa08 fea4f6a23b48a8b4d1793a1d1b3afd92b7960ff8 3b2eac5f2f508d8939e2a4f97085fe5d10af9727 6823527bbfab550917bf5df1586049e6d00e0abd

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (RpkException) in RpkToolTest.test_consume_from_partition
3 participants