Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller backend: reconcile each ntp in individual fibers #16055

Merged
merged 6 commits into from
Jan 15, 2024

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented Jan 10, 2024

In the future, when partitions will be assigned to shards by a node-level component (as opposed to topic table), controller backend will have to accept notifications from several sources. To support this it is more convenient to have per-ntp fibers doing reconciliation, as opposed to a single housekeeping fiber that periodically launches reconciliation for all pending ntps and waits for all reconciliation attempts to finish.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/43647#018cf476-8100-43a4-9e7d-67a03af05ae4:

"rptest.tests.recovery_mode_test.RecoveryModeTest.test_recovery_mode"

@ztlpn
Copy link
Contributor Author

ztlpn commented Jan 10, 2024

Hmm, the above failure may actually be semi-related due to slightly changed timings of partition creation. In the test run ntp kafka/__consumer_offsets/1 never got a leader because all replicas reported

TRACE 2024-01-10 18:19:47,779 [shard 0:main] raft - [group_id:13, {kafka/__consumer_offsets/1}] consensus.cc:953 - current node 
priority 1 is lower than target 6538824 (next vote 4359216)

Looking at the code, I see that before the raft group manager is marked "ready", consensus instances are created with priority=1. When set_ready is called, priority is reset to a normal value for all raft instances existing at that point. But I think there is a small race possibility where set_ready is called exactly between the moment a consensus instance is created and the moment it is added to the group_manager map.

@mmaslankaprv WDYT?

@ztlpn
Copy link
Contributor Author

ztlpn commented Jan 11, 2024

ok, this is very easy to reproduce with a well-placed sleep

Comment on lines +569 to +679
auto [rs_it, inserted] = _states.try_emplace(d.ntp);
if (inserted) {
rs_it->second
= ss::make_lw_shared<ntp_reconciliation_state>();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat :)

@ztlpn
Copy link
Contributor Author

ztlpn commented Jan 11, 2024

Fix for the test_recovery_mode failure: #16068

bharathv
bharathv previously approved these changes Jan 12, 2024
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good to me.

@vbotbuildovich
Copy link
Collaborator

@ztlpn ztlpn merged commit c9eb5cb into redpanda-data:dev Jan 15, 2024
19 checks passed
@ztlpn ztlpn deleted the controller-backend-fiber-per-ntp branch April 19, 2024 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants