-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocate and rebalance replicas for different partitions in random order #17962
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/ci-repeat |
ztlpn
force-pushed
the
fix-replica-clustering
branch
from
April 19, 2024 18:52
eb62711
to
d521dc8
Compare
/ci-repeat |
ztlpn
force-pushed
the
fix-replica-clustering
branch
from
April 19, 2024 23:00
d521dc8
to
29c0c44
Compare
/ci-repeat |
Previously, when doing counts rebalancing, we visited partitions randomly, but for each partition we tried to move each replica in the group sequentially. This can lead to undesireable clustering of replicas - for example, if we add 3 new nodes, whole replica groups will be moved there wholesale, and as a result, there will be no replication traffic between old and new nodes. To avoid this, iterate over individual replicas randomly.
Allocate replicas in random order (i.e. not all replicas for a partition at once) to prevent formation of replica clusters - an undesireable allocation pattern where many partitions have the exact same replica set. Example: suppose we have 6 nodes and a 1-partition topic with rf=3, with replicas on nodes 1, 2, 3. Then, if we allocate replicas sequentially, due to interplay between topic-aware counts and total counts objectives, newly allocated partitions for a new topic will have the following replica sets: {1,2,3}, {4,5,6}, {1,2,3}, etc., i.e. all partitions will have only one of 2 possible replica sets!
ztlpn
force-pushed
the
fix-replica-clustering
branch
from
April 21, 2024 19:04
29c0c44
to
ba439dc
Compare
/ci-repeat |
/ci-repeat |
ztlpn
force-pushed
the
fix-replica-clustering
branch
from
April 22, 2024 00:20
249d2ad
to
2369265
Compare
/ci-repeat |
ztlpn
force-pushed
the
fix-replica-clustering
branch
from
April 22, 2024 08:11
2369265
to
567ee89
Compare
…ition Now that we increase replication factor in allocate(), we don't need that functionality in reallocate_partition().
ztlpn
force-pushed
the
fix-replica-clustering
branch
from
April 22, 2024 08:16
567ee89
to
4eb64d6
Compare
/ci-repeat |
mmaslankaprv
approved these changes
Apr 22, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, we allocated new replicas of a partition and rebalanced existing ones together - i.e. all replicas of a single partition one by one. This can lead to having some replica sets repeated over and over - and undesirable pattern because these nodes will replicate data mostly between each other.
To prevent this, allocate and rebalance replicas in true random order - a replica of partition P1, then a replica of partition P2, then maybe a replica of P1 again, etc.
Fixes #17925
Backports Required
Release Notes
Improvements