Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate and rebalance replicas for different partitions in random order #17962

Merged
merged 7 commits into from
Apr 22, 2024

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented Apr 19, 2024

Previously, we allocated new replicas of a partition and rebalanced existing ones together - i.e. all replicas of a single partition one by one. This can lead to having some replica sets repeated over and over - and undesirable pattern because these nodes will replicate data mostly between each other.

To prevent this, allocate and rebalance replicas in true random order - a replica of partition P1, then a replica of partition P2, then maybe a replica of P1 again, etc.

Fixes #17925

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • Allocate and rebalance partition replicas in random order to prevent an undesirable pattern when many partitions have the same replica set.

@ztlpn
Copy link
Contributor Author

ztlpn commented Apr 19, 2024

/ci-repeat

@ztlpn
Copy link
Contributor Author

ztlpn commented Apr 19, 2024

/ci-repeat

@ztlpn
Copy link
Contributor Author

ztlpn commented Apr 19, 2024

/ci-repeat

Previously, when doing counts rebalancing, we visited partitions
randomly, but for each partition we tried to move each replica in the
group sequentially. This can lead to undesireable clustering of
replicas - for example, if we add 3 new nodes, whole replica groups will
be moved there wholesale, and as a result, there will be no replication
traffic between old and new nodes. To avoid this, iterate over
individual replicas randomly.
Allocate replicas in random order (i.e. not all replicas for a partition
at once) to prevent formation of replica clusters - an undesireable
allocation pattern where many partitions have the exact same replica set.

Example: suppose we have 6 nodes and a 1-partition topic with rf=3, with
replicas on nodes 1, 2, 3. Then, if we allocate replicas sequentially,
due to interplay between topic-aware counts and total counts objectives,
newly allocated partitions for a new topic will have the following
replica sets: {1,2,3}, {4,5,6}, {1,2,3}, etc., i.e. all partitions will
have only one of 2 possible replica sets!
@ztlpn
Copy link
Contributor Author

ztlpn commented Apr 21, 2024

/ci-repeat

@ztlpn
Copy link
Contributor Author

ztlpn commented Apr 21, 2024

/ci-repeat

@ztlpn
Copy link
Contributor Author

ztlpn commented Apr 22, 2024

/ci-repeat

…ition

Now that we increase replication factor in allocate(), we don't need
that functionality in reallocate_partition().
@ztlpn
Copy link
Contributor Author

ztlpn commented Apr 22, 2024

/ci-repeat

@ztlpn ztlpn marked this pull request as ready for review April 22, 2024 08:24
@ztlpn ztlpn added this to the 24.1.1-GA milestone Apr 22, 2024
@ztlpn ztlpn requested a review from mmaslankaprv April 22, 2024 10:13
@ztlpn ztlpn merged commit 017c955 into redpanda-data:dev Apr 22, 2024
19 checks passed
@ztlpn ztlpn deleted the fix-replica-clustering branch April 22, 2024 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Partition allocator creates clustered node groups
2 participants