Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Allocation and rebalancing based on average primary shard count per index #6422

Conversation

dreamer-89
Copy link
Member

@dreamer-89 dreamer-89 commented Feb 21, 2023

Description

This change adds average primary shard count per node constraint to both allocation and rebalancing operations and basically implements the approach 1 as mentioned in #6210

Changes

  • Adds a new dynamic setting cluster.routing.allocation.balance.prefer_primary defaults to false. This setting defines if primary shard balance is needed or not.
  • Adds a new constraint isPrimaryShardsPerIndexPerNodeBreached which breaches when a node contains more than average primary shards of an index. When breached, the node weight calculation adds 100k to the node's weight resulting in lower chances for the node to be selected as target for unassigned shard allocation or relocation from rebalancing. Pleaset note, this is a soft limit which means even if node breaches constraint it can still be selected for allocation compared to hard limit from different allocation & rebalance deciders which removes node from allocation consideration.
  • Restructure the constraint classes to consume changes to dynamic setting. When setting is enabled, constraints are applied during allocation and rebalancing
  • Copies code from [Segment Replication] Consider primary shard balancing first #6325 to move primaries first for balancing. This is essential to converge the balancing round quickly and minimize relocations.
  • Unit and integration tests
  • There are few tests which currently are flaky due to limitation called out in [Segment Replication] Allocation changes to distribute primaries - Benchmark & document improvements using opensearch-benchmark. #6210 (comment) where optimal primary shard distribution can't be attained due to SameShardAllocationDecider allocation decider.

Issues Resolved

#6210

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions

This comment was marked as outdated.

@dreamer-89 dreamer-89 force-pushed the primary_allocation_with_constraint branch from fd7d113 to df7c663 Compare February 23, 2023 06:22
@dreamer-89 dreamer-89 changed the title [Segment Replication] Average primary shard count per index constraint for allocation and rebalancing [Segment Replication] Allocation and rebalancing based on average primary shard count per index Feb 23, 2023
@github-actions

This comment was marked as outdated.

@dreamer-89 dreamer-89 marked this pull request as draft February 23, 2023 07:15
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
@dreamer-89 dreamer-89 force-pushed the primary_allocation_with_constraint branch from 5771038 to b28fe2d Compare March 1, 2023 23:55
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Copy link
Member

@mch2 mch2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @dreamer-89.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2023

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member Author

dreamer-89 commented Mar 2, 2023

@shwetathareja @nknize @Bukhtawar @vigyasharma : Need help with the review. CC @anasalkouz @mch2

As this is core change, it will be good to have one more review. @shwetathareja @nknize @Bukhtawar @vigyasharma ping for review. Will wait for couple of days for review.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled

CHANGELOG.md Outdated Show resolved Hide resolved
Signed-off-by: Suraj Singh <surajrider@gmail.com>
@dreamer-89
Copy link
Member Author

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled

Tracked in #5957

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2023

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member Author

Merging the changes now.
@shwetathareja @nknize @Bukhtawar @vigyasharma : Please feel free to comment if you have any questions/concerns.

@dreamer-89 dreamer-89 merged commit f61402a into opensearch-project:main Mar 3, 2023
@dreamer-89 dreamer-89 added the backport 2.x Backport to 2.x branch label Mar 3, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 3, 2023
…mary shard count per index (#6422)

* [Segment Replication] Move primary shard first during rebalancing

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add primary weight constraint

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add average primary shard count constraint for allocation and rebalancing operation

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Spotless fix and javadocs

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add failing tests

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Fix unit tests

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add unit test for nodes breaching multiple constraints

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add comments and update unit test

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Remove auto-expand replicas integration test

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Update comments for tests

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add changelog entry

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Remove extra changelog entry

Signed-off-by: Suraj Singh <surajrider@gmail.com>

---------

Signed-off-by: Suraj Singh <surajrider@gmail.com>
(cherry picked from commit f61402a)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@nknize
Copy link
Collaborator

nknize commented Mar 4, 2023

No concerns, here. I haven't had a chance to dig deep on this but I did give it a quick once over. I like where this is going and great suggestion with the use of constraints.

I think merging is fine so we can let it bake. I'll be giving another pass when I get back to teasing out the common server components.

dreamer-89 pushed a commit that referenced this pull request Mar 4, 2023
…mary shard count per index (#6422) (#6541)

* [Segment Replication] Move primary shard first during rebalancing



* Address review comments



* Add primary weight constraint



* Add average primary shard count constraint for allocation and rebalancing operation



* Spotless fix and javadocs



* Add failing tests



* Fix unit tests



* Add unit test for nodes breaching multiple constraints



* Add comments and update unit test



* Address review comments



* Remove auto-expand replicas integration test



* Address review comments



* Update comments for tests



* Add changelog entry



* Remove extra changelog entry



---------


(cherry picked from commit f61402a)

Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
mingshl pushed a commit to mingshl/OpenSearch-Mingshl that referenced this pull request Mar 24, 2023
…mary shard count per index (opensearch-project#6422)

* [Segment Replication] Move primary shard first during rebalancing

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add primary weight constraint

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add average primary shard count constraint for allocation and rebalancing operation

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Spotless fix and javadocs

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add failing tests

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Fix unit tests

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add unit test for nodes breaching multiple constraints

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add comments and update unit test

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Remove auto-expand replicas integration test

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Update comments for tests

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add changelog entry

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Remove extra changelog entry

Signed-off-by: Suraj Singh <surajrider@gmail.com>

---------

Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Mingshi Liu <mingshl@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch v2.7.0
Projects
None yet
5 participants