Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Prioritize replica shard movement during shard relocation #8875

Merged
merged 6 commits into from Aug 5, 2023

Conversation

Poojita-Raj
Copy link
Contributor

@Poojita-Raj Poojita-Raj commented Jul 25, 2023

Description

When some node or set of nodes is excluded, the shards are moved away in random order. When segment replication is enabled for a cluster, we might end up in a mixed version state where replicas will be on lower version and unable to read segments sent from higher version primaries and fail.

To avoid this, we could prioritize replica shard movement to avoid entering this situation.

Adding a new setting called shard movement strategy - SHARD_MOVEMENT_STRATEGY_SETTING - that will allow us to specify in which order we want to move our shards: NO_PREFERENCE (default), PRIMARY_FIRST or REPLICA_FIRST.

The PRIMARY_FIRST option will perform the same behavior as the previous setting SHARD_MOVE_PRIMARY_FIRST_SETTING which will be now deprecated in favor of the shard movement strategy setting.

Expected behavior:

If SHARD_MOVEMENT_STRATEGY_SETTING is changed from its default behavior to be either PRIMARY_FIRST or REPLICA_FIRST then we perform this behavior whether or not SHARD_MOVE_PRIMARY_FIRST_SETTING is enabled.

If SHARD_MOVEMENT_STRATEGY_SETTING is still at its default setting of NO_PREFERENCE and SHARD_MOVE_PRIMARY_FIRST_SETTING is enabled we move the primary shards first. This ensures that users still using this setting will not see any changes in behavior.

Reference: #1445

Parent issue: #3881

Related Issues

Resolves #8265

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testPitCreatedOnReplica
      1 org.opensearch.indices.replication.SegmentReplicationIT.classMethod

@codecov
Copy link

codecov bot commented Jul 25, 2023

Codecov Report

Merging #8875 (342b6ed) into main (5bb7fa3) will decrease coverage by 0.08%.
Report is 1 commits behind head on main.
The diff coverage is 85.00%.

@@             Coverage Diff              @@
##               main    #8875      +/-   ##
============================================
- Coverage     71.02%   70.94%   -0.08%     
+ Complexity    57286    57212      -74     
============================================
  Files          4765     4766       +1     
  Lines        270398   270454      +56     
  Branches      39546    39555       +9     
============================================
- Hits         192045   191883     -162     
- Misses        62191    62391     +200     
- Partials      16162    16180      +18     
Files Changed Coverage Δ
...rg/opensearch/common/settings/ClusterSettings.java 93.18% <ø> (ø)
...nsearch/cluster/routing/ShardMovementStrategy.java 63.63% <63.63%> (ø)
...location/decider/NodeVersionAllocationDecider.java 91.83% <81.25%> (-5.23%) ⬇️
.../allocation/allocator/BalancedShardsAllocator.java 91.62% <85.71%> (+0.25%) ⬆️
...a/org/opensearch/cluster/routing/RoutingNodes.java 85.31% <87.50%> (+0.39%) ⬆️
...ain/java/org/opensearch/cluster/ClusterModule.java 100.00% <100.00%> (ø)
...ting/allocation/allocator/LocalShardsBalancer.java 85.52% <100.00%> (+0.22%) ⬆️

... and 475 files with indirect coverage changes

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermarkWithAutoReleaseEnabled

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/security-analytics.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

BUILD SUCCESSFUL in 30m 16s

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2023

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

BUILD SUCCESSFUL in 25m 34s

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git]
Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

BUILD SUCCESSFUL in 27m 1s

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=cluster.allocation_explain/10_basic/cluster shard allocation explanation test with empty request}

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2023

Gradle Check (Jenkins) Run Completed with:

@Poojita-Raj
Copy link
Contributor Author

Poojita-Raj commented Aug 5, 2023

Gradle Check (Jenkins) Run Completed with:

* **RESULT:** FAILURE ❌

* **URL:** https://build.ci.opensearch.org/job/gradle-check/21925/

* **CommitID:** [342b6ed](https://github.com/opensearch-project/OpenSearch/commit/342b6edaccd8332796b35e6656d104dc5328b8f9)
  Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
  Is the failure [a flaky test](https://github.com/opensearch-project/OpenSearch/blob/main/DEVELOPER_GUIDE.md#flaky-tests) unrelated to your change?

#5176
#9092
#9130

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.index.ShardIndexingPressureIT.testShardIndexingPressureTrackingDuringBulkWrites

@mch2 mch2 merged commit c6e4bcd into opensearch-project:main Aug 5, 2023
10 checks passed
@mch2 mch2 added the backport 2.x Backport to 2.x branch label Aug 5, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-8875-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c6e4bcd097969f00956c0f4c152540cb610e4f93
# Push it to GitHub
git push --set-upstream origin backport/backport-8875-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-8875-to-2.x.

Poojita-Raj added a commit to Poojita-Raj/OpenSearch that referenced this pull request Aug 7, 2023
…relocation (opensearch-project#8875)

* add shard movement strategy setting

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add tests

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add changelog

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* Add NodeVersionAllocationDecider check

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* refactoring

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add annotation + refactor

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

---------

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
(cherry picked from commit c6e4bcd)
tlfeng pushed a commit that referenced this pull request Aug 7, 2023
…relocation (#8875) (#9153)

When some node or set of nodes is excluded, the shards are moved away in random order. When segment replication is enabled for a cluster, we might end up in a mixed version state where replicas will be on lower version and unable to read segments sent from higher version primaries and fail.

To avoid this, we could prioritize replica shard movement to avoid entering this situation.

Adding a new setting called shard movement strategy - `SHARD_MOVEMENT_STRATEGY_SETTING` - that will allow us to specify in which order we want to move our shards: `NO_PREFERENCE` (default), `PRIMARY_FIRST` or `REPLICA_FIRST`. 

The `PRIMARY_FIRST` option will perform the same behavior as the previous setting `SHARD_MOVE_PRIMARY_FIRST_SETTING` which will be now deprecated in favor of the shard movement strategy setting. 

Expected behavior: 

If `SHARD_MOVEMENT_STRATEGY_SETTING` is changed from its default behavior to be either `PRIMARY_FIRST` or `REPLICA_FIRST` then we perform this behavior whether or not `SHARD_MOVE_PRIMARY_FIRST_SETTING` is enabled. 

If `SHARD_MOVEMENT_STRATEGY_SETTING` is still at its default setting of `NO_PREFERENCE` and `SHARD_MOVE_PRIMARY_FIRST_SETTING` is enabled we move the primary shards first. This ensures that users still using this setting will not see any changes in behavior. 

Reference: #1445

Parent issue: #3881
---------

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
(cherry picked from commit c6e4bcd)
kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
…relocation (opensearch-project#8875)

* add shard movement strategy setting

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add tests

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add changelog

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* Add NodeVersionAllocationDecider check

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* refactoring

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add annotation + refactor

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

---------

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…relocation (opensearch-project#8875)

* add shard movement strategy setting

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add tests

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add changelog

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* Add NodeVersionAllocationDecider check

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* refactoring

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add annotation + refactor

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

---------

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…relocation (opensearch-project#8875)

* add shard movement strategy setting

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add tests

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add changelog

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* Add NodeVersionAllocationDecider check

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* refactoring

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* add annotation + refactor

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

---------

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Segment Replication] Prioritize replica shard movement during shard relocation
5 participants