Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster setting for shard movement strategy #4955

Merged
merged 4 commits into from
Sep 14, 2023

Conversation

Poojita-Raj
Copy link
Contributor

Description

Adds cluster setting for shard movement strategy that was missing from the docs.

Issues Resolved

Resolves #4827

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

@cwillum cwillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR. It's a big help. I made some suggestions and included a few questions. Other than that, it looks good. I'll have another look after questions have been addressed.

@@ -102,6 +102,7 @@ The following request field parameters are compatible with the cluster API.
| cluster.routing.allocation.include.<attribute> | Enum | Allocates shards to a node whose `attribute` has at least one of the included comma-separated values. |
| cluster.routing.allocation.require.<attribute> | Enum | Only allocates shards to a node whose `attribute` has all of the included comma-separated values. |
| cluster.routing.allocation.exclude.<attribute> | Enum | Does not allocate shards to a node whose `attribute` has any of the included comma-separated values. The cluster allocation settings support the following built-in attributes: <br /> <br /> `_name` – Match nodes by node name. <br /> <br /> `_host_ip` – Match nodes by host IP address. <br /> <br /> `_publish_ip` – Match nodes by publish IP address. <br /> <br /> `_ip` – Match either `_host_ip` or `_publish_ip`. <br /> <br /> `_host` – Match nodes by hostname. <br /> <br /> `_id` – Match nodes by node ID. <br /> <br /> `_tier` – Match nodes by data tier role. |
| cluster.routing.allocation.shard_movement_strategy | Enum | Decides order in which to move shards from node when shards can not stay on node anymore. Supports the following strategies: <br /> <br /> `NO_PREFERENCE` - default behavior in which order of shard movement doesn't matter. <br /> <br /> `PRIMARY_FIRST` - primary shards are moved first. <br /> <br /> `REPLICA_FIRST` - replica shards are moved first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions that users might have from original content:

  1. Do we need to mention any performance issues in the description of this setting? Take, for example, jainankitk's comment: "The primary first behavior can impact the speed of relocation, especially where some nodes only have replicas and they won't contribute to relocation initially. Hence, the default is best effort node level primary first and can be enforced across cluster level using primary first setting." And shortly after, knize's comment: "I updated the commit message to be more descriptive of the problem this change addresses. Can we get a separate PR to add some more thorough documentation of this new cluster setting and behavior (including the performance as a function of scale?)" Do we need to explain why you would want to order primary shards before replica shards, or replica shards before primary shards, as a concern for performance?
  2. NO_PREFERENCE: When the order of shard movement doesn't matter, is there still some logic that determines how the shards are moved? Or is it random? If not random, can we still explain how the system will move the shards (what dictates the order in this scenario)?
    I made some minor suggestions for the content we have here. Thanks.
Suggested change
| cluster.routing.allocation.shard_movement_strategy | Enum | Decides order in which to move shards from node when shards can not stay on node anymore. Supports the following strategies: <br /> <br /> `NO_PREFERENCE` - default behavior in which order of shard movement doesn't matter. <br /> <br /> `PRIMARY_FIRST` - primary shards are moved first. <br /> <br /> `REPLICA_FIRST` - replica shards are moved first.
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `NO_PREFERENCE` - default behavior in which order of shard movement doesn't matter. <br /> <br /> `PRIMARY_FIRST` - primary shards are relocated first, before replica shards. <br /> <br /> `REPLICA_FIRST` - replica shards are moved first, before primary shards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. These settings were introduced to deal with specific scenarios.
  • PRIMARY_FIRST prioritizes the primary of all shards first so the cluster does not become red if any excluded nodes (based on a cluster setting) were to go down before relocating other shards.
  • REPLICA_FIRST prioritizes the replica of all shards first so the cluster does not become red in a mixed version cluster if primaries are relocated to higher version nodes and send segments on a higher OS version to replicas that they cannot read, subsequently leading to shard failure.

Note that with this setting enabled performance of this change is a direct function of number of indices, shards, replicas, and nodes. The larger the indices, replicas, and distribution scale, the slower the allocation becomes. This should be used with care. This is true for both the above settings.

  1. No logic that determines shard movement order. It is random.

We can add in some information from (1) regarding performance and use cases for the settings. I had omitted that since the table of request field parameters didn't seem the right place to add in all this information - but if it's preferred that we add it here I'm good with including some more.

Let me know and I'll make the changes including your suggestion :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Poojita-Raj Thanks for adding context here. I think including something about these two scenarios would be worthwhile. Have a look at these revisions to the setting descriptions and let me know if I misunderstood your explanations (I'm not familiar with cluster health and shard allocation. So this is highly possible):

PRIMARY_FIRST -- Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if nodes excluded from shard relocation happened to go down during the process.

  1. I'm not sure what the nodes are excluded from. Are they excluded from shard relocation? Which cluster setting are we talking about here?
  2. In the original, your mentioned "before relocating other shards". I'm not sure what these "other shards" are. Are these primary or replica shards? Are these the replica shards that would get second priority to the primaries in this configuration?

REPLICA_FIRST -- Replica shards are moved first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a cluster that includes nodes on different versions of OpenSearch. In this situation, primary shards relocated to later-version nodes could try to copy segment files to replica shards on an earlier version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters.

  1. Is this the idea here, to relocate the replica shards first and then relocate primary shards (including those on newer versions of OpenSearch) and remove the possibility of segment file read errors?

NO_PREFERENCE - The default behavior in which the order of shard relocation has no importance.

Let's include these descriptions in this order. It seems the purpose of adding this new setting is to give users an important choice to relocate shards one way or another. Let's present them with the options and then, last, tell them the default doesn't consider order.

Copy link
Contributor Author

@Poojita-Raj Poojita-Raj Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwillum Thanks for the suggestions! I'll change the order of the settings and include the descriptions for each setting.

PRIMARY_FIRST - (1) When you exclude a group of nodes from a cluster, the shards on those nodes are moved to other nodes in the cluster. cluster.routing.allocation.exclude. is the setting used for this. For the purpose of this setting, can be referred to as the relocating nodes.
(2) Yes, other shards refers to replica shards in this case that would get second priority.

REPLICA_FIRST - (1) Yes, that's exactly it.

@hdhalter hdhalter added v2.10.0 3 - Tech Review PR: Tech review in progress labels Sep 11, 2023
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Copy link
Contributor

@cwillum cwillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some last suggestions. After this, I think this update will be ready for a final editorial review.
And thank you!

@@ -102,6 +102,7 @@ The following request field parameters are compatible with the cluster API.
| cluster.routing.allocation.include.<attribute> | Enum | Allocates shards to a node whose `attribute` has at least one of the included comma-separated values. |
| cluster.routing.allocation.require.<attribute> | Enum | Only allocates shards to a node whose `attribute` has all of the included comma-separated values. |
| cluster.routing.allocation.exclude.<attribute> | Enum | Does not allocate shards to a node whose `attribute` has any of the included comma-separated values. The cluster allocation settings support the following built-in attributes: <br /> <br /> `_name` – Match nodes by node name. <br /> <br /> `_host_ip` – Match nodes by host IP address. <br /> <br /> `_publish_ip` – Match nodes by publish IP address. <br /> <br /> `_ip` – Match either `_host_ip` or `_publish_ip`. <br /> <br /> `_host` – Match nodes by hostname. <br /> <br /> `_id` – Match nodes by node ID. <br /> <br /> `_tier` – Match nodes by data tier role. |
| cluster.routing.allocation.shard_movement_strategy | Enum | Decides order in which to move shards from node when shards can not stay on node anymore. Supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed version, segment replication enabled OpenSearch cluster. In this situation, primary shards relocated to newer OpenSearch version nodes could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| cluster.routing.allocation.shard_movement_strategy | Enum | Decides order in which to move shards from node when shards can not stay on node anymore. Supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed version, segment replication enabled OpenSearch cluster. In this situation, primary shards relocated to newer OpenSearch version nodes could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance.
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if the relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed-version, segment-replication-enabled OpenSearch cluster. In this situation, primary shards relocated to OpenSearch nodes of a newer version could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance.

@cwillum cwillum added 5 - Final Editorial Review PR: Editorial Review in progress and removed 3 - Tech Review PR: Tech review in progress labels Sep 12, 2023
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Poojita-Raj @cwillum Please see my comments and changes (the changes are just changing the hyphens to en dashes and an added "the") and let me know if you have any questions. Thanks!

@@ -102,6 +102,7 @@ The following request field parameters are compatible with the cluster API.
| cluster.routing.allocation.include.<attribute> | Enum | Allocates shards to a node whose `attribute` has at least one of the included comma-separated values. |
| cluster.routing.allocation.require.<attribute> | Enum | Only allocates shards to a node whose `attribute` has all of the included comma-separated values. |
| cluster.routing.allocation.exclude.<attribute> | Enum | Does not allocate shards to a node whose `attribute` has any of the included comma-separated values. The cluster allocation settings support the following built-in attributes: <br /> <br /> `_name` – Match nodes by node name. <br /> <br /> `_host_ip` – Match nodes by host IP address. <br /> <br /> `_publish_ip` – Match nodes by publish IP address. <br /> <br /> `_ip` – Match either `_host_ip` or `_publish_ip`. <br /> <br /> `_host` – Match nodes by hostname. <br /> <br /> `_id` – Match nodes by node ID. <br /> <br /> `_tier` – Match nodes by data tier role. |
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if the relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed-version, segment-replication-enabled OpenSearch cluster. In this situation, primary shards relocated to OpenSearch nodes of a newer version could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if the relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed-version, segment-replication-enabled OpenSearch cluster. In this situation, primary shards relocated to OpenSearch nodes of a newer version could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance.
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `PRIMARY_FIRST` -- Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if the relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` -- Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed-version, segment-replication-enabled OpenSearch cluster. In this situation, primary shards relocated to OpenSearch nodes of a newer version could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` -- The default behavior in which the order of shard relocation has no importance.

@@ -102,6 +102,7 @@ The following request field parameters are compatible with the cluster API.
| cluster.routing.allocation.include.<attribute> | Enum | Allocates shards to a node whose `attribute` has at least one of the included comma-separated values. |
| cluster.routing.allocation.require.<attribute> | Enum | Only allocates shards to a node whose `attribute` has all of the included comma-separated values. |
| cluster.routing.allocation.exclude.<attribute> | Enum | Does not allocate shards to a node whose `attribute` has any of the included comma-separated values. The cluster allocation settings support the following built-in attributes: <br /> <br /> `_name` – Match nodes by node name. <br /> <br /> `_host_ip` – Match nodes by host IP address. <br /> <br /> `_publish_ip` – Match nodes by publish IP address. <br /> <br /> `_ip` – Match either `_host_ip` or `_publish_ip`. <br /> <br /> `_host` – Match nodes by hostname. <br /> <br /> `_id` – Match nodes by node ID. <br /> <br /> `_tier` – Match nodes by data tier role. |
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if the relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed-version, segment-replication-enabled OpenSearch cluster. In this situation, primary shards relocated to OpenSearch nodes of a newer version could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under PRIMARY_FIRST: "This prioritization may help prevent a cluster's health status from going red if the relocating nodes fail during the process"?

Under REPLICA_FIRST: "a cluster's health status"? Also, the last sentence of this description is slightly awkward. Do we mean something like "Relocating replica shards first may help to avoid this in multi-version clusters"?

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
@cwillum cwillum added release-notes PR: Include this PR in the automated release notes and removed 5 - Final Editorial Review PR: Editorial Review in progress labels Sep 14, 2023
@cwillum
Copy link
Contributor

cwillum commented Sep 14, 2023

@Poojita-Raj Thanks for taking care of the final suggested changes. This looks good. I've merged the PR. Your contribution and efforts on this issue are appreciated.

@cwillum cwillum merged commit fa35a82 into opensearch-project:main Sep 14, 2023
4 checks passed
@Poojita-Raj
Copy link
Contributor Author

@cwillum Thanks Chris, happy to help!

vagimeli pushed a commit that referenced this pull request Sep 19, 2023
* Add cluster setting for shard movement strategy

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* changes to doc

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* added suggestion

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* changes to en-dash

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

---------

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
vagimeli added a commit that referenced this pull request Sep 19, 2023
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
)

* Add cluster setting for shard movement strategy

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* changes to doc

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* added suggestion

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* changes to en-dash

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

---------

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
vagimeli pushed a commit that referenced this pull request Dec 21, 2023
* Add cluster setting for shard movement strategy

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* changes to doc

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* added suggestion

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* changes to en-dash

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

---------

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes PR: Include this PR in the automated release notes v2.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Add documentation for cluster routing allocation settings
4 participants