Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster-wide shard limit #32856

Closed
wants to merge 12 commits into from
Closed

Conversation

gwbrown
Copy link
Contributor

@gwbrown gwbrown commented Aug 14, 2018

Adds a safety limit on the number of shards in a cluster, based on
the number of nodes in the cluster. The limit is checked on operations
that add (or activate) shards, such as index creation, snapshot
restoration, and opening closed indices, and can be changed via the
cluster settings API.

Closes #20705

Adds a safety limit on the number of shards in a cluster, based on
the number of nodes in the cluster. The limit is checked on operations
that add (or activate) shards, such as index creation, snapshot
restoration, and opening closed indices, and can be changed via the
cluster settings API.

Closes elastic#20705
Based on review feedback. Either can be used to set the per-node shard
limit, so let's verify both.
During cluster startup, a cluster may consist only of master, non-data
nodes. In this case, we want to allow the user to configure the cluster
until the data nodes come online.
@@ -127,6 +127,9 @@

}

public static final Setting<Integer> SETTING_CLUSTER_MAX_SHARDS_PER_NODE =
Setting.intSetting("cluster.shards.max_per_node", 1000, Property.Dynamic, Property.NodeScope);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set a minimum for this setting of 1 shard per node? (so that people don't set it to -171 and expect weird things)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a higher minimum is warranted -- e.g., to ensure if we are setting up a new cluster we can create a .kibana index and so on?

Copy link
Member

@jasontedor jasontedor Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default here is a 1000 so we will be fine out of the box. The question here is the minimum and there is not a good value as there are many indices that might be created (.kibana, .security, Watcher, .monitoring, etc.). It is too hard to find the right minimum to ensure the basics of our stack function and to keep this value properly maintained. If someone really does want to set the value to one shard per node, I think we should permit that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a minimum of 1 - I agree with Jason, I think trying to figure out any other minimum would be very complicated.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start. I think the implementation here misses a critical case which is updating index settings to increase the number of replicas. For example, I think the following would be permitted with the current implementation:

  • set the limit to 1 shard per node
  • start two nodes
  • create an index i and an index j with index.number_of_replicas set to zero, and the default number of shards
  • now, creating a third index will be blocked by the max shards per node limit 🎉
  • however, a settings update on i and j to increase the index.number_of_replicas to one would be permitted, yet this would put the cluster over the limit 😢

Per discussion on elastic#32856, the cluster-wide shard limit is now
enforced when changing the number of replicas used by an index.
@gwbrown
Copy link
Contributor Author

gwbrown commented Aug 15, 2018

Jason makes an excellent point - I simply forgot about that case. I've added code to handle changing the replica settings, as well as several test cases.

Additionally, following the rule of three, I've factored some shared logic out into a shared method.

@colings86 colings86 added >enhancement :Data Management/Indices APIs APIs to create and manage indices and templates v7.0.0 labels Aug 16, 2018
It appears that ActionRequestValidationException tends to be used for
more client-related purposes, and ValidationException is more
appropriate here.
@gwbrown
Copy link
Contributor Author

gwbrown commented Aug 16, 2018

@elasticmachine retest this please

@colings86
Copy link
Contributor

This might already be planned but I think we might want to add some kind of deprecation warning to 6.x to explain to the user that this breaking change is coming in 7.0 if they are over 1,000 * num_nodes shards

@gwbrown
Copy link
Contributor Author

gwbrown commented Aug 28, 2018

Per discussion with @jasontedor, I'm closing this PR, pending a new PR which implements deprecation warnings for clusters with shard counts above the default limit. This will be followed shortly by opt-in enforcement (to be backported to 6.x) and enforcement by default in 7.0+.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hard limit on total number of shards in a cluster
5 participants