Add cluster-wide shard limit warnings #34021

gwbrown · 2018-09-25T00:14:27Z

In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This is intended to prevent operations which may
unintentionally destabilize the cluster.

This limit is configurable, and checked on operations which create or
open shards and issue a warning if the operation would take the
cluster over the configured limit.

There is an option to enable strict enforcement of the limit, which
turns the warnings into errors. In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.

This PR will be followed by a 7.0-only PR which removes the enforcement
option and deprecation warnings and always enforces the limit.

Relates to #20705.

This is take 2 of #32856

In a future major version, we will be introducing a soft limit on the number of shards in a cluster based on the number of nodes in the cluster. This limit will be configurable, and checked on operations which create or open shards and issue a warning if the operation would take the cluster over the limit. There is an option to enable strict enforcement of the limit, which turns the warnings into errors. In a future release, the option will be removed and strict enforcement will be the default (and only) behavior.

elasticmachine · 2018-09-25T00:14:29Z

Pinging @elastic/es-core-infra

colings86

I left some documentation comments. The code seems good to me but it would be worth someone more familiar with this area of the code taking a look

colings86 · 2018-09-25T08:22:24Z

docs/reference/modules/cluster/misc.asciidoc

+NOTE: `cluster.shards.enforce_max_per_node` cannot be set to `false`, as this
+setting will be removed in 7.0 and the limit will always be enforced. To return
+to the default behavior for your Elasticsearch version, set this setting to
+`"default"`.


Do we need to have a "default" value for the setting here? There are two reasons why I am concerned about having this:

It means the setting accepts values of different types (boolean and String) which we have tried to avoid and remove instances of in other APIs

Users who set the setting to "default" explicitly are going to need to make a subsequent change to their settings in 8.0 (I presume?) to remove the setting which will no longer be valid

Instead could we maybe have the default behaviour enabled if the setting is not set, meaning that users who want to maintain the default behaviour through the version changes don't end up defining this setting and so don't need to make any setting changes at all?

I did this based on a conversation with @jasontedor a while ago, where he made very similar comments, but I think I misunderstood what he was suggesting at the time. I'll reevaluate this setting and change it as appropriate.

Just talked with @jasontedor again - this setting is going to go away and become a system property, which can only be unset or true.

colings86 · 2018-09-25T09:29:55Z

docs/reference/modules/cluster/misc.asciidoc

+
+==== Cluster Shard Limit
+
+In a Elasticsearch 7.0 and later, there will be a soft cap on the number of


Should we call it a "soft limit" to be in line with the terminology on similar settings elsewhere?

Yes, I'll change all the instances of "cap" to "limit" - thanks!

colings86 · 2018-09-25T09:32:00Z

docs/reference/modules/cluster/misc.asciidoc

+If the cluster is already over the cap, due to changes in node membership or
+setting changes, all operations that create or open indices will issue warnings
+or fail until either the cap is increased as described below, or some indices
+are closed or deleted to bring the number of shards below the cap.


As above I wonder if we should use "limit" instead of "cap"?

jasontedor · 2018-09-27T19:24:32Z

@jasontedor Please review this. 🙏

jasontedor

This is looking good, I left some comments, mostly about changing the keys.

jasontedor · 2018-09-29T12:32:10Z

docs/reference/modules/cluster/misc.asciidoc

+In a Elasticsearch 7.0 and later, there will be a soft limit on the number of
+shards in a cluster, based on the number of nodes in the cluster.  This is
+intended to prevent operations which may unintentionally destabilize the
+cluster. Until 7.0, actions which would result in the cluster going over the


Until -> Prior to

jasontedor · 2018-09-29T12:32:37Z

docs/reference/modules/cluster/misc.asciidoc

+cluster. Until 7.0, actions which would result in the cluster going over the
+limit will issue a deprecation warning.
+
+NOTE: You can set the system property `es.enforce.shard_limit` to `true` to opt


I don't think we need to namespace this under enforce, so es.enforce.shard_limit -> es.enforce_shard_limit. And perhaps it should more closely reflect the name of the setting, so es.enforce.shard_limit -> es.enforce_max_shards_per_node.

jasontedor · 2018-09-29T12:36:30Z

docs/reference/modules/cluster/misc.asciidoc

+If the cluster is already over the limit, due to changes in node membership or
+setting changes, all operations that create or open indices will issue warnings
+until either the limit is increased as described below, or some indices
+are closed or deleted to bring the number of shards below the limit.


Would you link to the sections of the documentation relevant to closing, and separately deleting an index?

jasontedor · 2018-09-29T12:41:48Z

docs/reference/modules/cluster/misc.asciidoc

+The limit defaults to 1,000 shards per node, and be dynamically adjusted using
+the following property:
+
+`cluster.shards.max_per_node`::


I am doubting whether this needs to be in a shards namespace. How about cluster.shards.max_per_node -> cluster.max_shards_per_node.

jasontedor · 2018-09-29T12:42:13Z

docs/reference/migration/migrate_7_0/cluster.asciidoc

+
+==== Cluster-wide shard soft limit
+Clusters now have soft limits on the total number of open shards in the cluster
+based on the number of nodes and the `cluster.shards.max_per_node` cluster


How about cluster.shards.max_per_node -> cluster.max_shards_per_node`.

server/src/main/java/org/elasticsearch/indices/IndicesService.java

jasontedor · 2018-09-29T12:50:34Z

server/src/main/java/org/elasticsearch/indices/IndicesService.java

@@ -156,6 +158,20 @@
    public static final String INDICES_SHARDS_CLOSED_TIMEOUT = "indices.shards_closed_timeout";
    public static final Setting<TimeValue> INDICES_CACHE_CLEAN_INTERVAL_SETTING =
        Setting.positiveTimeSetting("indices.cache.cleanup_interval", TimeValue.timeValueMinutes(1), Property.NodeScope);
+    private static final boolean ENFORCE_SHARD_LIMIT;
+    static {
+        final String ENFORCE_SHARD_LIMIT_KEY = "es.enforce.shard_limit";


es.enforce_shard_limit -> es.enforce_max_shards_per_node.

jasontedor · 2018-09-29T12:50:49Z

server/src/main/java/org/elasticsearch/indices/IndicesService.java

+    private static final boolean ENFORCE_SHARD_LIMIT;
+    static {
+        final String ENFORCE_SHARD_LIMIT_KEY = "es.enforce.shard_limit";
+        final String enforceShardLimitSetting = System.getProperty(ENFORCE_SHARD_LIMIT_KEY);


enforceShardLimitSetting -> enforceMaxShardsPerNode

jasontedor · 2018-09-29T12:51:07Z

server/src/main/java/org/elasticsearch/indices/IndicesService.java

@@ -156,6 +158,20 @@
    public static final String INDICES_SHARDS_CLOSED_TIMEOUT = "indices.shards_closed_timeout";
    public static final Setting<TimeValue> INDICES_CACHE_CLEAN_INTERVAL_SETTING =
        Setting.positiveTimeSetting("indices.cache.cleanup_interval", TimeValue.timeValueMinutes(1), Property.NodeScope);
+    private static final boolean ENFORCE_SHARD_LIMIT;


ENFORCE_SHARD_LIMIT -> ENFORCE_MAX_SHARDS_PER_NODE.

jasontedor · 2018-09-29T12:51:50Z

server/src/main/java/org/elasticsearch/indices/IndicesService.java

+    /**
+     * Checks to see if an operation can be performed without taking the cluster
+     * over the cluster-wide shard limit. Adds a deprecation warning or returns
+     * an error message as appropriate


You don't have to wrap these so narrowly, we can use the full 140-column line length here.

gwbrown · 2018-10-01T22:03:32Z

Thanks! I've addressed your comments, can you re-review @jasontedor?

jasontedor

LGTM.

gwbrown · 2018-10-23T22:35:20Z

Thanks for the reviews!

In a future major version, we will be introducing a soft limit on the number of shards in a cluster based on the number of nodes in the cluster. This limit will be configurable, and checked on operations which create or open shards and issue a warning if the operation would take the cluster over the limit. There is an option to enable strict enforcement of the limit, which turns the warnings into errors. In a future release, the option will be removed and strict enforcement will be the default (and only) behavior.

Frozen indices (partial searchable snapshots) require less heap per shard and the limit can therefore be raised for those. We pick 3000 frozen shards per frozen data node, since we think 2000 is reasonable to use in production. Relates #71042 and #34021

Frozen indices (partial searchable snapshots) require less heap per shard and the limit can therefore be raised for those. We pick 3000 frozen shards per frozen data node, since we think 2000 is reasonable to use in production. Relates elastic#71042 and elastic#34021

Frozen indices (partial searchable snapshots) require less heap per shard and the limit can therefore be raised for those. We pick 3000 frozen shards per frozen data node, since we think 2000 is reasonable to use in production. Relates #71042 and #34021 Includes #71781 and #71777

gwbrown added >enhancement review :Data Management/Indices APIs APIs to create and manage indices and templates v7.0.0 v6.5.0 labels Sep 25, 2018

gwbrown requested review from colings86 and jasontedor September 25, 2018 00:14

colings86 reviewed Sep 25, 2018

View reviewed changes

gwbrown added 2 commits September 25, 2018 10:08

Change "cap" to "limit" as suggested in review

e8a762a

Replace enforcement setting with system property

1f33086

jasontedor requested changes Sep 29, 2018

View reviewed changes

gwbrown added 3 commits October 1, 2018 09:29

Merge branch 'master' into shardlimit/warning-and-enforcement

92dec5d

Review changes

ff0e4a6

More review changes

9aed58a

rjernst removed the review label Oct 10, 2018

gwbrown added 5 commits October 12, 2018 11:11

Merge branch 'master' into shardlimit/warning-and-enforcement

499a987

Merge branch 'master' into shardlimit/warning-and-enforcement

87d35e3

Merge branch 'master' into shardlimit/warning-and-enforcement

51605b1

Construct DeprecationLogger in tests as needed.

fe02b7b

Merge branch 'master' into shardlimit/warning-and-enforcement

0d22857

jasontedor approved these changes Oct 23, 2018

View reviewed changes

Merge branch 'master' into shardlimit/warning-and-enforcement

14034d7

gwbrown merged commit da20dfd into elastic:master Oct 23, 2018

gwbrown added the backport pending label Oct 23, 2018

gwbrown mentioned this pull request Oct 23, 2018

[Backport] Add cluster-wide shard limit warnings (#34021) #34783

Merged

gwbrown removed the backport pending label Oct 24, 2018

gwbrown mentioned this pull request Oct 24, 2018

Hard limit on total number of shards in a cluster #20705

Closed

majormoses mentioned this pull request Oct 25, 2018

New check for number of shards for the whole cluster sensu-plugins/sensu-plugins-elasticsearch#107

Open

gwbrown mentioned this pull request Oct 25, 2018

Always enforce cluster-wide shard limit #34892

Merged

tlrx mentioned this pull request Nov 27, 2018

[Docs] Add a warning to the default max number of shards allowed in a cluster #35943

Closed

gwbrown deleted the shardlimit/warning-and-enforcement branch December 7, 2018 04:58

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

lukas-vlcek mentioned this pull request Feb 20, 2019

Expose cluster-wide shard limit metric vvanholl/elasticsearch-prometheus-exporter#163

Closed

DaveCTurner mentioned this pull request May 14, 2019

Node-level cluster.max_shards_per_node has no effect #42137

Closed

This was referenced Jul 19, 2019

Validate index settings after applying templates #44612

Closed

Check shard limit after applying index templates #44619

Merged

henningandersen mentioned this pull request Apr 7, 2021

Introduce separate shard limit for frozen shards #71392

Merged

henningandersen mentioned this pull request Apr 15, 2021

Introduce separate shard limit for frozen shards (#71392) #71760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cluster-wide shard limit warnings #34021

Add cluster-wide shard limit warnings #34021

gwbrown commented Sep 25, 2018

elasticmachine commented Sep 25, 2018

colings86 left a comment

colings86 Sep 25, 2018

gwbrown Sep 25, 2018

gwbrown Sep 26, 2018

colings86 Sep 25, 2018

gwbrown Sep 25, 2018

colings86 Sep 25, 2018

jasontedor commented Sep 27, 2018

jasontedor left a comment

jasontedor Sep 29, 2018

jasontedor Sep 29, 2018

jasontedor Sep 29, 2018

jasontedor Sep 29, 2018

jasontedor Sep 29, 2018

jasontedor Sep 29, 2018

jasontedor Sep 29, 2018

jasontedor Sep 29, 2018

jasontedor Sep 29, 2018

gwbrown commented Oct 1, 2018

jasontedor left a comment

gwbrown commented Oct 23, 2018


		==== Cluster Shard Limit

		In a Elasticsearch 7.0 and later, there will be a soft cap on the number of

Add cluster-wide shard limit warnings #34021

Add cluster-wide shard limit warnings #34021

Conversation

gwbrown commented Sep 25, 2018

elasticmachine commented Sep 25, 2018

colings86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasontedor commented Sep 27, 2018

jasontedor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gwbrown commented Oct 1, 2018

jasontedor left a comment

Choose a reason for hiding this comment

gwbrown commented Oct 23, 2018