Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster-wide shard limit warnings #34021

Merged
merged 12 commits into from
Oct 23, 2018

Conversation

gwbrown
Copy link
Contributor

@gwbrown gwbrown commented Sep 25, 2018

In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This is intended to prevent operations which may
unintentionally destabilize the cluster.

This limit is configurable, and checked on operations which create or
open shards and issue a warning if the operation would take the
cluster over the configured limit.

There is an option to enable strict enforcement of the limit, which
turns the warnings into errors. In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.

This PR will be followed by a 7.0-only PR which removes the enforcement
option and deprecation warnings and always enforces the limit.

Relates to #20705.

This is take 2 of #32856

In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This limit will be configurable, and checked on operations
which create or open shards and issue a warning if the operation would
take the cluster over the limit.

There is an option to enable strict enforcement of the limit, which
turns the warnings into errors.  In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some documentation comments. The code seems good to me but it would be worth someone more familiar with this area of the code taking a look

NOTE: `cluster.shards.enforce_max_per_node` cannot be set to `false`, as this
setting will be removed in 7.0 and the limit will always be enforced. To return
to the default behavior for your Elasticsearch version, set this setting to
`"default"`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have a "default" value for the setting here? There are two reasons why I am concerned about having this:

  1. It means the setting accepts values of different types (boolean and String) which we have tried to avoid and remove instances of in other APIs
  2. Users who set the setting to "default" explicitly are going to need to make a subsequent change to their settings in 8.0 (I presume?) to remove the setting which will no longer be valid

Instead could we maybe have the default behaviour enabled if the setting is not set, meaning that users who want to maintain the default behaviour through the version changes don't end up defining this setting and so don't need to make any setting changes at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this based on a conversation with @jasontedor a while ago, where he made very similar comments, but I think I misunderstood what he was suggesting at the time. I'll reevaluate this setting and change it as appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just talked with @jasontedor again - this setting is going to go away and become a system property, which can only be unset or true.


==== Cluster Shard Limit

In a Elasticsearch 7.0 and later, there will be a soft cap on the number of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it a "soft limit" to be in line with the terminology on similar settings elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll change all the instances of "cap" to "limit" - thanks!

If the cluster is already over the cap, due to changes in node membership or
setting changes, all operations that create or open indices will issue warnings
or fail until either the cap is increased as described below, or some indices
are closed or deleted to bring the number of shards below the cap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above I wonder if we should use "limit" instead of "cap"?

@jasontedor
Copy link
Member

@jasontedor Please review this. 🙏

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, I left some comments, mostly about changing the keys.

In a Elasticsearch 7.0 and later, there will be a soft limit on the number of
shards in a cluster, based on the number of nodes in the cluster. This is
intended to prevent operations which may unintentionally destabilize the
cluster. Until 7.0, actions which would result in the cluster going over the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until -> Prior to

cluster. Until 7.0, actions which would result in the cluster going over the
limit will issue a deprecation warning.

NOTE: You can set the system property `es.enforce.shard_limit` to `true` to opt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to namespace this under enforce, so es.enforce.shard_limit -> es.enforce_shard_limit. And perhaps it should more closely reflect the name of the setting, so es.enforce.shard_limit -> es.enforce_max_shards_per_node.

If the cluster is already over the limit, due to changes in node membership or
setting changes, all operations that create or open indices will issue warnings
until either the limit is increased as described below, or some indices
are closed or deleted to bring the number of shards below the limit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you link to the sections of the documentation relevant to closing, and separately deleting an index?

The limit defaults to 1,000 shards per node, and be dynamically adjusted using
the following property:

`cluster.shards.max_per_node`::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doubting whether this needs to be in a shards namespace. How about cluster.shards.max_per_node -> cluster.max_shards_per_node.


==== Cluster-wide shard soft limit
Clusters now have soft limits on the total number of open shards in the cluster
based on the number of nodes and the `cluster.shards.max_per_node` cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about cluster.shards.max_per_node -> cluster.max_shards_per_node`.

@@ -156,6 +158,20 @@
public static final String INDICES_SHARDS_CLOSED_TIMEOUT = "indices.shards_closed_timeout";
public static final Setting<TimeValue> INDICES_CACHE_CLEAN_INTERVAL_SETTING =
Setting.positiveTimeSetting("indices.cache.cleanup_interval", TimeValue.timeValueMinutes(1), Property.NodeScope);
private static final boolean ENFORCE_SHARD_LIMIT;
static {
final String ENFORCE_SHARD_LIMIT_KEY = "es.enforce.shard_limit";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

es.enforce_shard_limit -> es.enforce_max_shards_per_node.

private static final boolean ENFORCE_SHARD_LIMIT;
static {
final String ENFORCE_SHARD_LIMIT_KEY = "es.enforce.shard_limit";
final String enforceShardLimitSetting = System.getProperty(ENFORCE_SHARD_LIMIT_KEY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enforceShardLimitSetting -> enforceMaxShardsPerNode

@@ -156,6 +158,20 @@
public static final String INDICES_SHARDS_CLOSED_TIMEOUT = "indices.shards_closed_timeout";
public static final Setting<TimeValue> INDICES_CACHE_CLEAN_INTERVAL_SETTING =
Setting.positiveTimeSetting("indices.cache.cleanup_interval", TimeValue.timeValueMinutes(1), Property.NodeScope);
private static final boolean ENFORCE_SHARD_LIMIT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENFORCE_SHARD_LIMIT -> ENFORCE_MAX_SHARDS_PER_NODE.

/**
* Checks to see if an operation can be performed without taking the cluster
* over the cluster-wide shard limit. Adds a deprecation warning or returns
* an error message as appropriate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to wrap these so narrowly, we can use the full 140-column line length here.

@gwbrown
Copy link
Contributor Author

gwbrown commented Oct 1, 2018

Thanks! I've addressed your comments, can you re-review @jasontedor?

@rjernst rjernst removed the review label Oct 10, 2018
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@gwbrown gwbrown merged commit da20dfd into elastic:master Oct 23, 2018
@gwbrown
Copy link
Contributor Author

gwbrown commented Oct 23, 2018

Thanks for the reviews!

gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Oct 23, 2018
In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This limit will be configurable, and checked on operations
which create or open shards and issue a warning if the operation would
take the cluster over the limit.

There is an option to enable strict enforcement of the limit, which
turns the warnings into errors.  In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
gwbrown added a commit that referenced this pull request Oct 24, 2018
In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This limit will be configurable, and checked on operations
which create or open shards and issue a warning if the operation would
take the cluster over the limit.

There is an option to enable strict enforcement of the limit, which
turns the warnings into errors.  In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
kcm pushed a commit that referenced this pull request Oct 30, 2018
In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This limit will be configurable, and checked on operations
which create or open shards and issue a warning if the operation would
take the cluster over the limit.

There is an option to enable strict enforcement of the limit, which
turns the warnings into errors.  In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
@gwbrown gwbrown deleted the shardlimit/warning-and-enforcement branch December 7, 2018 04:58
henningandersen added a commit that referenced this pull request Apr 15, 2021
Frozen indices (partial searchable snapshots) require less heap per
shard and the limit can therefore be raised for those. We pick 3000
frozen shards per frozen data node, since we think 2000 is reasonable
to use in production.

Relates #71042 and #34021
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Apr 15, 2021
Frozen indices (partial searchable snapshots) require less heap per
shard and the limit can therefore be raised for those. We pick 3000
frozen shards per frozen data node, since we think 2000 is reasonable
to use in production.

Relates elastic#71042 and elastic#34021
henningandersen added a commit that referenced this pull request Apr 17, 2021
Frozen indices (partial searchable snapshots) require less heap per
shard and the limit can therefore be raised for those. We pick 3000
frozen shards per frozen data node, since we think 2000 is reasonable
to use in production.

Relates #71042 and #34021

Includes #71781 and #71777
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants