Clarify the difference between cluster.routing.allocation.total_shards_per_node
and cluster.max_shards_per_node
#51839
Labels
:Core/Infra/Core
Core issues without another label
>docs
General docs changes
Team:Core/Infra
Meta label for core/infra team
Team:Docs
Meta label for docs team
I've seen some confusion about the differences between
cluster.routing.allocation.total_shards_per_node
andcluster.max_shards_per_node
, which is reasonable given the similarity in their names and the brevity of their documentation. I think the docs could be clarified fairly easily with just a few additions to the docs for each setting. Below are some notes I wrote up elsewhere about the difference between these two settings - these should probably be condensed down before being included in the docs rather than pasted directly.cluster.routing.allocation.total_shards_per_node
is checked at allocation time to control how many shards will be allocated to any given node. To give an example: If this setting is set to 100, and there are 3 data nodes - A with 100 shards, B with 98 shards, and C with 1 shard, and node C dies and its shard needs to be reallocated, they can only go to node B, because node A already has 100 shards.cluster.max_shards_per_node
controls how many shards are allowed to exist in the cluster as a whole, and is checked at shard creation time, but does not pay attention to how many shards any individual node has. To give an example: If this setting is set to 100, and there are 3 data nodes A with 100 shards, B with 98 shards, and C with 98 shards, there are 296 shards total in the cluster. The limit isnumber_of_data_nodes * cluster.max_shards_per_node = 3 * 100 = 300
, so up to 4 new shards can be created in the cluster. If you try to create an index with 3 shards/1 replica, this would result in the creation of 6 total shards (3 primaries and 3 replicas), which would take the cluster over the limit of 300 and the request will be rejected.To summarize:
cluster.routing.allocation.total_shards_per_node
controls how many shards each individual node can have allocated to it at any given time, whilecluster.max_shards_per_node
controls how many shards the cluster can have in it as a whole, scaled by number of data nodes, but regardless of how many shards any particular node might have allocated to it.Some notes:
cluster.max_shards_per_node
will result in requests that would create more shards being rejected. This is to prevent the most extreme cases of oversharding.cluster.max_shards_per_node
counts both primary and replica shards for open indices. Closed indices don't count, so if you see a cluster hitting this limit, closing some indices will help. Frozen indices count as open, at time of writing, although if we get feedback that counting them as closed would be useful that's open to changing.cluster.max_shards_per_node
uses the number of data nodes to determine the limit. Non-data nodes (dedicated master, ingest-only, etc.) don't count. A cluster with no data nodes has no limit (apparently some users will set up master nodes first, create indices and such, and then add data nodes later, this is to enable that use case).The text was updated successfully, but these errors were encountered: