Skip to content

Commit

Permalink
[7.17.1] Adjust indices.recovery.max_bytes_per_sec according to exter…
Browse files Browse the repository at this point in the history
…nal settings (#83413)

* Adjust indices.recovery.max_bytes_per_sec according to external settings

Today the setting indices.recovery.max_bytes_per_sec defaults to different
values depending on the node roles, the JVM version and the system total
memory that can be detected.

The current logic to set the default value can be summarized as:

    40 MB for non-data nodes
    40 MB for data nodes that runs on a JVM version < 14
    40 MB for data nodes that have one of the data_hot, data_warm, data_content or data roles

Nodes with only data_cold and/or data_frozen roles as data roles have a
default value that depends of the available memory:

    with ≤ 4 GB of available memory, the default is 40 MB
    with more than 4 GB and less or equal to 8 GB, the default is 60 MB
    with more than 8 GB and less or equal to 16 GB, the default is 90 MB
    with more than 16 GB and less or equal to 32 GB, the default is 125 MB
    and above 32 GB, the default is 250 MB

While those defaults served us well, we want to evaluate if we can define
more appropriate defaults if Elasticsearch were to know better the limits
(or properties) of the hardware it is running on - something that Elasticsearch
cannot extract by itself but can derive from settings that are provided at startup.

This pull request introduces the following new node settings:

    node.bandwidth.recovery.network
    node.bandwidth.recovery.disk.read
    node.bandwidth.recovery.disk.write

Those settings are not dynamic and must be set before the node starts.
When they are set Elasticsearch detects the minimum available bandwidth
among the network, disk read and disk write available bandwidths and computes
a maximum bytes per seconds limit that will be a fraction of the min. available
bandwidth. By default 40% of the min. bandwidth is used but that can be
dynamically configured by an operator
(using the node.bandwidth.recovery.operator.factor setting) or by the user
directly (using a different setting node.bandwidth.recovery.factor).

The limit computed from available bandwidths is then compared to pre existing
limitations like the one set through the indices.recovery.max_bytes_per_sec setting
or the one that is computed by Elasticsearch from the node's physical memory
on dedicated cold/frozen nodes. Elasticsearch will try to use the highest possible
limit among those values, while not exceeding an overcommit ratio that is also
defined through a node setting
(see node.bandwidth.recovery.operator.factor.max_overcommit).

This overcommit ratio is here to prevent the rate limit to be set to a value that is
greater than 100 times (by default) the minimum available bandwidth.

Backport of #82819 for 7.17.1

* Add missing max overcommit factor to list of (dynamic) settings (#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit
wasn't added to the list of cluster settings and to the list of settings to
consume for updates.

Relates #82819

* Operator factor settings should have the OperatorDynamic setting property (#83359)

Relates #82819

* Add docs for node bandwith settings (#83361)

Relates #82819

* Adjust for 7.17.1

* remove draft

* remove docs/changelog/83350.yaml

Co-authored-by: David Turner <david.turner@elastic.co>
  • Loading branch information
tlrx and DaveCTurner committed Feb 9, 2022
1 parent 034de18 commit 07b9951
Show file tree
Hide file tree
Showing 6 changed files with 598 additions and 42 deletions.
6 changes: 6 additions & 0 deletions docs/changelog/82819.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 82819
summary: "Adjust `indices.recovery.max_bytes_per_sec` according to external\
\ settings"
area: Recovery
type: enhancement
issues: []
90 changes: 90 additions & 0 deletions docs/reference/modules/indices/recovery.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -109,3 +109,93 @@ executed in parallel in the target node for all recoveries. Defaults to `25`.
+
Do not increase this setting without carefully verifying that your cluster has
the resources available to handle the extra load that will result.

[discrete]
==== Recovery settings for managed services

NOTE: {cloud-only}

WARNING: This feature is available in {es} 7.17.1 and 8.0.1 onwards but is not
supported in {es} 8.0.0. As such the recovery settings for managed services should
be removed before upgrading to 8.0.0. It is possible to configure the settings in
7.17.1 and then upgrade to 8.0.1 directly.

When running {es} as a managed service, the following settings allow the
service to specify absolute maximum bandwidths for disk reads, disk writes, and
network traffic on each node, and permit you to control the maximum recovery
bandwidth on each node in terms of these absolute maximum values. They have two
effects:

1. They determine the bandwidth used for recovery if
`indices.recovery.max_bytes_per_sec` is not set, overriding the default
behaviour described above.

2. They impose a node-wide limit on recovery bandwidth which is independent of
the value of `indices.recovery.max_bytes_per_sec`.

If you do not set `indices.recovery.max_bytes_per_sec` then the maximum
recovery bandwidth is computed as a proportion of the absolute maximum
bandwidth. The computation is performed separately for read and write traffic.
The service defines the absolute maximum bandwidths for disk reads, disk
writes, and network transfers using `node.bandwidth.recovery.disk.read`,
`node.bandwidth.recovery.disk.write` and `node.bandwidth.recovery.network`
respectively, and you can set the proportion of the absolute maximum bandwidth
that may be used for recoveries by adjusting
`node.bandwidth.recovery.operator.factor.read` and
`node.bandwidth.recovery.operator.factor.write`.

If you set `indices.recovery.max_bytes_per_sec` then {es} will use its value
for the maximum recovery bandwidth, as long as this does not exceed the
node-wide limit. {es} computes the node-wide limit by multiplying the absolute
maximum bandwidths by the
`node.bandwidth.recovery.operator.factor.max_overcommit` factor. If you set
`indices.recovery.max_bytes_per_sec` in excess of the node-wide limit then the
node-wide limit takes precedence.

The service should determine values for the absolute maximum bandwidths
settings by experiment, using a recovery-like workload in which there are
several concurrent workers each processing files sequentially in chunks of
512kiB.

`node.bandwidth.recovery.disk.read`::
(<<byte-units,byte value>> per second) The absolute maximum disk read speed for
a recovery-like workload on the node. If set,
`node.bandwidth.recovery.disk.write` and `node.bandwidth.recovery.network` must
also be set.

`node.bandwidth.recovery.disk.write`::
(<<byte-units,byte value>> per second) The absolute maximum disk write speed
for a recovery-like workload on the node. If set,
`node.bandwidth.recovery.disk.read` and `node.bandwidth.recovery.network` must
also be set.

`node.bandwidth.recovery.network`::
(<<byte-units,byte value>> per second) The absolute maximum network throughput
for a recovery-like workload on the node, which applies to both reads and
writes. If set, `node.bandwidth.recovery.disk.read` and
`node.bandwidth.recovery.disk.write` must also be set.

`node.bandwidth.recovery.operator.factor.read`::
(float) The proportion of the maximum read
bandwidth that may be used for recoveries if `indices.recovery.max_bytes_per_sec`
is not set. Must be greater than `0` and not greater than `1`. If not set, the
value of `node.bandwidth.recovery.operator.factor` is used. If no factor
settings are set then the value `0.4` is used.

`node.bandwidth.recovery.operator.factor.write`::
(float) The proportion of the maximum
write bandwidth that may be used for recoveries if `indices.recovery.max_bytes_per_sec`
is not set. Must be greater than `0` and not greater than `1`. If not set, the
value of `node.bandwidth.recovery.operator.factor` is used. If no factor
settings are set then the value `0.4` is used.

`node.bandwidth.recovery.operator.factor`::
(float) The proportion of the maximum
bandwidth that may be used for recoveries if neither
`indices.recovery.max_bytes_per_sec` nor any other factor settings are set.
Must be greater than `0` and not greater than `1`. Defaults to `0.4`.

`node.bandwidth.recovery.operator.factor.max_overcommit`::
(float) The proportion of the absolute
maximum bandwidth that may be used for recoveries regardless of any other
settings. Must be greater than `0`. Defaults to `100`.
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,13 @@ public void apply(Settings value, Settings current, Settings previous) {
RecoverySettings.INDICES_RECOVERY_USE_SNAPSHOTS_SETTING,
RecoverySettings.INDICES_RECOVERY_MAX_CONCURRENT_SNAPSHOT_FILE_DOWNLOADS,
RecoverySettings.INDICES_RECOVERY_MAX_CONCURRENT_SNAPSHOT_FILE_DOWNLOADS_PER_NODE,
RecoverySettings.NODE_BANDWIDTH_RECOVERY_OPERATOR_FACTOR_SETTING,
RecoverySettings.NODE_BANDWIDTH_RECOVERY_OPERATOR_FACTOR_READ_SETTING,
RecoverySettings.NODE_BANDWIDTH_RECOVERY_OPERATOR_FACTOR_WRITE_SETTING,
RecoverySettings.NODE_BANDWIDTH_RECOVERY_OPERATOR_FACTOR_MAX_OVERCOMMIT_SETTING,
RecoverySettings.NODE_BANDWIDTH_RECOVERY_DISK_WRITE_SETTING,
RecoverySettings.NODE_BANDWIDTH_RECOVERY_DISK_READ_SETTING,
RecoverySettings.NODE_BANDWIDTH_RECOVERY_NETWORK_SETTING,
ThrottlingAllocationDecider.CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING,
ThrottlingAllocationDecider.CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_INCOMING_RECOVERIES_SETTING,
ThrottlingAllocationDecider.CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_OUTGOING_RECOVERIES_SETTING,
Expand Down
51 changes: 29 additions & 22 deletions server/src/main/java/org/elasticsearch/common/settings/Setting.java
Original file line number Diff line number Diff line change
Expand Up @@ -1941,28 +1941,35 @@ public static Setting<Double> doubleSetting(String key, double defaultValue, dou
}

public static Setting<Double> doubleSetting(String key, double defaultValue, double minValue, double maxValue, Property... properties) {
return new Setting<>(key, (s) -> Double.toString(defaultValue), (s) -> {
final double d = Double.parseDouble(s);
if (d < minValue) {
String err = "Failed to parse value"
+ (isFiltered(properties) ? "" : " [" + s + "]")
+ " for setting ["
+ key
+ "] must be >= "
+ minValue;
throw new IllegalArgumentException(err);
}
if (d > maxValue) {
String err = "Failed to parse value"
+ (isFiltered(properties) ? "" : " [" + s + "]")
+ " for setting ["
+ key
+ "] must be <= "
+ maxValue;
throw new IllegalArgumentException(err);
}
return d;
}, properties);
return new Setting<>(
key,
(s) -> Double.toString(defaultValue),
(s) -> parseDouble(s, minValue, maxValue, key, properties),
properties
);
}

public static Double parseDouble(String s, double minValue, double maxValue, String key, Property... properties) {
final double d = Double.parseDouble(s);
if (d < minValue) {
String err = "Failed to parse value"
+ (isFiltered(properties) ? "" : " [" + s + "]")
+ " for setting ["
+ key
+ "] must be >= "
+ minValue;
throw new IllegalArgumentException(err);
}
if (d > maxValue) {
String err = "Failed to parse value"
+ (isFiltered(properties) ? "" : " [" + s + "]")
+ " for setting ["
+ key
+ "] must be <= "
+ maxValue;
throw new IllegalArgumentException(err);
}
return d;
}

@Override
Expand Down

0 comments on commit 07b9951

Please sign in to comment.