[7.17.1] Adjust indices.recovery.max_bytes_per_sec according to exter…

…nal settings (#83413) * Adjust indices.recovery.max_bytes_per_sec according to external settings Today the setting indices.recovery.max_bytes_per_sec defaults to different values depending on the node roles, the JVM version and the system total memory that can be detected. The current logic to set the default value can be summarized as: 40 MB for non-data nodes 40 MB for data nodes that runs on a JVM version < 14 40 MB for data nodes that have one of the data_hot, data_warm, data_content or data roles Nodes with only data_cold and/or data_frozen roles as data roles have a default value that depends of the available memory: with ≤ 4 GB of available memory, the default is 40 MB with more than 4 GB and less or equal to 8 GB, the default is 60 MB with more than 8 GB and less or equal to 16 GB, the default is 90 MB with more than 16 GB and less or equal to 32 GB, the default is 125 MB and above 32 GB, the default is 250 MB While those defaults served us well, we want to evaluate if we can define more appropriate defaults if Elasticsearch were to know better the limits (or properties) of the hardware it is running on - something that Elasticsearch cannot extract by itself but can derive from settings that are provided at startup. This pull request introduces the following new node settings: node.bandwidth.recovery.network node.bandwidth.recovery.disk.read node.bandwidth.recovery.disk.write Those settings are not dynamic and must be set before the node starts. When they are set Elasticsearch detects the minimum available bandwidth among the network, disk read and disk write available bandwidths and computes a maximum bytes per seconds limit that will be a fraction of the min. available bandwidth. By default 40% of the min. bandwidth is used but that can be dynamically configured by an operator (using the node.bandwidth.recovery.operator.factor setting) or by the user directly (using a different setting node.bandwidth.recovery.factor). The limit computed from available bandwidths is then compared to pre existing limitations like the one set through the indices.recovery.max_bytes_per_sec setting or the one that is computed by Elasticsearch from the node's physical memory on dedicated cold/frozen nodes. Elasticsearch will try to use the highest possible limit among those values, while not exceeding an overcommit ratio that is also defined through a node setting (see node.bandwidth.recovery.operator.factor.max_overcommit). This overcommit ratio is here to prevent the rate limit to be set to a value that is greater than 100 times (by default) the minimum available bandwidth. Backport of #82819 for 7.17.1 * Add missing max overcommit factor to list of (dynamic) settings (#83350) The setting node.bandwidth.recovery.operator.factor.max_overcommit wasn't added to the list of cluster settings and to the list of settings to consume for updates. Relates #82819 * Operator factor settings should have the OperatorDynamic setting property (#83359) Relates #82819 * Add docs for node bandwith settings (#83361) Relates #82819 * Adjust for 7.17.1 * remove draft * remove docs/changelog/83350.yaml Co-authored-by: David Turner <david.turner@elastic.co>
elastic · Feb 9, 2022 · 07b9951 · 07b9951
1 parent 034de18
commit 07b9951
Show file tree

Hide file tree

Showing 6 changed files with 598 additions and 42 deletions.
diff --git a/docs/changelog/82819.yaml b/docs/changelog/82819.yaml
@@ -0,0 +1,6 @@
+pr: 82819
+summary: "Adjust `indices.recovery.max_bytes_per_sec` according to external\
+  \ settings"
+area: Recovery
+type: enhancement
+issues: []
diff --git a/docs/reference/modules/indices/recovery.asciidoc b/docs/reference/modules/indices/recovery.asciidoc
@@ -109,3 +109,93 @@ executed in parallel in the target node for all recoveries. Defaults to `25`.
 +
 Do not increase this setting without carefully verifying that your cluster has
 the resources available to handle the extra load that will result.
+
+[discrete]
+==== Recovery settings for managed services
+
+NOTE: {cloud-only}
+
+WARNING: This feature is available in {es} 7.17.1 and 8.0.1 onwards but is not
+supported in {es} 8.0.0. As such the recovery settings for managed services should
+be removed before upgrading to 8.0.0. It is possible to configure the settings in
+7.17.1 and then upgrade to 8.0.1 directly.
+
+When running {es} as a managed service, the following settings allow the
+service to specify absolute maximum bandwidths for disk reads, disk writes, and
+network traffic on each node, and permit you to control the maximum recovery
+bandwidth on each node in terms of these absolute maximum values. They have two
+effects:
+
+1. They determine the bandwidth used for recovery if
+`indices.recovery.max_bytes_per_sec` is not set, overriding the default
+behaviour described above.
+
+2. They impose a node-wide limit on recovery bandwidth which is independent of
+the value of `indices.recovery.max_bytes_per_sec`.
+
+If you do not set `indices.recovery.max_bytes_per_sec` then the maximum
+recovery bandwidth is computed as a proportion of the absolute maximum
+bandwidth. The computation is performed separately for read and write traffic.
+The service defines the absolute maximum bandwidths for disk reads, disk
+writes, and network transfers using `node.bandwidth.recovery.disk.read`,
+`node.bandwidth.recovery.disk.write` and `node.bandwidth.recovery.network`
+respectively, and you can set the proportion of the absolute maximum bandwidth
+that may be used for recoveries by adjusting
+`node.bandwidth.recovery.operator.factor.read` and
+`node.bandwidth.recovery.operator.factor.write`.
+
+If you set `indices.recovery.max_bytes_per_sec` then {es} will use its value
+for the maximum recovery bandwidth, as long as this does not exceed the
+node-wide limit. {es} computes the node-wide limit by multiplying the absolute
+maximum bandwidths by the
+`node.bandwidth.recovery.operator.factor.max_overcommit` factor. If you set
+`indices.recovery.max_bytes_per_sec` in excess of the node-wide limit then the
+node-wide limit takes precedence.
+
+The service should determine values for the absolute maximum bandwidths
+settings by experiment, using a recovery-like workload in which there are
+several concurrent workers each processing files sequentially in chunks of
+512kiB.
+
+`node.bandwidth.recovery.disk.read`::
+(<<byte-units,byte value>> per second) The absolute maximum disk read speed for
+a recovery-like workload on the node. If set,
+`node.bandwidth.recovery.disk.write` and `node.bandwidth.recovery.network` must
+also be set.
+
+`node.bandwidth.recovery.disk.write`::
+(<<byte-units,byte value>> per second) The absolute maximum disk write speed
+for a recovery-like workload on the node. If set,
+`node.bandwidth.recovery.disk.read` and `node.bandwidth.recovery.network` must
+also be set.
+
+`node.bandwidth.recovery.network`::
+(<<byte-units,byte value>> per second) The absolute maximum network throughput
+for a recovery-like workload on the node, which applies to both reads and
+writes. If set, `node.bandwidth.recovery.disk.read` and
+`node.bandwidth.recovery.disk.write` must also be set.
+
+`node.bandwidth.recovery.operator.factor.read`::
+(float) The proportion of the maximum read
+bandwidth that may be used for recoveries if `indices.recovery.max_bytes_per_sec`
+is not set. Must be greater than `0` and not greater than `1`. If not set, the
+value of `node.bandwidth.recovery.operator.factor` is used. If no factor
+settings are set then the value `0.4` is used.
+
+`node.bandwidth.recovery.operator.factor.write`::
+(float) The proportion of the maximum
+write bandwidth that may be used for recoveries if `indices.recovery.max_bytes_per_sec`
+is not set. Must be greater than `0` and not greater than `1`. If not set, the
+value of `node.bandwidth.recovery.operator.factor` is used. If no factor
+settings are set then the value `0.4` is used.
+
+`node.bandwidth.recovery.operator.factor`::
+(float) The proportion of the maximum
+bandwidth that may be used for recoveries if neither
+`indices.recovery.max_bytes_per_sec` nor any other factor settings are set.
+Must be greater than `0` and not greater than `1`. Defaults to `0.4`.
+
+`node.bandwidth.recovery.operator.factor.max_overcommit`::
+(float) The proportion of the absolute
+maximum bandwidth that may be used for recoveries regardless of any other
+settings. Must be greater than `0`. Defaults to `100`.
diff --git a/server/src/main/java/org/elasticsearch/common/settings/ClusterSettings.java b/server/src/main/java/org/elasticsearch/common/settings/ClusterSettings.java
@@ -237,6 +237,13 @@ public void apply(Settings value, Settings current, Settings previous) {
                 RecoverySettings.INDICES_RECOVERY_USE_SNAPSHOTS_SETTING,
                 RecoverySettings.INDICES_RECOVERY_MAX_CONCURRENT_SNAPSHOT_FILE_DOWNLOADS,
                 RecoverySettings.INDICES_RECOVERY_MAX_CONCURRENT_SNAPSHOT_FILE_DOWNLOADS_PER_NODE,
+                RecoverySettings.NODE_BANDWIDTH_RECOVERY_OPERATOR_FACTOR_SETTING,
+                RecoverySettings.NODE_BANDWIDTH_RECOVERY_OPERATOR_FACTOR_READ_SETTING,
+                RecoverySettings.NODE_BANDWIDTH_RECOVERY_OPERATOR_FACTOR_WRITE_SETTING,
+                RecoverySettings.NODE_BANDWIDTH_RECOVERY_OPERATOR_FACTOR_MAX_OVERCOMMIT_SETTING,
+                RecoverySettings.NODE_BANDWIDTH_RECOVERY_DISK_WRITE_SETTING,
+                RecoverySettings.NODE_BANDWIDTH_RECOVERY_DISK_READ_SETTING,
+                RecoverySettings.NODE_BANDWIDTH_RECOVERY_NETWORK_SETTING,
                 ThrottlingAllocationDecider.CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING,
                 ThrottlingAllocationDecider.CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_INCOMING_RECOVERIES_SETTING,
                 ThrottlingAllocationDecider.CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_OUTGOING_RECOVERIES_SETTING,

diff --git a/server/src/main/java/org/elasticsearch/common/settings/Setting.java b/server/src/main/java/org/elasticsearch/common/settings/Setting.java
@@ -1941,28 +1941,35 @@ public static Setting<Double> doubleSetting(String key, double defaultValue, dou
     }
 
     public static Setting<Double> doubleSetting(String key, double defaultValue, double minValue, double maxValue, Property... properties) {
-        return new Setting<>(key, (s) -> Double.toString(defaultValue), (s) -> {
-            final double d = Double.parseDouble(s);
-            if (d < minValue) {
-                String err = "Failed to parse value"
-                    + (isFiltered(properties) ? "" : " [" + s + "]")
-                    + " for setting ["
-                    + key
-                    + "] must be >= "
-                    + minValue;
-                throw new IllegalArgumentException(err);
-            }
-            if (d > maxValue) {
-                String err = "Failed to parse value"
-                    + (isFiltered(properties) ? "" : " [" + s + "]")
-                    + " for setting ["
-                    + key
-                    + "] must be <= "
-                    + maxValue;
-                throw new IllegalArgumentException(err);
-            }
-            return d;
-        }, properties);
+        return new Setting<>(
+            key,
+            (s) -> Double.toString(defaultValue),
+            (s) -> parseDouble(s, minValue, maxValue, key, properties),
+            properties
+        );
+    }
+
+    public static Double parseDouble(String s, double minValue, double maxValue, String key, Property... properties) {
+        final double d = Double.parseDouble(s);
+        if (d < minValue) {
+            String err = "Failed to parse value"
+                + (isFiltered(properties) ? "" : " [" + s + "]")
+                + " for setting ["
+                + key
+                + "] must be >= "
+                + minValue;
+            throw new IllegalArgumentException(err);
+        }
+        if (d > maxValue) {
+            String err = "Failed to parse value"
+                + (isFiltered(properties) ? "" : " [" + s + "]")
+                + " for setting ["
+                + key
+                + "] must be <= "
+                + maxValue;
+            throw new IllegalArgumentException(err);
+        }
+        return d;
     }
 
     @Override