Uses `ClusterSettings` instead of Node `Settings` in `HealthMetadataService` #96843

HiDAl · 2023-06-14T13:55:49Z

While we built the first version of HealthMetadata object, we use the Settings object, which has the Node's settings. Due to this we always used the default values of the settings. This makes that every time a new master was selected, the initial HealthMetadata was built with the default values instead of the settings configured by the user.

closes #96219

When building the first version of `HealthMetadata` object, we were using the `Settings` object, which has the Node's settings, what does not seem to be propagated to the Node, hence we always used the default values of the settings. This made that every time a new master was selected, the initial `HealthMetadata` was built with the default values instead of the settings configured by the customer.

elasticsearchmachine · 2023-06-14T13:56:14Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2023-06-14T13:56:14Z

Hi @HiDAl, I've created a changelog YAML for you.

andreidan

Thanks for fixing this Pablo 🚀

Sorry, it took me a while to test this.
Namely, I had in mind a potential race-condition concern between the InsertHealthMetadata and UpsertHealthMetadataTask (in case a setting is updated as there's a also a master failover).

However, I don't think we're subject to it as we have 1 cluster state applier thread and the cluster settings are applied (and listeners notified here https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/service/ClusterApplierService.java#L490) before the cluster state listeners are notified https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/service/ClusterApplierService.java#L502

Equally, now that InsertHealthMetadata uses the ClusterSettings it will see the changes to the cluster settings.

Browsing a bit the code I noticed we use this pattern for cluster settings :

read the default value using the Settings object
subscribe for change notifications via the ClusterSettings object
However, step 1 can be performed using the ClusterSettings as you've done here for the indices.dlm.poll_interval setting

andreidan · 2023-06-15T08:17:16Z

server/src/internalClusterTest/java/org/elasticsearch/health/HealthMetadataServiceIT.java

-            ByteSizeValue initialMaxHeadroom = percentageMode ? randomBytes : ByteSizeValue.MINUS_ONE;
+            ByteSizeValue initialMaxHeadroom = randomBytes;


Is percentage mode not needed here anymore? Below at line 140 we check it when we do the assertion

in this line, no, not needed anymore. This setting has logic when getting the default value, depending upon a different setting. While checking this test, the old behavior was incorrect, using -1 just to make the test pass. The percentage logic at line 107, will either trigger or not the default-getter-logic

andreidan · 2023-06-15T08:29:38Z

server/src/internalClusterTest/java/org/elasticsearch/health/HealthMetadataServiceIT.java

                assertThat(diskMetadata.describeFloodStageWatermark(), equalTo(updatedFloodStageWatermark));
-                assertThat(diskMetadata.floodStageMaxHeadroom(), equalTo(updatedFloodStageMaxHeadroom));


Is this flood headroom check removal on purpose?

nop, it's a couple of lines bellow :P (176-179)

same comment that https://github.com/elastic/elasticsearch/pull/96843/files#r1230675689

Ah thanks 👍 🚀

gmarouli · 2023-06-15T12:09:04Z

server/src/internalClusterTest/java/org/elasticsearch/health/HealthMetadataServiceIT.java

+                // `cluster.routing.allocation.disk.watermark.high`. Check {@link CLUSTER_ROUTING_ALLOCATION_HIGH_DISK_MAX_HEADROOM_SETTING}
+                assertThat(
+                    diskMetadata.highMaxHeadroom(),
+                    equalTo(percentageMode ? maxHeadroomByNode.get(electedMaster) : ByteSizeValue.ofGb(150))


It took me a while to figure out the relation between the percentageMode and the maxHeadroom. Maybe add a comment here that the headroom is not set in the settings if percentageMode is false. Or, if possible, set null in the hashmap that passes the value. This way while reading the test I can see how they relate.

What do you think?

that's why I added the comment to each of these tests, to clearly state that the value depends on the value of other settings... craziness

gmarouli · 2023-06-15T12:14:09Z

server/src/internalClusterTest/java/org/elasticsearch/health/HealthMetadataServiceIT.java

+                // The value of the setting `cluster.routing.allocation.disk.watermark.high.max_headroom` depends upon the existence of
+                // `cluster.routing.allocation.disk.watermark.high`.
+                // Check {@link DiskThresholdSettings.CLUSTER_ROUTING_ALLOCATION_HIGH_DISK_MAX_HEADROOM_SETTING}
+                assertThat(diskMetadata.highMaxHeadroom(), equalTo(percentageMode ? initialMaxHeadroom : ByteSizeValue.ofGb(150)));


Nit: if possible, I would prefer to use CLUSTER_ROUTING_ALLOCATION_HIGH_DISK_MAX_HEADROOM_SETTING.getDefault(...) instead of the hard coded value, because if the value changes for some reason then this test will not fail. Or at least create a constant, if it changes we can easily change it in one place. What do you think?

oh nice idea, will change it

gmarouli

Wow, nice catch with this! I can't believe we didn't see it earlier. Thanks for fixing it. I added some minor comments but LGTM.

elasticsearchmachine · 2023-06-15T13:25:56Z

💚 Backport successful

Status	Branch	Result
✅	8.8

…ervice` (elastic#96843) When building the first version of `HealthMetadata` object, we were using the `Settings` object, which has the Node's settings, what does not seem to be propagated to the Node, hence we always used the default values of the settings. This made that every time a new master was selected, the initial `HealthMetadata` was built with the default values instead of the settings configured by the customer.

…ervice` (#96843) (#96870) When building the first version of `HealthMetadata` object, we were using the `Settings` object, which has the Node's settings, what does not seem to be propagated to the Node, hence we always used the default values of the settings. This made that every time a new master was selected, the initial `HealthMetadata` was built with the default values instead of the settings configured by the customer.

…ervice` (elastic#96843) When building the first version of `HealthMetadata` object, we were using the `Settings` object, which has the Node's settings, what does not seem to be propagated to the Node, hence we always used the default values of the settings. This made that every time a new master was selected, the initial `HealthMetadata` was built with the default values instead of the settings configured by the customer.

…etadataService` (elastic#96843)" This reverts commit 994e927.

HiDAl added 2 commits June 14, 2023 15:50

fix tests

ab2c1d4

HiDAl added >bug Team:Data Management Meta label for data/management team auto-backport-and-merge Automatically create backport pull requests and merge when ready :Data Management/Health v8.9.0 v8.8.2 labels Jun 14, 2023

HiDAl requested a review from andreidan June 14, 2023 13:55

HiDAl and others added 2 commits June 14, 2023 15:56

Update docs/changelog/96843.yaml

0425e3e

uses ClusterSettings to check whether the health service is enabled

3c2e707

andreidan approved these changes Jun 15, 2023

View reviewed changes

andreidan requested a review from gmarouli June 15, 2023 08:53

gmarouli reviewed Jun 15, 2023

View reviewed changes

gmarouli approved these changes Jun 15, 2023

View reviewed changes

tackle comments from @gmarouli

e298a7b

HiDAl merged commit 994e927 into elastic:main Jun 15, 2023
12 checks passed

HiDAl deleted the fix-96219 branch June 15, 2023 13:24

HiDAl mentioned this pull request Jun 15, 2023

[8.8] Uses ClusterSettings instead of Node Settings in HealthMetadataService (#96843) #96870

Merged

HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Jun 23, 2023

Revert "Uses ClusterSettings instead of Node Settings in `HealthM…

123d9a5

…etadataService` (elastic#96843)" This reverts commit 994e927.

HiDAl mentioned this pull request Jun 23, 2023

Reuse already initialized HealthMetadata #97044

Closed

HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Jun 26, 2023

Revert "Uses ClusterSettings instead of Node Settings in `HealthM…

52edaf4

…etadataService` (elastic#96843)" This reverts commit 994e927.

HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Jun 26, 2023

Revert "Uses ClusterSettings instead of Node Settings in `HealthM…

4cfe1ce

…etadataService` (elastic#96843)" This reverts commit 994e927.

HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Jun 27, 2023

Revert "Uses ClusterSettings instead of Node Settings in `HealthM…

b0daa99

…etadataService` (elastic#96843)" This reverts commit 994e927.

HiDAl mentioned this pull request Jun 27, 2023

[bugfix] Ensures that we're keeping up with cluster settings in HealthMetadataService #97135

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uses `ClusterSettings` instead of Node `Settings` in `HealthMetadataService` #96843

Uses `ClusterSettings` instead of Node `Settings` in `HealthMetadataService` #96843

HiDAl commented Jun 14, 2023

elasticsearchmachine commented Jun 14, 2023

elasticsearchmachine commented Jun 14, 2023

andreidan left a comment

andreidan Jun 15, 2023

HiDAl Jun 15, 2023

andreidan Jun 15, 2023

HiDAl Jun 15, 2023

HiDAl Jun 15, 2023

andreidan Jun 15, 2023

gmarouli Jun 15, 2023

HiDAl Jun 15, 2023

gmarouli Jun 15, 2023

HiDAl Jun 15, 2023

gmarouli left a comment

elasticsearchmachine commented Jun 15, 2023

		ByteSizeValue initialMaxHeadroom = percentageMode ? randomBytes : ByteSizeValue.MINUS_ONE;
		ByteSizeValue initialMaxHeadroom = randomBytes;

		assertThat(diskMetadata.describeFloodStageWatermark(), equalTo(updatedFloodStageWatermark));
		assertThat(diskMetadata.floodStageMaxHeadroom(), equalTo(updatedFloodStageMaxHeadroom));

Uses ClusterSettings instead of Node Settings in HealthMetadataService #96843

Uses ClusterSettings instead of Node Settings in HealthMetadataService #96843

Conversation

HiDAl commented Jun 14, 2023

elasticsearchmachine commented Jun 14, 2023

elasticsearchmachine commented Jun 14, 2023

andreidan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmarouli left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Jun 15, 2023

💚 Backport successful

Uses `ClusterSettings` instead of Node `Settings` in `HealthMetadataService` #96843

Uses `ClusterSettings` instead of Node `Settings` in `HealthMetadataService` #96843