New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uses ClusterSettings
instead of Node Settings
in HealthMetadataService
#96843
Conversation
When building the first version of `HealthMetadata` object, we were using the `Settings` object, which has the Node's settings, what does not seem to be propagated to the Node, hence we always used the default values of the settings. This made that every time a new master was selected, the initial `HealthMetadata` was built with the default values instead of the settings configured by the customer.
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @HiDAl, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this Pablo 🚀
Sorry, it took me a while to test this.
Namely, I had in mind a potential race-condition concern between the InsertHealthMetadata
and UpsertHealthMetadataTask
(in case a setting is updated as there's a also a master failover).
However, I don't think we're subject to it as we have 1 cluster state applier thread and the cluster settings are applied (and listeners notified here https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/service/ClusterApplierService.java#L490) before the cluster state listeners are notified https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/service/ClusterApplierService.java#L502
Equally, now that InsertHealthMetadata
uses the ClusterSettings
it will see the changes to the cluster settings.
Browsing a bit the code I noticed we use this pattern for cluster settings :
- read the default value using the
Settings
object - subscribe for change notifications via the
ClusterSettings
object
However, step 1 can be performed using theClusterSettings
as you've done here for theindices.dlm.poll_interval
setting
ByteSizeValue initialMaxHeadroom = percentageMode ? randomBytes : ByteSizeValue.MINUS_ONE; | ||
ByteSizeValue initialMaxHeadroom = randomBytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is percentage mode not needed here anymore? Below at line 140 we check it when we do the assertion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this line, no, not needed anymore. This setting has logic when getting the default value, depending upon a different setting. While checking this test, the old behavior was incorrect, using -1
just to make the test pass. The percentage logic at line 107, will either trigger or not the default-getter-logic
assertThat(diskMetadata.describeFloodStageWatermark(), equalTo(updatedFloodStageWatermark)); | ||
assertThat(diskMetadata.floodStageMaxHeadroom(), equalTo(updatedFloodStageMaxHeadroom)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this flood headroom check removal on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nop, it's a couple of lines bellow :P (176-179)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment that https://github.com/elastic/elasticsearch/pull/96843/files#r1230675689
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks 👍 🚀
// `cluster.routing.allocation.disk.watermark.high`. Check {@link CLUSTER_ROUTING_ALLOCATION_HIGH_DISK_MAX_HEADROOM_SETTING} | ||
assertThat( | ||
diskMetadata.highMaxHeadroom(), | ||
equalTo(percentageMode ? maxHeadroomByNode.get(electedMaster) : ByteSizeValue.ofGb(150)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to figure out the relation between the percentageMode
and the maxHeadroom. Maybe add a comment here that the headroom is not set in the settings if percentageMode
is false. Or, if possible, set null in the hashmap that passes the value. This way while reading the test I can see how they relate.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's why I added the comment to each of these tests, to clearly state that the value depends on the value of other settings... craziness
// The value of the setting `cluster.routing.allocation.disk.watermark.high.max_headroom` depends upon the existence of | ||
// `cluster.routing.allocation.disk.watermark.high`. | ||
// Check {@link DiskThresholdSettings.CLUSTER_ROUTING_ALLOCATION_HIGH_DISK_MAX_HEADROOM_SETTING} | ||
assertThat(diskMetadata.highMaxHeadroom(), equalTo(percentageMode ? initialMaxHeadroom : ByteSizeValue.ofGb(150))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if possible, I would prefer to use CLUSTER_ROUTING_ALLOCATION_HIGH_DISK_MAX_HEADROOM_SETTING.getDefault(...)
instead of the hard coded value, because if the value changes for some reason then this test will not fail. Or at least create a constant, if it changes we can easily change it in one place. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice idea, will change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, nice catch with this! I can't believe we didn't see it earlier. Thanks for fixing it. I added some minor comments but LGTM.
💚 Backport successful
|
…ervice` (elastic#96843) When building the first version of `HealthMetadata` object, we were using the `Settings` object, which has the Node's settings, what does not seem to be propagated to the Node, hence we always used the default values of the settings. This made that every time a new master was selected, the initial `HealthMetadata` was built with the default values instead of the settings configured by the customer.
…ervice` (#96843) (#96870) When building the first version of `HealthMetadata` object, we were using the `Settings` object, which has the Node's settings, what does not seem to be propagated to the Node, hence we always used the default values of the settings. This made that every time a new master was selected, the initial `HealthMetadata` was built with the default values instead of the settings configured by the customer.
…ervice` (elastic#96843) When building the first version of `HealthMetadata` object, we were using the `Settings` object, which has the Node's settings, what does not seem to be propagated to the Node, hence we always used the default values of the settings. This made that every time a new master was selected, the initial `HealthMetadata` was built with the default values instead of the settings configured by the customer.
…etadataService` (elastic#96843)" This reverts commit 994e927.
…etadataService` (elastic#96843)" This reverts commit 994e927.
…etadataService` (elastic#96843)" This reverts commit 994e927.
…etadataService` (elastic#96843)" This reverts commit 994e927.
While we built the first version of
HealthMetadata
object, we use theSettings
object, which has the Node's settings. Due to this we always used the default values of the settings. This makes that every time a new master was selected, the initialHealthMetadata
was built with the default values instead of the settings configured by the user.closes #96219