Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses ClusterSettings instead of Node Settings in HealthMetadataService #96843

Merged
merged 5 commits into from Jun 15, 2023

Conversation

HiDAl
Copy link
Contributor

@HiDAl HiDAl commented Jun 14, 2023

While we built the first version of HealthMetadata object, we use the Settings object, which has the Node's settings. Due to this we always used the default values of the settings. This makes that every time a new master was selected, the initial HealthMetadata was built with the default values instead of the settings configured by the user.

closes #96219

When building the first version of `HealthMetadata` object, we were
using the `Settings` object, which has the Node's settings, what does
not seem to be propagated to the Node, hence we always used the default
values of the settings. This made that every time a new master was
selected, the initial `HealthMetadata` was built with the default values
instead of the settings configured by the customer.
@HiDAl HiDAl added >bug Team:Data Management Meta label for data/management team auto-backport-and-merge Automatically create backport pull requests and merge when ready :Data Management/Health v8.9.0 v8.8.2 labels Jun 14, 2023
@HiDAl HiDAl requested a review from andreidan June 14, 2023 13:55
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @HiDAl, I've created a changelog YAML for you.

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this Pablo 🚀

Sorry, it took me a while to test this.
Namely, I had in mind a potential race-condition concern between the InsertHealthMetadata and UpsertHealthMetadataTask (in case a setting is updated as there's a also a master failover).

However, I don't think we're subject to it as we have 1 cluster state applier thread and the cluster settings are applied (and listeners notified here https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/service/ClusterApplierService.java#L490) before the cluster state listeners are notified https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/service/ClusterApplierService.java#L502

Equally, now that InsertHealthMetadata uses the ClusterSettings it will see the changes to the cluster settings.

Browsing a bit the code I noticed we use this pattern for cluster settings :

  1. read the default value using the Settings object
  2. subscribe for change notifications via the ClusterSettings object
    However, step 1 can be performed using the ClusterSettings as you've done here for the indices.dlm.poll_interval setting

Comment on lines -96 to +108
ByteSizeValue initialMaxHeadroom = percentageMode ? randomBytes : ByteSizeValue.MINUS_ONE;
ByteSizeValue initialMaxHeadroom = randomBytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is percentage mode not needed here anymore? Below at line 140 we check it when we do the assertion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this line, no, not needed anymore. This setting has logic when getting the default value, depending upon a different setting. While checking this test, the old behavior was incorrect, using -1 just to make the test pass. The percentage logic at line 107, will either trigger or not the default-getter-logic

assertThat(diskMetadata.describeFloodStageWatermark(), equalTo(updatedFloodStageWatermark));
assertThat(diskMetadata.floodStageMaxHeadroom(), equalTo(updatedFloodStageMaxHeadroom));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this flood headroom check removal on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nop, it's a couple of lines bellow :P (176-179)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks 👍 🚀

@andreidan andreidan requested a review from gmarouli June 15, 2023 08:53
// `cluster.routing.allocation.disk.watermark.high`. Check {@link CLUSTER_ROUTING_ALLOCATION_HIGH_DISK_MAX_HEADROOM_SETTING}
assertThat(
diskMetadata.highMaxHeadroom(),
equalTo(percentageMode ? maxHeadroomByNode.get(electedMaster) : ByteSizeValue.ofGb(150))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to figure out the relation between the percentageMode and the maxHeadroom. Maybe add a comment here that the headroom is not set in the settings if percentageMode is false. Or, if possible, set null in the hashmap that passes the value. This way while reading the test I can see how they relate.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why I added the comment to each of these tests, to clearly state that the value depends on the value of other settings... craziness

// The value of the setting `cluster.routing.allocation.disk.watermark.high.max_headroom` depends upon the existence of
// `cluster.routing.allocation.disk.watermark.high`.
// Check {@link DiskThresholdSettings.CLUSTER_ROUTING_ALLOCATION_HIGH_DISK_MAX_HEADROOM_SETTING}
assertThat(diskMetadata.highMaxHeadroom(), equalTo(percentageMode ? initialMaxHeadroom : ByteSizeValue.ofGb(150)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: if possible, I would prefer to use CLUSTER_ROUTING_ALLOCATION_HIGH_DISK_MAX_HEADROOM_SETTING.getDefault(...) instead of the hard coded value, because if the value changes for some reason then this test will not fail. Or at least create a constant, if it changes we can easily change it in one place. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice idea, will change it

Copy link
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, nice catch with this! I can't believe we didn't see it earlier. Thanks for fixing it. I added some minor comments but LGTM.

@HiDAl HiDAl merged commit 994e927 into elastic:main Jun 15, 2023
12 checks passed
@HiDAl HiDAl deleted the fix-96219 branch June 15, 2023 13:24
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.8

HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Jun 15, 2023
…ervice` (elastic#96843)

When building the first version of `HealthMetadata` object, we were
using the `Settings` object, which has the Node's settings, what does
not seem to be propagated to the Node, hence we always used the default
values of the settings. This made that every time a new master was
selected, the initial `HealthMetadata` was built with the default values
instead of the settings configured by the customer.
elasticsearchmachine pushed a commit that referenced this pull request Jun 15, 2023
…ervice` (#96843) (#96870)

When building the first version of `HealthMetadata` object, we were
using the `Settings` object, which has the Node's settings, what does
not seem to be propagated to the Node, hence we always used the default
values of the settings. This made that every time a new master was
selected, the initial `HealthMetadata` was built with the default values
instead of the settings configured by the customer.
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this pull request Jun 19, 2023
…ervice` (elastic#96843)

When building the first version of `HealthMetadata` object, we were
using the `Settings` object, which has the Node's settings, what does
not seem to be propagated to the Node, hence we always used the default
values of the settings. This made that every time a new master was
selected, the initial `HealthMetadata` was built with the default values
instead of the settings configured by the customer.
HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Jun 23, 2023
HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Jun 26, 2023
HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Jun 26, 2023
HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-and-merge Automatically create backport pull requests and merge when ready >bug :Data Management/Health Team:Data Management Meta label for data/management team v8.8.2 v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

max_shards_in_cluster is not properly calculated from max_shards_per_node
4 participants