Quieter logging from the DiskThresholdMonitor #48115

DaveCTurner · 2019-10-16T09:21:50Z

Today if an Elasticsearch node reaches a disk watermark then it will repeatedly
emit logging about it, which implies that some action needs to be taken by the
administrator. This is misleading. Elasticsearch strives to keep nodes under
the high watermark, but it is normal to have a few nodes occasionally exceed
this level. Nodes may be over the low watermark for an extended period without
any ill effects.

This commit enhances the logging emitted by the DiskThresholdMonitor to be
less misleading. The expected case of hitting the high watermark and
immediately relocating one or more shards that to bring the node back under the
watermark again is reduced in severity to INFO. Additionally, INFO messages
are not emitted repeatedly.

Fixes #48038

Today if an Elasticsearch node reaches a disk watermark then it will repeatedly emit logging about it, which implies that some action needs to be taken by the administrator. This is misleading. Elasticsearch strives to keep nodes under the high watermark, but it is normal to have a few nodes occasionally exceed this level. Nodes may be over the low watermark for an extended period without any ill effects. This commit enhances the logging emitted by the `DiskThresholdMonitor` to be less misleading. The expected case of hitting the high watermark and immediately relocating one or more shards that to bring the node back under the watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages are not emitted repeatedly. Fixes elastic#48038

elasticmachine · 2019-10-16T09:21:53Z

Pinging @elastic/es-distributed (:Distributed/Allocation)

henningandersen

LGTM.

I left a couple of minor comments to consider.

server/src/main/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdMonitor.java

...er/src/test/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdMonitorTests.java

...java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderUnitTests.java

…logging

Today if an Elasticsearch node reaches a disk watermark then it will repeatedly emit logging about it, which implies that some action needs to be taken by the administrator. This is misleading. Elasticsearch strives to keep nodes under the high watermark, but it is normal to have a few nodes occasionally exceed this level. Nodes may be over the low watermark for an extended period without any ill effects. This commit enhances the logging emitted by the `DiskThresholdMonitor` to be less misleading. The expected case of hitting the high watermark and immediately relocating one or more shards that to bring the node back under the watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages are not emitted repeatedly. Fixes #48038

DaveCTurner added >enhancement :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.0.0 v7.6.0 labels Oct 16, 2019

DaveCTurner requested review from ywelsch and henningandersen October 16, 2019 09:21

henningandersen approved these changes Oct 16, 2019

View reviewed changes

DaveCTurner added 5 commits October 18, 2019 13:27

Merge branch 'master' into 2019-10-16-quieter-disk-threshold-monitor-…

9d78efc

…logging

Rename parameters

b51d525

Reorder parameters

895ffa8

Assert messages emitted when passing two watermarks at once

ff041dc

Add some testing of behaviour if time does not pass

a866bcb

DaveCTurner merged commit e16bb9a into elastic:master Oct 18, 2019

DaveCTurner deleted the 2019-10-16-quieter-disk-threshold-monitor-logging branch October 18, 2019 13:44

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quieter logging from the DiskThresholdMonitor #48115

Quieter logging from the DiskThresholdMonitor #48115

DaveCTurner commented Oct 16, 2019

elasticmachine commented Oct 16, 2019

henningandersen left a comment

Quieter logging from the DiskThresholdMonitor #48115

Quieter logging from the DiskThresholdMonitor #48115

Conversation

DaveCTurner commented Oct 16, 2019

elasticmachine commented Oct 16, 2019

henningandersen left a comment

Choose a reason for hiding this comment