Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect excessive "Health changed to" messages and warn about them #368

Open
magiconair opened this issue Oct 9, 2017 · 3 comments
Open
Milestone

Comments

@magiconair
Copy link
Contributor

The issue of having lots of log messages about changed health comes up on a regular basis.

2017/09/06 14:45:07 [INFO] consul: Health changed to #10372843

Users have generally asked on how to suppress them by looking to change the log level. I've refused to implement that with the reason that a) fabio does not log excessively if the system is stable and b) seeing lots of these messages indicates that the system is either not stable (lots of flapping health checks), is getting lots of writes usually from having volatile output like timestamps, pids, ... in the check output. This causes additional writes to the consul master and the raft log replication and should be avoided. The fix is to remove the volatile data from the health checks and not to change the log level.

However, for other reasons I've now added leveled logging in #366 and that means that users will suppress this message by changing the log level instead of acting upon it. Since the message is not explanatory in itself and one has to understand the connection to consul, raft and how fabio uses them it isn't clear what to do.

Therefore, fabio should detect excessive changes in the health checks and log those as WARN. The simplest way would be to determine the rate of change in changes/min or changes/sec and provide a configurable threshold after which fabio starts logging.

See #345 and #205

@manos
Copy link

manos commented Nov 3, 2017

@magiconair is this possible to add to 1.5.3, if it's not too late? :)

@magiconair
Copy link
Contributor Author

@manos I don't have code for that yet but for now you can set the -discard-check-output option in Consul.

@manos
Copy link

manos commented Nov 3, 2017

oh, right. Thanks :)

magiconair added a commit that referenced this issue Jan 5, 2018
This patch changes the log level for Consul raft log changes from
INFO to DEBUG so that most users will not see this anymore. High
change rate is usually an indicator for flapping services or health
checks with volatile output like timestamps but that problem should be
dealt with on the Consul side.

A PR for #368 will to add an indicator for the change rate of the Consul
state.

Fixes #408
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants