Skip to content
This repository has been archived by the owner on Aug 25, 2021. It is now read-only.

Set check_update_interval for healthchecks #674

Merged
merged 1 commit into from Nov 6, 2020
Merged

Set check_update_interval for healthchecks #674

merged 1 commit into from Nov 6, 2020

Conversation

lkysow
Copy link
Member

@lkysow lkysow commented Nov 6, 2020

The check_update_interval setting controls how often Consul clients
update Consul servers when the output of their checks have changed but
the status hasn't. This
defaults to 5m because often users would create checks that would run
every minute or so and whose output
contained a timestamp that would change on every iteration of the check.
This would cause a lot of writes to the Consul servers even though
the status of the check didn't change.

In our use-case, we only update the check when the pod's readiness
status changes, so we won't be spamming updates. We run into an issue
though because we always set the check to failing when the pod is first
created and then, because of an issue with Consul where you can't set
the check output on first creation, we issue an update to the check with
the same status (critical) but with output "Pod is pending". This output
will take 5 minutes to show up in the UI without this setting. That
might be okay if the pod immediately becomes healthy but if it doesn't
then it's likely going to be confusing for users why the check is
failing but it doesn't have any output.

Additionally, if the pod's readiness probes fail immediately, then we'll
update the output to be something like "Readiness probes failing" but
again, the UI won't be updated because the status is still critical and
so again the UI would show the check failing with no output.

(also rename the test file to match the file it's testing as per our convention).

The check_update_interval setting controls how often Consul clients
update Consul servers when the output of their checks have changed but
the status hasn't. This
defaults to 5m because often users would create checks that would run
every minute or so and whose output
contained a timestamp that would change on every iteration of the check.
This would cause a lot of writes to the Consul servers even though
the status of the check didn't change.

In our use-case, we only update the check when the pod's readiness
status changes, so we won't be spamming updates. We run into an issue
though because we always set the check to failing when the pod is first
created and then, because of an issue with Consul where you can't set
the check output on first creation, we issue an update to the check with
the same status (critical) but with output "Pod is pending". This output
will take 5 minutes to show up in the UI without this setting. That
might be okay if the pod immediately becomes healthy but if it doesn't
then it's likely going to be confusing for users why the check is
failing but it doesn't have any output.

Additionally, if the pod's readiness probes fail immediately, then we'll
update the output to be something like "Readiness probes failing" but
again, the UI won't be updated because the status is still critical and
so again the UI would show the check failing with no output.
Copy link
Contributor

@kschoche kschoche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧨

Copy link
Contributor

@ndhanushkodi ndhanushkodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 🎉

@lkysow lkysow merged commit 5f9f7c6 into master Nov 6, 2020
@lkysow lkysow deleted the check-output branch November 6, 2020 20:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants