Set check_update_interval for healthchecks #674

lkysow · 2020-11-06T00:20:30Z

The check_update_interval setting controls how often Consul clients
update Consul servers when the output of their checks have changed but
the status hasn't. This
defaults to 5m because often users would create checks that would run
every minute or so and whose output
contained a timestamp that would change on every iteration of the check.
This would cause a lot of writes to the Consul servers even though
the status of the check didn't change.

In our use-case, we only update the check when the pod's readiness
status changes, so we won't be spamming updates. We run into an issue
though because we always set the check to failing when the pod is first
created and then, because of an issue with Consul where you can't set
the check output on first creation, we issue an update to the check with
the same status (critical) but with output "Pod is pending". This output
will take 5 minutes to show up in the UI without this setting. That
might be okay if the pod immediately becomes healthy but if it doesn't
then it's likely going to be confusing for users why the check is
failing but it doesn't have any output.

Additionally, if the pod's readiness probes fail immediately, then we'll
update the output to be something like "Readiness probes failing" but
again, the UI won't be updated because the status is still critical and
so again the UI would show the check failing with no output.

(also rename the test file to match the file it's testing as per our convention).

The check_update_interval setting controls how often Consul clients update Consul servers when the output of their checks have changed but the status hasn't. This defaults to 5m because often users would create checks that would run every minute or so and whose output contained a timestamp that would change on every iteration of the check. This would cause a lot of writes to the Consul servers even though the status of the check didn't change. In our use-case, we only update the check when the pod's readiness status changes, so we won't be spamming updates. We run into an issue though because we always set the check to failing when the pod is first created and then, because of an issue with Consul where you can't set the check output on first creation, we issue an update to the check with the same status (critical) but with output "Pod is pending". This output will take 5 minutes to show up in the UI without this setting. That might be okay if the pod immediately becomes healthy but if it doesn't then it's likely going to be confusing for users why the check is failing but it doesn't have any output. Additionally, if the pod's readiness probes fail immediately, then we'll update the output to be something like "Readiness probes failing" but again, the UI won't be updated because the status is still critical and so again the UI would show the check failing with no output.

kschoche

🧨

ndhanushkodi

Looks great 🎉

lkysow requested a review from kschoche November 6, 2020 00:20

lkysow mentioned this pull request Nov 6, 2020

Set failing check when pod is not yet running hashicorp/consul-k8s#380

Merged

1 task

kschoche approved these changes Nov 6, 2020

View reviewed changes

lkysow requested a review from ndhanushkodi November 6, 2020 18:44

ndhanushkodi approved these changes Nov 6, 2020

View reviewed changes

lkysow merged commit 5f9f7c6 into master Nov 6, 2020

lkysow deleted the check-output branch November 6, 2020 20:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set check_update_interval for healthchecks #674

Set check_update_interval for healthchecks #674

lkysow commented Nov 6, 2020

kschoche left a comment

ndhanushkodi left a comment

Set check_update_interval for healthchecks #674

Set check_update_interval for healthchecks #674

Conversation

lkysow commented Nov 6, 2020

kschoche left a comment

Choose a reason for hiding this comment

ndhanushkodi left a comment

Choose a reason for hiding this comment