Improve healthcheck responsiveness #579
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Description
By default, dig will try 3 times to get a response back. Each attempt defaults to 5 seconds. Before this change, a single docker healtcheck failure would really mean three failures and would take a total of 15 seconds before failing. By default, docker healthchecks will retry 3 times before considering a service unhealhy (with a 30 second interval). Combined with dig retries, this means it would take a total of 9 failed DNS responses before it considers the pihole to be unhealthy. Combining the retry between dig and docker, dig considers it a success if even 1/3 responses are received - and docker considers it a success if only 1/3 of those attempts are successful. I'm not great at math - and order does make a difference - but I think that means as long as 1/9th of these queries are being answered - then docker thinks its healthy. Anyways, long story sort, dig doesn't need to have its own retry logic since docker already has a configurable retry. I also disable recurse since the goal is to test this specific instance.
Motivation and Context
I have not had issues with this. I'm just trying to get familiar with this project and this stood out to me as a small potential improvement.
How Has This Been Tested?
I tested this by running dig commands directly, as well as building the image locally. I was unable to figure out how to cause a running image to become unhealthy but remain running.
Types of changes
Checklist: