Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve healthcheck responsiveness #579

Merged

Conversation

@lightswitch05
Copy link
Contributor

lightswitch05 commented Feb 25, 2020

  • Improve healthcheck responsiveness by disabling dig retry logic.
  • Disable recursive queries for the healthcheck.
  • Remove duplicate python import

Description

By default, dig will try 3 times to get a response back. Each attempt defaults to 5 seconds. Before this change, a single docker healtcheck failure would really mean three failures and would take a total of 15 seconds before failing. By default, docker healthchecks will retry 3 times before considering a service unhealhy (with a 30 second interval). Combined with dig retries, this means it would take a total of 9 failed DNS responses before it considers the pihole to be unhealthy. Combining the retry between dig and docker, dig considers it a success if even 1/3 responses are received - and docker considers it a success if only 1/3 of those attempts are successful. I'm not great at math - and order does make a difference - but I think that means as long as 1/9th of these queries are being answered - then docker thinks its healthy. Anyways, long story sort, dig doesn't need to have its own retry logic since docker already has a configurable retry. I also disable recurse since the goal is to test this specific instance.

Motivation and Context

I have not had issues with this. I'm just trying to get familiar with this project and this stood out to me as a small potential improvement.

How Has This Been Tested?

I tested this by running dig commands directly, as well as building the image locally. I was unable to figure out how to cause a running image to become unhealthy but remain running.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
@lightswitch05 lightswitch05 force-pushed the lightswitch05:feature/improve-healthcheck-responsiveness branch 4 times, most recently from 270a791 to cce4611 Feb 25, 2020
By default, dig will retry 2 times (for a total of 3 attempts) to get a response back. Each attempt defaults to 5 seconds. Before this change, a single docker healtcheck failure would really mean three failures and would take a total of 15 seconds before failing. By default, docker healthchecks will retry 3 times before considering a service unhealhy (with a 30 second interval). Combined with dig retries, this means it would take a total of 9 failed DNS responses before it considers the pihole to be unhealthy. Combining the retry between dig and docker, dig considers it a success if even 1/3 responses are recieved - and docker considers it a success if only 1/3 of those successes are successful. I'm not great at math - and order does make a difference - but I think that means as long as 1/9th of DNS queries are being answered - then docker thinks its healthy. Anyways, long story sort, dig doesn't need to have its own retry logic since docker already has a configuarable retry. I also disable recurse since the goal is to test this specific instance.

Also removed duplicate import statement.

Signed-off-by: Daniel <daniel@developerdan.com>
@lightswitch05 lightswitch05 force-pushed the lightswitch05:feature/improve-healthcheck-responsiveness branch from cce4611 to cf6d74a Feb 25, 2020
@diginc
diginc approved these changes Feb 25, 2020
@diginc diginc merged commit d102ba9 into pi-hole:dev Feb 25, 2020
1 check passed
1 check passed
DCO DCO
Details
@lightswitch05

This comment has been minimized.

Copy link
Contributor Author

lightswitch05 commented Feb 25, 2020

🍾 🎉 😄

@diginc

This comment has been minimized.

Copy link
Member

diginc commented Feb 25, 2020

Thanks for the PR. I wasn't aware this was a problem. Nice fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.