New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Services are registered without health check #7736
Comments
It looks like the health check is inheriting its name from the service. Do you have the same problem if you give the check a unique name?
Possibly related? #7709 |
Thank you for the tip. I tried it, but unfortunately it did not make a difference. |
I just noticed this looks like the exact same problem as #3935. Any update on this @preetapan ? It is blocking us from using Nomad in our production environment, we can't afford to lose requests every time we deploy. |
Hi folks, I'm going to close this issue as a dupe of #3935 in the interest of helping us surface some of the older papercuts we need to get fixed. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
0.10.4
Operating system and Environment details
Ubuntu 18.04.4
Issue
We have been experiencing some slight downtime on redeployments. It seems to be caused by Nomad registering the new services in Consul in two steps, first the service itself, then the health checks. This causes our load balancer (Traefik) to pick up the new set of instances right away, and then remove them again when the health check is added (in critical state). This happens very quickly, so the services are only registered for a split second in Traefik, but it is enough to lose some requests.
Reproduction steps
For a split second you will see the "Checks" list only contain the "serfHealth" check.
The next call to the API will show two items in the "Checks" list, the new one being the one defined by the job.
The issue is reproducible with this job file:
Here are two consecutive calls to
/v1/health/service/demo-webapp
during a redeployment of the job above. From the first call it looks like the service is healthy, but it is only because of the missing health check.The text was updated successfully, but these errors were encountered: