New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from curl-based default Docker healthcheck to a CLI-based one #5342
Conversation
I would say if this was a development environment change, then develop would be ok but since this is changing the Dockerfile, it should go to next. |
Agreed. I'll retarget it. |
Could any of the Celery commands be used to get the status/availability of workers? |
Sure, but I don't think we want the server container to report as unhealthy when no workers are running? Would cause a chicken-and-egg problem between the server container and the worker container(s). |
… Add migrations check to healthcheck.
14b56a9
to
4b1b4d9
Compare
Also, the celery inspect commands are slow. They send out a ping and wait for a specified timeout for workers to report back. |
Closes N/A (but relates to #5340)
What's Changed
curl
based one (which can fail if all request-processing workers/processes/threads are busy with other requests) to a CLI based one callingnautobot-server health_check
.nautobot-server
commands always callingimport_jobs_as_celery_tasks
when called #4292 and related startup code, thenautobot-server health_check
takes about 6 seconds to execute, so I increased the defaultinterval
andtimeout
for the healthcheck from 5s to 10s. We should be able to bring this back down once we improve the performance ofnautobot-server
startup time.start-period
from 5s to 5m since we know that initial migrations may take several minutes to complete.docker-compose.yml
anddocker-compose.final.yml
.health_check.contrib.migrations
implementation. This fails if there are any un-applied migrations detected and passes if all migrations are in effect. This is needed because while the/health/
URL endpoint won't start responding until the nautobot-server process is serving responses, and therefore implicitly will fail while migrations are in process, thenautobot-server health_check
CLI spins up its own process to report back ASAP. We don't want the container to report as healthy before migrations are completed, as dependent containers/processes (celery worker, celery beat) are likely to encounter errors if they try to start up while migrations are in-flight.QUESTION: should this go into
develop
as a bug fix ornext
as a feature/behavior-change?Screenshots
TODO