New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop builders gracefully on SIGTERM #6960
Conversation
|
||
# Do not send the SIGTERM signal to childs | ||
# (pip is automatically killed when receives SIGTERM and make the build to fail one command and stop build) | ||
signal.signal(signal.SIGTERM, sigterm_received) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder where this code should live really, in this task seems resonable, but I wonder if there's a celery-level function that would make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I had the same concern. I was thinking on readthedocs.worker
but that's executed on normal runs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there is a "correct" way of doing this via a Celery plugin. I just found https://github.com/MnogoByte/celery-graceful-stop that replace the SIGTERM handler for the workers as https://github.com/MnogoByte/celery-graceful-stop/blob/master/celery_graceful_stop/bootsteps.py#L35
I will keep reading to see if we can do something similar in a nicer way.
When SIGTERM is received by the process that is building a build, we only log a message instead of passing it to the child processes (`pip` running inside the Docker container, for example). Otherwise, the `pip` process also receives the SIGTERM and it kills itself, producing a `exit_status != 0`, registering this as a BuildCommand and making the whole build to fail. After that, celery "stops gracefully" with a failed build. This is useful combined with supervisor to make our builders to stop gracefully in this scenario: 1. supervisorctl stop build 2. celery receives the SIGTERM and logs the warning message 3. supervisor waits for `stopwaitsecs` before sending SIGKILL
6e49051
to
72db6f4
Compare
I'm
When SIGTERM is received by the process that is building a build, we only log a
message instead of passing it to the child processes (
pip
running inside theDocker container, for example).
Otherwise, the
pip
process also receives the SIGTERM and it kills itself,producing a
exit_status != 0
, registering this as a BuildCommand and makingthe whole build to fail. After that, celery "stops gracefully" with a failed build.
This is useful combined with supervisor to make our builders to stop gracefully
in this scenario:
stopwaitsecs
before sending SIGKILL