-
Notifications
You must be signed in to change notification settings - Fork 255
Description
❌ This issue is not open for contribution. Visit Contributing guidelines to learn about the contributing process and how to find suitable issues.
Current behavior
During continuous deployment, when the container's running production code are shutdown, and in particular the celery workers, tasks may get interrupted and may not be reprocessed after the new code, and workers, have started. Kubernetes gives the worker container 30 seconds of grace to shutdown.
Desired behavior
With Celery 5.5.0, a new feature was added to enable a 'soft shutdown', which is a time-limited warm shutdown. A warm shutdown allows the workers to finish any tasks before it actually terminates. We can take advantage of this feature by setting. This won't likely help long-running publishing tasks, but would likely help all others. To implement this, we should:
- upgrade to Celery 5.5.0 if we haven't already
- set
worker_soft_shutdown_timeout=28for Celery (28 slightly less than the 30 seconds of grace given by k8s)
We may also need the following to ensure a soft shutdown is triggered for SIGTERM:
- set
export REMAP_SIGTERM="SIGQUIT"for the celery workers, and - coordinate with infrastructure to ensure this is set for production
Value add
Better reliability for Studio's celery tasks
Possible tradeoffs
As mentioned, this will not accommodate for tasks that normally take longer than 30 seconds, like channel publishing.
References
https://github.com/celery/celery/releases/tag/v5.5.0
https://docs.celeryq.dev/en/latest/userguide/workers.html#soft-shutdown
