Check for missing, unexpected or malfunctioning Celery
processes, workers or servers through django-celery
.
Celery
works well as a distributed task queue, but when backed against redis
rather than a proper
message bus, it had can be hard to identify how well it is running when under heavy load. Many of the
default Celery
monitoring or reporting tools fail to work under certain error conditions. In particular,
concurrency-based setups with multiple processes can prove unreliable.
The script is a very basic way to perform a crude inventory of running processes, workers and servers against a known list. It can easily be extended to do cleverer things.
The script was built at Reincubate where its primary use has been to identify malfunctioning workers.
- The script is an example of simplicity over configurability. It deliberately avoids any
Celery
-specific functionality, such as theping
request, having found that they cannot be trusted under load. Hack it if you need something different, or enhance it and submit a pull request.
Deploy it to a server which has django-celery
installed and which is able to run ./manage.py celery status
.
It can be run like so:
./manage.py celery status -t 10 | ./celery-worker-check.py specialworker-4@serverA otherworker-5@serverB workername-2@serverC
In this example, the script will look for 4 processes of a worker named specialworker
running on serverA
. It will look for 5 processes of otherworker
running on serverB
, and it will look for 2 processes of workername
running on serverC
.
Note the
-t 10
argument, so that we'll wait 10 seconds to get a full set of responses from loaded processes.
Output could look like this:
Server serverC was missing, accounts for 2 missing processes...
Worker specialworker@serverA was missing, accounts for 4 missing processes...
Worker newspecialworker@serverA was unexpectedly present, accounts for 4 processes...
Process otherworker-4@serverB was missing...
Configure it to be run by cron
so that it will automatically email reports on missing or unexpected servers or workers.
Before building this script we sought to find a pre-existing solution. Most notably, Celery
fires an event when a worker or process dies, but this is not reliable under heavy load.