-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic (or at least pro-active) monitoring for workers #847
Comments
Update:Summary: Details: The following script is used to send the heartbeat to Sentry: #!/bin/bash -ex
SENTRY_CRON="https://xxxxxxx.ingest.sentry.io/api/xxxxxxx/cron/workers_firebxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/"
CONTAINER_NAME=mapswipe_workers_firebase_to_postgres
# 🟡 Notify Sentry your job is running:
curl "${SENTRY_CRON}?status=in_progress"
docker ps --format '{{.Names}}' | grep -wq "$CONTAINER_NAME"
sleep 2
# 🟢 Notify Sentry your job has completed successfully:
curl "${SENTRY_CRON}?status=ok" The crontab configuration to run this script every 1 hours is as follows: 0 */1 * * * /root/sentry-cron-monitor.sh Future ImprovementsTo enhance our system's reliability and ensure timely execution of scheduled tasks, we should implement cron monitoring for each individual task using the Sentry Python SDK. This will allow us to detect and address any issues with task execution promptly. |
For future improvements, we have created a ticket for in-depth tasks monitoring |
This is a great step forward! Out of curiosity (I'm researching monitoring options for similar use cases) did you look at other options? |
Hey @laurentS We are mostly using one of these tools for monitoring (push/pull)
Using sentry, we have options to integrate within the codebase for centralized/easy configuration. For pull checks, /health-check endpoint is also something which seems to be good integration In the future, we should add more of this checks in mapswipe for identifying issues proactively. |
As per recent (and multiple past experiences) we need to explore some solution for automatically monitoring workers. Currently we don't find out until several days later when people start complaining that their stats aren't updating.
The text was updated successfully, but these errors were encountered: