Automatic (or at least pro-active) monitoring for workers #847

Katbaloo · 2024-05-21T13:56:09Z

As per recent (and multiple past experiences) we need to explore some solution for automatically monitoring workers. Currently we don't find out until several days later when people start complaining that their stats aren't updating.

thenav56 · 2024-05-31T08:14:47Z

Update:

Summary:
We have set up a simple heartbeat check that sends an "okay" message to Sentry every 2 hours. If the container stops working, the heartbeat check will not send the "okay" message, triggering an email notification to Sentry's Mapswipe members.

Details:
We have set up a cron monitor to track the status of the worker Docker container using Sentry.
We can view the monitor here: Sentry Cron Monitor

The following script is used to send the heartbeat to Sentry:

#!/bin/bash -ex

SENTRY_CRON="https://xxxxxxx.ingest.sentry.io/api/xxxxxxx/cron/workers_firebxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/"
CONTAINER_NAME=mapswipe_workers_firebase_to_postgres

# 🟡 Notify Sentry your job is running:
curl "${SENTRY_CRON}?status=in_progress"
docker ps --format '{{.Names}}' | grep -wq "$CONTAINER_NAME"
sleep 2

# 🟢 Notify Sentry your job has completed successfully:
curl "${SENTRY_CRON}?status=ok"

The crontab configuration to run this script every 1 hours is as follows:

0 */1 * * * /root/sentry-cron-monitor.sh

Future Improvements

To enhance our system's reliability and ensure timely execution of scheduled tasks, we should implement cron monitoring for each individual task using the Sentry Python SDK. This will allow us to detect and address any issues with task execution promptly.

thenav56 · 2024-05-31T08:18:42Z

For future improvements, we have created a ticket for in-depth tasks monitoring

Setup cron monitoring for all scheduled tasks python-mapswipe-workers#937

laurentS · 2024-05-31T14:31:43Z

This is a great step forward! Out of curiosity (I'm researching monitoring options for similar use cases) did you look at other options?

thenav56 · 2024-06-04T15:54:45Z

Hey @laurentS

We are mostly using one of these tools for monitoring (push/pull)

https://uptimerobot.com/ (Push/Pull)
https://github.com/louislam/uptime-kuma (Push/Pull)
Sentry (Push)

Using sentry, we have options to integrate within the codebase for centralized/easy configuration.
For e.g.:
We have sentry cron configs here https://github.com/IFRCGo/go-api/blob/59e8a463403f2cbdbcefec47ecc5c31dcadc8237/main/sentry.py#L107-L120
which are tracked automatically with just a decorator like https://github.com/IFRCGo/go-api/blob/59e8a463403f2cbdbcefec47ecc5c31dcadc8237/api/management/commands/index_and_notify.py#L60

For pull checks, /health-check endpoint is also something which seems to be good integration
For e.g.:
https://github.com/IFRCGo/alert-hub-backend/blob/75445599b6cfb17fe2749d09edad84cabde403cb/main/settings.py#L105-L111
https://alerthub-api.ifrc.org/health-check/ - Running instance

In the future, we should add more of this checks in mapswipe for identifying issues proactively.

Katbaloo assigned Katbaloo and kopitek8 and unassigned Katbaloo May 21, 2024

Katbaloo changed the title ~~Automatic (or at least pro-active) monitoring fro workers~~ Automatic (or at least pro-active) monitoring for workers May 23, 2024

kopitek8 assigned thenav56 and unassigned kopitek8 May 29, 2024

thenav56 mentioned this issue May 31, 2024

Setup cron monitoring for all scheduled tasks mapswipe/python-mapswipe-workers#937

Open

kopitek8 closed this as completed Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic (or at least pro-active) monitoring for workers #847

Automatic (or at least pro-active) monitoring for workers #847

Katbaloo commented May 21, 2024 •

edited

Loading

thenav56 commented May 31, 2024 •

edited

Loading

thenav56 commented May 31, 2024

laurentS commented May 31, 2024

thenav56 commented Jun 4, 2024

Automatic (or at least pro-active) monitoring for workers #847

Automatic (or at least pro-active) monitoring for workers #847

Comments

Katbaloo commented May 21, 2024 • edited Loading

thenav56 commented May 31, 2024 • edited Loading

Update:

Future Improvements

thenav56 commented May 31, 2024

laurentS commented May 31, 2024

thenav56 commented Jun 4, 2024

Katbaloo commented May 21, 2024 •

edited

Loading

thenav56 commented May 31, 2024 •

edited

Loading