Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic (or at least pro-active) monitoring for workers #847

Closed
Katbaloo opened this issue May 21, 2024 · 4 comments
Closed

Automatic (or at least pro-active) monitoring for workers #847

Katbaloo opened this issue May 21, 2024 · 4 comments
Assignees

Comments

@Katbaloo
Copy link

Katbaloo commented May 21, 2024

As per recent (and multiple past experiences) we need to explore some solution for automatically monitoring workers. Currently we don't find out until several days later when people start complaining that their stats aren't updating.

@Katbaloo Katbaloo assigned Katbaloo and kopitek8 and unassigned Katbaloo May 21, 2024
@Katbaloo Katbaloo changed the title Automatic (or at least pro-active) monitoring fro workers Automatic (or at least pro-active) monitoring for workers May 23, 2024
@kopitek8 kopitek8 assigned thenav56 and unassigned kopitek8 May 29, 2024
@thenav56
Copy link
Contributor

thenav56 commented May 31, 2024

Update:

Summary:
We have set up a simple heartbeat check that sends an "okay" message to Sentry every 2 hours. If the container stops working, the heartbeat check will not send the "okay" message, triggering an email notification to Sentry's Mapswipe members.

Details:
We have set up a cron monitor to track the status of the worker Docker container using Sentry.
We can view the monitor here: Sentry Cron Monitor

image

The following script is used to send the heartbeat to Sentry:

#!/bin/bash -ex

SENTRY_CRON="https://xxxxxxx.ingest.sentry.io/api/xxxxxxx/cron/workers_firebxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/"
CONTAINER_NAME=mapswipe_workers_firebase_to_postgres

# 🟡 Notify Sentry your job is running:
curl "${SENTRY_CRON}?status=in_progress"
docker ps --format '{{.Names}}' | grep -wq "$CONTAINER_NAME"
sleep 2

# 🟢 Notify Sentry your job has completed successfully:
curl "${SENTRY_CRON}?status=ok"

The crontab configuration to run this script every 1 hours is as follows:

0 */1 * * * /root/sentry-cron-monitor.sh

Future Improvements

To enhance our system's reliability and ensure timely execution of scheduled tasks, we should implement cron monitoring for each individual task using the Sentry Python SDK. This will allow us to detect and address any issues with task execution promptly.

@thenav56
Copy link
Contributor

For future improvements, we have created a ticket for in-depth tasks monitoring

@laurentS
Copy link
Member

This is a great step forward! Out of curiosity (I'm researching monitoring options for similar use cases) did you look at other options?

@thenav56
Copy link
Contributor

thenav56 commented Jun 4, 2024

Hey @laurentS

We are mostly using one of these tools for monitoring (push/pull)

Using sentry, we have options to integrate within the codebase for centralized/easy configuration.
For e.g.:
We have sentry cron configs here https://github.com/IFRCGo/go-api/blob/59e8a463403f2cbdbcefec47ecc5c31dcadc8237/main/sentry.py#L107-L120
which are tracked automatically with just a decorator like https://github.com/IFRCGo/go-api/blob/59e8a463403f2cbdbcefec47ecc5c31dcadc8237/api/management/commands/index_and_notify.py#L60

For pull checks, /health-check endpoint is also something which seems to be good integration
For e.g.:
https://github.com/IFRCGo/alert-hub-backend/blob/75445599b6cfb17fe2749d09edad84cabde403cb/main/settings.py#L105-L111
https://alerthub-api.ifrc.org/health-check/ - Running instance

In the future, we should add more of this checks in mapswipe for identifying issues proactively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done and deployed
Development

No branches or pull requests

4 participants