feat: add monitoring metrics to process_pending container#1760
Conversation
| AGGREGATION_LAST_SUCCESS_TIMESTAMP = Gauge( | ||
| "aggregation_last_success_timestamp_seconds", | ||
| "Timestamp of last successful batch processing", | ||
| multiprocess_mode="livemax", | ||
| ) |
There was a problem hiding this comment.
Saving a timestamp is really weird imo, doesn't really fit with a timeseries record. Instead, I would just add a Counter that counts how many successful writes there were (isn't that what AGGREGATION_RECORDS_WRITTEN does?). We can know the approximate "last successful timestamp" simply by looking at the counter, if it stops going up then we have a problem.
monitoring/prometheus.yml
Outdated
| static_configs: | ||
| - targets: ['host.docker.internal:8003'] | ||
| metrics_path: '/metrics/' | ||
| scrape_interval: 1s |
There was a problem hiding this comment.
High resolution here, I think 10s should be more than enough, no? How often does the cron job for processing pending even run?
There was a problem hiding this comment.
Right now it seems to be configured to run each 1s, anyway we can increase the resolution here for sure
ca5018d to
3aecbb3
Compare
3aecbb3 to
e96b275
Compare
| - dashboard_db | ||
| ports: | ||
| - target: 8001 | ||
| published: ${PENDING_AGGREGATIONS_METRICS_PORT:-8003} |
There was a problem hiding this comment.
is there a need for a default here seen that you already have a default in the environment section?
e96b275 to
4e740ea
Compare
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Outdated
Show resolved
Hide resolved
4e740ea to
8a275c7
Compare
8a275c7 to
24b755b
Compare
MarceloRobert
left a comment
There was a problem hiding this comment.
LGTM, it's working well
Description
Add Prometheus metrics and a Grafana dashboard for the pending aggregations processor (process_pending_aggregations)
Changes
How to test