Skip to content

feat: add monitoring metrics to process_pending container#1760

Merged
gustavobtflores merged 1 commit intokernelci:mainfrom
gustavobtflores:feat/process-pending-metrics
Feb 25, 2026
Merged

feat: add monitoring metrics to process_pending container#1760
gustavobtflores merged 1 commit intokernelci:mainfrom
gustavobtflores:feat/process-pending-metrics

Conversation

@gustavobtflores
Copy link
Contributor

Description

Add Prometheus metrics and a Grafana dashboard for the pending aggregations processor (process_pending_aggregations)

Changes

  • Instrument process_pending_aggregations with Prometheus metrics (aggregation_records_written_total by table, aggregation_last_success_timestamp_seconds).
  • Expose the processor metrics endpoint via Docker Compose (8003:8001) and add a Prometheus scrape job for it.
  • Add the monitoring/aggregation_process.json Grafana dashboard and update monitoring docs with import steps.

How to test

  1. docker compose -f docker-compose.monitoring.yml up -d
  2. docker compose --profile with_commands up -d pending_aggregations_processor
  3. In Prometheus (http://localhost:9090/targets), confirm kernelci-pending-aggregations-processor is UP, then import monitoring/aggregation_process.json in Grafana to verify panels populate.

Comment on lines +28 to +32
AGGREGATION_LAST_SUCCESS_TIMESTAMP = Gauge(
"aggregation_last_success_timestamp_seconds",
"Timestamp of last successful batch processing",
multiprocess_mode="livemax",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saving a timestamp is really weird imo, doesn't really fit with a timeseries record. Instead, I would just add a Counter that counts how many successful writes there were (isn't that what AGGREGATION_RECORDS_WRITTEN does?). We can know the approximate "last successful timestamp" simply by looking at the counter, if it stops going up then we have a problem.

static_configs:
- targets: ['host.docker.internal:8003']
metrics_path: '/metrics/'
scrape_interval: 1s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High resolution here, I think 10s should be more than enough, no? How often does the cron job for processing pending even run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it seems to be configured to run each 1s, anyway we can increase the resolution here for sure

@gustavobtflores gustavobtflores force-pushed the feat/process-pending-metrics branch from ca5018d to 3aecbb3 Compare February 24, 2026 19:39
@gustavobtflores gustavobtflores force-pushed the feat/process-pending-metrics branch from 3aecbb3 to e96b275 Compare February 24, 2026 23:27
- dashboard_db
ports:
- target: 8001
published: ${PENDING_AGGREGATIONS_METRICS_PORT:-8003}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a need for a default here seen that you already have a default in the environment section?

@gustavobtflores gustavobtflores force-pushed the feat/process-pending-metrics branch from e96b275 to 4e740ea Compare February 25, 2026 13:03
@MarceloRobert MarceloRobert added the Metrics Related to open metrics, measurements or usage data label Feb 25, 2026
@gustavobtflores gustavobtflores force-pushed the feat/process-pending-metrics branch from 4e740ea to 8a275c7 Compare February 25, 2026 15:41
@gustavobtflores gustavobtflores force-pushed the feat/process-pending-metrics branch from 8a275c7 to 24b755b Compare February 25, 2026 15:49
Copy link
Collaborator

@MarceloRobert MarceloRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it's working well

@gustavobtflores gustavobtflores added this pull request to the merge queue Feb 25, 2026
Merged via the queue into kernelci:main with commit 25e2434 Feb 25, 2026
7 checks passed
@gustavobtflores gustavobtflores mentioned this pull request Mar 1, 2026
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Metrics Related to open metrics, measurements or usage data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants