Skip to content

Exporter: add reservation name as a label to slurm_node_info#1838

Merged
ali-sattari merged 1 commit intosoperator-release-1.23from
ali/SCHED-215/reservation-metric
Nov 25, 2025
Merged

Exporter: add reservation name as a label to slurm_node_info#1838
ali-sattari merged 1 commit intosoperator-release-1.23from
ali/SCHED-215/reservation-metric

Conversation

@ali-sattari
Copy link
Collaborator

@ali-sattari ali-sattari commented Nov 25, 2025

Problem

We need to alert when nodes are stuck in particular reservations (done by extensive checks) for too long.

Solution

Adding reservation name that covers each node as a label to existing slurm_node_info metric

Testing

  1. Deploy to a test cluster
  2. Create reservation using scontrol create reservation nodes=worker-0 reservationname="example reservation" ...
  3. Wait for ~30s for scrape, check the metric

Release Notes

Exporter: added reservation_name as a label to slurm_node_info metric.

@ali-sattari ali-sattari changed the base branch from main to soperator-release-1.23 November 25, 2025 11:21
@ali-sattari ali-sattari merged commit 7bc0f94 into soperator-release-1.23 Nov 25, 2025
7 of 8 checks passed
@ali-sattari ali-sattari deleted the ali/SCHED-215/reservation-metric branch November 25, 2025 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants