Skip to content

Run hc_program passive checks more often#1759

Merged
rdjjke merged 1 commit intosoperator-release-1.22from
SCHED-362/0
Nov 11, 2025
Merged

Run hc_program passive checks more often#1759
rdjjke merged 1 commit intosoperator-release-1.22from
SCHED-362/0

Conversation

@rdjjke
Copy link
Collaborator

@rdjjke rdjjke commented Nov 11, 2025

Problem

Periodic passive checks that should undrain [user_problem] nodes run too rarely.

Solution

Run passive checks in "hc_program" context more often: every 2 minutes instead of 5 minutes.

Testing

  1. Create a new cluster
  2. Check scontrol show config | grep HealthCheckInterval

Release Notes

Slurm HealthCheckProgram is launched every 2 minutes instead of 5 minutes.

@rdjjke rdjjke merged commit 9dfa16b into soperator-release-1.22 Nov 11, 2025
7 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants