A Python tool to manage preemptable Slurm jobs that are blocking pending jobs due to AssocGrpNodeLimit constraints.
This script identifies pending jobs blocked by AssocGrpNodeLimit and finds preemptable jobs running on nodes used by the same accounts. It then kills the lowest-priority preemptable jobs to free up resources.
This script is an interim fix whilst we work on a more permanent solution with SchedMD whereby AssocGrpNodeLimit blocked jobs do not appear to preempt jobs even though they should.
Slurm Account Name Format: Accounts are expected in format <facility>:<repo> or <facility>:<repo>@<cluster> (e.g., fpd:g4sim@milano).
- 🔍 Identifies pending jobs blocked by
AssocGrpNodeLimit - 🎯 Finds preemptable jobs on nodes where the pending job's facility already have running jobs
- 📊 Kills lowest priority first, then prefers 'default' repos
- 🔒 Dry-run mode for safe testing
- 🏢 Facility-based consolidation analysis to identify node sharing patterns
- 📈 Fragmentation scoring to assess resource utilization efficiency
- Python 3.9+
- Slurm workload manager (with
squeue,scontrol, andscancelcommands) - Access permissions to query and cancel Slurm jobs
Use the provided Makefile to set up everything automatically:
# Create virtual environment and install dependencies
make
# Or explicitly:
make dev# Dry-run mode (safe - shows what would be done)
# Default: facility matching
make dry-run
# Dry-run with verbose logging
make dry-run-verbose
# Actually kill jobs (facility matching - default)
make run
# Run with verbose logging
make run-verbose
# Show all available commands
make help# Activate virtual environment first
source venv/bin/activate
# Show help
python kill_preemptable_jobs.py --help
# Dry-run mode (default: facility matching)
python kill_preemptable_jobs.py --dry-run
# Dry-run with tables (shows resource details)
python kill_preemptable_jobs.py --dry-run -v
# Dry-run with full debug logging
python kill_preemptable_jobs.py --dry-run -vv
# Actually kill jobs (default: facility matching)
python kill_preemptable_jobs.py
# Kill jobs with tables
python kill_preemptable_jobs.py -v
# Kill jobs with full debug logging
python kill_preemptable_jobs.py -vv--dry-run: Show what would be done without actually killing jobs--verbose/-v: Control output verbosity (can be repeated)- No flag (default): Show only summaries and key status messages
-v: Show summaries + detailed tables (resource breakdowns, job lists)-vv: Show summaries + tables + debug logs (all Slurm commands, data processing)
# Minimal output - just summaries (47 lines)
python kill_preemptable_jobs.py --dry-run
# Moderate output - summaries + tables (67 lines)
python kill_preemptable_jobs.py --dry-run -v
# Full output - everything including debug info (154+ lines)
python kill_preemptable_jobs.py --dry-run -vvThe tool follows an 11-step pipeline to manage preemptable jobs:
- Query Pending Jobs: Finds all pending non-preemptable jobs with
AssocGrpNodeLimitreason - Filter by Account Capacity: Validates pending jobs against account capacity
- Display Pending Jobs Summary: Shows summary by facility
- Find Occupied Nodes: Locates all nodes needed for pending jobs
- Locate Preemptable Jobs: Finds all running preemptable jobs on those nodes
- Get Node Memory Information: Collects memory constraints for resource calculations
- Process each account independently based on account limits and usage
- Calculate resource requirements per partition
- Determine jobs to terminate based on account limits
- Sort by priority (lowest first), then prefer 'default' repos
- Generate account results with kill decisions
- Group accounts by facility prefix (e.g.,
fpd:g4sim,fpd:analysis→ facilityfpd) - Analyze node distribution across accounts within each facility
- Calculate fragmentation scores (0.0-1.0) indicating node sharing
- Identify consolidation opportunities where multiple accounts share nodes
- Display multi-account node details for visibility
Fragmentation Score Interpretation:
- < 0.1: Low fragmentation (excellent consolidation)
- 0.1 - 0.3: Moderate fragmentation (some sharing)
- > 0.3: High fragmentation (many nodes shared across accounts)
- Display total statistics across all accounts
- Show per-account breakdown of kill decisions
- Show detailed job listings grouped by account
- Mark jobs to be killed vs preserved
- Cancel minimum number of preemptable jobs to free required resources
- Or perform dry-run to show what would be done
┌─────────────────────────────────────────────────────────────────┐
│ Steps 1-6: Data Collection │
│ • Get pending jobs with AssocGrpNodeLimit │
│ • Filter by account capacity │
│ • Get nodes for pending jobs │
│ • Get and filter preemptable jobs │
│ • Get node memory information │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 7: ACCOUNT-BASED PROCESSING │
│ • Process each account independently │
│ • Calculate resource requirements per partition │
│ • Determine jobs to terminate based on account limits │
│ • Generate account_results list │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 8: FACILITY-BASED PROCESSING │
│ • Group accounts by facility prefix │
│ • Analyze node distribution per facility │
│ • Calculate fragmentation scores │
│ • Identify consolidation opportunities │
│ • Display multi-account node details │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 9: CROSS-ACCOUNT SUMMARY │
│ • Display total statistics across all accounts │
│ • Show per-account breakdown of kill decisions │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 10: Display All Jobs by Account │
│ • Show detailed job listings │
│ • Mark jobs to be killed vs preserved │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 11: Execute Terminations │
│ • Kill jobs (or dry-run) │
└─────────────────────────────────────────────────────────────────┘
In the account naming convention <facility>:<repo>@<partition>, the facility is the prefix before the colon. For example:
- In
fpd:g4sim@milano:fpdis the facility,g4simis the repo,milanois the partition - Multiple accounts can share the same facility:
fpd:g4sim,fpd:analysis,fpd:simulation
The facility-based processing analyzes how jobs are distributed across nodes:
- Single-account nodes: Nodes where all jobs belong to one account (optimal consolidation)
- Multi-account nodes: Nodes shared by multiple accounts from the same facility (fragmentation)
================================================================================
FACILITY-BASED PROCESSING
================================================================================
Analyzing job placement by facility to identify consolidation opportunities.
Found 2 unique facility/facilities: atlas, fpd
----------------------------------------------------------------------------------------------------
FACILITY: fpd
----------------------------------------------------------------------------------------------------
Accounts in facility: 3
→ fpd:g4sim@milano
→ fpd:analysis@milano
→ fpd:simulation@milano
Total pending jobs: 45
Total preemptable jobs: 120
Total jobs to kill (from account-based): 25
Node Distribution Analysis:
→ Total nodes in use: 50
→ Nodes with single account: 35 (70.0%)
→ Nodes with multiple accounts: 15 (30.0%)
→ Fragmentation score: 0.30
⚠️ High fragmentation detected - multiple accounts sharing many nodes
Nodes with multiple accounts (consolidation opportunities):
→ node001: 2 accounts (fpd:g4sim@milano, fpd:analysis@milano), 8 jobs
→ node015: 2 accounts (fpd:analysis@milano, fpd:simulation@milano), 6 jobs
→ node023: 3 accounts (fpd:g4sim@milano, fpd:analysis@milano, fpd:simulation@milano), 12 jobs
... and 12 more nodes
- Visibility: Provides insights into resource fragmentation at the facility level
- Planning: Helps administrators understand consolidation opportunities
- Efficiency: Identifies cases where better job placement could improve resource utilization
- Debugging: Helps diagnose scheduling inefficiencies related to node sharing
- Non-Invasive: Currently read-only analysis that doesn't modify kill decisions
- Facility-Aware: Respects the facility grouping implicit in account names
- Partition-Aware: Maintains partition boundaries (jobs only analyzed within their partition)
While the current implementation focuses on analysis and reporting, future enhancements could include:
- Active consolidation that modifies kill decisions to prioritize better facility-level consolidation
- Node affinity preferences when selecting which preemptable jobs to kill
- Facility-level bin packing to optimize job placement
- Cross-account coordination within facilities
- CPUs & GPUs: Aggregated across nodes (can be summed across all jobs)
- Memory: Per-node constraint (uses largest job, as jobs can run on different nodes)
- The script kills the minimum number of preemptable jobs needed to satisfy resource requirements
makeormake venv- Create virtual environment and install dependenciesmake install- Install/update dependencies in existing venv
make dry-run- Run in dry-run mode with facility matching (default)make dry-run-verbose- Run in dry-run mode with debug logging (facility)make dry-run-full-account- Run in dry-run mode with full account matchingmake dry-run-full-account-verbose- Run in dry-run mode with full account matching & verbose
make run- Run with facility matching and kill jobs (default)make run-verbose- Run with facility matching & verbose loggingmake run-full-account- Run with full account matching and kill jobs (narrower scope)make run-full-account-verbose- Run with full account matching & verbose logging
make clean- Remove virtual environment and cache filesmake help- Show help message
slurm-whacker/
├── kill_preemptable_jobs.py # Main script
├── requirements.txt # Python dependencies
├── Makefile # Build automation
├── README.md # This file
├── CHANGELOG.md # Version history
├── QUICK_REFERENCE.md # Quick usage guide
└── TESTING.md # Testing guide
- click (>=8.0.4,<8.1.0) - Command line interface framework
- loguru (>=0.7.0) - Beautiful logging with colors
- python-hostlist (>=1.21) - Slurm hostlist parsing for performance optimization
# Remove virtual environment and cache files
make clean⚠️ Always use--dry-runfirst to see what will be affected- 🏢 Facility matching is the default - this affects all repos in the facility
- Only preemptable jobs (QoS "preemptable") will be targeted
- Jobs are killed in priority order (lowest priority first), with 'default' repos preferred
- 'default' repos are killed before other repos at the same priority level
- Failed cancellations are logged but don't stop the process
This tool is provided as-is for managing Slurm workloads.
Feel free to submit issues or pull requests for improvements.