Slurm Whacker

A Python tool to manage preemptable Slurm jobs that are blocking pending jobs due to AssocGrpNodeLimit constraints.

Overview

This script identifies pending jobs blocked by AssocGrpNodeLimit and finds preemptable jobs running on nodes used by the same accounts. It then kills the lowest-priority preemptable jobs to free up resources.

This script is an interim fix whilst we work on a more permanent solution with SchedMD whereby AssocGrpNodeLimit blocked jobs do not appear to preempt jobs even though they should.

Slurm Account Name Format: Accounts are expected in format <facility>:<repo> or <facility>:<repo>@<cluster> (e.g., fpd:g4sim@milano).

Features

🔍 Identifies pending jobs blocked by AssocGrpNodeLimit
🎯 Finds preemptable jobs on nodes where the pending job's facility already have running jobs
📊 Kills lowest priority first, then prefers 'default' repos
🔒 Dry-run mode for safe testing
🏢 Facility-based consolidation analysis to identify node sharing patterns
📈 Fragmentation scoring to assess resource utilization efficiency

Requirements

Python 3.9+
Slurm workload manager (with squeue, scontrol, and scancel commands)
Access permissions to query and cancel Slurm jobs

Setup

Quick Start

Use the provided Makefile to set up everything automatically:

# Create virtual environment and install dependencies
make

# Or explicitly:
make dev

Usage

Using Make (Recommended)

# Dry-run mode (safe - shows what would be done)
# Default: facility matching
make dry-run

# Dry-run with verbose logging
make dry-run-verbose

# Actually kill jobs (facility matching - default)
make run

# Run with verbose logging
make run-verbose

# Show all available commands
make help

Direct Python Usage

# Activate virtual environment first
source venv/bin/activate

# Show help
python kill_preemptable_jobs.py --help

# Dry-run mode (default: facility matching)
python kill_preemptable_jobs.py --dry-run

# Dry-run with tables (shows resource details)
python kill_preemptable_jobs.py --dry-run -v

# Dry-run with full debug logging
python kill_preemptable_jobs.py --dry-run -vv

# Actually kill jobs (default: facility matching)
python kill_preemptable_jobs.py

# Kill jobs with tables
python kill_preemptable_jobs.py -v

# Kill jobs with full debug logging
python kill_preemptable_jobs.py -vv

Command Line Options

--dry-run: Show what would be done without actually killing jobs
--verbose / -v: Control output verbosity (can be repeated)
- No flag (default): Show only summaries and key status messages
- -v: Show summaries + detailed tables (resource breakdowns, job lists)
- -vv: Show summaries + tables + debug logs (all Slurm commands, data processing)

Verbosity Examples

# Minimal output - just summaries (47 lines)
python kill_preemptable_jobs.py --dry-run

# Moderate output - summaries + tables (67 lines)
python kill_preemptable_jobs.py --dry-run -v

# Full output - everything including debug info (154+ lines)
python kill_preemptable_jobs.py --dry-run -vv

How It Works

Processing Pipeline

The tool follows an 11-step pipeline to manage preemptable jobs:

Steps 1-6: Data Collection

Query Pending Jobs: Finds all pending non-preemptable jobs with AssocGrpNodeLimit reason
Filter by Account Capacity: Validates pending jobs against account capacity
Display Pending Jobs Summary: Shows summary by facility
Find Occupied Nodes: Locates all nodes needed for pending jobs
Locate Preemptable Jobs: Finds all running preemptable jobs on those nodes
Get Node Memory Information: Collects memory constraints for resource calculations

Step 7: Account-Based Processing

Process each account independently based on account limits and usage
Calculate resource requirements per partition
Determine jobs to terminate based on account limits
Sort by priority (lowest first), then prefer 'default' repos
Generate account results with kill decisions

Step 8: Facility-Based Processing

Group accounts by facility prefix (e.g., fpd:g4sim, fpd:analysis → facility fpd)
Analyze node distribution across accounts within each facility
Calculate fragmentation scores (0.0-1.0) indicating node sharing
Identify consolidation opportunities where multiple accounts share nodes
Display multi-account node details for visibility

Fragmentation Score Interpretation:

< 0.1: Low fragmentation (excellent consolidation)
0.1 - 0.3: Moderate fragmentation (some sharing)
> 0.3: High fragmentation (many nodes shared across accounts)

Step 9: Cross-Account Summary

Display total statistics across all accounts
Show per-account breakdown of kill decisions

Step 10: Display All Jobs by Account

Show detailed job listings grouped by account
Mark jobs to be killed vs preserved

Step 11: Execute Terminations

Cancel minimum number of preemptable jobs to free required resources
Or perform dry-run to show what would be done

Visual Workflow

┌─────────────────────────────────────────────────────────────────┐
│ Steps 1-6: Data Collection                                      │
│ • Get pending jobs with AssocGrpNodeLimit                       │
│ • Filter by account capacity                                    │
│ • Get nodes for pending jobs                                    │
│ • Get and filter preemptable jobs                               │
│ • Get node memory information                                   │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 7: ACCOUNT-BASED PROCESSING                                │
│ • Process each account independently                            │
│ • Calculate resource requirements per partition                 │
│ • Determine jobs to terminate based on account limits           │
│ • Generate account_results list                                 │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 8: FACILITY-BASED PROCESSING                               │
│ • Group accounts by facility prefix                             │
│ • Analyze node distribution per facility                        │
│ • Calculate fragmentation scores                                │
│ • Identify consolidation opportunities                          │
│ • Display multi-account node details                            │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 9: CROSS-ACCOUNT SUMMARY                                   │
│ • Display total statistics across all accounts                  │
│ • Show per-account breakdown of kill decisions                  │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 10: Display All Jobs by Account                            │
│ • Show detailed job listings                                    │
│ • Mark jobs to be killed vs preserved                           │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 11: Execute Terminations                                   │
│ • Kill jobs (or dry-run)                                        │
└─────────────────────────────────────────────────────────────────┘

Facility-Based Consolidation Analysis

What is a Facility?

In the account naming convention <facility>:<repo>@<partition>, the facility is the prefix before the colon. For example:

In fpd:g4sim@milano: fpd is the facility, g4sim is the repo, milano is the partition
Multiple accounts can share the same facility: fpd:g4sim, fpd:analysis, fpd:simulation

Node Distribution Analysis

The facility-based processing analyzes how jobs are distributed across nodes:

Single-account nodes: Nodes where all jobs belong to one account (optimal consolidation)
Multi-account nodes: Nodes shared by multiple accounts from the same facility (fragmentation)

Example Output

================================================================================
FACILITY-BASED PROCESSING
================================================================================
Analyzing job placement by facility to identify consolidation opportunities.

Found 2 unique facility/facilities: atlas, fpd

----------------------------------------------------------------------------------------------------
FACILITY: fpd
----------------------------------------------------------------------------------------------------
Accounts in facility: 3
  → fpd:g4sim@milano
  → fpd:analysis@milano
  → fpd:simulation@milano

Total pending jobs: 45
Total preemptable jobs: 120
Total jobs to kill (from account-based): 25

Node Distribution Analysis:
  → Total nodes in use: 50
  → Nodes with single account: 35 (70.0%)
  → Nodes with multiple accounts: 15 (30.0%)
  → Fragmentation score: 0.30

⚠️  High fragmentation detected - multiple accounts sharing many nodes

Nodes with multiple accounts (consolidation opportunities):
  → node001: 2 accounts (fpd:g4sim@milano, fpd:analysis@milano), 8 jobs
  → node015: 2 accounts (fpd:analysis@milano, fpd:simulation@milano), 6 jobs
  → node023: 3 accounts (fpd:g4sim@milano, fpd:analysis@milano, fpd:simulation@milano), 12 jobs
  ... and 12 more nodes

Benefits

Visibility: Provides insights into resource fragmentation at the facility level
Planning: Helps administrators understand consolidation opportunities
Efficiency: Identifies cases where better job placement could improve resource utilization
Debugging: Helps diagnose scheduling inefficiencies related to node sharing

Design Considerations

Non-Invasive: Currently read-only analysis that doesn't modify kill decisions
Facility-Aware: Respects the facility grouping implicit in account names
Partition-Aware: Maintains partition boundaries (jobs only analyzed within their partition)

Future Enhancement Opportunities

While the current implementation focuses on analysis and reporting, future enhancements could include:

Active consolidation that modifies kill decisions to prioritize better facility-level consolidation
Node affinity preferences when selecting which preemptable jobs to kill
Facility-level bin packing to optimize job placement
Cross-account coordination within facilities

Resource Calculation

CPUs & GPUs: Aggregated across nodes (can be summed across all jobs)
Memory: Per-node constraint (uses largest job, as jobs can run on different nodes)
The script kills the minimum number of preemptable jobs needed to satisfy resource requirements

Makefile Targets

Setup

make or make venv - Create virtual environment and install dependencies
make install - Install/update dependencies in existing venv

Dry-run (Safe Testing)

make dry-run - Run in dry-run mode with facility matching (default)
make dry-run-verbose - Run in dry-run mode with debug logging (facility)
make dry-run-full-account - Run in dry-run mode with full account matching
make dry-run-full-account-verbose - Run in dry-run mode with full account matching & verbose

Execute (Will Kill Jobs)

make run - Run with facility matching and kill jobs (default)
make run-verbose - Run with facility matching & verbose logging
make run-full-account - Run with full account matching and kill jobs (narrower scope)
make run-full-account-verbose - Run with full account matching & verbose logging

Maintenance

make clean - Remove virtual environment and cache files
make help - Show help message

Development

Project Structure

slurm-whacker/
├── kill_preemptable_jobs.py  # Main script
├── requirements.txt           # Python dependencies
├── Makefile                   # Build automation
├── README.md                  # This file
├── CHANGELOG.md               # Version history
├── QUICK_REFERENCE.md         # Quick usage guide
└── TESTING.md                 # Testing guide

Dependencies

click (>=8.0.4,<8.1.0) - Command line interface framework
loguru (>=0.7.0) - Beautiful logging with colors
python-hostlist (>=1.21) - Slurm hostlist parsing for performance optimization

Cleaning Up

# Remove virtual environment and cache files
make clean

Safety Notes

⚠️ Always use --dry-run first to see what will be affected
🏢 Facility matching is the default - this affects all repos in the facility
Only preemptable jobs (QoS "preemptable") will be targeted
Jobs are killed in priority order (lowest priority first), with 'default' repos preferred
'default' repos are killed before other repos at the same priority level
Failed cancellations are logged but don't stop the process

License

This tool is provided as-is for managing Slurm workloads.

Contributing

Feel free to submit issues or pull requests for improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
tests		tests
.gitignore		.gitignore
Makefile		Makefile
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
TESTING.md		TESTING.md
kill_preemptable_jobs.py		kill_preemptable_jobs.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Slurm Whacker

Overview

Features

Requirements

Setup

Quick Start

Usage

Using Make (Recommended)

Direct Python Usage

Command Line Options

Verbosity Examples

How It Works

Processing Pipeline

Steps 1-6: Data Collection

Step 7: Account-Based Processing

Step 8: Facility-Based Processing

Step 9: Cross-Account Summary

Step 10: Display All Jobs by Account

Step 11: Execute Terminations

Visual Workflow

Facility-Based Consolidation Analysis

What is a Facility?

Node Distribution Analysis

Example Output

Benefits

Design Considerations

Future Enhancement Opportunities

Resource Calculation

Makefile Targets

Setup

Dry-run (Safe Testing)

Execute (Will Kill Jobs)

Maintenance

Development

Project Structure

Dependencies

Cleaning Up

Safety Notes

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages