-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Describe the bug
LoRA fine-tuning crashes with kIOGPUCommandBufferCallbackErrorImpactingInteractivity when the laptop display is active. The macOS Metal GPU watchdog kills the training process because GPU command buffers from MLX block WindowServer display compositing. The crash is 100% reproducible with the display on (4/4 runs across macOS 26.2 and 26.3.1) and 100% avoidable with the display off (closing the lid with caffeinate -s to prevent sleep, this eliminates WindowServer GPU compositing, so the watchdog's "impacting interactivity" check has nothing to protect and never fires).
The workload is minimal: 2.75GB peak memory on a 16GB system (17% utilization), batch_size=1, max_seq_length=256, every parameter at or more conservative than mlx-lm defaults.
To Reproduce
pip install mlx-lm
# Training data: 105 JSONL chat examples, ~256 tokens each
# (Qwen format with system/user/assistant messages containing <think> + <tool_call> blocks)
# Short examples (~40 tokens, ~3.0 it/sec) do NOT crash
# Long examples (~256 tokens, ~0.8 it/sec) crash every time
python -m mlx_lm lora \
--model mlx-community/Qwen3.5-2B-OptiQ-4bit \
--train --data ./data \
--config lora_config.yaml \
--adapter-path ./adapters
# Crashes within 1-5 minutes:
# libc++abi: terminating due to uncaught exception of type std::runtime_error:
# [METAL] Command buffer execution failed: Impacting Interactivity
# (0000000e:kIOGPUCommandBufferCallbackErrorImpactingInteractivity)lora_config.yaml:
model: mlx-community/Qwen3.5-2B-OptiQ-4bit
fine_tune_type: lora
batch_size: 1
num_layers: 8
lora_parameters:
rank: 8
scale: 20.0
dropout: 0.0
iters: 500
steps_per_eval: 500
learning_rate: 1e-5
grad_checkpoint: true
mask_prompt: true
max_seq_length: 256Results across 5 runs:
| Run | macOS | Display | Background | Result |
|---|---|---|---|---|
| 1 | 26.2 | Open | Minimal | Crashed iter ~180 |
| 2 | 26.2 | Open | Normal OS activity | Crashed iter ~53 |
| 3 | 26.2 | Open | Normal OS activity | Crashed iter ~43 |
| 4 | 26.3.1 | Open | Normal OS activity | Crashed iter ~1 |
| 5 | 26.3.1 | Closed | caffeinate -s | Completed 500/500 |
Expected behavior
Training completes all 500 iterations. The workload uses 17% of available GPU memory with the most conservative possible settings.
Desktop (please complete the following information):
- OS Version: macOS 26.2 Tahoe and macOS 26.3.1 Tahoe
- MLX Version: 0.31.1
- mlx-lm Version: 0.31.1
- Hardware: MacBook Pro M2 Pro, 16GB unified memory
- Python: 3.14.3
Additional context
MLX_MAX_OPS_PER_BUFFER=1andMLX_MAX_MB_PER_BUFFER=10do not prevent the crash with the display active- Normal background OS activity and rendering windows accelerates the crash, more GPU contention from WindowServer compositing means earlier crashes
- The crash does not occur with short training examples (~40 tokens, 1.75GB peak, ~3.0 it/sec), only with longer sequences (~256 tokens, 2.75GB peak, ~0.8 it/sec) where individual GPU operations take ~1.2 seconds
- Workaround:
caffeinate -s &then close the laptop lid. With no active display, WindowServer does not need GPU time for compositing, so the watchdog never triggers. Training completes normally. - Training data and config available on request for reproduction