-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Summary
In environments where no Metal device is visible (headless/sandboxed/virtualized/macOS automation sessions),
MLX initialization aborts the process with an uncaught Objective-C exception instead of returning
a recoverable Python error.
This makes downstream tooling fail hard, including quality/lint/test pipelines that do not need GPU execution.
Reproduction
- Run in a session with no visible Metal device.
- Execute:
python -c "import mlx.core as mx; print(mx.default_device())"(Equivalent failure also occurs during import mlx via dependency probes.)
Actual behavior
Process exits with signal/abort (-6) and an uncaught exception similar to:
NSRangeException: -[__NSArray0 objectAtIndex:]: index 0 beyond bounds for empty array
Expected behavior
- No hard abort.
- Either:
- raise a clear Python exception (e.g.,
RuntimeError: No Metal device available), or - allow a documented CPU/no-op mode for import-time checks.
- raise a clear Python exception (e.g.,
- Error path should be machine-detectable so CI tools can handle it gracefully.
Why this matters
Hard aborts break non-inference workflows (lint/type/test/packaging checks) when MLX is installed
but GPU is unavailable. A recoverable error would allow callers to skip MLX-dependent runtime tests
without crashing the whole process.
Suggested fix
- Guard the zero-device path before indexing into device arrays.
load_device()inmlx/backend/metal/device.cppis the primary fix site (empty-device guard + graceful error path). - Convert this failure path to a typed Python exception instead of process termination.
- Optionally provide an env flag to skip Metal probing at import time in CI/headless contexts.
The main code path is:
-
mlx/backend/metal/device.cppload_device()
It currently doesdevices->object(0)without checking ifCopyAllDevices()returned an empty list, which matches theNSRangeExceptionwe're seeing. -
mlx/backend/metal/device.cppDevice::Device()
This callsload_device()during backend init. -
mlx/device.cpp
default_device_is initialized at static init time viametal::is_available(), so any hard failure in Metal probing can crash import-time flows instead of surfacing a recoverable error.
Related existing report: Issue #2691.