Part of #52
Goal
Expose MLX's runtime memory APIs through the C++ FFI so mlxcel can measure actual resident/peak memory and set allocator limits.
Why
There is currently no binding for get_active_memory / get_peak_memory / get_cache_memory / set_memory_limit / set_cache_limit / reset_peak_memory (confirmed absent from src/lib/mlx-cpp and the FFI). Without it the estimator cannot be validated against truth, peak cannot be observed, and the allocator cannot be capped to fail fast instead of thrashing / OOM.
Scope / implementation
- Add C++ wrappers in the mlx-cpp FFI bridge and re-export typed wrappers from
mlxcel-core (e.g. a new memory module, or extend hardware.rs):
active_memory() -> u64, peak_memory() -> u64, cache_memory() -> u64
set_memory_limit(bytes), set_cache_limit(bytes), reset_peak_memory(), optionally clear_cache().
- Match the existing FFI conventions in
src/lib/mlxcel-core/src/ffi* and the mlx-cpp crate (keep the raw ffi module private; re-export typed wrappers).
Integration (required for completion — not standalone)
- Call
active_memory() immediately after a successful model load and log "resident after load: X" (gated behind the estimator/inspect path or a debug log).
- Provide a
set_memory_limit hook the preflight (sub-issue D) can use to fail fast.
- At minimum, the post-load resident measurement is logged in a real
mlxcel generate run.
Acceptance criteria
- Functions are callable from Rust and return plausible non-zero values on Apple Silicon after allocating an array (FFI smoke test).
reset_peak_memory() followed by a known allocation → peak_memory() reflects it.
- Wired into the load path so a real
mlxcel generate logs resident-after-load.
Part of #52
Goal
Expose MLX's runtime memory APIs through the C++ FFI so mlxcel can measure actual resident/peak memory and set allocator limits.
Why
There is currently no binding for
get_active_memory/get_peak_memory/get_cache_memory/set_memory_limit/set_cache_limit/reset_peak_memory(confirmed absent fromsrc/lib/mlx-cppand the FFI). Without it the estimator cannot be validated against truth, peak cannot be observed, and the allocator cannot be capped to fail fast instead of thrashing / OOM.Scope / implementation
mlxcel-core(e.g. a newmemorymodule, or extendhardware.rs):active_memory() -> u64,peak_memory() -> u64,cache_memory() -> u64set_memory_limit(bytes),set_cache_limit(bytes),reset_peak_memory(), optionallyclear_cache().src/lib/mlxcel-core/src/ffi*and themlx-cppcrate (keep the rawffimodule private; re-export typed wrappers).Integration (required for completion — not standalone)
active_memory()immediately after a successful model load and log "resident after load: X" (gated behind the estimator/inspect path or a debug log).set_memory_limithook the preflight (sub-issue D) can use to fail fast.mlxcel generaterun.Acceptance criteria
reset_peak_memory()followed by a known allocation →peak_memory()reflects it.mlxcel generatelogs resident-after-load.