Add buffer caching to no_gpu CPU allocator by dhiltgen · Pull Request #3554 · ml-explore/mlx

dhiltgen · 2026-05-16T15:58:54Z

Proposed changes

Split out from #3019

Integrate BufferCache into the CPU allocator to enable memory reuse for CPU-only builds. Previously the no_gpu allocator called malloc/free on every allocation with no caching, while the Metal and CUDA backends had buffer caching for better performance.

Track cached buffers by their physical capacity when they are reused so get_cache_memory(), active memory, and cache limit enforcement continue to reflect retained memory. Add a regression test for reusing a larger cached block for a smaller request.

Changes:

Add CpuCachedBuffer struct with intrusive freelist for object pooling
Use BufferCache to recycle freed buffers with a 32MB default cache limit
Preserve cached block capacity across reuse and avoid caching zero-size allocations
Implement get_cache_memory(), set_cache_limit(), clear_cache() (were no-ops)
Cache-first allocation path with fallback to OS malloc on cache miss

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

zcbenz

Looks good to me, thanks!

Integrate BufferCache into the CPU allocator to enable memory reuse for CPU-only builds. Previously the no_gpu allocator called malloc/free on every allocation with no caching, while the Metal and CUDA backends had buffer caching for better performance. Track cached buffers by their physical capacity when they are reused so get_cache_memory(), active memory, and cache limit enforcement continue to reflect retained memory. Add a regression test for reusing a larger cached block for a smaller request. Changes: - Add CpuCachedBuffer struct with intrusive freelist for object pooling - Use BufferCache to recycle freed buffers with a 32MB default cache limit - Preserve cached block capacity across reuse and avoid caching zero-size allocations - Implement get_cache_memory(), set_cache_limit(), clear_cache() (were no-ops) - Cache-first allocation path with fallback to OS malloc on cache miss

zcbenz approved these changes May 17, 2026

View reviewed changes

zcbenz force-pushed the pr/allocator-cache branch from 26a9c8d to c79c931 Compare May 18, 2026 00:09

Avoid extra allocation

6b8de41

zcbenz force-pushed the pr/allocator-cache branch from c79c931 to 6b8de41 Compare May 18, 2026 00:13

zcbenz merged commit f831bdf into ml-explore:main May 18, 2026
16 checks passed

dhiltgen mentioned this pull request May 19, 2026

Synchronize no-GPU cache eviction with CPU streams #3566

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add buffer caching to no_gpu CPU allocator#3554

Add buffer caching to no_gpu CPU allocator#3554
zcbenz merged 2 commits into
ml-explore:mainfrom
dhiltgen:pr/allocator-cache

dhiltgen commented May 16, 2026

Uh oh!

zcbenz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dhiltgen commented May 16, 2026

Proposed changes

Checklist

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants