Skip to content

Add buffer caching to no_gpu CPU allocator#3554

Merged
zcbenz merged 2 commits into
ml-explore:mainfrom
dhiltgen:pr/allocator-cache
May 18, 2026
Merged

Add buffer caching to no_gpu CPU allocator#3554
zcbenz merged 2 commits into
ml-explore:mainfrom
dhiltgen:pr/allocator-cache

Conversation

@dhiltgen
Copy link
Copy Markdown
Contributor

Proposed changes

Split out from #3019

Integrate BufferCache into the CPU allocator to enable memory reuse for CPU-only builds. Previously the no_gpu allocator called malloc/free on every allocation with no caching, while the Metal and CUDA backends had buffer caching for better performance.

Track cached buffers by their physical capacity when they are reused so get_cache_memory(), active memory, and cache limit enforcement continue to reflect retained memory. Add a regression test for reusing a larger cached block for a smaller request.

Changes:

  • Add CpuCachedBuffer struct with intrusive freelist for object pooling
  • Use BufferCache to recycle freed buffers with a 32MB default cache limit
  • Preserve cached block capacity across reuse and avoid caching zero-size allocations
  • Implement get_cache_memory(), set_cache_limit(), clear_cache() (were no-ops)
  • Cache-first allocation path with fallback to OS malloc on cache miss

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

Integrate BufferCache into the CPU allocator to enable memory reuse for CPU-only builds. Previously the no_gpu allocator called malloc/free on every allocation with no caching, while the Metal and CUDA backends had buffer caching for better performance.

Track cached buffers by their physical capacity when they are reused so get_cache_memory(), active memory, and cache limit enforcement continue to reflect retained memory. Add a regression test for reusing a larger cached block for a smaller request.

Changes:
- Add CpuCachedBuffer struct with intrusive freelist for object pooling
- Use BufferCache to recycle freed buffers with a 32MB default cache limit
- Preserve cached block capacity across reuse and avoid caching zero-size allocations
- Implement get_cache_memory(), set_cache_limit(), clear_cache() (were no-ops)
- Cache-first allocation path with fallback to OS malloc on cache miss
@zcbenz zcbenz force-pushed the pr/allocator-cache branch from 26a9c8d to c79c931 Compare May 18, 2026 00:09
@zcbenz zcbenz force-pushed the pr/allocator-cache branch from c79c931 to 6b8de41 Compare May 18, 2026 00:13
@zcbenz zcbenz merged commit f831bdf into ml-explore:main May 18, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants