refactor(memory): introduce MemoryManager façade for VRAM ownership by kekzl · Pull Request #321 · kekzl/imp

kekzl · 2026-05-20T13:16:55Z

Summary

Phase 5 Track C. Introduce `MemoryManager` class in `src/memory/memory_manager.{h,cpp}` as a façade owning the engine's VRAM allocators + wrapping the budget/planner free functions. Engine field changes from `VRAMAllocator vram_alloc_` to `MemoryManager memory_manager_` — single entry point.

Stacking

#319 → #320 → this.

Design: façade pattern, not implementation-merge

The 5 underlying modules stay untouched:

`memory/vram_allocator` — owned directly by MemoryManager
`memory/pinned_allocator` + `memory/device_allocator` — owned via lazy `unique_ptr` (currently unused by Engine, but exposed for future use)
`runtime/vram_budget` + `runtime/storage_planner` — free functions wrapped as `compute_budget()` / `plan_storage_for()` methods

"Deeper merge" (kill the .cpp files and fold their bodies) would have inflated the diff by ~300 LOC and broken the façade-not-merge claim. Façade now, fold later if a second consumer ever appears.

LOC

New: `memory_manager.h` 97 LOC, `memory_manager.cpp` 13 LOC
Engine: 18 call-site migrations (17 internal + 1 executor pointer redirect at `engine.cpp:620`)

Why 18 not ~30

PinnedAllocator/DeviceAllocator were standalone classes that Engine never instantiated. `src/exec/*` has its own `vram_alloc_` pointer (GraphExecutor::set_vram_allocator), not Engine's — unchanged after this PR's redirect.

Pre-existing test failure noted

`StubModelTest.CreateContextAndInfer` segfaults; verified pre-existing by reproducing on parent commit `b746959`. Not introduced by this PR.

Test plan

`make build` green
`make verify-fast` green
Pre-push hook verify-fast green
Combined spec + code-quality reviewer ✅ Approved

Phase 5 Track C of `docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md`.

🤖 Generated with Claude Code

Phase 5 Track A. The gemma4 model-specific knobs no longer live in the global RuntimeConfig — they belong on the model, not on the runtime. Migration: - New ModelConfig::Overrides::Gemma4 struct in src/model/model_config.h - 6 caller files now read from model_->config().overrides.gemma4.* - engine_init_resolver mutates model_->config_.overrides.gemma4.* directly - GemmContext + GemmKernelArgs gained a `force_mmvq` field so the low-level GGUF small-M kernel (which has no Model in scope) can keep honoring the per-model override without a back-channel into the RuntimeConfig singleton - RuntimeConfig::Gemma4 struct and gemma4.* parser entries removed - imp.conf.example [gemma4] section dropped (knob is now per-model only) - test_gemm_kernel_registry tests updated to set args.force_mmvq instead of mutating RuntimeConfig::current() Eliminates the "model-name in a runtime singleton" code smell flagged in the original Phase 5 spec. Phase 5 Track A of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 5 Track B. imp_generate and imp_generate_streaming are now thin wrappers that delegate to imp_prefill_with_params + a imp_decode_step loop. Both paths share a single generate_via_prefill_decode_loop helper, so the per-token loop, EOS / max_tokens / cancellation handling, sampling-params translation, and request lifecycle now live in exactly one place (the underlying primitives). Public C-API symbol surface is unchanged (ABI-stable). Behaviour is preserved — verified via new tests/test_api_generate.cpp and existing DegenerationTest suite (Qwen3-8B-Q8_0). Note: src/api/imp_api.cpp grew by +11 LOC because the new shared helper carries documentation for the prefill-sampled-first-token bookkeeping (consumed_output cursor reset) and the natural-finish stop signal. The net win is the elimination of two parallel prefill/decode loops, not raw line count — future per-token-behaviour changes no longer need parallel edits in imp_generate and imp_generate_streaming. Phase 5 Track B of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 5 Track C of the architecture refactor roadmap. Engine carried a loose `VRAMAllocator vram_alloc_` field directly and depended on the free functions `compute_vram_budget` and `plan_storage` in src/runtime/{vram_budget,storage_planner}.cpp. Memory ownership was spread across the engine and two anonymous free functions. This PR introduces `MemoryManager` in src/memory/memory_manager.{h,cpp}: - Owns the VRAMAllocator directly (matches the only allocator the engine instantiates today). - Exposes PinnedAllocator + DeviceAllocator via lazy `unique_ptr<>` accessors so future call sites can reach them through the façade without changing engine init costs (no 64-MiB pinned pool until something asks). - Wraps `compute_vram_budget` as `compute_budget()` and `plan_storage` as `plan_storage_for()`. - Engine now has one `MemoryManager memory_manager_` field; the pre-existing `Engine::vram_allocator()` accessor delegates to it for source-compatibility, and `Engine::memory_manager()` is the new primary entry point. The five underlying modules (vram_allocator, device_allocator, pinned_allocator, vram_budget, storage_planner) are unchanged — this is a façade, not an implementation merge. 17 call sites in src/runtime/engine.cpp migrated from `vram_alloc_` / `compute_vram_budget(...)` to the new façade. Executor-internal call sites in src/exec/ keep using their own `vram_alloc_` raw pointer (set via `set_vram_allocator(&memory_manager_.vram_allocator())`) — they are not Engine-scope and stay out of this façade pass. `make verify-fast` passes (test-gpu has a pre-existing unrelated stub crash in StubModelTest.CreateContextAndInfer, confirmed by running the same target on the parent commit). Phase 5 Track C of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kekzl and others added 3 commits May 20, 2026 14:38

github-actions Bot enabled auto-merge (squash) May 20, 2026 13:17

github-actions Bot merged commit b1802c3 into main May 20, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(memory): introduce MemoryManager façade for VRAM ownership#321

refactor(memory): introduce MemoryManager façade for VRAM ownership#321
github-actions[bot] merged 3 commits into
mainfrom
refactor/memory-manager-facade

kekzl commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kekzl commented May 20, 2026

Summary

Stacking

Design: façade pattern, not implementation-merge

LOC

Why 18 not ~30

Pre-existing test failure noted

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant