Skip to content

refactor(memory): introduce MemoryManager façade for VRAM ownership#321

Merged
github-actions[bot] merged 3 commits into
mainfrom
refactor/memory-manager-facade
May 20, 2026
Merged

refactor(memory): introduce MemoryManager façade for VRAM ownership#321
github-actions[bot] merged 3 commits into
mainfrom
refactor/memory-manager-facade

Conversation

@kekzl
Copy link
Copy Markdown
Owner

@kekzl kekzl commented May 20, 2026

Summary

Phase 5 Track C. Introduce `MemoryManager` class in `src/memory/memory_manager.{h,cpp}` as a façade owning the engine's VRAM allocators + wrapping the budget/planner free functions. Engine field changes from `VRAMAllocator vram_alloc_` to `MemoryManager memory_manager_` — single entry point.

Stacking

#319#320 → this.

Design: façade pattern, not implementation-merge

The 5 underlying modules stay untouched:

  • `memory/vram_allocator` — owned directly by MemoryManager
  • `memory/pinned_allocator` + `memory/device_allocator` — owned via lazy `unique_ptr` (currently unused by Engine, but exposed for future use)
  • `runtime/vram_budget` + `runtime/storage_planner` — free functions wrapped as `compute_budget()` / `plan_storage_for()` methods

"Deeper merge" (kill the .cpp files and fold their bodies) would have inflated the diff by ~300 LOC and broken the façade-not-merge claim. Façade now, fold later if a second consumer ever appears.

LOC

  • New: `memory_manager.h` 97 LOC, `memory_manager.cpp` 13 LOC
  • Engine: 18 call-site migrations (17 internal + 1 executor pointer redirect at `engine.cpp:620`)

Why 18 not ~30

PinnedAllocator/DeviceAllocator were standalone classes that Engine never instantiated. `src/exec/*` has its own `vram_alloc_` pointer (GraphExecutor::set_vram_allocator), not Engine's — unchanged after this PR's redirect.

Pre-existing test failure noted

`StubModelTest.CreateContextAndInfer` segfaults; verified pre-existing by reproducing on parent commit `b746959`. Not introduced by this PR.

Test plan

  • `make build` green
  • `make verify-fast` green
  • Pre-push hook verify-fast green
  • Combined spec + code-quality reviewer ✅ Approved

Phase 5 Track C of `docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md`.

🤖 Generated with Claude Code

kekzl and others added 3 commits May 20, 2026 14:38
Phase 5 Track A. The gemma4 model-specific knobs no longer live in the
global RuntimeConfig — they belong on the model, not on the runtime.

Migration:
- New ModelConfig::Overrides::Gemma4 struct in src/model/model_config.h
- 6 caller files now read from model_->config().overrides.gemma4.*
- engine_init_resolver mutates model_->config_.overrides.gemma4.* directly
- GemmContext + GemmKernelArgs gained a `force_mmvq` field so the
  low-level GGUF small-M kernel (which has no Model in scope) can keep
  honoring the per-model override without a back-channel into the
  RuntimeConfig singleton
- RuntimeConfig::Gemma4 struct and gemma4.* parser entries removed
- imp.conf.example [gemma4] section dropped (knob is now per-model only)
- test_gemm_kernel_registry tests updated to set args.force_mmvq instead
  of mutating RuntimeConfig::current()

Eliminates the "model-name in a runtime singleton" code smell flagged
in the original Phase 5 spec.

Phase 5 Track A of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 5 Track B. imp_generate and imp_generate_streaming are now thin
wrappers that delegate to imp_prefill_with_params + a imp_decode_step
loop. Both paths share a single generate_via_prefill_decode_loop
helper, so the per-token loop, EOS / max_tokens / cancellation
handling, sampling-params translation, and request lifecycle now live
in exactly one place (the underlying primitives).

Public C-API symbol surface is unchanged (ABI-stable). Behaviour is
preserved — verified via new tests/test_api_generate.cpp and existing
DegenerationTest suite (Qwen3-8B-Q8_0).

Note: src/api/imp_api.cpp grew by +11 LOC because the new shared
helper carries documentation for the prefill-sampled-first-token
bookkeeping (consumed_output cursor reset) and the natural-finish
stop signal. The net win is the elimination of two parallel
prefill/decode loops, not raw line count — future per-token-behaviour
changes no longer need parallel edits in imp_generate and
imp_generate_streaming.

Phase 5 Track B of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 5 Track C of the architecture refactor roadmap. Engine carried a
loose `VRAMAllocator vram_alloc_` field directly and depended on the
free functions `compute_vram_budget` and `plan_storage` in
src/runtime/{vram_budget,storage_planner}.cpp. Memory ownership was
spread across the engine and two anonymous free functions.

This PR introduces `MemoryManager` in src/memory/memory_manager.{h,cpp}:

- Owns the VRAMAllocator directly (matches the only allocator the
  engine instantiates today).
- Exposes PinnedAllocator + DeviceAllocator via lazy `unique_ptr<>`
  accessors so future call sites can reach them through the façade
  without changing engine init costs (no 64-MiB pinned pool until
  something asks).
- Wraps `compute_vram_budget` as `compute_budget()` and `plan_storage`
  as `plan_storage_for()`.
- Engine now has one `MemoryManager memory_manager_` field; the
  pre-existing `Engine::vram_allocator()` accessor delegates to it for
  source-compatibility, and `Engine::memory_manager()` is the new
  primary entry point.

The five underlying modules (vram_allocator, device_allocator,
pinned_allocator, vram_budget, storage_planner) are unchanged — this
is a façade, not an implementation merge. 17 call sites in
src/runtime/engine.cpp migrated from `vram_alloc_` /
`compute_vram_budget(...)` to the new façade. Executor-internal call
sites in src/exec/ keep using their own `vram_alloc_` raw pointer
(set via `set_vram_allocator(&memory_manager_.vram_allocator())`) —
they are not Engine-scope and stay out of this façade pass.

`make verify-fast` passes (test-gpu has a pre-existing unrelated stub
crash in StubModelTest.CreateContextAndInfer, confirmed by running
the same target on the parent commit).

Phase 5 Track C of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant