refactor(config): retire RuntimeConfig::current() singleton — Track D complete#342
Closed
kekzl wants to merge 4 commits into
Closed
refactor(config): retire RuntimeConfig::current() singleton — Track D complete#342kekzl wants to merge 4 commits into
kekzl wants to merge 4 commits into
Conversation
Phase 5 Track D, partial. Engine now owns RuntimeConfig as a per-instance field; GraphExecutor holds a non-owning pointer set via set_runtime_config(). Engine::init() snapshots RuntimeConfig::current() once into runtime_config_, then engine_init_resolver helpers mutate the per-Engine copy in place for arch-specific defaults (FP8-KV deterministic_gemm, Gemma-4 deterministic fallback) instead of writing back into the singleton via install(). Migrated 61 of 94 call sites — all of Engine::* and all of GraphExecutor::*. Files touched: engine.cpp, engine_init_resolver.cpp, executor_forward.cu, executor_forward_moe.cu, executor_attention.cu, executor_ssm_gdn.cu, executor_workspace_buffers.cu, executor_pre_dequant.cu, executor_ffn.cu. GraphExecutor::runtime_config() falls back to RuntimeConfig::current() when the pointer is null so existing unit tests that construct a bare executor without an owning Engine continue to work. 33 call sites remain — all free functions in compute/, quant/, runtime/, vision/, model/, plus the executor_debug.h inline diagnostic helpers and gemm_kernel_gguf.cu / executor_kernels.cu free dispatchers. A follow-up PR will thread const RuntimeConfig& through those modules and finally delete RuntimeConfig::current() / install(). Phase 5 Track D (partial) of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Track D main commit (128eba2) removed install() calls in two resolver helpers (FP8-KV + Gemma-4 deterministic_gemm auto-promote), replacing them with per-engine mutations only. However, gemm.cu:364 is a free-function reader still on the singleton (it's part of the 33-call follow-up scope), so the skip-benchmark deterministic branch no longer fires for these two paths — silent prefill-timing variance regression. Fix: keep a transitional install() dual-write alongside the per-engine mutation. Remove when the free-function migration follow-up PR lands. Follow-up to 128eba2 on the same branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… complete
Phase 5 Track D follow-up. Migrates the 41 remaining free-function
readers off the process-wide singleton and deletes the singleton itself.
Migration shape:
* Free functions inside Engine/Executor scope (gemm_dispatch, gemm-kernel
GGUF small-M handler, NVFP4 dispatch, attention prefill dispatch, MoE
per-tile context) now take their RuntimeConfig values via the existing
GemmContext / explicit parameters threaded from runtime_config().
The gemm.q4k_imma + gemm.no_mmvq / no_mmvq_q8_0 / no_dp4a_gemv flags
flow GemmContext -> GemmKernelArgs.
* Process-wide leaf utilities (graph_diag.h, executor_debug.h, pdl.cu,
cuda_graph.cu capture-mode cache, vision_encoder.cu graph gate, gemm.cu
log/deterministic flags, ffn_sparsity_probe.cu, attention_paged.cu
splitk_pipe, attention_mxfp4_prefill.cu mode cache, model/weight_upload
Pass-2 expert-offload budget, model/chat_template.cpp debug logging,
model/hf_config_loader.cpp GDN layout override) read from a new
src/runtime/process_diag.{h,cpp} module — POD getters seeded once at
startup via process_diag_install(). Trades the per-call singleton
accessor for narrow typed reads; engine_init_resolver mutates the
deterministic_gemm slot in place when FP8-KV / Gemma-4 arch resolvers
promote it (replaces the former RuntimeConfig::install dual-write).
* Tool mains (imp-cli, imp-server) load RuntimeConfig once, call
process_diag_install() for the leaf cache, then stash for Engine
consumption via set_pending_runtime_config(). imp_context_create's
Engine::init() pulls it via take_pending_runtime_config() — short-lived
handoff bounded to one Engine construction, no per-call accessor.
Server handlers re-stash on every model swap so live /v1/models POST
reload keeps working. build_config() takes the RuntimeConfig as
parameter (replaces the in-body RuntimeConfig::current() probe for
prefix_cache).
Deletions:
* RuntimeConfig::current() static accessor (was in config.h/.cpp)
* RuntimeConfig::install() singleton setter
* The static singleton storage in config.cpp
* The transitional RuntimeConfig::install(runtime_config_) dual-write
in engine_init_resolver.cpp (added by 5747169 to bridge the gap
until gemm.cu's free-function reader was migrated)
* The null-fallback to RuntimeConfig::current() inside
GraphExecutor::runtime_config() — bare GraphExecutor (no owning
Engine, tests only) now sees a cold default RuntimeConfig instead.
Tool-main approach: option (c) — process-static slot inside imp_api.cpp's
take_pending_runtime_config() (in src/runtime/config.cpp). Lifetime is
bounded to one Engine construction; cannot be re-read after take().
Chosen over a new ImpConfig field (would break the C ABI in include/imp/)
or a third Engine::init overload (would multiply the public Engine
surface for a one-shot startup wiring).
After this PR: grep -rn 'RuntimeConfig::current()\|RuntimeConfig::install' src/ tools/ tests/
returns ZERO non-comment matches. The singleton is gone.
make build OK
make verify-fast OK (fast filter)
Phase 5 Track D (follow-up complete) of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two minor fixes from code review of the Track D follow-up: 1. engine.h:215 stale comment said "Snapshot is initialized from RuntimeConfig::current()" but actual code uses take_pending_runtime_config(). 2. process_diag dead getters: server_prefix_cache + dump_tokens were declared, mirrored, and never read (server config reads via handlers' RuntimeConfig& parameter; dump_tokens has no consumer at all). Removed. Follow-up to 7f9e459 on the same branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rebased from #324 onto current main (PR-stack cleanup; harness blocked force-push). Original content preserved below.
Summary
Phase 5 Track D follow-up. Retires the `RuntimeConfig::current()` singleton completely. 41 → 0 live calls. Singleton accessor + `install()` setter deleted from `config.{h,cpp}`.
Stacking
#319 → #320 → #321 → #322 → this.
Migration strategy
Two-pronged:
This separates per-Engine config (threaded properly) from process-wide policy (narrow typed cache) instead of having both in one global.
Tool-main handoff
Option (c) variant: process-static slot `set_pending_runtime_config()` / `take_pending_runtime_config()` for one-shot handoff from tool main to `Engine::init()`. Lifetime bounded to one Engine construction; the slot is cleared on take.
Transitional dual-write removed
Track D main (`5747169`) added `RuntimeConfig::install(runtime_config_)` calls in `engine_init_resolver` as a bridge until `gemm.cu`'s singleton reader was migrated. Now that `gemm.cu` reads via `process_diag_deterministic_gemm()`, the dual-write is gone.
LOC
38 files changed, +403 / −148. Two new files: `process_diag.{h,cpp}` (95 + 85 LOC = 180 total).
Code review follow-ups (folded in as `eccf9ae`)
Post-condition
```
$ grep -rn 'RuntimeConfig::current()|RuntimeConfig::install' src/ tools/ tests/
Only doc-comment matches remain (16 mentions describe the former existence).
```
The singleton is gone. Phase 5 Track D is fully closed. The architecture refactor roadmap is complete.
Test plan
Phase 5 Track D follow-up of `docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md`.
🤖 Generated with Claude Code