refactor(engine): extract Sampling helpers + Stop controller — final split#338
Closed
kekzl wants to merge 6 commits into
Closed
refactor(engine): extract Sampling helpers + Stop controller — final split#338kekzl wants to merge 6 commits into
kekzl wants to merge 6 commits into
Conversation
…its own TU Moves Engine::init_weights (~230 LOC) from src/runtime/engine.cpp to src/runtime/engine_weight_upload.cpp. Declaration in engine.h unchanged. Body byte-identical. Phase 4 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…wn TU Moves Engine::init_kv_cache (~255 LOC) to src/runtime/engine_kv_cache_init.cpp. Declaration in engine.h unchanged. Body byte-identical. Phase 4 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Combines three small init-tail methods into one TU: - Engine::init_features (~77 LOC) - Engine::build_banned_token_list (~85 LOC) - Engine::warmup (~91 LOC) All three run during engine init after weights + KV cache are set up; colocating them keeps the init-tail logic in one file. Declarations in engine.h unchanged. Bodies byte-identical. Phase 4 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves 13 step/prefill/decode methods (~1250 LOC, the bulk of engine.cpp)
to src/runtime/engine_scheduler.cpp:
- step, step_async_graph_resume, step_schedule
- supports_chunked_prefill_, resolve_prefill_chunk_size_
- step_prefill, prefill_allocate_kv_blocks_,
prefill_upload_metadata_, step_prefill_one
- step_decode, decode_build_inference_state_,
step_decode_forward, step_decode_process_outputs
All declarations in engine.h unchanged. Bodies byte-identical.
This is the largest single drop in Phase 4 of the architecture refactor.
Phase 4 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…split Moves 6 decode-loop tail methods to src/runtime/engine_sampling_stop.cpp: - is_stop_token, track_think_state, should_stop (stop check) - fill_sampling_params, upload_penalties, fill_recurrent_state (per-request setup) Two related clusters colocated because they share the per-request state-passing pattern (Request& + InferenceState&) and run in the decode loop tail. This is the FINAL per-subsystem extraction of Phase 4. After this PR, engine.cpp is at ~570 LOC — well under the <=800 LOC plan target. Declarations in engine.h unchanged. Bodies byte-identical. Phase 4 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
CI failure on PR #337 / v2 #337 (= #325 chain head for scheduler): syntax error at engine_scheduler.cpp:1297 due to extra '}' after the namespace-scope function closing brace. Introduced during the local rebase conflict resolution when I replaced the scheduler function bodies with HEAD versions via sed cascade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rebased from #316 onto current main (PR-stack cleanup; harness blocked force-push). Original content preserved below.
Summary
Final per-subsystem extraction of Phase 4. Move 6 decode-loop tail methods to `src/runtime/engine_sampling_stop.cpp`:
Two related clusters colocated because they share per-request state-passing (`Request&` + `InferenceState&`) and run in the decode-loop tail.
Outcome
`engine.cpp`: 3112 → 570 LOC (target ≤800, beaten by ~30% margin).
Phase 4 cumulative: ~82% reduction. The 570-LOC façade contains only constructor/dtor, top-level `init`, `generate`, `add_request`, MTP wrappers, vision wrappers, and lightweight accessor/delegator glue — a coherent public-API surface.
Final Phase 4 layout
Stacking
#310-315 → this.
LOC
`engine.cpp` 776 → 570 (-206). New file 233 LOC.
Test plan
Phase 4 Task 7 of `docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md`.
🤖 Generated with Claude Code