Skip to content

refactor(pre-dequant): extract Phase 0 + 0b (NVFP4 loader) to its own TU#305

Closed
kekzl wants to merge 5 commits into
mainfrom
refactor/pre-dequant-extract-phase0-nvfp4-loader
Closed

refactor(pre-dequant): extract Phase 0 + 0b (NVFP4 loader) to its own TU#305
kekzl wants to merge 5 commits into
mainfrom
refactor/pre-dequant-extract-phase0-nvfp4-loader

Conversation

@kekzl
Copy link
Copy Markdown
Owner

@kekzl kekzl commented May 20, 2026

Summary

Move `GraphExecutor::pre_dequant_phase0_promote_nvfp4_sidecars_` + `GraphExecutor::pre_dequant_phase0b_register_cutlass_nvfp4_` to a combined TU `src/exec/pre_dequant_phase0_nvfp4_loader.cu`. Both phases are NVFP4 loader-side concerns running consecutively — colocated by design.

Stacking

Stack chain: #301#302#303#304 → this. Rebase as parents merge.

LOC

`executor_pre_dequant.cu` 1891 → 1567 (-324). New file 350 LOC.

Test plan

  • `make build` green
  • `make verify-fast` green
  • Pre-push hook verify-fast green
  • Combined spec + code-quality reviewer ✅ Approved

Phase 3 Task 5 of `docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md`.

🤖 Generated with Claude Code

kekzl and others added 5 commits May 20, 2026 11:54
Anonymous-namespace and file-scope static helpers
(infer_tier_from_wcache, borrow_payload_from_wcache, nvfp4_beneficial,
deduct_budget, create_fused_weight_pair, for_each_dense_weight) move
to a new header src/exec/pre_dequant_internal.h so subsequent
per-phase extractions can share them across translation units.

No behavior change. executor_pre_dequant.cu still contains every phase
function -- those move to their own TUs in Tasks 2-7.

Phase 3 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves GraphExecutor::pre_dequant_phase1_fp16_cache_ from
src/exec/executor_pre_dequant.cu to a new TU
src/exec/pre_dequant_phase1_fp16_cache.cu.

The declaration in src/exec/executor.h is unchanged. The function body
is byte-identical (verified by reading both before and after).

Phase 3 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves GraphExecutor::pre_dequant_phase2_fp8_cache_ from
src/exec/executor_pre_dequant.cu to a new TU
src/exec/pre_dequant_phase2_fp8_cache.cu.

Declaration in src/exec/executor.h unchanged. Function body byte-identical.

Phase 3 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves GraphExecutor::pre_dequant_phase4_tensor_registry_ (~378 LOC)
from src/exec/executor_pre_dequant.cu to a new TU
src/exec/pre_dequant_phase4_tensor_registry.cu.

Declaration in src/exec/executor.h unchanged. Function body byte-identical.

Phase 3 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves GraphExecutor::pre_dequant_phase0_promote_nvfp4_sidecars_
(~230 LOC) and GraphExecutor::pre_dequant_phase0b_register_cutlass_nvfp4_
(~93 LOC) to a combined file
src/exec/pre_dequant_phase0_nvfp4_loader.cu.

Both phases are NVFP4 loader-side concerns that run consecutively;
colocating them keeps related logic in one place. Declarations
unchanged, bodies byte-identical.

Phase 3 of docs/superpowers/specs/2026-05-20-architecture-refactor-roadmap-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kekzl
Copy link
Copy Markdown
Owner Author

kekzl commented May 20, 2026

Superseded by #330 — rebased onto current main since this branch couldn't be force-pushed via the harness.

@kekzl kekzl closed this May 20, 2026
auto-merge was automatically disabled May 20, 2026 15:42

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant