Skip to content

contract(trace-moe-gpu-sub-stages-v1): v1.2.0 → v1.3.0 — M-MOE-SUB cascade complete on main#1525

Merged
noahgift merged 1 commit intomainfrom
contract/trace-moe-gpu-sub-stages-v1.3.0-cascade-complete
May 6, 2026
Merged

contract(trace-moe-gpu-sub-stages-v1): v1.2.0 → v1.3.0 — M-MOE-SUB cascade complete on main#1525
noahgift merged 1 commit intomainfrom
contract/trace-moe-gpu-sub-stages-v1.3.0-cascade-complete

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

@noahgift noahgift commented May 6, 2026

Summary

Promotes contract trace-moe-gpu-sub-stages-v1 status from PROPOSED → ACTIVE_ALGORITHM_LEVEL after all five cascade PRs land. Records SHIPPED status for M-MOE-SUB-1, M-MOE-SUB-2 (a + b + c + c.gpu), and M-MOE-SUB-3 (harness). M-MOE-SUB-4 stays PENDING (optional).

Cited PRs (all merged)

PR What Step
#1507 moe_ffn_forward_layer_with_router (CPU helper) (c)
#1516 forward_qwen3_moe_traced_with_plan (CPU body) (a)
#1521 apr trace --save-tensor GGUF MoE CLI wireup (a) CLI
#1522 moe_ffn_forward_layer_cuda_with_router (GPU helper) (c.gpu)
#1523 forward_qwen3_moe_cuda_traced[_with_plan] (GPU body) (b)
#1524 Heavy CPU-vs-GPU per-stage diff harness M-MOE-SUB-3

What's left

  • Operator-dispatched run of falsify_moe_sub_002_cpu_gpu_traced_per_stage_diff on lambda-vector RTX 4090 + cached 17.3 GB Qwen3-Coder GGUF (~30-60 min wall) → produces the layer-by-layer divergence table.
  • M-MOE-SUB-3 ALGORITHM_LEVEL → FUNCTIONAL upon operator run.
  • FALSIFY-MOE-SUB-003 → DISCHARGED gated on M-GPU-MOE-1.4 root-cause fix (table identifies WHERE; fix is a separate PR class).

Test plan

  • pv validate contracts/trace-moe-gpu-sub-stages-v1.yaml → 0 errors, 0 warnings
  • All 5 PRs verified MERGED via gh pr view <id> --json mergedAt
  • Auto-merge once required checks pass

🤖 Generated with Claude Code

…scade complete on main

Promotes status PROPOSED → ACTIVE_ALGORITHM_LEVEL after all 5 cascade
PRs land. M-MOE-SUB-1, M-MOE-SUB-2 (a + b + c + c.gpu), M-MOE-SUB-3
(harness) status: PENDING → SHIPPED. M-MOE-SUB-4 stays PENDING
(optional, only needed if M-MOE-SUB-3's diff doesn't pinpoint at
MoeRouter / MoeFfnOut granularity).

Cited PRs (chronological)
=========================

- #1507 — moe_ffn_forward_layer_with_router (CPU helper, step c)
- #1516 — forward_qwen3_moe_traced_with_plan (CPU body, step a)
- #1521 — apr trace --save-tensor GGUF MoE CLI wireup (step a CLI)
- #1522 — moe_ffn_forward_layer_cuda_with_router (GPU helper, step c.gpu)
- #1523 — forward_qwen3_moe_cuda_traced (GPU body, step b)
- #1524 — heavy diff harness (M-MOE-SUB-3)

What's left
===========

- Operator-dispatched run of `falsify_moe_sub_002_cpu_gpu_traced_per_stage_diff`
  on lambda-vector RTX 4090 + cached 17.3 GB Qwen3-Coder GGUF
  (~30-60 min wall) → produces layer-by-layer divergence table.
- M-MOE-SUB-3 ALGORITHM_LEVEL → FUNCTIONAL upon operator run.
- FALSIFY-MOE-SUB-003 → DISCHARGED gated on M-GPU-MOE-1.4 root-cause fix.

Refs: contracts/trace-moe-gpu-sub-stages-v1.yaml
Refs: qwen3-moe-forward-gpu-v1 v1.4.0 M-GPU-MOE-1.4

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 6, 2026 01:19
@noahgift noahgift merged commit 1ad1383 into main May 6, 2026
19 of 21 checks passed
@noahgift noahgift deleted the contract/trace-moe-gpu-sub-stages-v1.3.0-cascade-complete branch May 6, 2026 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant