fix(orchestrate): #1781 apr serve startup-ready timeout — configurable + size-aware#1782
Merged
Conversation
…e + size-aware Closes #1781. Hardcoded `Duration::from_secs(30)` in `AprServeDriver::wait_for_ready` blocked large MoE GGUFs from ever becoming ready — Qwen3-Coder-30B at 18.5 GB exceeds 30s on cold-cache loads. Empirical evidence: - paiml/claude-code-parity-apr M260 dispatch: 15/15 student-side driver_error with "apr serve did not become ready within 30s" - Warm-cache load (after the failed bench had mmap'd the GGUF into page cache): time-to-ready ~1s Root cause (5-whys): 1. Why apr 0/15? → apr serve did not become ready in 30s 2. Why? → Cold-cache load of 18.5GB Qwen3-MoE GGUF exceeds 30s 3. Why the 30s? → Hardcoded Duration::from_secs(30) at line 143 4. Why hardcoded? → No env-var or model-size scaling 5. Why? → Designed for sub-2GB models that load in <5s Two-axis fix: 1. APR_SERVE_READY_TIMEOUT_S env override (operator escape hatch): `APR_SERVE_READY_TIMEOUT_S=120 apr code ...` sets the budget to 120s verbatim. Clamped to minimum 1s to avoid pathological zero. Non-integer values fall through to the size-aware default. 2. Size-aware default (auto-scale by model file size): - 30s baseline + 1s per 500 MB above 2 GB - 1 GB model → 30s (unchanged) - 4 GB → 34s - 18 GB Qwen3-Coder-30B → 62s - 30 GB → 87s - Unknown size (stat failed) → 30s baseline `AprServeDriver` gains a `model_size_bytes: Option<u64>` field populated via `std::fs::metadata(&model_path)` at launch. Resolution extracted to free `pub fn compute_ready_timeout_secs(...)` so the logic is unit-testable without spawning a subprocess. Error message updated to mention the env override: "apr serve did not become ready within Ns (override via APR_SERVE_READY_TIMEOUT_S)" 8 new tests (env override precedence, clamping, invalid override fallback, small-model baseline, size-aware scaling, unknown size, env-override-when-size-unknown, real Qwen3-Coder-30B 18.5 GB size). apr_serve module: 17 → 25 tests GREEN. Workspace cargo check clean; clippy clean; fmt clean (with project's nightly-required fmt config). Companion-side workaround at paiml/claude-code-parity-apr M262 (PR #239) shipped pre-warm step in bench scripts as an immediate measure; this PR is the proper upstream fix that obsoletes the need for the workaround on hosts where APR_SERVE_READY_TIMEOUT_S can be set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 18, 2026
noahgift
added a commit
that referenced
this pull request
May 18, 2026
…eights (#1790) Empty or undersized `weight.data` would cause a cryptic panic deep in `fused_matmul_f32`: thread '<unnamed>' panicked at matmul_fused.rs:211:54: index out of bounds: the len is 0 but the index is 56311808 Stack traces fire on every rayon worker simultaneously, with no indication that the root cause is an upstream tensor-loading bug. Most-likely root cause (per #1789): Qwen3-MoE-style models where the parent FFN tensor is registered with an empty data buffer because the actual weights live in per-expert slices (`ffn_up_exps`, `ffn_gate_exps`, `ffn_down_exps`) the GGUF loader hasn't wired in. This PR ships the DEFENSIVE GUARD only — it does NOT fix the underlying MoE F32 routing path (which is the deeper issue tracked in #1789). Instead it converts the cryptic panic into an actionable `RealizarError::InvalidShape` so the next investigator sees: matmul weight has EMPTY data buffer (in_dim=N, out_dim=M, qtype=0); likely a MoE per-expert tensor was registered with len-0 data — see aprender#1789 Two guards: 1. `weight.data.is_empty()` → InvalidShape with the empty-data hint 2. `weight.qtype == F32 && weight.data.len() < out_dim*in_dim*4` → InvalidShape with concrete have/need byte counts Guard logic extracted to free `fn validate_matmul_weight_shape(...)` so it's unit-testable without constructing a full `OwnedQuantizedModel`. 6 new unit tests covering empty data, undersized F32, correctly-sized F32, oversized F32 (padding allowed), non-F32 only-checks-emptiness, and usize-overflow protection. matmul_fused module: 0 → 6 tests GREEN. `cargo check -p aprender-serve` clean; clippy clean on lib. Empirical evidence: paiml/claude-code-parity-apr M260 dispatch + the post-#1782 re-dispatch both hit this panic. The timeout fix in #1782 unblocked startup but exposed this downstream MoE-weight bug. Filed as #1789 for the deeper MoE F32 routing fix. Does NOT fix Qwen3-Coder-30B inference yet — needs the MoE per-expert weight slicing fix tracked in #1789. This PR only stops the cryptic panic and gives actionable diagnostics. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 18, 2026
… PROPOSED (#1794) Two-axis bump: catch up to companion-led v1.31.0 + ship Phase 6 gate in one PR. Gate registry: 18 → 20 entries. v1.31.0 SKIPPED (companion-led at companion-repo M236 / PR #221 squash 188a328 without aprender-side authoring); v1.30.0 → v1.32.0 directly, same SKIP pattern v1.28.0 → v1.30.0 used for the auto-closed aprender#1705 PR. ## FALSIFY-CCPA-019 calibration_required_before_verdict (PROPOSED) Codifies the M196-M224 4-bug-stack lesson. Any future verdict on CCPA-016/017/018 — promotion PROPOSED → ACTIVE_RUNTIME OR treating an evidence file as discharging the gate — requires a fresh calibration record (identity_pass + regression_fail, ≤30 days old) at evidence/calibration/calibration-runs.json. Bidirectional-sensitivity: a meter that ALWAYS-passes would pass identity but also pass regression (caught); a meter that ALWAYS-fails would fail regression correctly but also fail identity (caught). Freshness window catches infrastructure drift (rustc bumps, apr CLI changes, claude CLI changes) without weekly runs. Test scaffold: companion-repo crates/ccpa-differ/tests/ falsify_ccpa_019_calibration.rs (7 active synthetic + 1 #[ignore]'d live-evidence). The M234 calibration evidence (evidence/calibration/calibration- runs.json) records both the trivial in-house identity fixture + decy#39 regression dispatch; discharges the gate currently. ## FALSIFY-CCPA-020 contract_compliance_per_turn (PROPOSED) Codifies the Phase 6 operator-directive (companion-repo M250+): the right experiment for paiml-org is claude-bound-by-pmat-comply- and-pv vs apr-bound-by-pmat-comply-and-pv, NOT raw-vs-raw. Every paiml commit must pass pmat comply + pv validate to merge. Per-turn pmat comply check --strict + pv validate fire on every Write/Edit in the under-contract regime (ArenaSession::with_compliance (N)). Compound oracle (cargo test + pmat comply + pv validate) gates OraclePassed. Bidirectional sensitivity: - Identity: clean-history-with-pass MUST satisfy - Regression: pass-with-failing-compliance-turn MUST be falsified Test scaffold: companion-repo crates/ccpa-arena/tests/ falsify_ccpa_020_contract_compliance.rs (7 active synthetic + 1 #[ignore]'d live-evidence). ## Companion-side ship trail (M250-M264) M250 plan + n=20 corpus; M252 schema; M254 dispatch hook + trap; M256 compound oracle; M258 CCPA-020 gate; M260 first valid n=15 calibration evidence; M262 Toyota-Way root-cause + upstream fixes (#1782 timeout + #1790 matmul guard, both MERGED); M264 P6.6 bench runner (operator-dispatchable end-to-end). ## Activation path CCPA-019 + CCPA-020 stay PROPOSED until first operator-dispatched Phase 6 bench produces evidence/under-contract/scores.json AND a fresh calibration record. ACTIVE_RUNTIME flip awaits both. `pv validate contracts/claude-code-parity-apr-v1.yaml` clean. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #1781. The hardcoded
Duration::from_secs(30)inAprServeDriver::wait_for_readyblocked large MoE GGUFs (Qwen3-Coder-30B, 18.5 GB) from ever becoming ready — cold-cache load exceeds 30s; warm-cache is ~1s.Root cause (5-whys)
apr serve did not become ready within 30sDuration::from_secs(30)atapr_serve.rs:143Fix
Two-axis: env override + size-aware default.
Resolution order:
APR_SERVE_READY_TIMEOUT_S=Nenv var (operator escape hatch; clamped ≥1s)Per-model budgets under the size-aware default:
Error message updated:
Test plan
apr_serve_tests.rs(env precedence, clamping, invalid fallback, baseline, scaling, unknown size, env-when-size-unknown, real Qwen3-Coder-30B 18.5 GB)cargo check -p aprender-orchestratecleancargo clippy -p aprender-orchestrate --tests -- -D warningscleancargo fmt --checkcleanEmpirical evidence
driver_errorwith this timeouttime apr serve run Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.ggufagainst already-warm-cache model → ready in ~1sAPR_SERVE_READY_TIMEOUT_SDoctrine
Toyota Way: fixed at root cause instead of accepting the 30s budget as a hard constraint.