fix(orchestrate): #1781 apr serve startup-ready timeout — configurable + size-aware by noahgift · Pull Request #1782 · paiml/aprender

noahgift · 2026-05-18T07:15:54Z

Summary

Closes #1781. The hardcoded Duration::from_secs(30) in AprServeDriver::wait_for_ready blocked large MoE GGUFs (Qwen3-Coder-30B, 18.5 GB) from ever becoming ready — cold-cache load exceeds 30s; warm-cache is ~1s.

Root cause (5-whys)

Why did large MoE startups fail? → apr serve did not become ready within 30s
Why? → Cold-cache load of 18.5 GB Qwen3-MoE GGUF exceeds 30s
Why the 30s? → Hardcoded Duration::from_secs(30) at apr_serve.rs:143
Why hardcoded? → No env-var or model-size scaling
Why? → Designed for sub-2GB models that load in <5s

Fix

Two-axis: env override + size-aware default.

pub fn compute_ready_timeout_secs(
    model_size_bytes: Option<u64>,
    env_override: Option<&str>,
) -> u64 { ... }

Resolution order:

APR_SERVE_READY_TIMEOUT_S=N env var (operator escape hatch; clamped ≥1s)
Size-aware default: 30s baseline + 1s per 500 MB above 2 GB
Unknown size → 30s baseline

Per-model budgets under the size-aware default:

Model size	Budget
1 GB	30s (unchanged)
4 GB	34s
18 GB (Qwen3-Coder-30B)	62s
30 GB	87s

Error message updated:

apr serve did not become ready within Ns (override via APR_SERVE_READY_TIMEOUT_S)

Test plan

8 new unit tests in apr_serve_tests.rs (env precedence, clamping, invalid fallback, baseline, scaling, unknown size, env-when-size-unknown, real Qwen3-Coder-30B 18.5 GB)
apr_serve module: 17 → 25 tests GREEN
cargo check -p aprender-orchestrate clean
cargo clippy -p aprender-orchestrate --tests -- -D warnings clean
cargo fmt --check clean

Empirical evidence

paiml/claude-code-parity-apr M260 dispatch produced 15/15 student-side driver_error with this timeout
time apr serve run Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf against already-warm-cache model → ready in ~1s
Companion-side pre-warm workaround at paiml/claude-code-parity-apr M262 (PR realizar: APR transformer loader lacks Q8/Q4 dequantization for attention weights #239) — this PR obsoletes the workaround for hosts that set APR_SERVE_READY_TIMEOUT_S

Doctrine

Toyota Way: fixed at root cause instead of accepting the 30s budget as a hard constraint.

…e + size-aware Closes #1781. Hardcoded `Duration::from_secs(30)` in `AprServeDriver::wait_for_ready` blocked large MoE GGUFs from ever becoming ready — Qwen3-Coder-30B at 18.5 GB exceeds 30s on cold-cache loads. Empirical evidence: - paiml/claude-code-parity-apr M260 dispatch: 15/15 student-side driver_error with "apr serve did not become ready within 30s" - Warm-cache load (after the failed bench had mmap'd the GGUF into page cache): time-to-ready ~1s Root cause (5-whys): 1. Why apr 0/15? → apr serve did not become ready in 30s 2. Why? → Cold-cache load of 18.5GB Qwen3-MoE GGUF exceeds 30s 3. Why the 30s? → Hardcoded Duration::from_secs(30) at line 143 4. Why hardcoded? → No env-var or model-size scaling 5. Why? → Designed for sub-2GB models that load in <5s Two-axis fix: 1. APR_SERVE_READY_TIMEOUT_S env override (operator escape hatch): `APR_SERVE_READY_TIMEOUT_S=120 apr code ...` sets the budget to 120s verbatim. Clamped to minimum 1s to avoid pathological zero. Non-integer values fall through to the size-aware default. 2. Size-aware default (auto-scale by model file size): - 30s baseline + 1s per 500 MB above 2 GB - 1 GB model → 30s (unchanged) - 4 GB → 34s - 18 GB Qwen3-Coder-30B → 62s - 30 GB → 87s - Unknown size (stat failed) → 30s baseline `AprServeDriver` gains a `model_size_bytes: Option<u64>` field populated via `std::fs::metadata(&model_path)` at launch. Resolution extracted to free `pub fn compute_ready_timeout_secs(...)` so the logic is unit-testable without spawning a subprocess. Error message updated to mention the env override: "apr serve did not become ready within Ns (override via APR_SERVE_READY_TIMEOUT_S)" 8 new tests (env override precedence, clamping, invalid override fallback, small-model baseline, size-aware scaling, unknown size, env-override-when-size-unknown, real Qwen3-Coder-30B 18.5 GB size). apr_serve module: 17 → 25 tests GREEN. Workspace cargo check clean; clippy clean; fmt clean (with project's nightly-required fmt config). Companion-side workaround at paiml/claude-code-parity-apr M262 (PR #239) shipped pre-warm step in bench scripts as an immediate measure; this PR is the proper upstream fix that obsoletes the need for the workaround on hosts where APR_SERVE_READY_TIMEOUT_S can be set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…eights (#1790) Empty or undersized `weight.data` would cause a cryptic panic deep in `fused_matmul_f32`: thread '<unnamed>' panicked at matmul_fused.rs:211:54: index out of bounds: the len is 0 but the index is 56311808 Stack traces fire on every rayon worker simultaneously, with no indication that the root cause is an upstream tensor-loading bug. Most-likely root cause (per #1789): Qwen3-MoE-style models where the parent FFN tensor is registered with an empty data buffer because the actual weights live in per-expert slices (`ffn_up_exps`, `ffn_gate_exps`, `ffn_down_exps`) the GGUF loader hasn't wired in. This PR ships the DEFENSIVE GUARD only — it does NOT fix the underlying MoE F32 routing path (which is the deeper issue tracked in #1789). Instead it converts the cryptic panic into an actionable `RealizarError::InvalidShape` so the next investigator sees: matmul weight has EMPTY data buffer (in_dim=N, out_dim=M, qtype=0); likely a MoE per-expert tensor was registered with len-0 data — see aprender#1789 Two guards: 1. `weight.data.is_empty()` → InvalidShape with the empty-data hint 2. `weight.qtype == F32 && weight.data.len() < out_dim*in_dim*4` → InvalidShape with concrete have/need byte counts Guard logic extracted to free `fn validate_matmul_weight_shape(...)` so it's unit-testable without constructing a full `OwnedQuantizedModel`. 6 new unit tests covering empty data, undersized F32, correctly-sized F32, oversized F32 (padding allowed), non-F32 only-checks-emptiness, and usize-overflow protection. matmul_fused module: 0 → 6 tests GREEN. `cargo check -p aprender-serve` clean; clippy clean on lib. Empirical evidence: paiml/claude-code-parity-apr M260 dispatch + the post-#1782 re-dispatch both hit this panic. The timeout fix in #1782 unblocked startup but exposed this downstream MoE-weight bug. Filed as #1789 for the deeper MoE F32 routing fix. Does NOT fix Qwen3-Coder-30B inference yet — needs the MoE per-expert weight slicing fix tracked in #1789. This PR only stops the cryptic panic and gives actionable diagnostics. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… PROPOSED (#1794) Two-axis bump: catch up to companion-led v1.31.0 + ship Phase 6 gate in one PR. Gate registry: 18 → 20 entries. v1.31.0 SKIPPED (companion-led at companion-repo M236 / PR #221 squash 188a328 without aprender-side authoring); v1.30.0 → v1.32.0 directly, same SKIP pattern v1.28.0 → v1.30.0 used for the auto-closed aprender#1705 PR. ## FALSIFY-CCPA-019 calibration_required_before_verdict (PROPOSED) Codifies the M196-M224 4-bug-stack lesson. Any future verdict on CCPA-016/017/018 — promotion PROPOSED → ACTIVE_RUNTIME OR treating an evidence file as discharging the gate — requires a fresh calibration record (identity_pass + regression_fail, ≤30 days old) at evidence/calibration/calibration-runs.json. Bidirectional-sensitivity: a meter that ALWAYS-passes would pass identity but also pass regression (caught); a meter that ALWAYS-fails would fail regression correctly but also fail identity (caught). Freshness window catches infrastructure drift (rustc bumps, apr CLI changes, claude CLI changes) without weekly runs. Test scaffold: companion-repo crates/ccpa-differ/tests/ falsify_ccpa_019_calibration.rs (7 active synthetic + 1 #[ignore]'d live-evidence). The M234 calibration evidence (evidence/calibration/calibration- runs.json) records both the trivial in-house identity fixture + decy#39 regression dispatch; discharges the gate currently. ## FALSIFY-CCPA-020 contract_compliance_per_turn (PROPOSED) Codifies the Phase 6 operator-directive (companion-repo M250+): the right experiment for paiml-org is claude-bound-by-pmat-comply- and-pv vs apr-bound-by-pmat-comply-and-pv, NOT raw-vs-raw. Every paiml commit must pass pmat comply + pv validate to merge. Per-turn pmat comply check --strict + pv validate fire on every Write/Edit in the under-contract regime (ArenaSession::with_compliance (N)). Compound oracle (cargo test + pmat comply + pv validate) gates OraclePassed. Bidirectional sensitivity: - Identity: clean-history-with-pass MUST satisfy - Regression: pass-with-failing-compliance-turn MUST be falsified Test scaffold: companion-repo crates/ccpa-arena/tests/ falsify_ccpa_020_contract_compliance.rs (7 active synthetic + 1 #[ignore]'d live-evidence). ## Companion-side ship trail (M250-M264) M250 plan + n=20 corpus; M252 schema; M254 dispatch hook + trap; M256 compound oracle; M258 CCPA-020 gate; M260 first valid n=15 calibration evidence; M262 Toyota-Way root-cause + upstream fixes (#1782 timeout + #1790 matmul guard, both MERGED); M264 P6.6 bench runner (operator-dispatchable end-to-end). ## Activation path CCPA-019 + CCPA-020 stay PROPOSED until first operator-dispatched Phase 6 bench produces evidence/under-contract/scores.json AND a fresh calibration record. ACTIVE_RUNTIME flip awaits both. `pv validate contracts/claude-code-parity-apr-v1.yaml` clean. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 18, 2026 07:19

Merge branch 'main' into fix/apr-serve-ready-timeout-1781

08ab310

noahgift merged commit ff9d0c9 into main May 18, 2026
10 checks passed

noahgift deleted the fix/apr-serve-ready-timeout-1781 branch May 18, 2026 08:45

This was referenced May 18, 2026

apr serve: matmul_fused.rs:211 panics with 'index out of bounds: len 0' on Qwen3-Coder-30B-MoE F32 weight #1789

Closed

fix(serve): #1789 matmul defensive guard against empty / undersized weights #1790

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(orchestrate): #1781 apr serve startup-ready timeout — configurable + size-aware#1782

fix(orchestrate): #1781 apr serve startup-ready timeout — configurable + size-aware#1782
noahgift merged 2 commits into
mainfrom
fix/apr-serve-ready-timeout-1781

noahgift commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 18, 2026

Summary

Root cause (5-whys)

Fix

Test plan

Empirical evidence

Doctrine

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant