fix: remove parallel_sub_agents experiment from smoke-pi workflow#37344
Conversation
The sub_agent_decomposition experiment was selecting the parallel_sub_agents variant ~50% of runs. This path asked the Pi engine to launch background task agents and wait for async notifications, but Pi runs in single-pass mode (--no-session) and cannot receive async notifications or use task/read_agent tools. The agent would exhaust tokens attempting this but never call any safe output tool, triggering the "No Safe Outputs Generated" failure. Remove the experiment and always use the sequential execution path. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Removes the sub_agent_decomposition A/B experiment from the smoke-pi workflow so Pi runs always follow the single-pass, sequential test flow (avoiding unsupported background task/read_agent behavior that could prevent safe outputs from being produced).
Changes:
- Deleted the
sub_agent_decompositionexperiment frontmatter fromsmoke-pi.md. - Removed the conditional prompt branch that attempted parallel sub-agent execution; the workflow now always runs tests sequentially.
- Recompiled
smoke-pi.lock.ymlto eliminate experiment selection, environment wiring, artifacts, and the experiment state push job.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/smoke-pi.md | Removes experiment configuration and forces sequential test execution in the Pi prompt. |
| .github/workflows/smoke-pi.lock.yml | Regenerated lockfile with all experiment-related steps/outputs/jobs removed. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 0
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
🧪 Test Quality Sentinel completed test quality analysis. No test files were added or modified in this PR. The changes are limited to workflow files (.github/workflows/smoke-pi.lock.yml and .github/workflows/smoke-pi.md). Test Quality Sentinel skipped. |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #37344 does not have the 'implementation' label and has 0 new lines of code in business logic directories (≤100 threshold). |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /tdd — approving. The fix is correct and targeted; two non-blocking suggestions worth considering.
📋 Key Themes & Highlights
Key Themes
- No regression guard: The class of failure (experiment variant requiring tools unavailable in Pi single-pass mode) can silently reoccur. Two inline comments suggest lightweight options.
- Stale branch cleanup:
experiments/smokepibranch referenced in the deleted infrastructure likely still exists with partial experiment state. Consider deleting or archiving it.
Positive Highlights
- ✅ Root cause properly eliminated — not just the symptom. The
parallel_sub_agentsvariant was fundamentally incompatible with Pi single-pass mode; the whole experiment block is gone. - ✅ Clean deletion (1 addition, 156 deletions) with no new logic introduced — zero risk of substituting one failure mode for another.
- ✅ Security surface reduced: the removed
push_experiments_statejob hadpermissions: contents: write; that write permission is no longer granted. - ✅ Lock file correctly recompiled — all experiment wiring (
GH_AW_EXPERIMENT_SPEC, variant selection env vars,sub_agent_decompositionoutput forwarding) is consistently absent. - ✅ PR description is exemplary: clearly explains the mechanism of failure, token cost, and the exact incompatibility — makes future archaeology easy.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 543.5K · 163 AIC · ⌖ 13.5 AIC
| tags: [cost_optimization, smoke_tests, pi_engine] | ||
| # issue: PLACEHOLDER_ISSUE_NUMBER | ||
| engine: | ||
| id: pi |
There was a problem hiding this comment.
[/diagnose] The id: pi config gives no indication that Pi runs in single-pass mode (--print --mode json --no-session) and therefore cannot support experiments requiring task/read_agent tools. Without this being documented here, the next engineer authoring an experiment for this workflow faces the same trap.
💡 Suggested addition
Add a comment directly below id: pi:
engine:
id: pi
# Pi runs in single-pass mode (--print --mode json --no-session).
# It has no task/read_agent tools and cannot receive async notifications.
# Experiments MUST NOT require background sub-agents.
model: copilot/gpt-5.4This makes the constraint discoverable at the point where future experiments are authored rather than only surfacing in production failures.
| Wait for all five agents to complete (you'll receive notifications). Read each agent's result using `read_agent`. Aggregate the results into a unified report with ✅/❌ status for each test. | ||
|
|
||
| {{else}} | ||
| Execute the following tests sequentially in a single turn: |
There was a problem hiding this comment.
[/tdd] The bug was caught in production (~50% failure rate, ~17K tokens wasted per run). There is no regression test or compile-time check to detect if someone re-introduces an incompatible experiment variant for Pi.
💡 Possible guard options
- Schema validation in
gh aw compile: Reject experiment variants that referencetask/read_agentwhen targeting thepiengine. - CI lint rule: A test that parses
smoke-pi.mdand asserts noexperiments:block usesparallel_sub_agentsor any variant whose prompt body containsmode: "background". - Guardrail comment in the
.mdsource (as suggested on theengine:block): Even a prose comment nearexperiments:documentation noting the Pi constraint would reduce rediscovery cost.
Option 1 is most robust; Option 3 is the lowest-friction starting point.
There was a problem hiding this comment.
REQUEST_CHANGES — pkg/agentdrain has blocking concurrency and correctness bugs
Seven issues found in the new agentdrain package. Four are high-severity and must be fixed before merge; three are medium-severity correctness/design problems.
### Blocking issues (4 high)
-
Data race on
cluster.Size(miner.go:150–161) —*Clusterpointer escapes the RLock window; concurrentTrainincrementsSizewhileAnalyzereads it. Detectable withgo test -race. -
TOCTOU
isNewstale (miner.go:137–144) — inference and training are two separate lock acquisitions; a concurrent goroutine can create the cluster between them, making both callers reportIsNewTemplate=truefor the same first event. -
Orphaned miner on concurrent
LoadSnapshots(coordinator.go:38–44) —minerForreleasesc.mu.RLockbefore the miner is used; ifLoadSnapshotscreates a new miner for that stage in the gap, training events are silently discarded. -
Write lock held during full deserialization (
coordinator.go:89–110) — JSON unmarshal, regex compilation, and tree rebuild all happen whilec.mu.Lock()is held, blocking everyTrainEvent/AnalyzeEventcall for the entire restore duration.
### Non-blocking issues (3 medium)
-
NextIDnot validated inLoadJSON(persist.go:71) — a snapshot with cluster IDs ≥nextIDwill cause silent ID collisions on the nextstore.add. -
Space in field values corrupts tokenization (
mask.go:77) —k+"="+vwherevcontains a space produces wrong token boundaries, building incorrect templates silently. -
AnomalyScoreceiling is 0.65, not 1.0 (anomaly.go:57–64) —maxScore=2.0is never reachable given mutually exclusive flags; callers using thresholds above 0.65 will never trigger.
🔎 Code quality review by PR Code Quality Reviewer · sonnet46 419.7K · 14.3 AIC · ⌖ 19.7 AIC
The
smoke-piworkflow was failing with "No Safe Outputs Generated" ~50% of runs due to asub_agent_decompositionA/B experiment that selected aparallel_sub_agentsvariant.Problem
The
parallel_sub_agentspath instructed the Pi agent to launch backgroundtaskagents and await async notifications — but Pi runs in single-pass mode (pi --print --mode json --no-session), so it has notask/read_agenttools and no mechanism to receive async notifications. The agent consumed ~17K tokens attempting this, then terminated without calling any safe output tool, producing an empty{"items":[]}agent output.Changes
smoke-pi.md: Remove thesub_agent_decompositionexperiment block from the frontmatter and collapse the{{#if}}…{{else}}…{{/if}}conditional to always use the sequential execution path.smoke-pi.lock.yml: Recompiled — all experiment-related wiring (GH_AW_EXPERIMENT_SPEC, variant selection, conditional prompt rendering) is gone.