fix: remove parallel_sub_agents experiment from smoke-pi workflow by Copilot · Pull Request #37344 · github/gh-aw

Copilot · 2026-06-06T16:07:22Z

The smoke-pi workflow was failing with "No Safe Outputs Generated" ~50% of runs due to a sub_agent_decomposition A/B experiment that selected a parallel_sub_agents variant.

Problem

The parallel_sub_agents path instructed the Pi agent to launch background task agents and await async notifications — but Pi runs in single-pass mode (pi --print --mode json --no-session), so it has no task/read_agent tools and no mechanism to receive async notifications. The agent consumed ~17K tokens attempting this, then terminated without calling any safe output tool, producing an empty {"items":[]} agent output.

Changes

smoke-pi.md: Remove the sub_agent_decomposition experiment block from the frontmatter and collapse the {{#if}}…{{else}}…{{/if}} conditional to always use the sequential execution path.
smoke-pi.lock.yml: Recompiled — all experiment-related wiring (GH_AW_EXPERIMENT_SPEC, variant selection, conditional prompt rendering) is gone.

The sub_agent_decomposition experiment was selecting the parallel_sub_agents variant ~50% of runs. This path asked the Pi engine to launch background task agents and wait for async notifications, but Pi runs in single-pass mode (--no-session) and cannot receive async notifications or use task/read_agent tools. The agent would exhaust tokens attempting this but never call any safe output tool, triggering the "No Safe Outputs Generated" failure. Remove the experiment and always use the sequential execution path. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Removes the sub_agent_decomposition A/B experiment from the smoke-pi workflow so Pi runs always follow the single-pass, sequential test flow (avoiding unsupported background task/read_agent behavior that could prevent safe outputs from being produced).

Changes:

Deleted the sub_agent_decomposition experiment frontmatter from smoke-pi.md.
Removed the conditional prompt branch that attempted parallel sub-agent execution; the workflow now always runs tests sequentially.
Recompiled smoke-pi.lock.yml to eliminate experiment selection, environment wiring, artifacts, and the experiment state push job.

Show a summary per file

File	Description
.github/workflows/smoke-pi.md	Removes experiment configuration and forces sequential test execution in the Pi prompt.
.github/workflows/smoke-pi.lock.yml	Regenerated lockfile with all experiment-related steps/outputs/jobs removed.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 0

github-actions · 2026-06-06T16:24:48Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-06T16:24:49Z

🧪 Test Quality Sentinel completed test quality analysis.

No test files were added or modified in this PR. The changes are limited to workflow files (.github/workflows/smoke-pi.lock.yml and .github/workflows/smoke-pi.md). Test Quality Sentinel skipped.

github-actions · 2026-06-06T16:24:50Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-06-06T16:25:05Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #37344 does not have the 'implementation' label and has 0 new lines of code in business logic directories (≤100 threshold).

github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — approving. The fix is correct and targeted; two non-blocking suggestions worth considering.

📋 Key Themes & Highlights

Key Themes

No regression guard: The class of failure (experiment variant requiring tools unavailable in Pi single-pass mode) can silently reoccur. Two inline comments suggest lightweight options.
Stale branch cleanup: experiments/smokepi branch referenced in the deleted infrastructure likely still exists with partial experiment state. Consider deleting or archiving it.

Positive Highlights

✅ Root cause properly eliminated — not just the symptom. The parallel_sub_agents variant was fundamentally incompatible with Pi single-pass mode; the whole experiment block is gone.
✅ Clean deletion (1 addition, 156 deletions) with no new logic introduced — zero risk of substituting one failure mode for another.
✅ Security surface reduced: the removed push_experiments_state job had permissions: contents: write; that write permission is no longer granted.
✅ Lock file correctly recompiled — all experiment wiring (GH_AW_EXPERIMENT_SPEC, variant selection env vars, sub_agent_decomposition output forwarding) is consistently absent.
✅ PR description is exemplary: clearly explains the mechanism of failure, token cost, and the exact incompatibility — makes future archaeology easy.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 543.5K · 163 AIC · ⌖ 13.5 AIC

github-actions · 2026-06-06T16:32:27Z

-    tags: [cost_optimization, smoke_tests, pi_engine]
-    # issue: PLACEHOLDER_ISSUE_NUMBER
 engine:
  id: pi


[/diagnose] The id: pi config gives no indication that Pi runs in single-pass mode (--print --mode json --no-session) and therefore cannot support experiments requiring task/read_agent tools. Without this being documented here, the next engineer authoring an experiment for this workflow faces the same trap.

💡 Suggested addition

Add a comment directly below id: pi:

engine: id: pi # Pi runs in single-pass mode (--print --mode json --no-session). # It has no task/read_agent tools and cannot receive async notifications. # Experiments MUST NOT require background sub-agents. model: copilot/gpt-5.4

This makes the constraint discoverable at the point where future experiments are authored rather than only surfacing in production failures.

github-actions · 2026-06-06T16:32:27Z

-Wait for all five agents to complete (you'll receive notifications). Read each agent's result using `read_agent`. Aggregate the results into a unified report with ✅/❌ status for each test.
-
-{{else}}
 Execute the following tests sequentially in a single turn:


[/tdd] The bug was caught in production (~50% failure rate, ~17K tokens wasted per run). There is no regression test or compile-time check to detect if someone re-introduces an incompatible experiment variant for Pi.

💡 Possible guard options

Schema validation in gh aw compile: Reject experiment variants that reference task/read_agent when targeting the pi engine.

CI lint rule: A test that parses smoke-pi.md and asserts no experiments: block uses parallel_sub_agents or any variant whose prompt body contains mode: "background".

Guardrail comment in the .md source (as suggested on the engine: block): Even a prose comment near experiments: documentation noting the Pi constraint would reduce rediscovery cost.

Option 1 is most robust; Option 3 is the lowest-friction starting point.

github-actions

REQUEST_CHANGES — `pkg/agentdrain` has blocking concurrency and correctness bugs

Seven issues found in the new agentdrain package. Four are high-severity and must be fixed before merge; three are medium-severity correctness/design problems.

### Blocking issues (4 high)

Data race on cluster.Size (miner.go:150–161) — *Cluster pointer escapes the RLock window; concurrent Train increments Size while Analyze reads it. Detectable with go test -race.
TOCTOU isNew stale (miner.go:137–144) — inference and training are two separate lock acquisitions; a concurrent goroutine can create the cluster between them, making both callers report IsNewTemplate=true for the same first event.
Orphaned miner on concurrent LoadSnapshots (coordinator.go:38–44) — minerFor releases c.mu.RLock before the miner is used; if LoadSnapshots creates a new miner for that stage in the gap, training events are silently discarded.
Write lock held during full deserialization (coordinator.go:89–110) — JSON unmarshal, regex compilation, and tree rebuild all happen while c.mu.Lock() is held, blocking every TrainEvent/AnalyzeEvent call for the entire restore duration.

### Non-blocking issues (3 medium)

NextID not validated in LoadJSON (persist.go:71) — a snapshot with cluster IDs ≥ nextID will cause silent ID collisions on the next store.add.
Space in field values corrupts tokenization (mask.go:77) — k+"="+v where v contains a space produces wrong token boundaries, building incorrect templates silently.
AnomalyScore ceiling is 0.65, not 1.0 (anomaly.go:57–64) — maxScore=2.0 is never reachable given mutually exclusive flags; callers using thresholds above 0.65 will never trigger.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 419.7K · 14.3 AIC · ⌖ 19.7 AIC

Initial plan

feb6e22

Copilot AI assigned Copilot and pelikhan Jun 6, 2026

Copilot AI linked an issue Jun 6, 2026 that may be closed by this pull request

[aw] Smoke Pi failed #37343

Closed

Copilot started work on behalf of pelikhan June 6, 2026 16:07 View session

github-actions Bot mentioned this pull request Jun 6, 2026

Smoke Test: Copilot - 27067076007 #37353

Closed

Copilot AI changed the title ~~[WIP] Fix Smoke Pi workflow failure issue~~ fix: remove parallel_sub_agents experiment from smoke-pi workflow Jun 6, 2026

Copilot finished work on behalf of pelikhan June 6, 2026 16:22

Copilot AI requested a review from pelikhan June 6, 2026 16:22

pelikhan marked this pull request as ready for review June 6, 2026 16:22

Copilot AI review requested due to automatic review settings June 6, 2026 16:22

Copilot started reviewing on behalf of pelikhan June 6, 2026 16:22 View session

Copilot AI reviewed Jun 6, 2026

View reviewed changes

pelikhan merged commit 41186ce into main Jun 6, 2026
51 of 63 checks passed

pelikhan deleted the copilot/aw-smoke-pi-failure-fix branch June 6, 2026 16:24

github-actions Bot mentioned this pull request Jun 6, 2026

[aw] No-Op Runs #37180

Closed

github-actions Bot approved these changes Jun 6, 2026

View reviewed changes

github-actions Bot requested changes Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove parallel_sub_agents experiment from smoke-pi workflow#37344

fix: remove parallel_sub_agents experiment from smoke-pi workflow#37344
pelikhan merged 2 commits into
mainfrom
copilot/aw-smoke-pi-failure-fix

Copilot AI commented Jun 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Jun 6, 2026

Uh oh!

github-actions Bot Jun 6, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

github-actions Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

REQUEST_CHANGES — pkg/agentdrain has blocking concurrency and correctness bugs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 6, 2026 •

edited

Loading

github-actions Bot commented Jun 6, 2026 •

edited

Loading

github-actions Bot commented Jun 6, 2026 •

edited

Loading

github-actions Bot commented Jun 6, 2026 •

edited

Loading

github-actions Bot commented Jun 6, 2026 •

edited

Loading

REQUEST_CHANGES — `pkg/agentdrain` has blocking concurrency and correctness bugs