Skip to content

fix: remove parallel_sub_agents experiment from smoke-pi workflow#37344

Merged
pelikhan merged 2 commits into
mainfrom
copilot/aw-smoke-pi-failure-fix
Jun 6, 2026
Merged

fix: remove parallel_sub_agents experiment from smoke-pi workflow#37344
pelikhan merged 2 commits into
mainfrom
copilot/aw-smoke-pi-failure-fix

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jun 6, 2026

The smoke-pi workflow was failing with "No Safe Outputs Generated" ~50% of runs due to a sub_agent_decomposition A/B experiment that selected a parallel_sub_agents variant.

Problem

The parallel_sub_agents path instructed the Pi agent to launch background task agents and await async notifications — but Pi runs in single-pass mode (pi --print --mode json --no-session), so it has no task/read_agent tools and no mechanism to receive async notifications. The agent consumed ~17K tokens attempting this, then terminated without calling any safe output tool, producing an empty {"items":[]} agent output.

Changes

  • smoke-pi.md: Remove the sub_agent_decomposition experiment block from the frontmatter and collapse the {{#if}}…{{else}}…{{/if}} conditional to always use the sequential execution path.
  • smoke-pi.lock.yml: Recompiled — all experiment-related wiring (GH_AW_EXPERIMENT_SPEC, variant selection, conditional prompt rendering) is gone.

Copilot AI linked an issue Jun 6, 2026 that may be closed by this pull request
The sub_agent_decomposition experiment was selecting the
parallel_sub_agents variant ~50% of runs. This path asked the Pi engine
to launch background task agents and wait for async notifications, but Pi
runs in single-pass mode (--no-session) and cannot receive async
notifications or use task/read_agent tools. The agent would exhaust tokens
attempting this but never call any safe output tool, triggering the
"No Safe Outputs Generated" failure.

Remove the experiment and always use the sequential execution path.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix Smoke Pi workflow failure issue fix: remove parallel_sub_agents experiment from smoke-pi workflow Jun 6, 2026
Copilot AI requested a review from pelikhan June 6, 2026 16:22
@pelikhan pelikhan marked this pull request as ready for review June 6, 2026 16:22
Copilot AI review requested due to automatic review settings June 6, 2026 16:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the sub_agent_decomposition A/B experiment from the smoke-pi workflow so Pi runs always follow the single-pass, sequential test flow (avoiding unsupported background task/read_agent behavior that could prevent safe outputs from being produced).

Changes:

  • Deleted the sub_agent_decomposition experiment frontmatter from smoke-pi.md.
  • Removed the conditional prompt branch that attempted parallel sub-agent execution; the workflow now always runs tests sequentially.
  • Recompiled smoke-pi.lock.yml to eliminate experiment selection, environment wiring, artifacts, and the experiment state push job.
Show a summary per file
File Description
.github/workflows/smoke-pi.md Removes experiment configuration and forces sequential test execution in the Pi prompt.
.github/workflows/smoke-pi.lock.yml Regenerated lockfile with all experiment-related steps/outputs/jobs removed.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

@pelikhan pelikhan merged commit 41186ce into main Jun 6, 2026
51 of 63 checks passed
@pelikhan pelikhan deleted the copilot/aw-smoke-pi-failure-fix branch June 6, 2026 16:24
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

🧪 Test Quality Sentinel completed test quality analysis.

No test files were added or modified in this PR. The changes are limited to workflow files (.github/workflows/smoke-pi.lock.yml and .github/workflows/smoke-pi.md). Test Quality Sentinel skipped.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

PR Code Quality Reviewer completed the code quality review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #37344 does not have the 'implementation' label and has 0 new lines of code in business logic directories (≤100 threshold).

@github-actions github-actions Bot mentioned this pull request Jun 6, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — approving. The fix is correct and targeted; two non-blocking suggestions worth considering.

📋 Key Themes & Highlights

Key Themes

  • No regression guard: The class of failure (experiment variant requiring tools unavailable in Pi single-pass mode) can silently reoccur. Two inline comments suggest lightweight options.
  • Stale branch cleanup: experiments/smokepi branch referenced in the deleted infrastructure likely still exists with partial experiment state. Consider deleting or archiving it.

Positive Highlights

  • ✅ Root cause properly eliminated — not just the symptom. The parallel_sub_agents variant was fundamentally incompatible with Pi single-pass mode; the whole experiment block is gone.
  • ✅ Clean deletion (1 addition, 156 deletions) with no new logic introduced — zero risk of substituting one failure mode for another.
  • ✅ Security surface reduced: the removed push_experiments_state job had permissions: contents: write; that write permission is no longer granted.
  • ✅ Lock file correctly recompiled — all experiment wiring (GH_AW_EXPERIMENT_SPEC, variant selection env vars, sub_agent_decomposition output forwarding) is consistently absent.
  • ✅ PR description is exemplary: clearly explains the mechanism of failure, token cost, and the exact incompatibility — makes future archaeology easy.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 543.5K · 163 AIC · ⌖ 13.5 AIC

tags: [cost_optimization, smoke_tests, pi_engine]
# issue: PLACEHOLDER_ISSUE_NUMBER
engine:
id: pi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The id: pi config gives no indication that Pi runs in single-pass mode (--print --mode json --no-session) and therefore cannot support experiments requiring task/read_agent tools. Without this being documented here, the next engineer authoring an experiment for this workflow faces the same trap.

💡 Suggested addition

Add a comment directly below id: pi:

engine:
  id: pi
  # Pi runs in single-pass mode (--print --mode json --no-session).
  # It has no task/read_agent tools and cannot receive async notifications.
  # Experiments MUST NOT require background sub-agents.
  model: copilot/gpt-5.4

This makes the constraint discoverable at the point where future experiments are authored rather than only surfacing in production failures.

Wait for all five agents to complete (you'll receive notifications). Read each agent's result using `read_agent`. Aggregate the results into a unified report with ✅/❌ status for each test.

{{else}}
Execute the following tests sequentially in a single turn:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The bug was caught in production (~50% failure rate, ~17K tokens wasted per run). There is no regression test or compile-time check to detect if someone re-introduces an incompatible experiment variant for Pi.

💡 Possible guard options
  1. Schema validation in gh aw compile: Reject experiment variants that reference task/read_agent when targeting the pi engine.
  2. CI lint rule: A test that parses smoke-pi.md and asserts no experiments: block uses parallel_sub_agents or any variant whose prompt body contains mode: "background".
  3. Guardrail comment in the .md source (as suggested on the engine: block): Even a prose comment near experiments: documentation noting the Pi constraint would reduce rediscovery cost.

Option 1 is most robust; Option 3 is the lowest-friction starting point.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — pkg/agentdrain has blocking concurrency and correctness bugs

Seven issues found in the new agentdrain package. Four are high-severity and must be fixed before merge; three are medium-severity correctness/design problems.

### Blocking issues (4 high)
  1. Data race on cluster.Size (miner.go:150–161) — *Cluster pointer escapes the RLock window; concurrent Train increments Size while Analyze reads it. Detectable with go test -race.

  2. TOCTOU isNew stale (miner.go:137–144) — inference and training are two separate lock acquisitions; a concurrent goroutine can create the cluster between them, making both callers report IsNewTemplate=true for the same first event.

  3. Orphaned miner on concurrent LoadSnapshots (coordinator.go:38–44) — minerFor releases c.mu.RLock before the miner is used; if LoadSnapshots creates a new miner for that stage in the gap, training events are silently discarded.

  4. Write lock held during full deserialization (coordinator.go:89–110) — JSON unmarshal, regex compilation, and tree rebuild all happen while c.mu.Lock() is held, blocking every TrainEvent/AnalyzeEvent call for the entire restore duration.

### Non-blocking issues (3 medium)
  1. NextID not validated in LoadJSON (persist.go:71) — a snapshot with cluster IDs ≥ nextID will cause silent ID collisions on the next store.add.

  2. Space in field values corrupts tokenization (mask.go:77) — k+"="+v where v contains a space produces wrong token boundaries, building incorrect templates silently.

  3. AnomalyScore ceiling is 0.65, not 1.0 (anomaly.go:57–64) — maxScore=2.0 is never reachable given mutually exclusive flags; callers using thresholds above 0.65 will never trigger.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 419.7K · 14.3 AIC · ⌖ 19.7 AIC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[aw] Smoke Pi failed

3 participants