Skip to content

Add A/B experiment wiring for smoke-pi sub-agent decomposition#34027

Merged
pelikhan merged 2 commits into
mainfrom
copilot/ab-advisor-experiment-campaign-smoke-pi
May 22, 2026
Merged

Add A/B experiment wiring for smoke-pi sub-agent decomposition#34027
pelikhan merged 2 commits into
mainfrom
copilot/ab-advisor-experiment-campaign-smoke-pi

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 22, 2026

smoke-pi currently runs all five Pi smoke checks in a single agent turn. This change introduces an A/B experiment on sub_agent_decomposition so the workflow can compare the current single-agent path against a parallel sub-agent implementation using the built-in experiments runtime.

  • Experiment metadata

    • Adds experiments.sub_agent_decomposition to /.github/workflows/smoke-pi.md
    • Configures:
      • variants: single_agent, parallel_sub_agents
      • primary metric: effective_token_count
      • secondary and guardrail metrics
      • sample size, weights, start date, analysis type, and tags
    • Leaves the issue field as a placeholder comment until a real tracking issue number is available
  • Prompt branching

    • Replaces the single static test-requirements block with variant-specific Handlebars branches
    • parallel_sub_agents instructs the workflow to launch five background task agents, one per smoke check, then aggregate results via read_agent
    • else preserves the existing sequential single-turn behavior as the baseline
  • Compiled workflow regeneration

    • Recompiles smoke-pi so /.github/workflows/smoke-pi.lock.yml includes:
      • experiment spec serialization
      • runtime variant selection wiring
      • prompt interpolation inputs for sub_agent_decomposition

Example of the new prompt split:

{{#if experiments.sub_agent_decomposition == "parallel_sub_agents"}}
Launch five parallel `task` agents using mode: "background" to execute each smoke test independently.
{{else}}
Execute the following tests sequentially in a single turn:
{{/if}}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add A/B test for sub_agent_strategy in smoke-pi campaign Add A/B experiment wiring for smoke-pi sub-agent decomposition May 22, 2026
Copilot AI requested a review from pelikhan May 22, 2026 14:44
@pelikhan pelikhan marked this pull request as ready for review May 22, 2026 14:46
Copilot AI review requested due to automatic review settings May 22, 2026 14:46
@pelikhan pelikhan merged commit 25aa24e into main May 22, 2026
@pelikhan pelikhan deleted the copilot/ab-advisor-experiment-campaign-smoke-pi branch May 22, 2026 14:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces an A/B experiment (sub_agent_decomposition) for the smoke-pi workflow to compare the existing single-agent smoke run against a parallel background sub-agent approach, with the compiled lock workflow regenerated to include experiments runtime wiring.

Changes:

  • Added experiments.sub_agent_decomposition metadata to smoke-pi workflow frontmatter (variants, metrics, weights, analysis).
  • Branched the “Test Requirements” prompt to run either sequential tests (baseline) or five parallel background task agents with aggregation via read_agent.
  • Regenerated smoke-pi.lock.yml to serialize the experiment spec, select variants at runtime, persist experiment state, and pass variant inputs into prompt interpolation.
Show a summary per file
File Description
.github/workflows/smoke-pi.md Adds experiment spec and Handlebars branching for single-agent vs parallel sub-agent smoke execution.
.github/workflows/smoke-pi.lock.yml Recompiled workflow to include experiment selection, state restore/push, artifact wiring, and prompt env var injection.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 3

{{#if experiments.sub_agent_decomposition == "parallel_sub_agents"}}
Launch five parallel `task` agents using mode: "background" to execute each smoke test independently. Use the `task` agent type with `description` field for each:

1. **GitHub MCP Test Agent**: Fetch 2 merged PR titles from ${{ github.repository }}

1. **GitHub MCP Test Agent**: Fetch 2 merged PR titles from ${{ github.repository }}
2. **Web Fetch Test Agent**: Fetch https://github.com and verify "GitHub" in response using web-fetch MCP
3. **File I/O Test Agent**: Create `/tmp/gh-aw/agent/smoke-test-pi-${{ github.run_id }}.txt` with timestamp
Comment on lines +89 to +98
Launch five parallel `task` agents using mode: "background" to execute each smoke test independently. Use the `task` agent type with `description` field for each:

1. **GitHub MCP Test Agent**: Fetch 2 merged PR titles from ${{ github.repository }}
2. **Web Fetch Test Agent**: Fetch https://github.com and verify "GitHub" in response using web-fetch MCP
3. **File I/O Test Agent**: Create `/tmp/gh-aw/agent/smoke-test-pi-${{ github.run_id }}.txt` with timestamp
4. **Bash Test Agent**: Verify file creation with `cat` command
5. **Build Test Agent**: Run `GOCACHE=/tmp/go-cache GOMODCACHE=/tmp/go-mod make build`

Wait for all five agents to complete (you'll receive notifications). Read each agent's result using `read_agent`. Aggregate the results into a unified report with ✅/❌ status for each test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ab-advisor] Experiment campaign for smoke-pi: A/B test sub_agent_strategy

3 participants