Skip to content

[ab-advisor] Add sub_agent_strategy A/B experiment to smoke-temporary-id workflow#34020

Merged
pelikhan merged 3 commits into
mainfrom
copilot/ab-advisor-experiment-campaign-smoke-temporary-id
May 22, 2026
Merged

[ab-advisor] Add sub_agent_strategy A/B experiment to smoke-temporary-id workflow#34020
pelikhan merged 3 commits into
mainfrom
copilot/ab-advisor-experiment-campaign-smoke-temporary-id

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 22, 2026

Implements the sub_agent_strategy experiment campaign on smoke-temporary-id, testing whether decomposing issue creation into parallel sub-agents reduces token consumption vs. the current single-agent approach.

Frontmatter

Added rich experiments.sub_agent_strategy block:

  • Variants: single_agent (control) / sub_agents (treatment), 50/50 weight
  • Primary metric: effective_token_count; secondary: run_duration_seconds, issue_creation_success_rate
  • Guardrails: all_issues_created ==3, temporary_id_resolution_rate >=0.95
  • min_samples: 20, analysis_type: t_test, start_date: 2026-05-23

Workflow body

Wrapped prompt in two {{#if}} branches:

{{#if experiments.sub_agent_strategy == 'single_agent'}}
## Single-Agent Mode
Create all issues in this context.
...3 create_issue JSON blocks...
{{/if}}

{{#if experiments.sub_agent_strategy == 'sub_agents'}}
## Sub-Agent Mode
Launch 3 background `task` agents (one per issue) in parallel, wait for completion...
{{/if}}

## Final Step: Add Summary Comment
...shared add_comment block for both variants...

The add_comment step is intentionally outside both conditional blocks so it runs regardless of variant.

Schema adaptations

  • Dropped issue: "#aw_campaign" — schema requires an integer issue number
  • Dropped direction: from guardrail metric entries — not a recognized field (name + threshold only)
  • Used single-quoted Handlebars expressions (== 'value') per compiler requirement

Lock file regenerated via gh aw compile smoke-temporary-id.

Copilot AI and others added 2 commits May 22, 2026 14:25
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…kflow

- Add experiments.sub_agent_strategy block to frontmatter (variants: single_agent, sub_agents)
- Wrap workflow body with {{#if}} conditional blocks for each variant
- single_agent: original single-context approach (3 create_issue calls in one context)
- sub_agents: coordinator launches 3 background task agents (one per issue) in parallel
- Final add_comment step is shared outside both variant blocks
- Regenerate smoke-temporary-id.lock.yml via gh aw compile

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add A/B test for sub_agent_strategy in smoke-temporary-id [ab-advisor] Add sub_agent_strategy A/B experiment to smoke-temporary-id workflow May 22, 2026
Copilot AI requested a review from pelikhan May 22, 2026 14:30
@pelikhan pelikhan marked this pull request as ready for review May 22, 2026 14:31
Copilot AI review requested due to automatic review settings May 22, 2026 14:31
@pelikhan pelikhan merged commit 737991d into main May 22, 2026
@pelikhan pelikhan deleted the copilot/ab-advisor-experiment-campaign-smoke-temporary-id branch May 22, 2026 14:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an A/B experiment (experiments.sub_agent_strategy) to the smoke-temporary-id workflow to compare the existing single-agent issue creation flow against a parallelized “sub-agent” strategy, and regenerates the compiled lock artifacts accordingly.

Changes:

  • Introduces rich experiments.sub_agent_strategy frontmatter (variants, metrics, guardrails, weights, analysis metadata).
  • Splits the workflow prompt into two experiment-gated branches: single-agent vs. background sub-agent orchestration.
  • Regenerates workflow lockfiles and updates the actions lock to include docker/metadata-action@v6.
Show a summary per file
File Description
.github/workflows/smoke-temporary-id.md Adds sub_agent_strategy experiment metadata and conditional prompt branches for single-agent vs sub-agent execution.
.github/workflows/smoke-temporary-id.lock.yml Recompiled lockfile reflecting experiment plumbing and updated pinned runtime/action/container details.
.github/workflows/release.lock.yml Updates pinned docker/metadata-action reference to the v6 entry.
.github/aw/actions-lock.json Adds a lock entry for docker/metadata-action@v6 (SHA pinned).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (3)

.github/workflows/smoke-temporary-id.lock.yml:520

  • The workflow installs AWF v0.25.49, which is a downgrade relative to other lockfiles in this repository that install v0.25.51. If the intent is only to add the experiment, consider recompiling with the same gh-aw/AWF versions used elsewhere to avoid behavioral drift across workflows.
      - name: Install GitHub Copilot CLI
        run: bash "${RUNNER_TEMP}/gh-aw/actions/install_copilot_cli.sh" 1.0.48
        env:
          GH_HOST: github.com
      - name: Install AWF binary
        run: bash "${RUNNER_TEMP}/gh-aw/actions/install_awf_binary.sh" v0.25.49
      - name: Determine automatic lockdown mode for GitHub MCP Server

.github/workflows/smoke-temporary-id.lock.yml:548

  • The image is pre-pulled using a digest-pinned reference (gh-aw-mcpg:v0.3.9@sha256:…), but later docker run uses only the tag (gh-aw-mcpg:v0.3.9). Depending on how download_docker_images.sh tags images locally, this can cause an extra pull or run a different digest than the one pinned here; align the docker run reference with the digest-pinned value (or stop using the digest in the pre-pull) to keep execution reproducible.
        run: bash "${RUNNER_TEMP}/gh-aw/actions/restore_inline_sub_agents.sh"
      - name: Download container images
        run: bash "${RUNNER_TEMP}/gh-aw/actions/download_docker_images.sh" ghcr.io/github/gh-aw-firewall/agent:0.25.49 ghcr.io/github/gh-aw-firewall/api-proxy:0.25.49 ghcr.io/github/gh-aw-firewall/squid:0.25.49 ghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388 ghcr.io/github/github-mcp-server:v1.0.4 node:lts-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f
      - name: Generate Safe Outputs Config

.github/workflows/smoke-temporary-id.lock.yml:804

  • MCP_GATEWAY_DOCKER_COMMAND runs ghcr.io/github/gh-aw-mcpg:v0.3.9 by tag, but earlier the workflow pre-downloads ghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:…. If the digest-pinned pull doesn’t create/refresh the :v0.3.9 tag locally, docker run may pull a different image than intended. Consider using the same digest-pinned reference in docker run (or pull by tag consistently) so the executed image is deterministic.
          DOCKER_SOCK_GID=$(stat -c '%g' "$DOCKER_SOCK_PATH" 2>/dev/null || echo '0')
          export MCP_GATEWAY_DOCKER_COMMAND='docker run -i --rm --network host --add-host host.docker.internal:127.0.0.1 --user '"${MCP_GATEWAY_UID}"':'"${MCP_GATEWAY_GID}"' --group-add '"${DOCKER_SOCK_GID}"' -v '"${DOCKER_SOCK_PATH}"':/var/run/docker.sock -e MCP_GATEWAY_PORT -e MCP_GATEWAY_DOMAIN -e MCP_GATEWAY_API_KEY -e MCP_GATEWAY_PAYLOAD_DIR -e MCP_GATEWAY_PAYLOAD_SIZE_THRESHOLD -e DOCKER_HOST=unix:///var/run/docker.sock -e DEBUG -e MCP_GATEWAY_LOG_DIR -e GH_AW_MCP_LOG_DIR -e GH_AW_SAFE_OUTPUTS -e GH_AW_SAFE_OUTPUTS_CONFIG_PATH -e GH_AW_SAFE_OUTPUTS_TOOLS_PATH -e GH_AW_ASSETS_BRANCH -e GH_AW_ASSETS_MAX_SIZE_KB -e GH_AW_ASSETS_ALLOWED_EXTS -e DEFAULT_BRANCH -e GITHUB_MCP_SERVER_TOKEN -e GITHUB_MCP_GUARD_MIN_INTEGRITY -e GITHUB_MCP_GUARD_REPOS -e GITHUB_REPOSITORY -e GITHUB_SERVER_URL -e GITHUB_SHA -e GITHUB_WORKSPACE -e GITHUB_TOKEN -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e GITHUB_JOB -e GITHUB_ACTION -e GITHUB_EVENT_NAME -e GITHUB_EVENT_PATH -e GITHUB_ACTOR -e GITHUB_ACTOR_ID -e GITHUB_TRIGGERING_ACTOR -e GITHUB_WORKFLOW -e GITHUB_WORKFLOW_REF -e GITHUB_WORKFLOW_SHA -e GITHUB_REF -e GITHUB_REF_NAME -e GITHUB_REF_TYPE -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GH_AW_SAFE_OUTPUTS_PORT -e GH_AW_SAFE_OUTPUTS_API_KEY -e GITHUB_AW_OTEL_TRACE_ID -e GITHUB_AW_OTEL_PARENT_SPAN_ID -e OTEL_EXPORTER_OTLP_HEADERS -v /tmp/gh-aw/mcp-payloads:/tmp/gh-aw/mcp-payloads:rw -v /opt:/opt:ro -v /tmp:/tmp:rw -v '"${GITHUB_WORKSPACE}"':'"${GITHUB_WORKSPACE}"':rw ghcr.io/github/gh-aw-mcpg:v0.3.9'
          
  • Files reviewed: 4/4 changed files
  • Comments generated: 2

Comment on lines 110 to 114
"temporary_id": "aw_test03",
"parent": "aw_test01",
"title": "Sub-Issue 2: Test Different ID Length",
"body": "This is sub-issue 2 with an 8-character temporary ID.\n\nParent: #aw_test01\nRelated: #aw_test02\n\nTesting that longer temporary IDs (8 chars) work correctly."
}
Comment on lines 50 to +54
# Container images used:
# - ghcr.io/github/gh-aw-firewall/agent:0.25.51
# - ghcr.io/github/gh-aw-firewall/api-proxy:0.25.51
# - ghcr.io/github/gh-aw-firewall/squid:0.25.51
# - ghcr.io/github/gh-aw-mcpg:v0.3.17
# - ghcr.io/github/gh-aw-firewall/agent:0.25.49
# - ghcr.io/github/gh-aw-firewall/api-proxy:0.25.49
# - ghcr.io/github/gh-aw-firewall/squid:0.25.49
# - ghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ab-advisor] Experiment campaign for smoke-temporary-id: A/B test sub_agent_strategy

3 participants