[ab-advisor] Add sub_agent_strategy A/B experiment to smoke-temporary-id workflow#34020
Merged
pelikhan merged 3 commits intoMay 22, 2026
Merged
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…kflow
- Add experiments.sub_agent_strategy block to frontmatter (variants: single_agent, sub_agents)
- Wrap workflow body with {{#if}} conditional blocks for each variant
- single_agent: original single-context approach (3 create_issue calls in one context)
- sub_agents: coordinator launches 3 background task agents (one per issue) in parallel
- Final add_comment step is shared outside both variant blocks
- Regenerate smoke-temporary-id.lock.yml via gh aw compile
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add A/B test for sub_agent_strategy in smoke-temporary-id
[ab-advisor] Add sub_agent_strategy A/B experiment to smoke-temporary-id workflow
May 22, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an A/B experiment (experiments.sub_agent_strategy) to the smoke-temporary-id workflow to compare the existing single-agent issue creation flow against a parallelized “sub-agent” strategy, and regenerates the compiled lock artifacts accordingly.
Changes:
- Introduces rich
experiments.sub_agent_strategyfrontmatter (variants, metrics, guardrails, weights, analysis metadata). - Splits the workflow prompt into two experiment-gated branches: single-agent vs. background sub-agent orchestration.
- Regenerates workflow lockfiles and updates the actions lock to include
docker/metadata-action@v6.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/smoke-temporary-id.md |
Adds sub_agent_strategy experiment metadata and conditional prompt branches for single-agent vs sub-agent execution. |
.github/workflows/smoke-temporary-id.lock.yml |
Recompiled lockfile reflecting experiment plumbing and updated pinned runtime/action/container details. |
.github/workflows/release.lock.yml |
Updates pinned docker/metadata-action reference to the v6 entry. |
.github/aw/actions-lock.json |
Adds a lock entry for docker/metadata-action@v6 (SHA pinned). |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (3)
.github/workflows/smoke-temporary-id.lock.yml:520
- The workflow installs AWF
v0.25.49, which is a downgrade relative to other lockfiles in this repository that installv0.25.51. If the intent is only to add the experiment, consider recompiling with the same gh-aw/AWF versions used elsewhere to avoid behavioral drift across workflows.
- name: Install GitHub Copilot CLI
run: bash "${RUNNER_TEMP}/gh-aw/actions/install_copilot_cli.sh" 1.0.48
env:
GH_HOST: github.com
- name: Install AWF binary
run: bash "${RUNNER_TEMP}/gh-aw/actions/install_awf_binary.sh" v0.25.49
- name: Determine automatic lockdown mode for GitHub MCP Server
.github/workflows/smoke-temporary-id.lock.yml:548
- The image is pre-pulled using a digest-pinned reference (
gh-aw-mcpg:v0.3.9@sha256:…), but laterdocker runuses only the tag (gh-aw-mcpg:v0.3.9). Depending on howdownload_docker_images.shtags images locally, this can cause an extra pull or run a different digest than the one pinned here; align thedocker runreference with the digest-pinned value (or stop using the digest in the pre-pull) to keep execution reproducible.
run: bash "${RUNNER_TEMP}/gh-aw/actions/restore_inline_sub_agents.sh"
- name: Download container images
run: bash "${RUNNER_TEMP}/gh-aw/actions/download_docker_images.sh" ghcr.io/github/gh-aw-firewall/agent:0.25.49 ghcr.io/github/gh-aw-firewall/api-proxy:0.25.49 ghcr.io/github/gh-aw-firewall/squid:0.25.49 ghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388 ghcr.io/github/github-mcp-server:v1.0.4 node:lts-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f
- name: Generate Safe Outputs Config
.github/workflows/smoke-temporary-id.lock.yml:804
MCP_GATEWAY_DOCKER_COMMANDrunsghcr.io/github/gh-aw-mcpg:v0.3.9by tag, but earlier the workflow pre-downloadsghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:…. If the digest-pinned pull doesn’t create/refresh the:v0.3.9tag locally,docker runmay pull a different image than intended. Consider using the same digest-pinned reference indocker run(or pull by tag consistently) so the executed image is deterministic.
DOCKER_SOCK_GID=$(stat -c '%g' "$DOCKER_SOCK_PATH" 2>/dev/null || echo '0')
export MCP_GATEWAY_DOCKER_COMMAND='docker run -i --rm --network host --add-host host.docker.internal:127.0.0.1 --user '"${MCP_GATEWAY_UID}"':'"${MCP_GATEWAY_GID}"' --group-add '"${DOCKER_SOCK_GID}"' -v '"${DOCKER_SOCK_PATH}"':/var/run/docker.sock -e MCP_GATEWAY_PORT -e MCP_GATEWAY_DOMAIN -e MCP_GATEWAY_API_KEY -e MCP_GATEWAY_PAYLOAD_DIR -e MCP_GATEWAY_PAYLOAD_SIZE_THRESHOLD -e DOCKER_HOST=unix:///var/run/docker.sock -e DEBUG -e MCP_GATEWAY_LOG_DIR -e GH_AW_MCP_LOG_DIR -e GH_AW_SAFE_OUTPUTS -e GH_AW_SAFE_OUTPUTS_CONFIG_PATH -e GH_AW_SAFE_OUTPUTS_TOOLS_PATH -e GH_AW_ASSETS_BRANCH -e GH_AW_ASSETS_MAX_SIZE_KB -e GH_AW_ASSETS_ALLOWED_EXTS -e DEFAULT_BRANCH -e GITHUB_MCP_SERVER_TOKEN -e GITHUB_MCP_GUARD_MIN_INTEGRITY -e GITHUB_MCP_GUARD_REPOS -e GITHUB_REPOSITORY -e GITHUB_SERVER_URL -e GITHUB_SHA -e GITHUB_WORKSPACE -e GITHUB_TOKEN -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e GITHUB_JOB -e GITHUB_ACTION -e GITHUB_EVENT_NAME -e GITHUB_EVENT_PATH -e GITHUB_ACTOR -e GITHUB_ACTOR_ID -e GITHUB_TRIGGERING_ACTOR -e GITHUB_WORKFLOW -e GITHUB_WORKFLOW_REF -e GITHUB_WORKFLOW_SHA -e GITHUB_REF -e GITHUB_REF_NAME -e GITHUB_REF_TYPE -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GH_AW_SAFE_OUTPUTS_PORT -e GH_AW_SAFE_OUTPUTS_API_KEY -e GITHUB_AW_OTEL_TRACE_ID -e GITHUB_AW_OTEL_PARENT_SPAN_ID -e OTEL_EXPORTER_OTLP_HEADERS -v /tmp/gh-aw/mcp-payloads:/tmp/gh-aw/mcp-payloads:rw -v /opt:/opt:ro -v /tmp:/tmp:rw -v '"${GITHUB_WORKSPACE}"':'"${GITHUB_WORKSPACE}"':rw ghcr.io/github/gh-aw-mcpg:v0.3.9'
- Files reviewed: 4/4 changed files
- Comments generated: 2
Comment on lines
110
to
114
| "temporary_id": "aw_test03", | ||
| "parent": "aw_test01", | ||
| "title": "Sub-Issue 2: Test Different ID Length", | ||
| "body": "This is sub-issue 2 with an 8-character temporary ID.\n\nParent: #aw_test01\nRelated: #aw_test02\n\nTesting that longer temporary IDs (8 chars) work correctly." | ||
| } |
Comment on lines
50
to
+54
| # Container images used: | ||
| # - ghcr.io/github/gh-aw-firewall/agent:0.25.51 | ||
| # - ghcr.io/github/gh-aw-firewall/api-proxy:0.25.51 | ||
| # - ghcr.io/github/gh-aw-firewall/squid:0.25.51 | ||
| # - ghcr.io/github/gh-aw-mcpg:v0.3.17 | ||
| # - ghcr.io/github/gh-aw-firewall/agent:0.25.49 | ||
| # - ghcr.io/github/gh-aw-firewall/api-proxy:0.25.49 | ||
| # - ghcr.io/github/gh-aw-firewall/squid:0.25.49 | ||
| # - ghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the
sub_agent_strategyexperiment campaign onsmoke-temporary-id, testing whether decomposing issue creation into parallel sub-agents reduces token consumption vs. the current single-agent approach.Frontmatter
Added rich
experiments.sub_agent_strategyblock:single_agent(control) /sub_agents(treatment), 50/50 weighteffective_token_count; secondary:run_duration_seconds,issue_creation_success_rateall_issues_created ==3,temporary_id_resolution_rate >=0.95min_samples: 20,analysis_type: t_test,start_date: 2026-05-23Workflow body
Wrapped prompt in two
{{#if}}branches:{{#if experiments.sub_agent_strategy == 'single_agent'}} ## Single-Agent Mode Create all issues in this context. ...3 create_issue JSON blocks... {{/if}} {{#if experiments.sub_agent_strategy == 'sub_agents'}} ## Sub-Agent Mode Launch 3 background `task` agents (one per issue) in parallel, wait for completion... {{/if}} ## Final Step: Add Summary Comment ...shared add_comment block for both variants...The
add_commentstep is intentionally outside both conditional blocks so it runs regardless of variant.Schema adaptations
issue: "#aw_campaign"— schema requires an integer issue numberdirection:from guardrail metric entries — not a recognized field (name+thresholdonly)== 'value') per compiler requirementLock file regenerated via
gh aw compile smoke-temporary-id.