Skip to content

[ab-advisor] A/B experiment: sub_agent_strategy for agent-persona-explorer#33753

Merged
pelikhan merged 3 commits into
mainfrom
copilot/ab-advisor-experiment-campaign-agent-persona-explo
May 21, 2026
Merged

[ab-advisor] A/B experiment: sub_agent_strategy for agent-persona-explorer#33753
pelikhan merged 3 commits into
mainfrom
copilot/ab-advisor-experiment-campaign-agent-persona-explo

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 21, 2026

Implements the sub_agent_strategy A/B experiment campaign for agent-persona-explorer, testing whether consolidating all scenario testing into a single sub-agent call (batch) reduces token costs vs. the current per-scenario invocation approach.

Frontmatter changes

  • Adds experiments: sub_agent_strategy block with variants [per_scenario, batch], 50/50 weight, t_test analysis, effective_tokens as primary metric, and guardrails requiring discussion_created == 1 and scenarios_analyzed >= 3
  • issue: #aw_campaign placeholder omitted — parses as null in YAML (comment marker), failing the schema's integer constraint; can be added when a campaign issue number exists
  • direction: sub-field on guardrail items replaced with the schema-compliant combined threshold: form (e.g. "==1", ">=3")

Phase 3 conditional blocks

{{#if experiments.sub_agent_strategy == 'batch' }}
Invoke the "agentic-workflows" custom agent **once** with all 3-4 selected scenarios...
{{else}}
For each selected scenario, invoke the "agentic-workflows" custom agent tool...
{{/if}}
  • batch: single sub-agent call with all scenarios in a structured list; response parsed for per-scenario assessments
  • per_scenario ({{else}}): original behavior preserved unchanged
  • Shared Important guidelines remain outside the conditional

Lock file

Regenerated via gh aw compile --approve. The --approve was needed because the prior lock referenced docker/build-push-action and docker/setup-buildx-action that are no longer emitted — a pre-existing delta unrelated to this change.

Copilot AI and others added 2 commits May 21, 2026 13:18
…xplorer

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add experiment campaign for agent-persona-explorer with A/B test [ab-advisor] A/B experiment: sub_agent_strategy for agent-persona-explorer May 21, 2026
Copilot AI requested a review from pelikhan May 21, 2026 13:24
@pelikhan pelikhan marked this pull request as ready for review May 21, 2026 13:25
Copilot AI review requested due to automatic review settings May 21, 2026 13:25
@pelikhan pelikhan merged commit d19111c into main May 21, 2026
@pelikhan pelikhan deleted the copilot/ab-advisor-experiment-campaign-agent-persona-explo branch May 21, 2026 13:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements an A/B experiment (sub_agent_strategy) for agent-persona-explorer to compare per-scenario sub-agent invocations vs. a single batched sub-agent invocation, aiming to reduce token usage while maintaining quality. Also includes regenerated workflow lockfiles reflecting updated generated firewall/model alias configuration.

Changes:

  • Adds an experiments.sub_agent_strategy frontmatter block to agent-persona-explorer.md (variants, metrics, guardrails, analysis metadata).
  • Adds Phase 3 conditional prompt logic to switch between batch and per_scenario sub-agent invocation strategies.
  • Regenerates multiple workflow *.lock.yml files (notably updating the embedded AWF config model alias map).
Show a summary per file
File Description
.github/workflows/agent-persona-explorer.md Adds sub_agent_strategy experiment metadata and prompt conditionals for batch vs per-scenario execution.
.github/workflows/workflow-health-manager.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/workflow-generator.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/video-analyzer.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/test-workflow.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/test-quality-sentinel.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/test-project-url-default.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/test-dispatcher.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/super-linter.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/slide-deck-maintainer.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/security-review.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/security-compliance.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/repo-tree-map.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/refiner.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/q.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/pr-description-caveman.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/poem-bot.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/plan.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/pdf-summary.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/metrics-collector.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/mergefest.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/lint-monster.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/jsweep.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/gpclean.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/example-permissions-warning.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/dev.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/dev-hawk.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/craft.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/contribution-check.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/code-simplifier.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/brave.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/bot-detection.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/archie.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.
.github/workflows/ace-editor.lock.yml Regenerated lockfile; embedded AWF config/model alias map updated.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 120/234 changed files
  • Comments generated: 1

description: "Test whether batch scenario testing reduces token costs vs. per-scenario sub-agent calls"
hypothesis: "H0: no change in effective_tokens or duration. H1: batch reduces tokens by ≥20% and duration by ≥15% without quality loss"
metric: effective_tokens
secondary_metrics: [run_duration_minutes, scenarios_tested, output_quality_score]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ab-advisor] Experiment campaign for agent-persona-explorer: A/B test sub_agent_strategy

3 participants