Add `output_format` A/B experiment to daily-code-metrics workflow by Copilot · Pull Request #32535 · github/gh-aw

Copilot · 2026-05-16T02:36:43Z

This updates daily-code-metrics to run an A/B experiment on report verbosity (full_detail vs executive_summary) so engagement, token cost, runtime, and chart volume can be measured from real runs. The workflow prompt now supports variant-specific chart generation and report structure while preserving the existing full-detail behavior as control.

Experiment wiring in workflow frontmatter
- Added experiments.output_format with variants, hypothesis, primary/secondary metrics, guardrails, weights, and sampling window.
- Included a compile-valid numeric issue placeholder required by current schema.
Variant-gated chart instructions
- Wrapped chart requirements in conditional template blocks.
- full_detail: existing 6-chart specification.
- executive_summary: reduced to 2 charts (quality_score_breakdown, historical_trends).
Variant-gated report template
- Added conditional report body branch for executive_summary:
  - key metrics line
  - 2 chart embeds
  - 3 recommendation bullets
  - no <details> metric tables
- Kept existing full report template in else branch for control.
Prompt guidance alignment
- Updated chart/report guidance text to explicitly describe variant-dependent output expectations.
- Added note to read GH_AW_EXPERIMENTS_OUTPUT_FORMAT in Python logic for branch selection.
Compiled workflow output
- Regenerated .github/workflows/daily-code-metrics.lock.yml to include experiment activation/state plumbing.

experiments:
  output_format:
    variants: [full_detail, executive_summary]
    metric: discussion_engagement_score
    secondary_metrics: [output_token_count, run_duration_seconds, chart_count]

{{#if experiments.output_format == 'executive_summary' }}
### 📊 Key Visualizations
...2 charts...
{{else}}
### 📊 Visualizations
...existing full-detail sections...
{{/if}}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Adds an output_format A/B experiment to the daily-code-metrics workflow so real-run metrics (engagement, tokens, runtime, chart volume) can be compared between full_detail and executive_summary report variants.

Changes:

Adds experiments.output_format frontmatter metadata and gates chart + report templates on the selected variant.
Updates the compiled workflow (.lock.yml) to restore/pick/persist experiment state and plumb GH_AW_EXPERIMENTS_OUTPUT_FORMAT.
Regenerates daily-model-inventory.lock.yml (appears unrelated to the PR’s stated scope).

Show a summary per file

File	Description
`.github/workflows/daily-code-metrics.md`	Defines the experiment and adds variant-gated instructions for chart generation and report structure.
`.github/workflows/daily-code-metrics.lock.yml`	Compiled workflow updates to select variants, pass env vars, and persist experiment state.
`.github/workflows/daily-model-inventory.lock.yml`	Lock file regen (heredoc marker/hash churn; likely incidental).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 3/3 changed files
Comments generated: 5

 ## Data Visualization with Python

+{{#if experiments.output_format == 'full_detail' }}
 Generate **6 high-quality charts** to visualize code metrics and trends using Python, matplotlib, and seaborn. All charts must be uploaded as assets and embedded in the discussion report.


+# Generate required charts for selected variant
+# Chart: Quality Score Breakdown
 # ... implementation ...

-# Chart 6: Historical Trends
+# Chart: Historical Trends


+    min_samples: 20
+    weight: [50, 50]
+    start_date: "2026-05-16"
+    issue: 1


+          cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << GH_AW_SAFE_OUTPUTS_CONFIG_4230166978ab76c7_EOF
          {"create_discussion":{"category":"audits","close_older_discussions":true,"expires":72,"fallback_to_issue":true,"max":1,"title_prefix":"[daily-code-metrics] "},"create_report_incomplete_issue":{},"missing_data":{},"missing_tool":{},"noop":{"max":1,"report-as-issue":"true"},"push_repo_memory":{"memories":[{"dir":"/tmp/gh-aw/repo-memory/default","id":"default","max_file_count":100,"max_file_size":102400,"max_patch_size":51200}]},"report_incomplete":{},"upload_asset":{"allowed-exts":[".png",".jpg",".jpeg",".svg"],"branch":"assets/${GITHUB_WORKFLOW}","max":5,"max-size":10240}}
-          GH_AW_SAFE_OUTPUTS_CONFIG_7874ab997f8429ba_EOF
+          GH_AW_SAFE_OUTPUTS_CONFIG_4230166978ab76c7_EOF


+          cat << 'GH_AW_PROMPT_04c8251975e742fc_EOF'
          <system>
-          GH_AW_PROMPT_4beb7ac5e590862b_EOF
+          GH_AW_PROMPT_04c8251975e742fc_EOF
          cat "${RUNNER_TEMP}/gh-aw/prompts/xpia.md"
          cat "${RUNNER_TEMP}/gh-aw/prompts/temp_folder_prompt.md"
          cat "${RUNNER_TEMP}/gh-aw/prompts/markdown.md"
          cat "${RUNNER_TEMP}/gh-aw/prompts/playwright_prompt.md"
          cat "${RUNNER_TEMP}/gh-aw/prompts/safe_outputs_prompt.md"
-          cat << 'GH_AW_PROMPT_4beb7ac5e590862b_EOF'
+          cat << 'GH_AW_PROMPT_04c8251975e742fc_EOF'
          <safe-output-tools>
          Tools: create_issue, missing_tool, missing_data, noop
          </safe-output-tools>
-          GH_AW_PROMPT_4beb7ac5e590862b_EOF
+          GH_AW_PROMPT_04c8251975e742fc_EOF


Initial plan

bda2dc8

Copilot AI assigned Copilot and pelikhan May 16, 2026

Copilot started work on behalf of pelikhan May 16, 2026 02:41 View session

Copilot AI linked an issue May 16, 2026 that may be closed by this pull request

[ab-advisor] Experiment campaign for daily-code-metrics: A/B test output_format #32524

Closed

8 tasks

Copilot AI and others added 3 commits May 16, 2026 02:45

Plan experiment output format update

40ce23c

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Add output_format experiment to daily-code-metrics workflow

cee1e29

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Document experiment variant env wiring in daily-code-metrics

dd62826

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Experiment with output format for daily-code-metrics~~ Add output_format A/B experiment to daily-code-metrics workflow May 16, 2026

Copilot finished work on behalf of pelikhan May 16, 2026 02:54

Copilot AI requested a review from pelikhan May 16, 2026 02:54

pelikhan marked this pull request as ready for review May 16, 2026 02:55

Copilot AI review requested due to automatic review settings May 16, 2026 02:55

pelikhan merged commit 0392b07 into main May 16, 2026

pelikhan deleted the copilot/ab-advisor-experiment-campaign-again branch May 16, 2026 02:55

Copilot started reviewing on behalf of pelikhan May 16, 2026 02:56 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 16, 2026

[docs] Update documentation for features from 2026-05-16 #32605

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `output_format` A/B experiment to daily-code-metrics workflow#32535

Add `output_format` A/B experiment to daily-code-metrics workflow#32535
pelikhan merged 4 commits into
mainfrom
copilot/ab-advisor-experiment-campaign-again

Copilot AI commented May 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 16, 2026 •

edited

Loading