Skip to content

Add prompt_style A/B experiment to daily-file-diet with concise/detailed prompt branching#35931

Closed
Copilot wants to merge 4 commits into
mainfrom
copilot/ab-advisor-experiment-prompt-style
Closed

Add prompt_style A/B experiment to daily-file-diet with concise/detailed prompt branching#35931
Copilot wants to merge 4 commits into
mainfrom
copilot/ab-advisor-experiment-prompt-style

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 30, 2026

This updates daily-file-diet to run a prompt_style experiment so we can compare the existing dense prompt against a lean variant without changing the workflow’s core behavior. The goal is to measure whether the concise prompt preserves issue quality while reducing token/runtime cost.

  • Experiment configuration (frontmatter)

    • Added experiments.prompt_style using object form with:
      • variants: [detailed, concise]
      • primary metric: issue_completeness_score
      • secondary metrics: effective_token_count, run_duration_ms
      • guardrails for issue creation success and empty output
      • min_samples, weight, start_date, and tracking issue metadata
  • Prompt variant routing

    • Wrapped prompt body in a prompt_style conditional.
    • detailed path keeps the existing full protocol as baseline.
    • concise path uses a compressed instruction set that still requires:
      • largest Go file detection
      • 800-line threshold behavior
      • Serena semantic analysis
      • issue output with split proposals, test plan, and acceptance checklist
  • Compiled workflow artifact update

    • Regenerated daily-file-diet.lock.yml to include experiment activation/assignment plumbing and prompt interpolation for the selected variant.
{{#if experiments.prompt_style == 'concise'}}
Find the largest `.go` source file... If it has **800+ lines**, open a GitHub issue titled
`[file-diet] Refactor <filename> (<N> lines)` with findings, 2–4 split proposals,
a test coverage plan, and an acceptance checklist. Use `serena` for semantic analysis.
{{#else}}
[existing detailed prompt]
{{/if}}

Copilot AI and others added 3 commits May 30, 2026 14:39
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Create A/B test for prompt_style in daily-file-diet Add prompt_style A/B experiment to daily-file-diet with concise/detailed prompt branching May 30, 2026
Copilot AI requested a review from pelikhan May 30, 2026 14:50
@pelikhan pelikhan marked this pull request as ready for review May 30, 2026 14:54
Copilot AI review requested due to automatic review settings May 30, 2026 14:54
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot merge main and recompile

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a prompt_style A/B experiment (detailed vs. concise) to daily-file-diet so the existing verbose refactoring prompt can be compared against a compressed variant on issue quality and token/runtime cost, and regenerates the corresponding lock file plus several other lock files affected by an unrelated checkout-prompt format change.

Changes:

  • Add object-form experiments.prompt_style frontmatter (variants, hypothesis, primary/secondary/guardrail metrics, weight, start date, tracking issue) to daily-file-diet.md.
  • Wrap the prompt body in {{#if experiments.prompt_style == 'concise'}} … {{#else}} … {{/if}} with a one-paragraph concise variant and the original detailed prompt as fallback.
  • Regenerate daily-file-diet.lock.yml with experiment plumbing (restore/pick/upload steps, artifact, push_experiments_state job, prompt env injection) and pick up unrelated checkout-prompt text reformatting in seven other lock files.
Show a summary per file
File Description
.github/workflows/daily-file-diet.md Adds prompt_style experiment frontmatter and conditional prompt branching.
.github/workflows/daily-file-diet.lock.yml Regenerated lock file adding experiment selection, artifact, and push job.
.github/workflows/smoke-update-cross-repo-pr.lock.yml Regenerated checkout-prompt line formatting only.
.github/workflows/smoke-create-cross-repo-pr.lock.yml Regenerated checkout-prompt line formatting only.
.github/workflows/smoke-codex.lock.yml Checkout-prompt formatting plus heredoc marker rehash.
.github/workflows/smoke-claude.lock.yml Regenerated checkout-prompt line formatting only.
.github/workflows/schema-feature-coverage.lock.yml Checkout-prompt formatting plus heredoc marker rehash.
.github/workflows/schema-consistency-checker.lock.yml Regenerated checkout-prompt line formatting only.
.github/workflows/pr-sous-chef.lock.yml Regenerated checkout-prompt line formatting only.
.github/workflows/glossary-maintainer.lock.yml Regenerated checkout-prompt line formatting only.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 10/10 changed files
  • Comments generated: 1

{{#runtime-import? .github/shared-instructions.md}}

{{#if experiments.prompt_style == 'concise'}}
Find the largest `.go` source file in the repository (exclude `*_test.go`, `third_party/`, and generated files such as `*_gen.go`, `*.pb.go`, and files with `Code generated` markers). If it has fewer than **800 lines**, stop and report no action is needed. If it has **800+ lines**, open a GitHub issue titled **`[file-diet] Refactor <filename> (<N> lines)`** with: a summary of findings, 2–4 concrete split proposals with rationale, a test coverage plan, and an acceptance checklist. Use `serena` for semantic analysis.
@pelikhan pelikhan closed this May 30, 2026
Copilot stopped work on behalf of pelikhan due to an error May 30, 2026 15:03
Copilot AI requested a review from pelikhan May 30, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ab-advisor] Experiment campaign for daily-file-diet: A/B test prompt_style

3 participants