Skip to content

feat(daily-semgrep-scan): add semgrep_output_format A/B experiment#32802

Merged
pelikhan merged 2 commits into
mainfrom
copilot/experiment-campaign-output-format
May 17, 2026
Merged

feat(daily-semgrep-scan): add semgrep_output_format A/B experiment#32802
pelikhan merged 2 commits into
mainfrom
copilot/experiment-campaign-output-format

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 17, 2026

Instruments the daily-semgrep-scan workflow with a 3-variant A/B experiment to test whether output structure affects code scanning alert creation rate and completeness.

Frontmatter changes

  • Added experiments.semgrep_output_format block with variants bullet_list, structured_sections, prose
  • Weighted 34/33/33, proportion_test analysis, 30-run minimum, guardrail run_success_rate >= 0.85
  • Dropped the unsupported direction: min field; expressed threshold as ">=0.85" (consistent with other workflows)

Prompt changes

Extended the single-line prompt with {{#if}} conditional blocks (single-quoted comparisons per gh-aw convention):

Scan the repository for SQL injection vulnerabilities using Semgrep.

{{#if experiments.semgrep_output_format == 'bullet_list' }}
Report each finding as a flat bullet point in this format:
- **[SEVERITY]** `<file>:<line>` — Rule: `<rule_id>` — <message>

Create one code scanning alert per finding.
{{/if}}
{{#if experiments.semgrep_output_format == 'structured_sections' }}
Structure your findings report with:
1. A summary table: | Severity | Count |
2. Sections grouped by severity (Critical, High, Medium, Low), then by rule ID
3. For each finding: file path, line number, rule, and recommended fix

Create one code scanning alert per finding.
{{/if}}
{{#if experiments.semgrep_output_format == 'prose' }}
Write a narrative security assessment describing the vulnerability patterns found. Embed
specific findings (file, line, rule) within the prose. Conclude with a prioritized
remediation list.

Create one code scanning alert per finding.
{{/if}}

Lock file

Recompiled daily-semgrep-scan.lock.yml — compiles clean (one expected "experimental feature" advisory).

…workflow

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Experiment campaign for daily-semgrep-scan output format feat(daily-semgrep-scan): add semgrep_output_format A/B experiment May 17, 2026
Copilot AI requested a review from pelikhan May 17, 2026 12:21
@pelikhan pelikhan marked this pull request as ready for review May 17, 2026 12:22
Copilot AI review requested due to automatic review settings May 17, 2026 12:22
@pelikhan pelikhan merged commit 7757f81 into main May 17, 2026
@pelikhan pelikhan deleted the copilot/experiment-campaign-output-format branch May 17, 2026 12:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an A/B/n experiment to the daily-semgrep-scan agentic workflow to test whether Semgrep findings output structure affects code scanning alert creation and report completeness.

Changes:

  • Introduces experiments.semgrep_output_format (3 variants, weighted 34/33/33) in the workflow markdown frontmatter and adds variant-specific prompt sections.
  • Recompiles the lock workflow to restore/pick experiment assignments, persist experiment state to a dedicated git branch, and thread the chosen variant into prompt interpolation/execution.
Show a summary per file
File Description
.github/workflows/daily-semgrep-scan.md Defines the semgrep_output_format experiment and adds conditional prompt blocks per variant.
.github/workflows/daily-semgrep-scan.lock.yml Compiled workflow wiring for experiment state restore/pick, artifact handling, and pushing state back to git.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (2)

.github/workflows/daily-semgrep-scan.md:51

  • Same issue as above: using single quotes in the equality check prevents the condition from being evaluated as a comparison by the current template engine, so the block will be kept for all runs. Use double quotes for the RHS so the experiment gating works.
{{#if experiments.semgrep_output_format == 'structured_sections' }}

.github/workflows/daily-semgrep-scan.md:59

  • Same issue as above: this {{#if}} comparison uses single quotes, but the template condition evaluator only supports comparisons against double-quoted strings. As written, this block will always render. Change to double quotes to ensure only the selected variant’s section is included.
{{#if experiments.semgrep_output_format == 'prose' }}
  • Files reviewed: 2/2 changed files
  • Comments generated: 2


Scan the repository for SQL injection vulnerabilities using Semgrep.

{{#if experiments.semgrep_output_format == 'bullet_list' }}

{{#if experiments.semgrep_output_format == 'bullet_list' }}
Report each finding as a flat bullet point in this format:
- **[SEVERITY]** `<file>:<line>` — Rule: `<rule_id>` — <message>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ab-advisor] Experiment campaign for daily-semgrep-scan: A/B test output_format

3 participants