feat(daily-semgrep-scan): add semgrep_output_format A/B experiment#32802
Merged
Conversation
7 tasks
…workflow Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Experiment campaign for daily-semgrep-scan output format
feat(daily-semgrep-scan): add semgrep_output_format A/B experiment
May 17, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an A/B/n experiment to the daily-semgrep-scan agentic workflow to test whether Semgrep findings output structure affects code scanning alert creation and report completeness.
Changes:
- Introduces
experiments.semgrep_output_format(3 variants, weighted 34/33/33) in the workflow markdown frontmatter and adds variant-specific prompt sections. - Recompiles the lock workflow to restore/pick experiment assignments, persist experiment state to a dedicated git branch, and thread the chosen variant into prompt interpolation/execution.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/daily-semgrep-scan.md | Defines the semgrep_output_format experiment and adds conditional prompt blocks per variant. |
| .github/workflows/daily-semgrep-scan.lock.yml | Compiled workflow wiring for experiment state restore/pick, artifact handling, and pushing state back to git. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (2)
.github/workflows/daily-semgrep-scan.md:51
- Same issue as above: using single quotes in the equality check prevents the condition from being evaluated as a comparison by the current template engine, so the block will be kept for all runs. Use double quotes for the RHS so the experiment gating works.
{{#if experiments.semgrep_output_format == 'structured_sections' }}
.github/workflows/daily-semgrep-scan.md:59
- Same issue as above: this {{#if}} comparison uses single quotes, but the template condition evaluator only supports comparisons against double-quoted strings. As written, this block will always render. Change to double quotes to ensure only the selected variant’s section is included.
{{#if experiments.semgrep_output_format == 'prose' }}
- Files reviewed: 2/2 changed files
- Comments generated: 2
|
|
||
| Scan the repository for SQL injection vulnerabilities using Semgrep. | ||
|
|
||
| {{#if experiments.semgrep_output_format == 'bullet_list' }} |
|
|
||
| {{#if experiments.semgrep_output_format == 'bullet_list' }} | ||
| Report each finding as a flat bullet point in this format: | ||
| - **[SEVERITY]** `<file>:<line>` — Rule: `<rule_id>` — <message> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Instruments the
daily-semgrep-scanworkflow with a 3-variant A/B experiment to test whether output structure affects code scanning alert creation rate and completeness.Frontmatter changes
experiments.semgrep_output_formatblock with variantsbullet_list,structured_sections,proseproportion_testanalysis, 30-run minimum, guardrailrun_success_rate >= 0.85direction: minfield; expressed threshold as">=0.85"(consistent with other workflows)Prompt changes
Extended the single-line prompt with
{{#if}}conditional blocks (single-quoted comparisons per gh-aw convention):Scan the repository for SQL injection vulnerabilities using Semgrep. {{#if experiments.semgrep_output_format == 'bullet_list' }} Report each finding as a flat bullet point in this format: - **[SEVERITY]** `<file>:<line>` — Rule: `<rule_id>` — <message> Create one code scanning alert per finding. {{/if}} {{#if experiments.semgrep_output_format == 'structured_sections' }} Structure your findings report with: 1. A summary table: | Severity | Count | 2. Sections grouped by severity (Critical, High, Medium, Low), then by rule ID 3. For each finding: file path, line number, rule, and recommended fix Create one code scanning alert per finding. {{/if}} {{#if experiments.semgrep_output_format == 'prose' }} Write a narrative security assessment describing the vulnerability patterns found. Embed specific findings (file, line, rule) within the prose. Conclude with a prioritized remediation list. Create one code scanning alert per finding. {{/if}}Lock file
Recompiled
daily-semgrep-scan.lock.yml— compiles clean (one expected "experimental feature" advisory).