🧪 Experiment Campaign: blog-auditor
Workflow file: .github/workflows/blog-auditor.md
Selected dimension: prompt_style
Triggered by: ab-testing-advisor on 2026-05-16
Background
The blog-auditor workflow performs a weekly automated audit of the GitHub Next Agentic Workflows blog page, validating accessibility, content integrity, and code snippet correctness using Playwright. The current prompt is extremely prescriptive — it includes exact bash commands, explicit code blocks for every step, and full Markdown templates for both pass and fail reports. This level of detail may be consuming unnecessary tokens and forcing the agent into rigid step-by-step execution rather than adapting intelligently. Testing a concise prompt style vs. the current detailed one will reveal whether the extra verbosity adds measurable quality or just cost.
Hypothesis
Null hypothesis (H0): A concise prompt variant produces the same discussion quality and validation correctness as the current detailed prompt.
Alternative hypothesis (H1): A concise prompt reduces effective token consumption and run duration by ≥20% while maintaining equivalent audit correctness (zero false-negative audits, same keyword-check pass rate).
Experiment Configuration
Add the following experiments: block to the workflow frontmatter:
experiments:
prompt_style:
variants: [detailed, concise]
description: "Tests whether a high-level goal-oriented prompt produces the same audit quality as the current step-by-step detailed instructions"
hypothesis: "H0: no change in audit correctness or discussion quality. H1: concise variant reduces token cost ≥20% with no degradation in validation accuracy"
metric: effective_token_count
secondary_metrics: [run_duration_ms, discussion_created, validation_pass_rate]
guardrail_metrics:
- name: empty_output_rate
direction: min
threshold: 0
- name: missed_validation_failures
direction: min
threshold: 0
min_samples: 20
weight: [50, 50]
start_date: "2026-05-16"
analysis_type: mann_whitney
tags: [prompt-engineering, cost-optimization, blog-auditor]
notify:
issue: 0
Note: Replace issue: 0 with this issue's number after creation.
Variant descriptions:
detailed: Current behavior — full step-by-step instructions with explicit bash commands, exact markdown templates, and exhaustive success criteria checklist.
concise: High-level goal-oriented instructions that describe what to achieve (navigate, validate, report) without prescribing exact commands or pre-written markdown templates. The agent selects its own approach.
Workflow Changes Required
Wrap the detailed instruction body with a conditional block and add a concise alternative. The experiments.prompt_style reference is resolved at compile time.
Before (current body opening):
## Audit Process
### Phase 1: Navigate and Capture Blog Content
Use Playwright to navigate to the target URL and capture the accessibility snapshot:
1. **Navigate to URL**: Run `playwright-cli browser_navigate --url (githubnext.com/redacted) to load the page
...
After (conditional wrap):
{{#if experiments.prompt_style == "concise" }}
## Audit Process
Navigate to `(githubnext.com/redacted) using Playwright, capture the accessibility snapshot, and validate:
- HTTP status is 200
- Final URL is within `githubnext.com` / `www.githubnext.com`
- Content length exceeds 5,000 characters
- All required keywords present: `agentic-workflows`, `GitHub`, `workflow`, `compiler`
- Any YAML/Markdown workflow code snippets pass `gh aw compile --no-emit --validate`
Create a discussion in the **Audits** category titled `[audit] Agentic Workflows blog audit - PASSED` (or `FAILED`). Include a summary table of each check with pass/fail status and the values observed. For failures, add suggested remediation steps.
{{else}}
## Audit Process
### Phase 1: Navigate and Capture Blog Content
... (existing detailed content unchanged)
{{/if}}
Success Metrics
| Metric |
Type |
Target |
| effective_token_count |
Primary |
≥20% reduction in concise variant |
| run_duration_ms |
Secondary |
≥15% reduction in concise variant |
| discussion_created |
Secondary |
100% in both variants |
| validation_pass_rate |
Secondary |
Equal across variants |
| empty_output_rate |
Guardrail |
Must remain 0% |
| missed_validation_failures |
Guardrail |
Must remain 0% |
Statistical Design
- Variants:
detailed (baseline), concise (treatment)
- Assignment: Round-robin via
gh-aw experiments runtime (cache-based)
- Minimum runs per variant: 20 (workflow runs weekly — ~40 weeks total; consider adding a
workflow_dispatch trigger during experiment)
- Expected experiment duration: ~40 weeks at weekly cadence; recommend adding a manual-dispatch option to accelerate sampling
- Analysis approach: Mann-Whitney U test on token count distributions (non-parametric, appropriate for skewed token distributions)
Implementation Steps
Infrastructure Status
✅ All three advanced experiment schema fields (analysis_type, tags, notify) are fully implemented in pkg/workflow/compiler_experiments.go and actions/setup/js/pick_experiment.cjs. No infrastructure sub-issue is required.
References
Generated by 🧪 Daily A/B Testing Advisor · ● 3.6M · ◷
🧪 Experiment Campaign: blog-auditor
Workflow file:
.github/workflows/blog-auditor.mdSelected dimension:
prompt_styleTriggered by:
ab-testing-advisoron 2026-05-16Background
The
blog-auditorworkflow performs a weekly automated audit of the GitHub Next Agentic Workflows blog page, validating accessibility, content integrity, and code snippet correctness using Playwright. The current prompt is extremely prescriptive — it includes exact bash commands, explicit code blocks for every step, and full Markdown templates for both pass and fail reports. This level of detail may be consuming unnecessary tokens and forcing the agent into rigid step-by-step execution rather than adapting intelligently. Testing a concise prompt style vs. the current detailed one will reveal whether the extra verbosity adds measurable quality or just cost.Hypothesis
Null hypothesis (H0): A concise prompt variant produces the same discussion quality and validation correctness as the current detailed prompt.
Alternative hypothesis (H1): A concise prompt reduces effective token consumption and run duration by ≥20% while maintaining equivalent audit correctness (zero false-negative audits, same keyword-check pass rate).
Experiment Configuration
Add the following
experiments:block to the workflow frontmatter:Variant descriptions:
detailed: Current behavior — full step-by-step instructions with explicit bash commands, exact markdown templates, and exhaustive success criteria checklist.concise: High-level goal-oriented instructions that describe what to achieve (navigate, validate, report) without prescribing exact commands or pre-written markdown templates. The agent selects its own approach.Workflow Changes Required
Wrap the detailed instruction body with a conditional block and add a concise alternative. The
experiments.prompt_stylereference is resolved at compile time.Before (current body opening):
After (conditional wrap):
{{#if experiments.prompt_style == "concise" }} ## Audit Process Navigate to `(githubnext.com/redacted) using Playwright, capture the accessibility snapshot, and validate: - HTTP status is 200 - Final URL is within `githubnext.com` / `www.githubnext.com` - Content length exceeds 5,000 characters - All required keywords present: `agentic-workflows`, `GitHub`, `workflow`, `compiler` - Any YAML/Markdown workflow code snippets pass `gh aw compile --no-emit --validate` Create a discussion in the **Audits** category titled `[audit] Agentic Workflows blog audit - PASSED` (or `FAILED`). Include a summary table of each check with pass/fail status and the values observed. For failures, add suggested remediation steps. {{else}} ## Audit Process ### Phase 1: Navigate and Capture Blog Content ... (existing detailed content unchanged) {{/if}}Success Metrics
Statistical Design
detailed(baseline),concise(treatment)gh-awexperiments runtime (cache-based)workflow_dispatchtrigger during experiment)Implementation Steps
experiments:section to frontmatter (updateissue:field with this issue number){{#if experiments.prompt_style == "concise" }}conditional block around audit instructionsgh aw compile blog-auditorto regenerate lock file/tmp/gh-aw/experiments/state.jsonInfrastructure Status
✅ All three advanced experiment schema fields (
analysis_type,tags,notify) are fully implemented inpkg/workflow/compiler_experiments.goandactions/setup/js/pick_experiment.cjs. No infrastructure sub-issue is required.References
.github/workflows/blog-auditor.md