🧪 Experiment Campaign: deep-report
Workflow file: .github/workflows/deep-report.md
Selected dimension: output_format
Triggered by: ab-testing-advisor on 2026-05-05
Background
DeepReport is an intelligence-gathering agent that runs daily (weekdays ~15:00 UTC) and synthesizes patterns, trends, and actionable tasks from across all agent-generated discussions, workflow logs, and recent issues. The prompt is extremely detailed (~250 lines) and the current report format prescribes seven distinct sections with heavy formatting requirements. This verbosity may inflate token usage, increase run duration, and reduce readability — making output format an ideal dimension to experiment on.
Hypothesis
Null hypothesis (H0): The report format has no significant effect on discussion engagement score or token consumption compared to the current full-briefing baseline.
Alternative hypothesis (H1): A denser executive_brief format reduces token consumption by ≥20% while maintaining or improving discussion engagement (reactions + replies), whereas the annotated_brief variant adds inline source citations that improve the actionability of findings.
Experiment Configuration
Add the following experiments: block to the workflow frontmatter:
experiments:
output_format:
variants: [full_briefing, executive_brief, annotated_brief]
description: "Tests whether report verbosity and structure affect token cost and discussion engagement"
hypothesis: "H0: no change in discussion engagement or token cost. H1: executive_brief reduces token usage by ≥20% without reducing engagement; annotated_brief improves actionability."
metric: token_count
secondary_metrics: [discussion_reactions, discussion_replies, output_char_length, run_duration_ms]
guardrail_metrics:
- name: empty_output_rate
direction: min
threshold: 0
- name: issue_creation_success_rate
direction: min
threshold: 0.8
min_samples: 15
owner: "`@team-agents`"
weight: [34, 33, 33]
start_date: "2026-05-06"
analysis_type: mann_whitney
tags: [output-format, token-cost, engagement, daily]
issue: 0
Variant descriptions:
full_briefing: Current behavior — seven verbose sections (Executive Summary, Pattern Analysis, Trend Intelligence, Notable Findings, Predictions & Recommendations, Actionable Tasks, Source Attribution).
executive_brief: Condensed single-pass report — 3-sentence executive summary, a flat list of top-5 patterns/findings with bullet points, and the 7 actionable tasks. No trend prose or prediction section.
annotated_brief: Same condensed structure as executive_brief but every finding includes an inline citation (discussion/issue/run URL) directly next to the claim rather than in a separate attribution section.
Workflow Changes Required
Replace the Report Structure section guidance in the prompt body to make it variant-aware:
Before:
## Report Structure
Generate an intelligence briefing with the following sections:
### 🔍 Executive Summary
...
### 📊 Pattern Analysis
...
### 📈 Trend Intelligence
...
### 🚨 Notable Findings
...
### 🔮 Predictions and Recommendations
...
### ✅ Actionable Agentic Tasks (Quick Wins)
...
### 📚 Source Attribution
...
After (using Handlebars conditional blocks):
## Report Structure
{{#if experiments.output_format "executive_brief"}}
Generate a **condensed intelligence brief** with these sections only:
1. **🔍 Executive Summary** — 3 sentences: overall health, top finding, urgent action.
2. **🚨 Top 5 Findings** — Flat bullet list, one line each, most impactful first.
3. **✅ Actionable Agentic Tasks** — Exactly 7 items as before.
{{else}}{{#if experiments.output_format "annotated_brief"}}
Generate a **condensed intelligence brief with inline citations** with these sections only:
1. **🔍 Executive Summary** — 3 sentences with at least one cited source link per sentence.
2. **🚨 Top 5 Findings** — Flat bullet list, one line each, each ending with `([source](url))`.
3. **✅ Actionable Agentic Tasks** — Exactly 7 items as before, each linking its evidence.
{{else}}
Generate an intelligence briefing with the following sections:
### 🔍 Executive Summary
...
### 📊 Pattern Analysis
...
### 📈 Trend Intelligence
...
### 🚨 Notable Findings
...
### 🔮 Predictions and Recommendations
...
### ✅ Actionable Agentic Tasks (Quick Wins)
...
### 📚 Source Attribution
...
{{/if}}{{/if}}
Success Metrics
| Metric |
Type |
Target |
token_count |
Primary |
≥20% reduction vs full_briefing for executive_brief |
discussion_reactions |
Secondary |
Must not drop vs baseline |
output_char_length |
Secondary |
Observe directionality |
run_duration_ms |
Secondary |
Expect reduction for brief variants |
empty_output_rate |
Guardrail |
Must remain 0 |
issue_creation_success_rate |
Guardrail |
Must stay ≥0.8 |
Statistical Design
- Variants:
full_briefing (34%), executive_brief (33%), annotated_brief (33%)
- Assignment: Round-robin via
gh-aw experiments runtime (cache-based, repo storage)
- Minimum runs per variant: 15 (total ≥45 runs)
- Expected frequency: ~5 runs/week (weekdays only)
- Expected experiment duration: ~9 weeks from
start_date
- Analysis approach: Mann-Whitney U test on token counts and output length (non-parametric, robust to non-normal distributions)
Implementation Steps
Infrastructure Status
✅ Infrastructure is complete. All three advanced experiment schema fields (analysis_type, tags, notify) are fully implemented in both pkg/workflow/compiler_experiments.go and actions/setup/js/pick_experiment.cjs. No sub-issue is needed.
References
Generated by Daily A/B Testing Advisor · ● 568.7K · ◷
🧪 Experiment Campaign: deep-report
Workflow file:
.github/workflows/deep-report.mdSelected dimension:
output_formatTriggered by:
ab-testing-advisoron 2026-05-05Background
DeepReport is an intelligence-gathering agent that runs daily (weekdays ~15:00 UTC) and synthesizes patterns, trends, and actionable tasks from across all agent-generated discussions, workflow logs, and recent issues. The prompt is extremely detailed (~250 lines) and the current report format prescribes seven distinct sections with heavy formatting requirements. This verbosity may inflate token usage, increase run duration, and reduce readability — making output format an ideal dimension to experiment on.
Hypothesis
Null hypothesis (H0): The report format has no significant effect on discussion engagement score or token consumption compared to the current full-briefing baseline.
Alternative hypothesis (H1): A denser
executive_briefformat reduces token consumption by ≥20% while maintaining or improving discussion engagement (reactions + replies), whereas theannotated_briefvariant adds inline source citations that improve the actionability of findings.Experiment Configuration
Add the following
experiments:block to the workflow frontmatter:Variant descriptions:
full_briefing: Current behavior — seven verbose sections (Executive Summary, Pattern Analysis, Trend Intelligence, Notable Findings, Predictions & Recommendations, Actionable Tasks, Source Attribution).executive_brief: Condensed single-pass report — 3-sentence executive summary, a flat list of top-5 patterns/findings with bullet points, and the 7 actionable tasks. No trend prose or prediction section.annotated_brief: Same condensed structure asexecutive_briefbut every finding includes an inline citation (discussion/issue/run URL) directly next to the claim rather than in a separate attribution section.Workflow Changes Required
Replace the Report Structure section guidance in the prompt body to make it variant-aware:
Before:
After (using Handlebars conditional blocks):
Success Metrics
token_countfull_briefingforexecutive_briefdiscussion_reactionsoutput_char_lengthrun_duration_msempty_output_rateissue_creation_success_rateStatistical Design
full_briefing(34%),executive_brief(33%),annotated_brief(33%)gh-awexperiments runtime (cache-based, repo storage)start_dateImplementation Steps
experiments:section to frontmatter (YAML block above){{#if experiments.output_format "<variant>"}}...{{else}}...{{/if}}syntaxgh aw compile deep-reportto regenerate lock file/tmp/gh-aw/experiments/state.jsonInfrastructure Status
✅ Infrastructure is complete. All three advanced experiment schema fields (
analysis_type,tags,notify) are fully implemented in bothpkg/workflow/compiler_experiments.goandactions/setup/js/pick_experiment.cjs. No sub-issue is needed.References
.github/workflows/deep-report.md