[ab-advisor] Experiment campaign for daily-issues-report: A/B test output_format

### 🧪 Experiment Campaign: daily-issues-report

**Workflow file**: `.github/workflows/daily-issues-report.md`
**Selected dimension**: `output_format`
**Triggered by**: `ab-testing-advisor` on 2026-05-06

---

### Background

The `daily-issues-report` workflow generates a comprehensive daily discussion post analyzing up to 1000 repository issues — including clustering by theme, trend charts, and actionable metrics. The report body is substantial, embedding two charts, multiple tables, and detailed recommendations inside a `<details>` collapse block. Experimenting on `output_format` (whether the full detail is hidden behind a collapsible or presented inline, and how metrics are grouped) is high-impact because it directly affects how actionable and readable the output is for the team consuming it each morning.

### Hypothesis

**Null hypothesis (H0)**: The output format variant does not affect discussion engagement (reaction count + reply count) compared to the baseline `collapsible` format.

**Alternative hypothesis (H1)**: The `inline` format (no `<details>` wrapper, structured `###` sections directly visible) produces ≥20% higher engagement because readers don't need to expand content to see the charts and recommendations.

### Experiment Configuration

Add the following `experiments:` block to the workflow frontmatter:

```yaml
experiments:
  output_format:
    variants: [collapsible, inline]
    description: "Test whether hiding report details behind a <details> block vs. presenting them inline affects discussion engagement"
    hypothesis: "H0: no change in discussion engagement score. H1: inline format produces ≥20% higher reactions+replies by making charts and recommendations immediately visible"
    metric: discussion_engagement_score
    secondary_metrics: [output_length_chars, run_duration_ms]
    guardrail_metrics:
      - name: empty_output_rate
        direction: min
        threshold: 0
    min_samples: 30
    owner: "`@team-agents`"
    weight: [50, 50]
    start_date: "2026-05-07"
    issue: 0
    analysis_type: mann_whitney
    tags: [output, readability, engagement]
```

**Variant descriptions**:
- `collapsible`: Current behavior — the full report detail (charts, tables, recommendations) is wrapped in a `<details><summary>📊 Full Report Details</summary>` block. Only the 2–3 paragraph summary is immediately visible.
- `inline`: The charts, cluster table, metrics tables, and recommendations sections are rendered directly at the top level under `###` headers with no collapse block, making all content immediately visible on page load.

### Workflow Changes Required

The following `before/after` diff shows the minimal change needed in the Phase 6 report body template:

**Before** (Phase 6, Discussion Format → Body section):

```markdown
<details>
<summary>📊 Full Report Details</summary>

### 📈 Issue Activity Trends
...
</details>
```

**After** (using handlebars experiment conditional):

```markdown
{{#if experiments.output_format collapsible}}<details>
<summary>📊 Full Report Details</summary>{{/if}}

### 📈 Issue Activity Trends
...

{{#if experiments.output_format collapsible}}</details>{{/if}}
```

This keeps the two variants structurally identical in content while toggling the visibility wrapper.

### Success Metrics

| Metric | Type | Target |
|--------|------|--------|
| `discussion_engagement_score` (reactions + replies) | Primary | ≥20% lift in `inline` variant |
| `output_length_chars` | Secondary | Should not diverge >10% between variants |
| `run_duration_ms` | Secondary | No meaningful difference expected |
| `empty_output_rate` | Guardrail | Must remain 0 for both variants |

### Statistical Design

- **Variants**: `collapsible` (control), `inline` (treatment)
- **Assignment**: Round-robin via `gh-aw` experiments runtime (repo-memory state)
- **Minimum runs per variant**: 30 (based on Mann-Whitney U with α=0.05, β=0.20, medium effect size d=0.5)
- **Expected experiment duration**: ~60 days (workflow runs once daily at 6 AM UTC)
- **Analysis approach**: Mann-Whitney U rank-sum test on engagement scores (non-parametric; suitable since daily engagement is non-normal and counts are small integers)

### Implementation Steps

- [ ] Add `experiments:` section to frontmatter (replace `issue: 0` with the real issue number after creation)
- [ ] Add conditional `<details>` wrapping blocks to Phase 6 discussion body using `{{#if experiments.output_format collapsible}}` / `{{else}}` / `{{/if}}`
- [ ] Run `gh aw compile daily-issues-report` to regenerate lock file
- [ ] Monitor experiment artifact uploaded per run to `/tmp/gh-aw/experiments/state.json`
- [ ] After 60+ days (30 runs per variant), fetch run artifacts and compare Mann-Whitney U statistic
- [ ] Document findings and promote winning variant by removing the losing branch

### Infrastructure Note

✅ Experiment infrastructure is complete. All three advanced fields — `analysis_type`, `tags`, and `notify` — are fully implemented in both `pkg/workflow/compiler_experiments.go` and `actions/setup/js/pick_experiment.cjs`. No sub-issue for infrastructure improvements is needed.

### References

- [A/B Testing in gh-aw](https://github.com/github/gh-aw/blob/main/.github/aw/github-agentic-workflows.md)
- Workflow file: `.github/workflows/daily-issues-report.md`







> Generated by [Daily A/B Testing Advisor](https://github.com/github/gh-aw/actions/runs/25431516680/agentic_workflow) · ● 3.8M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fab-testing-advisor%22&type=issues)
> - [x] expires  on May 20, 2026, 11:09 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ab-advisor] Experiment campaign for daily-issues-report: A/B test output_format #30573

🧪 Experiment Campaign: daily-issues-report

Background

Hypothesis

Experiment Configuration

Workflow Changes Required

Success Metrics

Statistical Design

Implementation Steps

Infrastructure Note

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Type	Target
`discussion_engagement_score` (reactions + replies)	Primary	≥20% lift in `inline` variant
`output_length_chars`	Secondary	Should not diverge >10% between variants
`run_duration_ms`	Secondary	No meaningful difference expected
`empty_output_rate`	Guardrail	Must remain 0 for both variants

[ab-advisor] Experiment campaign for daily-issues-report: A/B test output_format #30573

Description

🧪 Experiment Campaign: daily-issues-report

Background

Hypothesis

Experiment Configuration

Workflow Changes Required

Success Metrics

Statistical Design

Implementation Steps

Infrastructure Note

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions