Skip to content

[ab-advisor] Experiment campaign for daily-issues-report: A/B test output_format #30573

@github-actions

Description

@github-actions

🧪 Experiment Campaign: daily-issues-report

Workflow file: .github/workflows/daily-issues-report.md
Selected dimension: output_format
Triggered by: ab-testing-advisor on 2026-05-06


Background

The daily-issues-report workflow generates a comprehensive daily discussion post analyzing up to 1000 repository issues — including clustering by theme, trend charts, and actionable metrics. The report body is substantial, embedding two charts, multiple tables, and detailed recommendations inside a <details> collapse block. Experimenting on output_format (whether the full detail is hidden behind a collapsible or presented inline, and how metrics are grouped) is high-impact because it directly affects how actionable and readable the output is for the team consuming it each morning.

Hypothesis

Null hypothesis (H0): The output format variant does not affect discussion engagement (reaction count + reply count) compared to the baseline collapsible format.

Alternative hypothesis (H1): The inline format (no <details> wrapper, structured ### sections directly visible) produces ≥20% higher engagement because readers don't need to expand content to see the charts and recommendations.

Experiment Configuration

Add the following experiments: block to the workflow frontmatter:

experiments:
  output_format:
    variants: [collapsible, inline]
    description: "Test whether hiding report details behind a <details> block vs. presenting them inline affects discussion engagement"
    hypothesis: "H0: no change in discussion engagement score. H1: inline format produces ≥20% higher reactions+replies by making charts and recommendations immediately visible"
    metric: discussion_engagement_score
    secondary_metrics: [output_length_chars, run_duration_ms]
    guardrail_metrics:
      - name: empty_output_rate
        direction: min
        threshold: 0
    min_samples: 30
    owner: "`@team-agents`"
    weight: [50, 50]
    start_date: "2026-05-07"
    issue: 0
    analysis_type: mann_whitney
    tags: [output, readability, engagement]

Variant descriptions:

  • collapsible: Current behavior — the full report detail (charts, tables, recommendations) is wrapped in a <details><summary>📊 Full Report Details</summary> block. Only the 2–3 paragraph summary is immediately visible.
  • inline: The charts, cluster table, metrics tables, and recommendations sections are rendered directly at the top level under ### headers with no collapse block, making all content immediately visible on page load.

Workflow Changes Required

The following before/after diff shows the minimal change needed in the Phase 6 report body template:

Before (Phase 6, Discussion Format → Body section):

<details>
<summary>📊 Full Report Details</summary>

### 📈 Issue Activity Trends
...
</details>

After (using handlebars experiment conditional):

{{#if experiments.output_format collapsible}}<details>
<summary>📊 Full Report Details</summary>{{/if}}

### 📈 Issue Activity Trends
...

{{#if experiments.output_format collapsible}}</details>{{/if}}

This keeps the two variants structurally identical in content while toggling the visibility wrapper.

Success Metrics

Metric Type Target
discussion_engagement_score (reactions + replies) Primary ≥20% lift in inline variant
output_length_chars Secondary Should not diverge >10% between variants
run_duration_ms Secondary No meaningful difference expected
empty_output_rate Guardrail Must remain 0 for both variants

Statistical Design

  • Variants: collapsible (control), inline (treatment)
  • Assignment: Round-robin via gh-aw experiments runtime (repo-memory state)
  • Minimum runs per variant: 30 (based on Mann-Whitney U with α=0.05, β=0.20, medium effect size d=0.5)
  • Expected experiment duration: ~60 days (workflow runs once daily at 6 AM UTC)
  • Analysis approach: Mann-Whitney U rank-sum test on engagement scores (non-parametric; suitable since daily engagement is non-normal and counts are small integers)

Implementation Steps

  • Add experiments: section to frontmatter (replace issue: 0 with the real issue number after creation)
  • Add conditional <details> wrapping blocks to Phase 6 discussion body using {{#if experiments.output_format collapsible}} / {{else}} / {{/if}}
  • Run gh aw compile daily-issues-report to regenerate lock file
  • Monitor experiment artifact uploaded per run to /tmp/gh-aw/experiments/state.json
  • After 60+ days (30 runs per variant), fetch run artifacts and compare Mann-Whitney U statistic
  • Document findings and promote winning variant by removing the losing branch

Infrastructure Note

✅ Experiment infrastructure is complete. All three advanced fields — analysis_type, tags, and notify — are fully implemented in both pkg/workflow/compiler_experiments.go and actions/setup/js/pick_experiment.cjs. No sub-issue for infrastructure improvements is needed.

References

Generated by Daily A/B Testing Advisor · ● 3.8M ·

  • expires on May 20, 2026, 11:09 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions