[ab-advisor] Improve experiment infrastructure: schema, reporting & audit

### 🔧 Experiment Infrastructure Improvements

**Parent campaign**: #aw_campaign1  
**Triggered by**: `ab-testing-advisor` on 2026-05-25

> **Area 1 gate**: All three candidate fields (`analysis_type`, `tags`, `notify`) were checked via `field-presence-checker`. All three are **partially implemented** — parsed by the Go compiler but never acted on by `pick_experiment.cjs` at runtime. This sub-issue is therefore warranted.

---

### Area 1: Frontmatter Schema — Complete the Half-Implemented Fields

The `field-presence-checker` found all three fields parsed on the Go side but dead on the JS runtime side:

| Field | Go compiler | `pick_experiment.cjs` | Gap |
|---|---|---|---|
| `analysis_type` | ✅ Parsed → `cfg.AnalysisType` | ❌ JSDoc-only, never read | Picker never selects or surfaces the declared test type |
| `tags` | ✅ Parsed → `cfg.Tags` | ❌ JSDoc-only, never filtered | Tags cannot be used to filter/group experiments in dashboards |
| `notify` | ✅ Parsed → `cfg.Notify{Discussion,Issue}` | ❌ JSDoc-only, no alerts dispatched | Significance alerts are never sent |

<details><summary>Proposed pick_experiment.cjs changes</summary>

**`analysis_type`** — surface in the step summary so downstream analysis scripts know which test to apply:
```js
// In writeSummary(), add:
if (config.analysis_type) {
  core.summary.addRaw(`- **Analysis type**: \`${config.analysis_type}\`\n`);
}
```

**`tags`** — write tags into the experiment state artifact so dashboards can filter:
```js
// In writeState(), add to the state object:
tags: config.tags ?? [],
```

**`notify`** — dispatch a notification when `min_samples` is reached for all variants:
```js
// After variant counts are updated, check if experiment is mature:
if (config.notify && allVariantsReachedMinSamples(state, config)) {
  await dispatchNotification(config.notify, experimentName, state);
}
```
The `dispatchNotification` function should post a comment to the issue in `notify.issue` or a discussion reply to `notify.discussion`, summarising current variant tallies and prompting a human to run analysis.

</details>

---

### Area 2: Reporting & Dashboards

Propose a `daily-experiment-report` workflow that:

1. **Aggregates** run artifacts from the last N days: downloads `experiments/state.json` from each workflow run and merges variant counters per experiment name.
2. **Computes running statistics**: for proportion metrics — sample size, observed rate, Wilson confidence interval per variant; for continuous metrics — mean, variance, Mann-Whitney U statistic.
3. **Detects significance**: flags when p < 0.05 (two-sample z-test for proportions; Mann-Whitney for continuous). Logs the current leading variant.
4. **Generates ASCII comparison table** as a workflow step summary artifact:

```
Experiment: daily-doc-healer / prompt_style  (n=18 detailed, n=17 concise)
┌──────────┬──────────────────┬──────────┬──────────┐
│ Variant  │ pr_creation_rate │  tokens  │  p-value │
├──────────┼──────────────────┼──────────┼──────────┤
│ detailed │  0.78 [0.55,0.92]│  12 400  │          │
│ concise  │  0.76 [0.53,0.91]│   9 800* │  0.031 ✓ │
└──────────┴──────────────────┴──────────┴──────────┘
* statistically significant at p<0.05
```

5. **Posts to a discussion** with label `experiment-results`, including the table and the current winning variant.

**Implementation checklist**:
- [ ] Create `.github/workflows/daily-experiment-report.md`
- [ ] Script: download artifacts via `gh run download --name experiments-state`
- [ ] Script: merge JSON state files and compute statistics (Python or Node)
- [ ] Wire `analysis_type` field so the report uses the declared test (once Area 1 is implemented)
- [ ] Post result to discussion via `safeoutputs`

---

### Area 3: Audit & OTEL Integration

**OTEL span attributes** — in `pick_experiment.cjs`, after a variant is assigned, emit span attributes:
```js
core.exportVariable('OTEL_RESOURCE_ATTRIBUTES',
  `experiment.name=${experimentName},experiment.variant=${chosenVariant}`);
```
This tags every OTEL span in the run with the experiment context, enabling trace filtering by variant in Grafana/Jaeger.

**`gh aw audit` surface** — add experiment assignment to the audit log entry for each run:
```
run #12345  workflow=daily-doc-healer  variant=concise  experiment=prompt_style
```
Filterable via `gh aw audit --experiment prompt_style --variant concise`.

**Step summary** — `pick_experiment.cjs` should already append to `$GITHUB_STEP_SUMMARY`; ensure it includes:
```markdown
| Field | Value |
|---|---|
| Experiment | `prompt_style` |
| Assigned variant | `concise` |
| Analysis type | `proportion_test` |
| Run count (this variant) | 7 / 30 min_samples |
```

**Implementation checklist**:
- [ ] Emit `OTEL_RESOURCE_ATTRIBUTES` in `pick_experiment.cjs` after variant selection
- [ ] Update `gh aw audit` to read and display experiment metadata from run annotations
- [ ] Add experiment fields to step summary output in `pick_experiment.cjs`
- [ ] Document the `--experiment` / `--variant` filter flags in `gh aw audit --help`

---

### References

- Compiler: `pkg/workflow/compiler_experiments.go`
- Picker: `actions/setup/js/pick_experiment.cjs`
- Parent campaign: #aw_campaign1
- [A/B Testing in gh-aw](https://github.com/github/gh-aw/blob/main/.github/aw/github-agentic-workflows.md)







> Generated by [🧪 Daily A/B Testing Advisor](https://github.com/github/gh-aw/actions/runs/26398563077) · sonnet46 1.7M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fab-testing-advisor%22&type=issues)
> - [x] expires  on Jun 8, 2026, 11:46 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ab-advisor] Improve experiment infrastructure: schema, reporting & audit #34635

🔧 Experiment Infrastructure Improvements

Area 1: Frontmatter Schema — Complete the Half-Implemented Fields

Area 2: Reporting & Dashboards

Area 3: Audit & OTEL Integration

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Go compiler	`pick_experiment.cjs`	Gap
`analysis_type`	✅ Parsed → `cfg.AnalysisType`	❌ JSDoc-only, never read	Picker never selects or surfaces the declared test type
`tags`	✅ Parsed → `cfg.Tags`	❌ JSDoc-only, never filtered	Tags cannot be used to filter/group experiments in dashboards
`notify`	✅ Parsed → `cfg.Notify{Discussion,Issue}`	❌ JSDoc-only, no alerts dispatched	Significance alerts are never sent

[ab-advisor] Improve experiment infrastructure: schema, reporting & audit #34635

Description

🔧 Experiment Infrastructure Improvements

Area 1: Frontmatter Schema — Complete the Half-Implemented Fields

Area 2: Reporting & Dashboards

Area 3: Audit & OTEL Integration

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions