🔧 Experiment Infrastructure Improvements
Parent campaign: #aw_campaign1
Triggered by: ab-testing-advisor on 2026-05-25
Area 1 gate: All three candidate fields (analysis_type, tags, notify) were checked via field-presence-checker. All three are partially implemented — parsed by the Go compiler but never acted on by pick_experiment.cjs at runtime. This sub-issue is therefore warranted.
Area 1: Frontmatter Schema — Complete the Half-Implemented Fields
The field-presence-checker found all three fields parsed on the Go side but dead on the JS runtime side:
| Field |
Go compiler |
pick_experiment.cjs |
Gap |
analysis_type |
✅ Parsed → cfg.AnalysisType |
❌ JSDoc-only, never read |
Picker never selects or surfaces the declared test type |
tags |
✅ Parsed → cfg.Tags |
❌ JSDoc-only, never filtered |
Tags cannot be used to filter/group experiments in dashboards |
notify |
✅ Parsed → cfg.Notify{Discussion,Issue} |
❌ JSDoc-only, no alerts dispatched |
Significance alerts are never sent |
Proposed pick_experiment.cjs changes
analysis_type — surface in the step summary so downstream analysis scripts know which test to apply:
// In writeSummary(), add:
if (config.analysis_type) {
core.summary.addRaw(`- **Analysis type**: \`${config.analysis_type}\`\n`);
}
tags — write tags into the experiment state artifact so dashboards can filter:
// In writeState(), add to the state object:
tags: config.tags ?? [],
notify — dispatch a notification when min_samples is reached for all variants:
// After variant counts are updated, check if experiment is mature:
if (config.notify && allVariantsReachedMinSamples(state, config)) {
await dispatchNotification(config.notify, experimentName, state);
}
The dispatchNotification function should post a comment to the issue in notify.issue or a discussion reply to notify.discussion, summarising current variant tallies and prompting a human to run analysis.
Area 2: Reporting & Dashboards
Propose a daily-experiment-report workflow that:
- Aggregates run artifacts from the last N days: downloads
experiments/state.json from each workflow run and merges variant counters per experiment name.
- Computes running statistics: for proportion metrics — sample size, observed rate, Wilson confidence interval per variant; for continuous metrics — mean, variance, Mann-Whitney U statistic.
- Detects significance: flags when p < 0.05 (two-sample z-test for proportions; Mann-Whitney for continuous). Logs the current leading variant.
- Generates ASCII comparison table as a workflow step summary artifact:
Experiment: daily-doc-healer / prompt_style (n=18 detailed, n=17 concise)
┌──────────┬──────────────────┬──────────┬──────────┐
│ Variant │ pr_creation_rate │ tokens │ p-value │
├──────────┼──────────────────┼──────────┼──────────┤
│ detailed │ 0.78 [0.55,0.92]│ 12 400 │ │
│ concise │ 0.76 [0.53,0.91]│ 9 800* │ 0.031 ✓ │
└──────────┴──────────────────┴──────────┴──────────┘
* statistically significant at p<0.05
- Posts to a discussion with label
experiment-results, including the table and the current winning variant.
Implementation checklist:
Area 3: Audit & OTEL Integration
OTEL span attributes — in pick_experiment.cjs, after a variant is assigned, emit span attributes:
core.exportVariable('OTEL_RESOURCE_ATTRIBUTES',
`experiment.name=${experimentName},experiment.variant=${chosenVariant}`);
This tags every OTEL span in the run with the experiment context, enabling trace filtering by variant in Grafana/Jaeger.
gh aw audit surface — add experiment assignment to the audit log entry for each run:
run #12345 workflow=daily-doc-healer variant=concise experiment=prompt_style
Filterable via gh aw audit --experiment prompt_style --variant concise.
Step summary — pick_experiment.cjs should already append to $GITHUB_STEP_SUMMARY; ensure it includes:
| Field | Value |
|---|---|
| Experiment | `prompt_style` |
| Assigned variant | `concise` |
| Analysis type | `proportion_test` |
| Run count (this variant) | 7 / 30 min_samples |
Implementation checklist:
References
- Compiler:
pkg/workflow/compiler_experiments.go
- Picker:
actions/setup/js/pick_experiment.cjs
- Parent campaign: #aw_campaign1
- A/B Testing in gh-aw
Generated by 🧪 Daily A/B Testing Advisor · sonnet46 1.7M · ◷
🔧 Experiment Infrastructure Improvements
Parent campaign: #aw_campaign1
Triggered by:
ab-testing-advisoron 2026-05-25Area 1: Frontmatter Schema — Complete the Half-Implemented Fields
The
field-presence-checkerfound all three fields parsed on the Go side but dead on the JS runtime side:pick_experiment.cjsanalysis_typecfg.AnalysisTypetagscfg.Tagsnotifycfg.Notify{Discussion,Issue}Proposed pick_experiment.cjs changes
analysis_type— surface in the step summary so downstream analysis scripts know which test to apply:tags— write tags into the experiment state artifact so dashboards can filter:notify— dispatch a notification whenmin_samplesis reached for all variants:The
dispatchNotificationfunction should post a comment to the issue innotify.issueor a discussion reply tonotify.discussion, summarising current variant tallies and prompting a human to run analysis.Area 2: Reporting & Dashboards
Propose a
daily-experiment-reportworkflow that:experiments/state.jsonfrom each workflow run and merges variant counters per experiment name.experiment-results, including the table and the current winning variant.Implementation checklist:
.github/workflows/daily-experiment-report.mdgh run download --name experiments-stateanalysis_typefield so the report uses the declared test (once Area 1 is implemented)safeoutputsArea 3: Audit & OTEL Integration
OTEL span attributes — in
pick_experiment.cjs, after a variant is assigned, emit span attributes:This tags every OTEL span in the run with the experiment context, enabling trace filtering by variant in Grafana/Jaeger.
gh aw auditsurface — add experiment assignment to the audit log entry for each run:Filterable via
gh aw audit --experiment prompt_style --variant concise.Step summary —
pick_experiment.cjsshould already append to$GITHUB_STEP_SUMMARY; ensure it includes:Implementation checklist:
OTEL_RESOURCE_ATTRIBUTESinpick_experiment.cjsafter variant selectiongh aw auditto read and display experiment metadata from run annotationspick_experiment.cjs--experiment/--variantfilter flags ingh aw audit --helpReferences
pkg/workflow/compiler_experiments.goactions/setup/js/pick_experiment.cjs