Conversation
…dit enhancements - Extend ExperimentConfig with Hypothesis, SecondaryMetrics, GuardrailMetrics, MinSamples, Owner fields - Add GuardrailMetric struct for threshold-based experiment guardrails - Parse new fields in extractOneExperimentConfig with extractStringSlice/extractGuardrailMetrics helpers - Fix TypeScript type error in ExperimentConfig JSDoc typedef (pre-existing issue) - Enhance writeSummary() with hypothesis display, guardrail metrics, progress bars, and ready-for-analysis flag - Add OTEL span attribute injection for experiment assignments to shared/observability-otlp.md - Enhance daily-experiment-report.md with min_samples-aware significance detection, guardrail evaluation, tracking issue notifications (Steps 8 & 9), and lifecycle label management - Add tests for new Go fields and JavaScript summary features Agent-Logs-Url: https://github.com/github/gh-aw/sessions/818f45d8-b494-4013-bbe3-6ae5f558672c Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot update experiments docs |
🧪 Test Quality Sentinel ReportTest Quality Score: 76/100
Test Classification DetailsView all 5 tests
Flagged Tests — Suggestions for Improvement
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 76/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). All 5 new tests verify observable behavioral contracts. Main opportunity: add negative/edge-case tests for the new hypothesis, guardrail_metrics, and min_samples fields.
…_samples, hypothesis fields - Show bare-array vs object form side-by-side - Document all new fields: hypothesis, secondary_metrics, guardrail_metrics, min_samples, owner, weight, start_date, end_date, issue - Add step summary example showing progress bars and guardrail thresholds - Add audit filtering section (--experiment / --variant flags) - Update frontmatter reference table with full object-form field list Agent-Logs-Url: https://github.com/github/gh-aw/sessions/37750d81-ae35-4ca7-9a55-d94e37ce564e Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Extends the experiment (A/B) metadata schema and reporting/telemetry surface area to support richer experiment definitions (hypothesis/guardrails/min_samples/owner), improved step summaries, tracking-issue automation, and OTEL span attributes for variant filtering.
Changes:
- Extend
ExperimentConfigschema to include hypothesis, secondary metrics, guardrail metrics, min sample gating, and owner metadata. - Enhance
pick_experiment.cjsstep summary with sampling progress/ready-for-analysis, plus hypothesis and guardrail rendering (with added tests). - Update experiment-reporting workflow docs and OTEL documentation; regenerate related lock workflows.
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/frontmatter_types.go | Adds new experiment schema fields and GuardrailMetric type. |
| pkg/workflow/compiler_experiments.go | Parses new experiment frontmatter fields; adds helpers for string slices and guardrail metrics. |
| pkg/workflow/compiler_experiments_test.go | Adds test coverage for the newly parsed experiment metadata fields. |
| actions/setup/js/pick_experiment.cjs | Updates JS typedefs and step summary rendering (hypothesis/guardrails/min_samples progress). |
| actions/setup/js/pick_experiment.test.cjs | Adds step-summary tests for hypothesis, guardrails, and min_samples progress/ready state. |
| .github/workflows/shared/observability-otlp.md | Documents emitting OTEL span attributes from experiment assignments. |
| .github/workflows/daily-fact.lock.yml | Regenerated lockfile incorporating the updated observability prompt content. |
| .github/workflows/daily-experiment-report.md | Expands daily report spec to include guardrails, min_samples gating, and tracking-issue notifications/labels. |
| .github/workflows/daily-experiment-report.lock.yml | Regenerated lockfile reflecting safe-output additions and permissions changes. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 9/9 changed files
- Comments generated: 3
| const minSamples = cfg.min_samples || 0; | ||
| const variants = cfg.variants || []; | ||
| const counts = state.counts[name] || {}; | ||
| const allReady = minSamples > 0 && variants.every(v => (counts[v] || 0) >= minSamples); | ||
| if (allReady) { |
| add-comment: | ||
| max: 10 | ||
| add-labels: | ||
| max: 10 |
| name, _ := m["name"].(string) | ||
| threshold, _ := m["threshold"].(string) | ||
| if name == "" { | ||
| continue | ||
| } |
Updated Changes:
|
|
@copilot review all comments |
…lidation, and label lifecycle - pick_experiment.cjs: guard min_samples progress bar against 0/negative values using Number.isInteger + > 0 check; eliminates potential RangeError from repeat(negative) - compiler_experiments.go: skip guardrail entries with empty threshold (in addition to empty name) so invalid guardrail configs don't propagate silently - daily-experiment-report.md: clarify Step 9 labels are additive only (remove the "Remove when" column that implied remove-labels was supported — only add-labels is configured in safe-outputs) - Tests: add Go test case for guardrail entry with empty/missing threshold; add two JS tests for min_samples: 0 and min_samples: -5 (progress bar should not be shown) Agent-Logs-Url: https://github.com/github/gh-aw/sessions/a5b8e895-a9f7-4c16-9a6d-0cf7a286cb93 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/a5b8e895-a9f7-4c16-9a6d-0cf7a286cb93 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed all three review comments in f76262a + c5f0227:
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot update main json schema |
…samples/owner to experiments schema Extends the `experiments` object-form properties in `main_workflow_schema.json` with: - `hypothesis` (string) — hypothesis statement surfaced in step summaries/reports - `secondary_metrics` (string[]) — additional metrics to collect alongside the primary metric - `guardrail_metrics` (object[]) — each entry requires `name` + `threshold` (pattern-validated); entries without a threshold are rejected at schema level - `min_samples` (integer, minimum 1) — minimum runs per variant before PROMOTE is eligible - `owner` (string) — owner/team label for experiment tracking Also updates the top-level description and the example to demonstrate all fields. Rebuilds the binary to embed the updated schema. Agent-Logs-Url: https://github.com/github/gh-aw/sessions/848f602b-d364-43bf-9f1c-29c3bf57d4b7 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- threshold pattern changed from \\S+ to -?\\d+(\\.\\d+)? so only numeric values are accepted (e.g. >=0.95, ==0, <=0.05); rejects double-operator forms like >=>=0.95 - min_samples gets default: 20 to make the schema self-documenting Agent-Logs-Url: https://github.com/github/gh-aw/sessions/848f602b-d364-43bf-9f1c-29c3bf57d4b7 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Updated
|
|
@copilot add integration test workflows in pkg/cli/workflows to parse experiments |
…ing for min_samples/issue/weight - Fix compiler_experiments.go: add uint64 case to switch blocks for min_samples, issue, and weight items (goccy/go-yaml returns YAML integers as uint64, not int/int64/float64) - Add unit test case: "uint64 integer values for min_samples and issue are parsed correctly" in TestExtractExperimentConfigsFromFrontmatter - Add pkg/cli/workflows/test-experiments-bare-array.md: bare-array form with two experiments (prompt_style, model_temp) - Add pkg/cli/workflows/test-experiments-rich-schema.md: full object form with all new fields (hypothesis, secondary_metrics, guardrail_metrics, min_samples, owner, weight, issue, dates) - Add pkg/cli/compile_experiments_integration_test.go with two tests: TestCompileExperimentsBareArrayForm — checks that both experiments appear in GH_AW_EXPERIMENT_SPEC and pick_experiment.cjs is wired; TestCompileExperimentsRichSchema — parses the spec JSON and asserts all new fields round-trip correctly through the compiler Agent-Logs-Url: https://github.com/github/gh-aw/sessions/cd061a98-3485-49e6-aab5-d40a5af3ca66 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/cd061a98-3485-49e6-aab5-d40a5af3ca66 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Workflow fixtures (
Integration tests (
While writing the tests I discovered a bug: Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Extends the A/B experiment infrastructure with richer schema fields,
min_samples-gated significance detection, guardrail metric enforcement, tracking issue lifecycle automation, and OTEL span attributes for variant filtering in APM tools.Schema (
ExperimentConfig)New fields in
frontmatter_types.go(parsed incompiler_experiments.go):New Go types:
GuardrailMetric{Name, Threshold}+extractStringSlice/extractGuardrailMetricshelpers. Guardrail entries with a missing or emptythresholdare silently skipped so invalid configs don't propagate.Bug fix:
goccy/go-yamlreturns YAML integer values asuint64(notint/int64/float64). Addeduint64cases to the switch blocks formin_samples,issue, andweightincompiler_experiments.go— previously these integer fields were silently dropped from the compiled lock file.JSON Schema (
main_workflow_schema.json)The main workflow JSON schema is updated with the new experiment object-form properties:
hypothesis(string) — hypothesis statement surfaced in step summaries and reportssecondary_metrics(string[]) — additional metrics to collect alongside the primary metricguardrail_metrics(object[]) — each entry requiresname+threshold;thresholdis pattern-validated as^(>=|<=|==|>|<)-?\d+(\.\d+)?$to reject non-numeric or malformed expressionsmin_samples(integer ≥ 1, default 20) — minimum runs per variant before PROMOTE is eligibleowner(string) — owner/team attributionThe top-level description and examples are updated to demonstrate all new fields.
Step Summary (
pick_experiment.cjs)min_samples:A: ████████░░░░░░░░░░░░ 8/25 (32%)min_samplesmin_samples: 0or negative values are treated as unset and the progress section is skipped entirely (preventsRangeErrorfromrepeat(negative))ExperimentConfigtypedef ({T|undefined}→[propName]optional syntax) that was silently preventing the test suite from runningDaily Experiment Report (
daily-experiment-report.md)ABANDONregardless of p-valuePROMOTEonn >= min_samplesper variant (was hardcoded n≥20)min_samplesprogress column and per-guardrail pass/fail rowsadd-commentnotifications to tracking issues for three conditions: all variants hitmin_samples, experiment reaches significance, or a guardrail failsexperiment:active/experiment:ready-for-analysis/experiment:concludedlabels on tracking issues viaadd-labelssafe-output (labels are additive only; removal is manual)OTEL (
shared/observability-otlp.md)Documents how to read
assignments.jsonand emitgh_aw.experiment.<name>span attributes viaotlp.cjs, enabling variant-based filtering in Datadog/Honeycomb.Integration Tests (
pkg/cli/)Two workflow fixtures and a Go integration test file verify that experiments compile correctly end-to-end:
test-experiments-bare-array.md— two bare-array experiments; test asserts both appear inGH_AW_EXPERIMENT_SPECwith correct variantstest-experiments-rich-schema.md— full object form with all new fields; test parses the spec JSON and assertshypothesis,secondary_metrics,guardrail_metrics,min_samples,issue,variants, andweightall round-trip correctly through the compiler