feat: experiment infrastructure — schema additions, per-run state, OTEL attributes, audit accuracy by Copilot · Pull Request #29985 · github/gh-aw

Copilot · 2026-05-03T18:19:34Z

The experiment infrastructure lacked traceability (state.json had no per-run history), OTEL observability, and three useful schema fields (analysis_type, tags, notify).

Schema additions (1a/1b/1c)

New fields on the experiments object form:

experiments:
  prompt_style:
    variants: [concise, detailed]
    analysis_type: t_test          # t_test | mann_whitney | proportion_test | bayesian_ab
    tags: [cost, prompting]
    notify:
      issue: 5678                  # post significance alert here (favors issue over discussion)

ExperimentConfig Go struct, main_workflow_schema.json, and JS typedef updated in sync
Extracted extractIntField() helper in the Go compiler to remove duplicated numeric coercion; now validates inputs (rejects negatives, non-integral float64, and out-of-range values)

Per-run metadata in `state.json` (1d)

pick_experiment.cjs now appends a run record to state.runs on every invocation:

{
  "counts": { "prompt_style": { "concise": 18, "detailed": 17 } },
  "runs": [
    { "run_id": "25276903561", "timestamp": "2026-05-03T10:41:00.000Z", "assignments": { "prompt_style": "concise" } }
  ]
}

state.runs is capped to the last 512 entries (MAX_RUN_HISTORY) to prevent unbounded growth in state.json size over time.

extractExperimentData in audit_report_experiments.go now reads the last run record directly when present, replacing the fragile max-count heuristic. Legacy state files (no runs field) fall back to the heuristic transparently.

OTEL resource attributes (3a)

After variant selection, pick_experiment.cjs exports experiment assignments into OTEL_RESOURCE_ATTRIBUTES:

experiment.prompt_style=concise,experiment.model_temp=low

Appends to any pre-existing value so other OTEL instrumentation is not clobbered.

Step summary format (3c)

Summary table now shows Counts (current/total) — count for the selected variant over total assigned runs (derived from state.runs when present, falling back to incremented counts). This correctly reflects all runs including those outside the active date window.

…n metadata, and OTEL attributes - Schema additions (1a/1b/1c): add analysis_type, tags, notify fields to ExperimentConfig Go struct, JSON schema, and JS typedef - Per-run metadata (1d): extend state.json with runs array tracking run_id, timestamp, and assignments per invocation; update audit extraction to prefer direct run records over count heuristic - OTEL attributes (3a): emit experiment.NAME=VARIANT as OTEL_RESOURCE_ATTRIBUTES after each run - Step summary (3c): update table to show Counts (this/total) format - Tests: update and extend JS and Go unit tests for all new features" Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0d88e2f1-f86f-43f6-bda4-5e39f2e410a2 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

…ify JSDoc, improve comment Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0d88e2f1-f86f-43f6-bda4-5e39f2e410a2 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

…ment, improve summary header wording Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0d88e2f1-f86f-43f6-bda4-5e39f2e410a2 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Enhances the experiment infrastructure to improve traceability, schema richness, observability (OTEL), and audit accuracy by recording per-run assignments and adding new experiment metadata fields.

Changes:

Add analysis_type, tags, and notify fields to the experiment rich schema across Go types, JSON schema, and JS typedefs.
Persist per-run assignment history in state.json and update audit extraction to use the latest run record when present (fallback to legacy heuristic).
Export experiment assignments into OTEL_RESOURCE_ATTRIBUTES and update the step summary table format.

Show a summary per file

File	Description
pkg/workflow/frontmatter_types.go	Extends `ExperimentConfig` with `analysis_type`, `tags`, and `notify` (new `ExperimentNotify` type).
pkg/workflow/compiler_experiments.go	Parses the new experiment fields and factors numeric coercion into `extractIntField()`.
pkg/parser/schemas/main_workflow_schema.json	Updates the workflow schema to include the new experiment fields and an example.
pkg/cli/workflows/test-experiments-rich-schema.md	Updates the rich-schema test workflow frontmatter with new experiment fields.
pkg/cli/compile_experiments_integration_test.go	Adds integration assertions for `analysis_type`, `tags`, and `notify`.
pkg/cli/audit_report_experiments_test.go	Adds tests for `extractExperimentData` using the new `runs` history and legacy fallback.
pkg/cli/audit_report_experiments.go	Adds `runs` support to state parsing and uses last run record when available.
actions/setup/js/pick_experiment.test.cjs	Extends JS tests for legacy state handling, per-run run history, and OTEL env exporting.
actions/setup/js/pick_experiment.cjs	Writes per-run records into `state.runs`, exports OTEL resource attributes, and changes the summary table to current/total counts.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 9/9 changed files
Comments generated: 5

+	// When per-run records are available, use the most recent run's assignments directly
+	// instead of inferring them from cumulative counts.
+	if len(state.Runs) > 0 {
+		lastRun := state.Runs[len(state.Runs)-1]
+		if len(lastRun.Assignments) > 0 {
+			experimentDataLog.Printf("Using run record from run_id=%s (timestamp=%s)", lastRun.RunID, lastRun.Timestamp)
+			return &ExperimentData{
+				Assignments:      lastRun.Assignments,
+				CumulativeCounts: state.Counts,
+			}
+		}
+	}


-    lines.push(`| \`${name}\` | **${selected}** | ${variants.join(", ")} | ${countsStr} |`);
+    const thisCount = counts[selected] || 0;
+    // counts values are always numbers (set by recordVariant which uses integer arithmetic).
+    const countValues = /** @type {number[]} */ Object.values(counts);


    const selected = assignments[name];
-    const variants = configs[name]?.variants || [];
    const counts = state.counts[name] || {};
-    const countsStr = variants.map(v => `${v}: ${counts[v] || 0}`).join(", ");
-    lines.push(`| \`${name}\` | **${selected}** | ${variants.join(", ")} | ${countsStr} |`);
+    const thisCount = counts[selected] || 0;
+    // counts values are always numbers (set by recordVariant which uses integer arithmetic).
+    const countValues = /** @type {number[]} */ Object.values(counts);
+    const totalCount = countValues.reduce((a, b) => a + b, 0);
+    lines.push(`| \`${name}\` | **${selected}** | ${thisCount} / ${totalCount} |`);


+  if (Object.keys(assignments).length > 0) {
+    // Append a per-run record to state.runs so each assignment is traceable.
+    const runId = process.env.GITHUB_RUN_ID || "";
+    const timestamp = new Date().toISOString();
+    if (!state.runs) {
+      state.runs = [];
+    }
+    state.runs.push({ run_id: runId, timestamp, assignments: { ...assignments } });
+  }


+// extractIntField converts a numeric any value to int.
+// Returns (int(value), true) on success; (0, false) when val is nil or not a supported numeric type.
+// Note: uint64 values larger than math.MaxInt are truncated on 32-bit systems (int is 32-bit there).
+// Experiment counts and IDs are expected to be well within int range, so this is acceptable.
+func extractIntField(val any) (int, bool) {
+	switch n := val.(type) {
+	case int:
+		return n, true
+	case int64:
+		return int(n), true
+	case uint64:
+		return int(n), true
+	case float64:
+		return int(n), true
+	}
+	return 0, false


…egration, and schema extensions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-03T18:59:44Z

Commit pushed: 33222ed

🏗️ ADR gate enforced by Design Decision Gate 🏗️

github-actions · 2026-05-03T18:59:45Z

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (230 new lines in pkg/ and actions/) but does not have a linked Architecture Decision Record (ADR).

AI has analyzed the PR diff and generated a draft ADR to help you get started:

📄 Draft ADR: docs/adr/29985-experiment-per-run-state-otel-integration-and-schema-extensions.md

The draft covers the three interlocking decisions this PR makes:

Per-run state in state.json — each variant selection is appended as a record with run_id, timestamp, and assignments; the Go audit reporter reads the last record directly instead of using the fragile max-count heuristic.
OTEL resource attributes — experiment assignments are exported into OTEL_RESOURCE_ATTRIBUTES so every span in the job carries the cohort for filtering in Honeycomb/Grafana.
Schema extensions (analysis_type, tags, notify) — three new optional fields added to the experiment object form, propagated through the JSON schema, Go structs, and GH_AW_EXPERIMENT_SPEC.

What to do next

Review the draft ADR committed to your branch — it was generated from the PR diff
Complete any missing sections — add context the AI could not infer, refine the decision rationale, and verify the listed alternatives match what was actually considered
Commit the finalized ADR to docs/adr/ on your branch
Reference the ADR in this PR body by adding a line such as:

ADR: ADR-29985: Experiment Per-Run State, OTEL Integration, and Schema Extensions

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

Why ADRs Matter

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.

📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

Context — What is the problem? What forces are at play?
Decision — What did you decide? Why?
Alternatives Considered — What else could have been done?
Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number.

🔒 This PR cannot merge until an ADR is linked in the PR body.

References: §25287740913

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · ● 223.7K · ◷

github-actions · 2026-05-03T19:04:44Z

test

🧪 Test quality analysis by Test Quality Sentinel · ● 825.7K · ◷

github-actions

🧪 Test Quality Sentinel Report

Test Quality Score: 78/100

⚠️ Acceptable — some edge-case coverage gaps

Metric	Value
New/modified tests analyzed	15
✅ Design tests (behavioral contracts)	15 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	9 (60%)
Duplicate test clusters	0
Test inflation detected	Yes — 2 files exceed 2:1 ratio
🚨 Coding-guideline violations	0

Test Classification Details

View all 15 tests

Test	File	Classification	Notes
"returns empty state when file does not exist"	`pick_experiment.test.cjs`	✅ Design	Edge case: nonexistent file
"returns empty state on invalid JSON"	`pick_experiment.test.cjs`	✅ Design	Edge case: invalid JSON
"round-trips state through save and load"	`pick_experiment.test.cjs`	✅ Design	Happy path — save/load round-trip
"initialises runs to [] when loading legacy state without runs field"	`pick_experiment.test.cjs`	✅ Design	Edge case: backward-compat migration
"preserves existing runs array when loading state"	`pick_experiment.test.cjs`	✅ Design	Happy path — state fidelity
"appends a run record to state.runs after picking variants"	`pick_experiment.test.cjs`	✅ Design	Happy path — new behavioral contract
"accumulates run records across multiple runs"	`pick_experiment.test.cjs`	✅ Design	Happy path — multi-run accumulation
"does not append a run record when no experiments are assigned"	`pick_experiment.test.cjs`	✅ Design	Edge case: empty spec
"exports OTEL_RESOURCE_ATTRIBUTES with experiment assignments"	`pick_experiment.test.cjs`	✅ Design	Happy path — OTEL integration
"appends to existing OTEL_RESOURCE_ATTRIBUTES"	`pick_experiment.test.cjs`	✅ Design	Edge case: pre-existing env var
"does not export OTEL_RESOURCE_ATTRIBUTES when no experiments are assigned"	`pick_experiment.test.cjs`	✅ Design	Edge case: empty spec
`TestExtractExperimentDataWithRuns` / "uses last run record when runs array is present"	`audit_report_experiments_test.go`	✅ Design	Happy path — new lookup strategy
`TestExtractExperimentDataWithRuns` / "falls back to heuristic when runs array is empty"	`audit_report_experiments_test.go`	✅ Design	Edge case: empty runs
`TestExtractExperimentDataWithRuns` / "falls back to heuristic when runs field is absent (legacy state)"	`audit_report_experiments_test.go`	✅ Design	Edge case: legacy state compat
`TestExtractExperimentDataWithRuns` / "skips last run record with empty assignments"	`audit_report_experiments_test.go`	✅ Design	Edge case: empty assignments map

Test Inflation (Advisory)

Two test files have a line-addition ratio > 2:1 compared to their production counterparts:

Test file	Test lines added	Production lines added	Ratio
`pick_experiment.test.cjs`	+108	+42	2.6:1
`audit_report_experiments_test.go`	+90	+23	3.9:1

This is not a quality failure here — the new tests are substantive and cover genuine behavioral contracts, not padding. However, the high ratio for audit_report_experiments_test.go (3.9:1) is worth noting: 90 lines of tests for 23 lines of production code. Inspect whether all four subtests are truly exercising distinct branches, or whether some can be collapsed.

Missing Coverage (Suggestions)

A few scenarios not yet covered by the new tests:

pick_experiment.test.cjs — OTEL attribute format: No test verifies the exact experiment.<name>=<variant> key format for multi-experiment specs (e.g., two features assigned simultaneously). The current tests only exercise single-feature specs.
audit_report_experiments_test.go — multi-feature runs records: TestExtractExperimentDataWithRuns only tests single-feature assignment maps. A test with assignments: {"a": "X", "b": "Y"} would confirm the function handles all features in the last run record.
compile_experiments_integration_test.go — no new test function: The 15 new lines are additional assertions appended to the existing TestCompileExperimentsRichSchema. These are correctly structured and high value; consider splitting into a dedicated sub-test for analysis_type/tags/notify fields to improve isolation.

Language Support

Tests analyzed:

🐹 Go (*_test.go): 5 test scenarios (4 via table-driven TestExtractExperimentDataWithRuns + integration assertions) — unit (//go:build !integration) and integration (//go:build integration) build tags both present ✅
🟨 JavaScript (*.test.cjs): 11 tests (vitest, vi.fn() / mockCore for external I/O — legitimate mocking targets)

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). All 15 tests verify behavioral contracts. Test inflation is advisory only — the tests themselves are substantive. No coding-guideline violations detected.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References:

§25287740931

pelikhan · 2026-05-03T19:05:38Z

@copilot merge main and recompile

Review all comments.
Favor issues for notify.

…iment-infrastructure-again Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

…validate extractIntField, favor issue in notify, robust tests Agent-Logs-Url: https://github.com/github/gh-aw/sessions/394e4a02-4cb4-4c79-b88f-417d7c638103 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-03T19:24:30Z

@copilot merge main and recompile

Review all comments....

Done in b1e7cb8:

Merged main and recompiled all workflows
Favor issue in notify: updated schema example, test fixture, and integration test to use notify.issue
All review comments addressed:
- extractExperimentData doc comment updated to describe the new runs-based path and legacy fallback
- extractIntField now rejects negatives, non-integral float64, and out-of-range values
- totalCount in step summary now counts from state.runs when present (no more 0/0 for date-gated runs)
- state.runs is capped to last 100 entries (MAX_RUN_HISTORY) to prevent unbounded growth
- JSDoc cast fixed to (/** @type {number[]} */ (expr)) parenthesized form

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

https://api.github.com/graphql
- Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw (http block)
- Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw %H %ct %D (http block)
- Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw b/gh-aw/pkg/workapi (http block)
https://api.github.com/repos/actions/checkout/git/ref/tags/v6
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv nt (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv ithub/workflows (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv /ref/tags/v9 (http block)
https://api.github.com/repos/actions/download-artifact/git/ref/tags/v8
- Triggering command: /usr/bin/gh gh api /repos/actions/download-artifact/git/ref/tags/v8 --jq [.object.sha, .object.type] | @tsv vitest run $name) { hasDiscussionsEnabled } } ignore (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/download-artifact/git/ref/tags/v8 --jq [.object.sha, .object.type] | @tsv "vitest" run --no-file-parallelism pick_experiment (http block)
https://api.github.com/repos/actions/github-script/git/ref/tags/v9
- Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v9 --jq [.object.sha, .object.type] | @tsv nt (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v9 --jq [.object.sha, .object.type] | @tsv (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v9 --jq [.object.sha, .object.type] | @tsv ns.json and pkg/workflow/data/action_pins.json..." structure-again de r (http block)
https://api.github.com/repos/actions/setup-node/git/ref/tags/v6
- Triggering command: /usr/bin/gh gh api /repos/actions/setup-node/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/setup-node/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv te 'scripts/**/*.js' --ignore-path .prettierignore --log-level=e!../../../pkg/workflow/js/**/*.jinfocmp (http block)
https://api.github.com/repos/astral-sh/setup-uv/git/ref/tags/eac588ad8def6316056a12d4907a9d4d84ff7a3b
- Triggering command: /usr/bin/gh gh api /repos/astral-sh/setup-uv/git/ref/tags/eac588ad8def6316056a12d4907a9d4d84ff7a3b --jq [.object.sha, .object.type] | @tsv w/js/**/*.json' --ignore-path (http block)
https://api.github.com/repos/docker/metadata-action/git/ref/tags/v6
- Triggering command: /usr/bin/gh gh api /repos/docker/metadata-action/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv .js' --ignore-paremote.origin.url (http block)
- Triggering command: /usr/bin/gh gh api /repos/docker/metadata-action/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv .js' --ignore-path .prettierignoremote.origin.url (http block)
https://api.github.com/repos/docker/setup-buildx-action/git/ref/tags/v4
- Triggering command: /usr/bin/gh gh api /repos/docker/setup-buildx-action/git/ref/tags/v4 --jq [.object.sha, .object.type] | @tsv .js' --ignore-path .prettierignore --log-level=error security ode -nxv (http block)
- Triggering command: /usr/bin/gh gh api /repos/docker/setup-buildx-action/git/ref/tags/v4 --jq [.object.sha, .object.type] | @tsv .js' --ignore-path .prettierigno-f nux-x86_64 ules/.bin/node (http block)
https://api.github.com/repos/github/gh-aw
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw --jq .default_branch ithub/workflows --no-file-parallowner=github /home/REDACTED/.lo-f (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

pelikhan · 2026-05-03T19:26:11Z

@copilot set MAX_RUN_HISTORY to 512

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/9acb139f-07b1-4e87-bc05-7d0026cfdaad Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-03T19:33:45Z

@copilot set MAX_RUN_HISTORY to 512

Done in 505ee93 — MAX_RUN_HISTORY updated to 512.

Initial plan

d7f5bc6

Copilot AI assigned Copilot and pelikhan May 3, 2026

Copilot started work on behalf of pelikhan May 3, 2026 18:19 View session

Copilot AI linked an issue May 3, 2026 that may be closed by this pull request

[ab-advisor] Improve experiment infrastructure: schema, reporting & audit #29911

Closed

9 tasks

Copilot AI and others added 3 commits May 3, 2026 18:41

refactor: address code review - extract extractIntField helper, simpl…

932574d

…ify JSDoc, improve comment Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0d88e2f1-f86f-43f6-bda4-5e39f2e410a2 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

refactor: add overflow comment to extractIntField, clarify counts com…

a828306

…ment, improve summary header wording Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0d88e2f1-f86f-43f6-bda4-5e39f2e410a2 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Improve experiment infrastructure for schema, reporting, and audit~~ feat: experiment infrastructure — schema additions, per-run state, OTEL attributes, audit accuracy May 3, 2026

Copilot finished work on behalf of pelikhan May 3, 2026 18:49

Copilot AI requested a review from pelikhan May 3, 2026 18:49

pelikhan marked this pull request as ready for review May 3, 2026 18:52

Copilot AI review requested due to automatic review settings May 3, 2026 18:52

Copilot started reviewing on behalf of pelikhan May 3, 2026 18:53 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

docs(adr): add draft ADR-29985 for experiment per-run state, OTEL int…

33222ed

…egration, and schema extensions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions Bot approved these changes May 3, 2026

View reviewed changes

Copilot started work on behalf of pelikhan May 3, 2026 19:08 View session

Copilot AI and others added 2 commits May 3, 2026 19:12

Merge remote-tracking branch 'origin/main' into copilot/improve-exper…

024bfb9

…iment-infrastructure-again Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 3, 2026 19:29

Copilot started work on behalf of pelikhan May 3, 2026 19:29 View session

fix: increase MAX_RUN_HISTORY to 512

505ee93

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/9acb139f-07b1-4e87-bc05-7d0026cfdaad Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 3, 2026 19:34

pelikhan merged commit bc8b014 into main May 3, 2026

pelikhan deleted the copilot/improve-experiment-infrastructure-again branch May 3, 2026 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: experiment infrastructure — schema additions, per-run state, OTEL attributes, audit accuracy#29985

feat: experiment infrastructure — schema additions, per-run state, OTEL attributes, audit accuracy#29985
pelikhan merged 8 commits intomainfrom
copilot/improve-experiment-infrastructure-again

Copilot AI commented May 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

pelikhan commented May 3, 2026

Uh oh!

Copilot AI commented May 3, 2026 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

pelikhan commented May 3, 2026

Uh oh!

Copilot AI commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Schema additions (1a/1b/1c)

Per-run metadata in state.json (1d)

OTEL resource attributes (3a)

Step summary format (3c)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

🏗️ Design Decision Gate — ADR Required

What to do next

Why ADRs Matter

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🧪 Test Quality Sentinel Report

Test Quality Score: 78/100

Test Classification Details

Test Inflation (Advisory)

Missing Coverage (Suggestions)

Language Support

Verdict

Uh oh!

pelikhan commented May 3, 2026

Uh oh!

Copilot AI commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

pelikhan commented May 3, 2026

Uh oh!

Copilot AI commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 3, 2026 •

edited

Loading

Per-run metadata in `state.json` (1d)

Copilot AI commented May 3, 2026 •

edited

Loading