Skip to content

feat: add experiment information as OTLP telemetry attribute#29593

Merged
pelikhan merged 5 commits intomainfrom
copilot/add-experiment-information-attribute
May 1, 2026
Merged

feat: add experiment information as OTLP telemetry attribute#29593
pelikhan merged 5 commits intomainfrom
copilot/add-experiment-information-attribute

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 1, 2026

Summary

Adds experiment assignment information as attributes in the OTLP OpenTelemetry telemetry spans, enabling A/B experiment variant correlation in observability backends.

Changes

actions/setup/js/pick_experiment.cjs

  • After selecting variants, writes current-run assignments to /tmp/gh-aw/experiments/assignments.json alongside the existing state file (which tracks counts)
  • Write is guarded: only occurs when at least one experiment is successfully assigned (skips empty-spec and all-invalid-variant cases)

actions/setup/js/send_otlp_span.cjs

  • Adds EXPERIMENT_ASSIGNMENTS_PATH constant (/tmp/gh-aw/experiments/assignments.json)
  • Adds readExperimentAssignments() helper that reads the assignments file (returns null if absent/invalid — non-fatal)
  • Adds buildExperimentAttributes() helper that converts assignments to OTLP span attributes:
    • One gh-aw.experiment.<name> attribute per experiment (e.g. gh-aw.experiment.caveman=yes)
    • A combined gh-aw.experiments JSON string attribute (only when at least one valid variant exists)
  • Both sendJobSetupSpan and sendJobConclusionSpan now include experiment attributes
  • All three helpers exported from the module

Tests

  • pick_experiment.test.cjs: 3 new tests covering assignments file creation, successive-run overwrite, and no-write cases
  • send_otlp_span.test.cjs: 10 new tests covering readExperimentAssignments, buildExperimentAttributes, and experiment attribute inclusion in both setup and conclusion spans

Attribute design

Attribute Value Example
gh-aw.experiment.<name> Selected variant string gh-aw.experiment.caveman=yes
gh-aw.experiments JSON of all assignments {"caveman":"yes","style":"detailed"}

The per-experiment attributes allow filtering on individual variants in dashboards. The aggregated JSON attribute enables substring search in backends that don't support per-attribute filtering.

Copilot AI and others added 2 commits May 1, 2026 18:38
- pick_experiment.cjs now writes assignments.json alongside state.json
  so downstream spans can read the current variant without recomputing
- send_otlp_span.cjs gains readExperimentAssignments() and
  buildExperimentAttributes() helpers
- sendJobSetupSpan and sendJobConclusionSpan both include
  gh-aw.experiment.<name> (one per experiment) and gh-aw.experiments
  (JSON of all assignments) span attributes
- Tests added for pick_experiment.cjs (assignments file writing) and
  send_otlp_span.cjs (experiment attributes in setup and conclusion spans)

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/f0ab1e69-b2cd-46eb-977e-8c582c5c65a1

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…gnments.json write

- buildExperimentAttributes only appends gh-aw.experiments when
  at least one valid variant attribute was produced
- pick_experiment.cjs only writes assignments.json when at least
  one experiment was successfully assigned
- Added test for all-empty-variant-string case in buildExperimentAttributes
- Added test for <2-variant experiment skipping in pick_experiment main

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/f0ab1e69-b2cd-46eb-977e-8c582c5c65a1

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI requested a review from pelikhan May 1, 2026 18:42
@pelikhan pelikhan marked this pull request as ready for review May 1, 2026 18:50
Copilot AI review requested due to automatic review settings May 1, 2026 18:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds A/B experiment assignment metadata to OTLP spans so observability backends can correlate traces with experiment variants.

Changes:

  • Persist current-run experiment assignments to an assignments.json file in pick_experiment.cjs.
  • Read assignments in send_otlp_span.cjs and emit them as OTLP span attributes (gh-aw.experiment.<name> and gh-aw.experiments).
  • Add/extend test coverage for both the assignments file behavior and OTLP attribute emission.
Show a summary per file
File Description
actions/setup/js/pick_experiment.cjs Writes current-run experiment assignments to assignments.json when at least one assignment exists.
actions/setup/js/pick_experiment.test.cjs Adds tests validating assignments.json creation/overwrite and no-write scenarios.
actions/setup/js/send_otlp_span.cjs Reads experiment assignments and injects them into setup/conclusion OTLP spans; exports new helpers/constants.
actions/setup/js/send_otlp_span.test.cjs Adds tests for reading/building experiment attributes and ensuring spans include/omit them appropriately.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 2

Comment on lines +212 to +218
/**
* Path to the experiment assignments file written by pick_experiment.cjs.
* Contains a JSON object mapping experiment name → selected variant for the
* current workflow run. Example: `{"caveman":"yes","style":"detailed"}`.
* @type {string}
*/
const EXPERIMENT_ASSIGNMENTS_PATH = "/tmp/gh-aw/experiments/assignments.json";
Comment on lines +254 to +268
function buildExperimentAttributes(assignments) {
if (!assignments || typeof assignments !== "object") return [];
const names = Object.keys(assignments).sort();
if (names.length === 0) return [];
const attrs = [];
for (const name of names) {
const variant = assignments[name];
if (typeof variant === "string" && variant) {
attrs.push(buildAttr(`gh-aw.experiment.${name}`, variant));
}
}
if (attrs.length > 0) {
attrs.push(buildAttr("gh-aw.experiments", JSON.stringify(assignments)));
}
return attrs;
@github-actions github-actions Bot mentioned this pull request May 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 83/100

Excellent

Metric Value
New/modified tests analyzed 17
✅ Design tests (behavioral contracts) 17 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 13 (76%)
Duplicate test clusters 0
Test inflation detected ⚠️ Yes — pick_experiment.test.cjs (58 test lines / 9 production lines = 6.4:1)
🚨 Coding-guideline violations None

Test Classification Details

View all 17 test classifications
Test File Classification Issues Detected
writes assignments.json alongside state.json after picking variants pick_experiment.test.cjs ✅ Design Verifies file existence and content on disk
overwrites assignments.json on successive runs reflecting the current variant pick_experiment.test.cjs ✅ Design Tests state transition across two invocations
does not write assignments.json when spec is empty pick_experiment.test.cjs ✅ Design Edge case: empty spec
does not write assignments.json when all experiments have fewer than 2 variants pick_experiment.test.cjs ✅ Design Edge case: boundary condition on variant count
includes gh-aw.experiment.<name> and gh-aw.experiments attributes when assignments file exists (setup) send_otlp_span.test.cjs ✅ Design Verifies all three OTLP attribute values in the emitted span
omits experiment attributes when assignments file is absent (setup) send_otlp_span.test.cjs ✅ Design Edge case: absent file → no attributes
returns null when the assignments file does not exist send_otlp_span.test.cjs ✅ Design Edge case: ENOENT → null return
returns null when the assignments file contains invalid JSON send_otlp_span.test.cjs ✅ Design Edge case: corrupt file → null
returns null when the assignments file contains a non-object value send_otlp_span.test.cjs ✅ Design Edge case: wrong type → null
returns the parsed assignments object when the file is valid send_otlp_span.test.cjs ✅ Design Happy-path return value
returns an empty array for null input send_otlp_span.test.cjs ✅ Design Edge case: null input
returns an empty array for an empty assignments object send_otlp_span.test.cjs ✅ Design Edge case: empty object
builds one attribute per experiment plus the aggregated gh-aw.experiments attribute send_otlp_span.test.cjs ✅ Design Verifies attribute structure and aggregated JSON
skips assignments with non-string or empty-string variants and still adds gh-aw.experiments for valid ones send_otlp_span.test.cjs ✅ Design Edge case: mixed valid/invalid variants
returns empty array and omits gh-aw.experiments when all variants are empty strings send_otlp_span.test.cjs ✅ Design Edge case: all variants invalid
includes gh-aw.experiment.<name> and gh-aw.experiments attributes in conclusion span send_otlp_span.test.cjs ✅ Design Verifies attributes in the conclusion span
omits experiment attributes in conclusion span when assignments file is absent send_otlp_span.test.cjs ✅ Design Edge case: absent file → no attributes in conclusion span

Score Breakdown

Component Score
Behavioral coverage (17/17 design tests) 40/40
Error/edge case coverage (13/17 tests = 76%) 23/30
Low duplication (0 clusters) 20/20
Proportional growth (inflation penalty: pick_experiment 6.4:1 > 2:1) 0/10
Total 83/100

The only deduction is the test inflation flag on pick_experiment.test.cjs (58 lines added vs. 9 production lines). In practice, this is benign — the 4 new test scenarios are substantive and cover meaningful behavioral contracts (file-system side-effects, state transitions, empty/boundary inputs). The inflation ratio is mechanically high because the new production code is concise.


Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 17 tests (vitest)

No Go test files were added or modified in this PR.


Notes on Mocking Strategy

The tests make heavy use of vi.spyOn(fs, "readFileSync") and vi.stubGlobal("fetch", ...). Both are external I/O (filesystem and network), which is acceptable under the project's guidelines. No internal business-logic functions are mocked. All assertions target observable outputs (OTLP span attribute values, file contents, return values of helper functions).


Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All 17 tests verify behavioral contracts — including a strong set of error and edge-case scenarios.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §25228044513

🧪 Test quality analysis by Test Quality Sentinel · ● 515.3K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 83/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 17 new vitest tests verify behavioral contracts with strong edge-case coverage.

@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented May 1, 2026

…tered experiments JSON

- readExperimentAssignments() now derives the file path from
  GH_AW_EXPERIMENT_STATE_DIR env var (same as pick_experiment.cjs),
  falling back to EXPERIMENT_ASSIGNMENTS_PATH when not set
- buildExperimentAttributes() builds gh-aw.experiments JSON from the
  filtered valid assignments only (excludes empty/non-string variants),
  making it consistent with the per-experiment attributes
- Updated/added tests for both fixes

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5ccb46be-572d-4706-9cd6-c6f4ee50a256

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 1, 2026

@copilot review all comments, fix https://github.com/github/gh-aw/actions/runs/25228047833/job/73976430444?pr=29593

Fixed both review comments in ad409cb:

  1. EXPERIMENT_ASSIGNMENTS_PATH / configurable dir: readExperimentAssignments() now checks GH_AW_EXPERIMENT_STATE_DIR and reads from <dir>/assignments.json when set, falling back to the hardcoded default only when the env var is absent — keeping it in sync with pick_experiment.cjs.

  2. gh-aw.experiments consistency: buildExperimentAttributes() now builds the aggregated JSON from the same filtered set used for per-experiment attributes (only valid, non-empty string variants), so both are always consistent.

The CI failure in create_issue.test.cjs is a pre-existing flaky test unrelated to these changes.

@pelikhan pelikhan merged commit aca5a8c into main May 1, 2026
6 of 7 checks passed
@pelikhan pelikhan deleted the copilot/add-experiment-information-attribute branch May 1, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants