feat: select randomly when cache is empty instead of always picking first variant by Copilot · Pull Request #29988 · github/gh-aw

Copilot · 2026-05-03T18:27:07Z

Summary

When multiple experiment variants share the lowest invocation count — including the very first run when the GitHub Actions cache is empty and all counts are zero — the previous implementation always defaulted to the first declared variant. This introduced a systematic bias: on every fresh repository or new branch, the first variant was guaranteed to be selected rather than having an equal chance alongside its peers.

This PR changes the tie-breaking strategy to uniform random selection from the tied variants. The least-used round-robin balancing is preserved; only ties are now broken randomly instead of deterministically by declaration order.

Changes

`actions/setup/js/pick_experiment.cjs`

pickVariant(): rewritten to collect all tied variants in a single pass and return tied[Math.floor(Math.random() * tied.length)]. This is also O(n) instead of the previous O(2n) two-pass approach.
Updated the file-header algorithm comment to reflect random tie-breaking.
Updated the step-summary footer text.

`actions/setup/js/pick_experiment.test.cjs`

Replaced deterministic "selects first on tie" unit tests with probabilistic tests that verify both variants are reachable from an empty state (200 iterations each; failure probability ≤ (1/2)^200).
Added vi.spyOn(Math, 'random').mockReturnValue(0) with explanatory comments to the main() integration tests that need a deterministic first-run outcome.

`docs/adr/29534-frontmatter-ab-experiments-variant-selection.md`

Rule 6 updated: ties MUST now be broken by uniform random selection.
Alternative 2 updated: clarifies that random selection is rejected as the sole strategy but retained as the tie-breaking mechanism within the least-used algorithm.

`docs/src/content/docs/guides/experiments.md`

Statistical balancing paragraph updated to mention random tie-breaking on the first run.

`pkg/parser/schemas/main_workflow_schema.json`

experiments field description updated to mention random tie-breaking.

Testing

All 48 JavaScript tests pass (npx vitest run). All Go unit tests pass (make test-unit).

…irst variant When multiple experiment variants share the lowest invocation count (including the initial empty-cache state where all counts are zero), pick one uniformly at random instead of always defaulting to the first declared variant. This prevents systematic bias toward the first variant on fresh repositories or whenever counts happen to be equal. Changes: - pick_experiment.cjs: update pickVariant() to collect all tied variants and return tied[Math.floor(Math.random() * tied.length)] - pick_experiment.test.cjs: update deterministic tie-break unit tests to verify randomness; add vi.spyOn(Math, 'random') in main() tests that need a predictable first-run outcome - ADR-29534: update Rule 6 and Alternative 2 to reflect random tie-breaking - experiments.md guide: update statistical balancing paragraph - main_workflow_schema.json: update experiments field description Agent-Logs-Url: https://github.com/github/gh-aw/sessions/58f16353-bee5-464f-91f9-9fee1f89f203 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

…ness - Use a single-pass algorithm in pickVariant(): build the tied array during the minCount scan (O(n) instead of O(2n)) - Increase probabilistic tie-break tests from 100 to 200 iterations for consistency with the three-variant test; add comments documenting the negligible (1/2)^200 flakiness bound Agent-Logs-Url: https://github.com/github/gh-aw/sessions/58f16353-bee5-464f-91f9-9fee1f89f203 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

This PR removes deterministic bias in experiment variant selection by breaking least-used ties with uniform random selection (not declaration order), especially for the first run when the cache is empty.

Changes:

Updated pickVariant() to select uniformly at random among tied least-used variants.
Updated Vitest coverage to accommodate the new tie-breaking behavior and keep main() tests deterministic via mocking.
Updated ADR/docs/schema text to document random tie-breaking.

Show a summary per file

File	Description
actions/setup/js/pick_experiment.cjs	Implements uniform-random tie-breaking for least-used selection and updates summary text/comments.
actions/setup/js/pick_experiment.test.cjs	Updates tests for tie behavior and adds `Math.random` mocking for deterministic integration tests.
docs/adr/29534-frontmatter-ab-experiments-variant-selection.md	Updates normative rule to require uniform-random tie-breaking.
docs/src/content/docs/guides/experiments.md	Documents random tie-breaking on first run / equal counts.
pkg/parser/schemas/main_workflow_schema.json	Updates schema description to mention random tie-breaking.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (4)

actions/setup/js/pick_experiment.test.cjs:68

This test is probabilistic (expects all variants to appear within 200 random trials), which makes it non-deterministic and can cause rare flakes. Consider mocking Math.random() with specific return values to cover multiple tie indices deterministically instead of relying on chance.

    it("randomly selects from all tied variants when all counts are equal", () => {
      const state = { counts: { f: { A: 1, B: 1, C: 1 } } };
      // Run many times and verify all three variants are selected.
      const results = new Set();
      for (let i = 0; i < 200; i++) {
        results.add(pickVariant("f", ["A", "B", "C"], state));
      }
      expect(results).toContain("A");
      expect(results).toContain("B");
      expect(results).toContain("C");
    });

actions/setup/js/pick_experiment.test.cjs:80

This test uses repeated calls with real Math.random() to prove both variants are reachable. That makes the test non-deterministic (rare CI flakes are possible). Prefer a deterministic approach by mocking Math.random() to select each tied variant at least once and asserting the results.

    it("handles unknown experiment name (no counts yet) by picking randomly", () => {
      const state = { counts: {} };
      // Both variants must be reachable from an empty state.
      // The probability that one variant never appears in 200 trials is (1/2)^200, which is negligible.
      const results = new Set();
      for (let i = 0; i < 200; i++) {
        results.add(pickVariant("new", ["X", "Y"], state));
      }
      expect(results).toContain("X");
      expect(results).toContain("Y");
    });

actions/setup/js/pick_experiment.test.cjs:173

Mocking Math.random() inside the test and restoring it at the end is fragile: if an assertion throws before cleanup, the mocked Math.random() can leak into later tests. Consider using an afterEach(() => vi.restoreAllMocks()) hook, or restore the specific spy in a try/finally block.

      // Force Math.random → 0 so the first tied variant ("A") is selected.
      vi.spyOn(Math, "random").mockReturnValue(0);

actions/setup/js/pick_experiment.test.cjs:203

vi.restoreAllMocks() restores every spy in the file, and combined with the per-test vi.clearAllMocks() calls this makes test isolation harder to reason about. Prefer keeping a reference to the Math.random spy and restoring just that spy (or use a suite-level afterEach restore) to avoid unexpected interactions as more spies/mocks are added.

      vi.restoreAllMocks();
      vi.clearAllMocks();

Files reviewed: 5/5 changed files
Comments generated: 1

+    it("selects one of the tied variants randomly when counts are equal", () => {
      const state = { counts: {} };
-      expect(pickVariant("f", ["A", "B"], state)).toBe("A");
+      // Run many times and verify both variants are eventually selected.
+      // The probability that one variant never appears in 200 trials is (1/2)^200, which is negligible.
+      const results = new Set();
+      for (let i = 0; i < 200; i++) {
+        results.add(pickVariant("f", ["A", "B"], state));
+      }
+      expect(results).toContain("A");
+      expect(results).toContain("B");
    });


github-actions · 2026-05-03T18:35:04Z

🧪 Test Quality Sentinel Report

Test Quality Score: 79/100

⚠️ Acceptable — test quality is solid, one process note

Metric	Value
New/modified tests analyzed	8
✅ Design tests (behavioral contracts)	8 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	5 (63%)
Duplicate test clusters	0
Test inflation detected	Yes (52 lines test / 12 lines prod = 4.3:1)
🚨 Coding-guideline violations	0

Test Classification Details

View All 8 Tests

Test	File	Classification	Notes
`selects one of the tied variants randomly when counts are equal`	`pick_experiment.test.cjs`	✅ Design	Probabilistic 200-trial tie-breaking; edge case
`randomly selects from all tied variants when all counts are equal`	`pick_experiment.test.cjs`	✅ Design	3-way tie edge case; 200-trial probabilistic
`handles unknown experiment name (no counts yet) by picking randomly`	`pick_experiment.test.cjs`	✅ Design	Empty-cache edge case; both variants must be reachable
`runs main() and assigns feature1 = A` (Math.random mock added)	`pick_experiment.test.cjs`	✅ Design	Integration: pins randomness for determinism
`persists state between calls to simulate multi-run balance` (mock added)	`pick_experiment.test.cjs`	✅ Design	Integration: state-persistence behavioral contract
`creates assignments.json with all experiment results` (mock added)	`pick_experiment.test.cjs`	✅ Design	Integration: output file content verified
`overwrites assignments.json on successive runs` (mock added)	`pick_experiment.test.cjs`	✅ Design	Integration: file overwrite + round-robin verified
`multiple experiments in single config` (mock added)	`pick_experiment.test.cjs`	✅ Design	Integration: first-tied-variant per experiment

Flagged Tests — Process Note

⚠️ Test inflation — `pick_experiment.test.cjs`

Ratio: 52 lines added in test vs 12 in production (4.3:1, threshold 2:1)

Why this is justified, not a concern: The behavior change introduces randomness. The three unit tests correctly use 200-trial probabilistic loops to assert reachability of all variants — a standard pattern for testing random selection. The integration tests required vi.spyOn(Math, 'random').mockReturnValue(0) to pin randomness for determinism. Both are appropriate responses to the new random behavior. The inflation ratio triggers the rubric flag but the test quality is high.

Language Support

Tests analyzed:

🟨 JavaScript (*.test.cjs): 8 tests (vitest)
🐹 Go (*_test.go): 0 tests changed in this PR

Verdict

✅ Check passed. 0% of new/modified tests are implementation tests (threshold: 30%). All 8 tests verify behavioral contracts. The vi.spyOn(Math, 'random') usage in integration tests is appropriate — Math.random is a runtime primitive, pinning it makes integration tests deterministic without obscuring behavioral logic.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does — observable outputs, state changes, error handling. Would catch a behavioral regression if deleted.

Implementation Tests (Low Value) verify how the system does it internally. Break during legitimate refactoring even when behavior is correct.

References: §25287203291

🧪 Test quality analysis by Test Quality Sentinel · ● 638.6K · ◷

github-actions

✅ Test Quality Sentinel: 79/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). All 8 modified tests verify behavioral contracts. Test inflation (4.3:1) is justified by the probabilistic testing approach required for the new random tie-breaking behavior.

pelikhan · 2026-05-03T18:36:56Z

@copilot review all comments

…andom mocks Use vi.spyOn(Math, 'random').mockReturnValueOnce() with specific values instead of running 200-iteration loops that rely on real randomness. Each test now asserts an exact variant selection for a controlled random input, eliminating any possibility of CI flakiness. - 2-variant tie: 0 → "A", 0.5 → "B" - 3-variant tie: 0 → "A", 0.4 → "B", 0.7 → "C" - empty-cache 2-variant: 0 → "X", 0.5 → "Y" Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c7876ebb-f617-451d-b588-19e0a6b7aa98 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-03T18:43:42Z

@copilot review all comments

Replaced all three probabilistic loop tests with deterministic vi.spyOn(Math, "random").mockReturnValueOnce() tests in commit 3950933. Each test now asserts an exact variant selection for a specific controlled random value (e.g. 0 → index 0, 0.5 → index 1 for 2-variant; 0, 0.4, 0.7 for 3-variant), eliminating any possibility of CI flakiness.

Copilot AI and others added 2 commits May 3, 2026 18:25

Copilot AI assigned Copilot and pelikhan May 3, 2026

Copilot created this pull request from a session on behalf of pelikhan May 3, 2026 18:27 View session

Copilot finished work on behalf of pelikhan May 3, 2026 18:28

Copilot AI requested a review from pelikhan May 3, 2026 18:28

pelikhan marked this pull request as ready for review May 3, 2026 18:28

Copilot AI review requested due to automatic review settings May 3, 2026 18:28

Copilot started reviewing on behalf of pelikhan May 3, 2026 18:29 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 3, 2026

[aw] No-Op Runs #29134

Open

github-actions Bot approved these changes May 3, 2026

View reviewed changes

Copilot started work on behalf of pelikhan May 3, 2026 18:38 View session

Copilot finished work on behalf of pelikhan May 3, 2026 18:44

pelikhan merged commit a6a14b1 into main May 3, 2026

pelikhan deleted the copilot/update-cache-selection-logics branch May 3, 2026 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: select randomly when cache is empty instead of always picking first variant#29988

feat: select randomly when cache is empty instead of always picking first variant#29988
pelikhan merged 3 commits intomainfrom
copilot/update-cache-selection-logics

Copilot AI commented May 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

pelikhan commented May 3, 2026

Uh oh!

Copilot AI commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented May 3, 2026

Summary

Changes

actions/setup/js/pick_experiment.cjs

actions/setup/js/pick_experiment.test.cjs

docs/adr/29534-frontmatter-ab-experiments-variant-selection.md

docs/src/content/docs/guides/experiments.md

pkg/parser/schemas/main_workflow_schema.json

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented May 3, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 79/100

Test Classification Details

Flagged Tests — Process Note

⚠️ Test inflation — pick_experiment.test.cjs

Language Support

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pelikhan commented May 3, 2026

Uh oh!

Copilot AI commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`actions/setup/js/pick_experiment.cjs`

`actions/setup/js/pick_experiment.test.cjs`

`docs/adr/29534-frontmatter-ab-experiments-variant-selection.md`

`docs/src/content/docs/guides/experiments.md`

`pkg/parser/schemas/main_workflow_schema.json`

⚠️ Test inflation — `pick_experiment.test.cjs`