Skip to content

feat: select randomly when cache is empty instead of always picking first variant#29988

Merged
pelikhan merged 3 commits intomainfrom
copilot/update-cache-selection-logics
May 3, 2026
Merged

feat: select randomly when cache is empty instead of always picking first variant#29988
pelikhan merged 3 commits intomainfrom
copilot/update-cache-selection-logics

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 3, 2026

Summary

When multiple experiment variants share the lowest invocation count — including the very first run when the GitHub Actions cache is empty and all counts are zero — the previous implementation always defaulted to the first declared variant. This introduced a systematic bias: on every fresh repository or new branch, the first variant was guaranteed to be selected rather than having an equal chance alongside its peers.

This PR changes the tie-breaking strategy to uniform random selection from the tied variants. The least-used round-robin balancing is preserved; only ties are now broken randomly instead of deterministically by declaration order.

Changes

actions/setup/js/pick_experiment.cjs

  • pickVariant(): rewritten to collect all tied variants in a single pass and return tied[Math.floor(Math.random() * tied.length)]. This is also O(n) instead of the previous O(2n) two-pass approach.
  • Updated the file-header algorithm comment to reflect random tie-breaking.
  • Updated the step-summary footer text.

actions/setup/js/pick_experiment.test.cjs

  • Replaced deterministic "selects first on tie" unit tests with probabilistic tests that verify both variants are reachable from an empty state (200 iterations each; failure probability ≤ (1/2)^200).
  • Added vi.spyOn(Math, 'random').mockReturnValue(0) with explanatory comments to the main() integration tests that need a deterministic first-run outcome.

docs/adr/29534-frontmatter-ab-experiments-variant-selection.md

  • Rule 6 updated: ties MUST now be broken by uniform random selection.
  • Alternative 2 updated: clarifies that random selection is rejected as the sole strategy but retained as the tie-breaking mechanism within the least-used algorithm.

docs/src/content/docs/guides/experiments.md

  • Statistical balancing paragraph updated to mention random tie-breaking on the first run.

pkg/parser/schemas/main_workflow_schema.json

  • experiments field description updated to mention random tie-breaking.

Testing

All 48 JavaScript tests pass (npx vitest run). All Go unit tests pass (make test-unit).

Copilot AI and others added 2 commits May 3, 2026 18:25
…irst variant

When multiple experiment variants share the lowest invocation count
(including the initial empty-cache state where all counts are zero),
pick one uniformly at random instead of always defaulting to the first
declared variant. This prevents systematic bias toward the first variant
on fresh repositories or whenever counts happen to be equal.

Changes:
- pick_experiment.cjs: update pickVariant() to collect all tied variants
  and return tied[Math.floor(Math.random() * tied.length)]
- pick_experiment.test.cjs: update deterministic tie-break unit tests to
  verify randomness; add vi.spyOn(Math, 'random') in main() tests that
  need a predictable first-run outcome
- ADR-29534: update Rule 6 and Alternative 2 to reflect random tie-breaking
- experiments.md guide: update statistical balancing paragraph
- main_workflow_schema.json: update experiments field description

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/58f16353-bee5-464f-91f9-9fee1f89f203

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…ness

- Use a single-pass algorithm in pickVariant(): build the tied array
  during the minCount scan (O(n) instead of O(2n))
- Increase probabilistic tie-break tests from 100 to 200 iterations for
  consistency with the three-variant test; add comments documenting the
  negligible (1/2)^200 flakiness bound

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/58f16353-bee5-464f-91f9-9fee1f89f203

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI requested a review from pelikhan May 3, 2026 18:28
@pelikhan pelikhan marked this pull request as ready for review May 3, 2026 18:28
Copilot AI review requested due to automatic review settings May 3, 2026 18:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes deterministic bias in experiment variant selection by breaking least-used ties with uniform random selection (not declaration order), especially for the first run when the cache is empty.

Changes:

  • Updated pickVariant() to select uniformly at random among tied least-used variants.
  • Updated Vitest coverage to accommodate the new tie-breaking behavior and keep main() tests deterministic via mocking.
  • Updated ADR/docs/schema text to document random tie-breaking.
Show a summary per file
File Description
actions/setup/js/pick_experiment.cjs Implements uniform-random tie-breaking for least-used selection and updates summary text/comments.
actions/setup/js/pick_experiment.test.cjs Updates tests for tie behavior and adds Math.random mocking for deterministic integration tests.
docs/adr/29534-frontmatter-ab-experiments-variant-selection.md Updates normative rule to require uniform-random tie-breaking.
docs/src/content/docs/guides/experiments.md Documents random tie-breaking on first run / equal counts.
pkg/parser/schemas/main_workflow_schema.json Updates schema description to mention random tie-breaking.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (4)

actions/setup/js/pick_experiment.test.cjs:68

  • This test is probabilistic (expects all variants to appear within 200 random trials), which makes it non-deterministic and can cause rare flakes. Consider mocking Math.random() with specific return values to cover multiple tie indices deterministically instead of relying on chance.
    it("randomly selects from all tied variants when all counts are equal", () => {
      const state = { counts: { f: { A: 1, B: 1, C: 1 } } };
      // Run many times and verify all three variants are selected.
      const results = new Set();
      for (let i = 0; i < 200; i++) {
        results.add(pickVariant("f", ["A", "B", "C"], state));
      }
      expect(results).toContain("A");
      expect(results).toContain("B");
      expect(results).toContain("C");
    });

actions/setup/js/pick_experiment.test.cjs:80

  • This test uses repeated calls with real Math.random() to prove both variants are reachable. That makes the test non-deterministic (rare CI flakes are possible). Prefer a deterministic approach by mocking Math.random() to select each tied variant at least once and asserting the results.
    it("handles unknown experiment name (no counts yet) by picking randomly", () => {
      const state = { counts: {} };
      // Both variants must be reachable from an empty state.
      // The probability that one variant never appears in 200 trials is (1/2)^200, which is negligible.
      const results = new Set();
      for (let i = 0; i < 200; i++) {
        results.add(pickVariant("new", ["X", "Y"], state));
      }
      expect(results).toContain("X");
      expect(results).toContain("Y");
    });

actions/setup/js/pick_experiment.test.cjs:173

  • Mocking Math.random() inside the test and restoring it at the end is fragile: if an assertion throws before cleanup, the mocked Math.random() can leak into later tests. Consider using an afterEach(() => vi.restoreAllMocks()) hook, or restore the specific spy in a try/finally block.
      // Force Math.random → 0 so the first tied variant ("A") is selected.
      vi.spyOn(Math, "random").mockReturnValue(0);

actions/setup/js/pick_experiment.test.cjs:203

  • vi.restoreAllMocks() restores every spy in the file, and combined with the per-test vi.clearAllMocks() calls this makes test isolation harder to reason about. Prefer keeping a reference to the Math.random spy and restoring just that spy (or use a suite-level afterEach restore) to avoid unexpected interactions as more spies/mocks are added.
      vi.restoreAllMocks();
      vi.clearAllMocks();

  • Files reviewed: 5/5 changed files
  • Comments generated: 1

Comment on lines 36 to 46
it("selects one of the tied variants randomly when counts are equal", () => {
const state = { counts: {} };
expect(pickVariant("f", ["A", "B"], state)).toBe("A");
// Run many times and verify both variants are eventually selected.
// The probability that one variant never appears in 200 trials is (1/2)^200, which is negligible.
const results = new Set();
for (let i = 0; i < 200; i++) {
results.add(pickVariant("f", ["A", "B"], state));
}
expect(results).toContain("A");
expect(results).toContain("B");
});
@github-actions github-actions Bot mentioned this pull request May 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 79/100

⚠️ Acceptable — test quality is solid, one process note

Metric Value
New/modified tests analyzed 8
✅ Design tests (behavioral contracts) 8 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 5 (63%)
Duplicate test clusters 0
Test inflation detected Yes (52 lines test / 12 lines prod = 4.3:1)
🚨 Coding-guideline violations 0

Test Classification Details

View All 8 Tests
Test File Classification Notes
selects one of the tied variants randomly when counts are equal pick_experiment.test.cjs ✅ Design Probabilistic 200-trial tie-breaking; edge case
randomly selects from all tied variants when all counts are equal pick_experiment.test.cjs ✅ Design 3-way tie edge case; 200-trial probabilistic
handles unknown experiment name (no counts yet) by picking randomly pick_experiment.test.cjs ✅ Design Empty-cache edge case; both variants must be reachable
runs main() and assigns feature1 = A (Math.random mock added) pick_experiment.test.cjs ✅ Design Integration: pins randomness for determinism
persists state between calls to simulate multi-run balance (mock added) pick_experiment.test.cjs ✅ Design Integration: state-persistence behavioral contract
creates assignments.json with all experiment results (mock added) pick_experiment.test.cjs ✅ Design Integration: output file content verified
overwrites assignments.json on successive runs (mock added) pick_experiment.test.cjs ✅ Design Integration: file overwrite + round-robin verified
multiple experiments in single config (mock added) pick_experiment.test.cjs ✅ Design Integration: first-tied-variant per experiment

Flagged Tests — Process Note

⚠️ Test inflation — pick_experiment.test.cjs

Ratio: 52 lines added in test vs 12 in production (4.3:1, threshold 2:1)

Why this is justified, not a concern: The behavior change introduces randomness. The three unit tests correctly use 200-trial probabilistic loops to assert reachability of all variants — a standard pattern for testing random selection. The integration tests required vi.spyOn(Math, 'random').mockReturnValue(0) to pin randomness for determinism. Both are appropriate responses to the new random behavior. The inflation ratio triggers the rubric flag but the test quality is high.


Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 8 tests (vitest)
  • 🐹 Go (*_test.go): 0 tests changed in this PR

Verdict

Check passed. 0% of new/modified tests are implementation tests (threshold: 30%). All 8 tests verify behavioral contracts. The vi.spyOn(Math, 'random') usage in integration tests is appropriate — Math.random is a runtime primitive, pinning it makes integration tests deterministic without obscuring behavioral logic.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does — observable outputs, state changes, error handling. Would catch a behavioral regression if deleted.

Implementation Tests (Low Value) verify how the system does it internally. Break during legitimate refactoring even when behavior is correct.

References: §25287203291

🧪 Test quality analysis by Test Quality Sentinel · ● 638.6K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 79/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). All 8 modified tests verify behavioral contracts. Test inflation (4.3:1) is justified by the probabilistic testing approach required for the new random tie-breaking behavior.

@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented May 3, 2026

@copilot review all comments

…andom mocks

Use vi.spyOn(Math, 'random').mockReturnValueOnce() with specific values
instead of running 200-iteration loops that rely on real randomness.
Each test now asserts an exact variant selection for a controlled random
input, eliminating any possibility of CI flakiness.

- 2-variant tie: 0 → "A", 0.5 → "B"
- 3-variant tie: 0 → "A", 0.4 → "B", 0.7 → "C"
- empty-cache 2-variant: 0 → "X", 0.5 → "Y"

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c7876ebb-f617-451d-b588-19e0a6b7aa98

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 3, 2026

@copilot review all comments

Replaced all three probabilistic loop tests with deterministic vi.spyOn(Math, "random").mockReturnValueOnce() tests in commit 3950933. Each test now asserts an exact variant selection for a specific controlled random value (e.g. 0 → index 0, 0.5 → index 1 for 2-variant; 0, 0.4, 0.7 for 3-variant), eliminating any possibility of CI flakiness.

@pelikhan pelikhan merged commit a6a14b1 into main May 3, 2026
@pelikhan pelikhan deleted the copilot/update-cache-selection-logics branch May 3, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants