feat: select randomly when cache is empty instead of always picking first variant#29988
feat: select randomly when cache is empty instead of always picking first variant#29988
Conversation
…irst variant When multiple experiment variants share the lowest invocation count (including the initial empty-cache state where all counts are zero), pick one uniformly at random instead of always defaulting to the first declared variant. This prevents systematic bias toward the first variant on fresh repositories or whenever counts happen to be equal. Changes: - pick_experiment.cjs: update pickVariant() to collect all tied variants and return tied[Math.floor(Math.random() * tied.length)] - pick_experiment.test.cjs: update deterministic tie-break unit tests to verify randomness; add vi.spyOn(Math, 'random') in main() tests that need a predictable first-run outcome - ADR-29534: update Rule 6 and Alternative 2 to reflect random tie-breaking - experiments.md guide: update statistical balancing paragraph - main_workflow_schema.json: update experiments field description Agent-Logs-Url: https://github.com/github/gh-aw/sessions/58f16353-bee5-464f-91f9-9fee1f89f203 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…ness - Use a single-pass algorithm in pickVariant(): build the tied array during the minCount scan (O(n) instead of O(2n)) - Increase probabilistic tie-break tests from 100 to 200 iterations for consistency with the three-variant test; add comments documenting the negligible (1/2)^200 flakiness bound Agent-Logs-Url: https://github.com/github/gh-aw/sessions/58f16353-bee5-464f-91f9-9fee1f89f203 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR removes deterministic bias in experiment variant selection by breaking least-used ties with uniform random selection (not declaration order), especially for the first run when the cache is empty.
Changes:
- Updated
pickVariant()to select uniformly at random among tied least-used variants. - Updated Vitest coverage to accommodate the new tie-breaking behavior and keep
main()tests deterministic via mocking. - Updated ADR/docs/schema text to document random tie-breaking.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/pick_experiment.cjs | Implements uniform-random tie-breaking for least-used selection and updates summary text/comments. |
| actions/setup/js/pick_experiment.test.cjs | Updates tests for tie behavior and adds Math.random mocking for deterministic integration tests. |
| docs/adr/29534-frontmatter-ab-experiments-variant-selection.md | Updates normative rule to require uniform-random tie-breaking. |
| docs/src/content/docs/guides/experiments.md | Documents random tie-breaking on first run / equal counts. |
| pkg/parser/schemas/main_workflow_schema.json | Updates schema description to mention random tie-breaking. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (4)
actions/setup/js/pick_experiment.test.cjs:68
- This test is probabilistic (expects all variants to appear within 200 random trials), which makes it non-deterministic and can cause rare flakes. Consider mocking
Math.random()with specific return values to cover multiple tie indices deterministically instead of relying on chance.
it("randomly selects from all tied variants when all counts are equal", () => {
const state = { counts: { f: { A: 1, B: 1, C: 1 } } };
// Run many times and verify all three variants are selected.
const results = new Set();
for (let i = 0; i < 200; i++) {
results.add(pickVariant("f", ["A", "B", "C"], state));
}
expect(results).toContain("A");
expect(results).toContain("B");
expect(results).toContain("C");
});
actions/setup/js/pick_experiment.test.cjs:80
- This test uses repeated calls with real
Math.random()to prove both variants are reachable. That makes the test non-deterministic (rare CI flakes are possible). Prefer a deterministic approach by mockingMath.random()to select each tied variant at least once and asserting the results.
it("handles unknown experiment name (no counts yet) by picking randomly", () => {
const state = { counts: {} };
// Both variants must be reachable from an empty state.
// The probability that one variant never appears in 200 trials is (1/2)^200, which is negligible.
const results = new Set();
for (let i = 0; i < 200; i++) {
results.add(pickVariant("new", ["X", "Y"], state));
}
expect(results).toContain("X");
expect(results).toContain("Y");
});
actions/setup/js/pick_experiment.test.cjs:173
- Mocking
Math.random()inside the test and restoring it at the end is fragile: if an assertion throws before cleanup, the mockedMath.random()can leak into later tests. Consider using anafterEach(() => vi.restoreAllMocks())hook, or restore the specific spy in atry/finallyblock.
// Force Math.random → 0 so the first tied variant ("A") is selected.
vi.spyOn(Math, "random").mockReturnValue(0);
actions/setup/js/pick_experiment.test.cjs:203
vi.restoreAllMocks()restores every spy in the file, and combined with the per-testvi.clearAllMocks()calls this makes test isolation harder to reason about. Prefer keeping a reference to theMath.randomspy and restoring just that spy (or use a suite-levelafterEachrestore) to avoid unexpected interactions as more spies/mocks are added.
vi.restoreAllMocks();
vi.clearAllMocks();
- Files reviewed: 5/5 changed files
- Comments generated: 1
| it("selects one of the tied variants randomly when counts are equal", () => { | ||
| const state = { counts: {} }; | ||
| expect(pickVariant("f", ["A", "B"], state)).toBe("A"); | ||
| // Run many times and verify both variants are eventually selected. | ||
| // The probability that one variant never appears in 200 trials is (1/2)^200, which is negligible. | ||
| const results = new Set(); | ||
| for (let i = 0; i < 200; i++) { | ||
| results.add(pickVariant("f", ["A", "B"], state)); | ||
| } | ||
| expect(results).toContain("A"); | ||
| expect(results).toContain("B"); | ||
| }); |
🧪 Test Quality Sentinel ReportTest Quality Score: 79/100
Test Classification DetailsView All 8 Tests
Flagged Tests — Process Note
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 79/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). All 8 modified tests verify behavioral contracts. Test inflation (4.3:1) is justified by the probabilistic testing approach required for the new random tie-breaking behavior.
|
@copilot review all comments |
…andom mocks Use vi.spyOn(Math, 'random').mockReturnValueOnce() with specific values instead of running 200-iteration loops that rely on real randomness. Each test now asserts an exact variant selection for a controlled random input, eliminating any possibility of CI flakiness. - 2-variant tie: 0 → "A", 0.5 → "B" - 3-variant tie: 0 → "A", 0.4 → "B", 0.7 → "C" - empty-cache 2-variant: 0 → "X", 0.5 → "Y" Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c7876ebb-f617-451d-b588-19e0a6b7aa98 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Replaced all three probabilistic loop tests with deterministic |
Summary
When multiple experiment variants share the lowest invocation count — including the very first run when the GitHub Actions cache is empty and all counts are zero — the previous implementation always defaulted to the first declared variant. This introduced a systematic bias: on every fresh repository or new branch, the first variant was guaranteed to be selected rather than having an equal chance alongside its peers.
This PR changes the tie-breaking strategy to uniform random selection from the tied variants. The least-used round-robin balancing is preserved; only ties are now broken randomly instead of deterministically by declaration order.
Changes
actions/setup/js/pick_experiment.cjspickVariant(): rewritten to collect all tied variants in a single pass and returntied[Math.floor(Math.random() * tied.length)]. This is also O(n) instead of the previous O(2n) two-pass approach.actions/setup/js/pick_experiment.test.cjsvi.spyOn(Math, 'random').mockReturnValue(0)with explanatory comments to themain()integration tests that need a deterministic first-run outcome.docs/adr/29534-frontmatter-ab-experiments-variant-selection.mddocs/src/content/docs/guides/experiments.mdpkg/parser/schemas/main_workflow_schema.jsonexperimentsfield description updated to mention random tie-breaking.Testing
All 48 JavaScript tests pass (
npx vitest run). All Go unit tests pass (make test-unit).