Skip to content

fix(bench): anchor KNOWN_REGRESSIONS staleness to recorded baseline (#1703)#1704

Merged
carlos-alm merged 1 commit into
mainfrom
fix/staleness-anchor-baseline-1703
Jun 24, 2026
Merged

fix(bench): anchor KNOWN_REGRESSIONS staleness to recorded baseline (#1703)#1704
carlos-alm merged 1 commit into
mainfrom
fix/staleness-anchor-baseline-1703

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Problem

The Pre-publish benchmark gate's "KNOWN_REGRESSIONS entries are not stale" test measured each exemption's age against package.json. But package.json is bumped at release time, while the benchmark baseline is only recorded after publish (via the benchmark.yml workflow_run PR). During that window the package version races ahead of the recorded baseline.

This bit PR #1701 (release 3.15.0): the recorded baseline was stuck at 3.13.0, so the 6 still-live 3.13.0:* exemptions — the actual baseline for the dev-vs-baseline comparison — were flagged as ">1 minor behind 3.15.0" and failed the gate, even though they were not yet dead.

Fix

Anchor staleness to the latest recorded benchmark version (computed from the committed build/query/incremental history) instead of package.json. An exemption is only flagged once a newer baseline that actually supersedes it has landed — which is exactly when it becomes dead weight.

  • Extracted two pure helpers: latestRecordedVersion(histories) (highest non-dev, non-SKIP_VERSIONS release across all history files) and findStaleEntries(entries, anchorVersion).
  • The staleness test now resolves the baseline from history; if no history is recorded it no-ops (that failure mode is already covered by the has at least one engine to compare tests).
  • Added always-on unit tests (not gated behind RUN_REGRESSION_GUARD) covering the anchor selection, dev/SKIP_VERSIONS exclusion, semver (not lexical) ordering, and the exact Release PRs trip the KNOWN_REGRESSIONS staleness guard when version jumps >1 minor ahead of the benchmark baseline #1703 case: a 3.13.0 entry stays live when the baseline is 3.13.0 but is correctly flagged once the baseline reaches 3.15.0.

The guard still forces pruning — it just fires when a superseding baseline lands rather than prematurely at version-bump time.

Verification

  • RUN_REGRESSION_GUARD=1 full suite + unit tests: 25 passed.
  • Normal npm test path (gated suite skipped): 8 unit tests run, 17 skipped — the anchor logic is now covered even when the data-driven guard isn't.
  • tsc --noEmit: clean. Biome: clean.

Test-only change; no Rust mirror needed.

Closes #1703

The 'KNOWN_REGRESSIONS entries are not stale' guard measured each entry's
age against package.json, which is bumped at release time before the
post-publish benchmark-recording PR lands. When the package version jumped
>1 minor ahead of the latest recorded baseline (e.g. 3.15.0 while data was
stuck at 3.13.0), still-live exemptions keyed to the current baseline were
wrongly flagged stale and failed the pre-publish benchmark gate.

Anchor staleness to the latest recorded benchmark version instead, computed
from the committed history. An exemption is only flagged once a newer
baseline that supersedes it has actually landed. Extract the anchor and
stale-detection into pure helpers (latestRecordedVersion, findStaleEntries)
and add always-on unit tests covering the anchoring, dev/SKIP_VERSIONS
exclusion, and the exact #1703 regression case.

Closes #1703
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a false-positive in the pre-publish benchmark gate by changing the staleness anchor for KNOWN_REGRESSIONS entries from package.json to the latest version present in the committed benchmark history files. This prevents still-live exemptions from being flagged during the release window, when package.json is bumped before the post-publish benchmark-recording PR lands.

  • Extracted two pure helpers — latestRecordedVersion (highest non-dev, non-SKIP_VERSIONS release across all history files using proper semver numeric comparison) and findStaleEntries (compares each exemption's version prefix against the anchor via the existing minorGap helper) — replacing the inline package.json read in the staleness test.
  • Added 8 always-on unit tests (not gated behind RUN_REGRESSION_GUARD) that cover anchor selection, dev/SKIP_VERSIONS exclusion, lexical-vs-semver ordering, and the exact Release PRs trip the KNOWN_REGRESSIONS staleness guard when version jumps >1 minor ahead of the benchmark baseline #1703 failure shape.
  • When no baseline history is recorded, the staleness test now no-ops gracefully, with a comment noting that the missing-history failure mode is already covered by separate has at least one engine to compare assertions.

Confidence Score: 5/5

Test-only change that fixes a false-positive gate failure; no production code or benchmark data is modified.

The two new helper functions are straightforward, correctly use the existing parseSemver and minorGap utilities, and preserve all prior behavior. The unit tests are unconditionally run and cover the exact failure scenario from #1703 as well as important edge cases.

No files require special attention.

Important Files Changed

Filename Overview
tests/benchmarks/regression-guard.test.ts Replaces package.json-anchored staleness check with latestRecordedVersion() computed from committed history; adds two pure helper functions and 8 unconditional unit tests covering the #1703 regression. Logic, tests, and documentation are all sound.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[KNOWN_REGRESSIONS staleness test runs] --> B{baselineVersion = latestRecordedVersion}
    B --> C{baselineVersion === null?}
    C -- Yes --> D[No-op: skip staleness check]
    C -- No --> E[findStaleEntries KNOWN_REGRESSIONS, baselineVersion]
    E --> F{For each entry with version: prefix}
    F --> G[minorGap entryVersion, baselineVersion]
    G --> H{gap > 1?}
    H -- Yes --> I[Mark as stale]
    H -- No --> J[Keep as live]
    I --> K{any stale entries?}
    J --> K
    K -- Yes --> L[Test fails]
    K -- No --> M[Test passes]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[KNOWN_REGRESSIONS staleness test runs] --> B{baselineVersion = latestRecordedVersion}
    B --> C{baselineVersion === null?}
    C -- Yes --> D[No-op: skip staleness check]
    C -- No --> E[findStaleEntries KNOWN_REGRESSIONS, baselineVersion]
    E --> F{For each entry with version: prefix}
    F --> G[minorGap entryVersion, baselineVersion]
    G --> H{gap > 1?}
    H -- Yes --> I[Mark as stale]
    H -- No --> J[Keep as live]
    I --> K{any stale entries?}
    J --> K
    K -- Yes --> L[Test fails]
    K -- No --> M[Test passes]
Loading

Reviews (1): Last reviewed commit: "fix(bench): anchor KNOWN_REGRESSIONS sta..." | Re-trigger Greptile

@carlos-alm carlos-alm merged commit fb61cf2 into main Jun 24, 2026
25 checks passed
@carlos-alm carlos-alm deleted the fix/staleness-anchor-baseline-1703 branch June 24, 2026 23:28
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Release PRs trip the KNOWN_REGRESSIONS staleness guard when version jumps >1 minor ahead of the benchmark baseline

1 participant