Skip to content

ci: fast-fail regression matrix + preflight gate before expensive jobs#877

Merged
jrusso1020 merged 3 commits into
mainfrom
ci/fast-fail-and-preflight-gate
May 15, 2026
Merged

ci: fast-fail regression matrix + preflight gate before expensive jobs#877
jrusso1020 merged 3 commits into
mainfrom
ci/fast-fail-and-preflight-gate

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

Summary

Stop burning 60+ runner-minutes on expensive jobs (Docker regression shards, Chrome perf/parity renders, Windows-latest renders, catalog-preview renders) when the PR is already failing lint or format.

Changes

  • regression.yml — matrix fail-fast: falsetrue so the first failing render-regression shard cancels the other 9. New preflight job (lint + oxfmt --check) gates regression-shards.
  • player-perf.yml — matrix fail-fast: falsetrue. New preflight job gates perf-shards.
  • preview-regression.yml — new preflight job gates preview-parity.
  • windows-render.yml — new preflight job (runs on ubuntu-latest) gates both render-windows and test-windows (each on windows-latest).
  • catalog-previews.yml — new preflight job gates render-previews.

Each preflight installs deps, runs bun run lint, and bun run format:check. Typecheck is intentionally not in preflight — it's slow (~10 min) and any TS error will still surface in each heavy job's own bun run build step.

Net effect

  • Lint/format-broken PR: dies at ~2 min of preflight instead of waiting 30–60 min for all the heavy renders to complete.
  • Green PR: ~2 min added to critical path (preflight runs in parallel with ci.yml's own lint/format/typecheck before heavy jobs kick off).
  • First failing regression or perf shard now cancels its siblings — faster signal at the cost of "see every failing shard at once".

Test plan

  • CI runs on this PR and the preflight gates appear before the heavy jobs in the Checks tab.
  • Confirm regression and player-perf summary required-check names are unchanged (still match branch protection).
  • On a deliberate lint-breaking PR (smoke test off this branch), confirm regression-shards / perf-shards / etc. don't start.

🤖 Generated with Claude Code

Don't burn 60+ runner-minutes on regression shards, perf shards,
preview-parity, Windows renders, or catalog-preview renders when
the PR is already failing lint or format.

- regression: matrix fail-fast: false → true (first failing shard
  cancels the rest), plus a new preflight (lint + format:check)
  job gating regression-shards.
- player-perf: matrix fail-fast → true, plus preflight gate.
- preview-regression, windows-render, catalog-previews: preflight
  gate added; heavy jobs now needs: [..., preflight].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved — clean, well-scoped CI cost optimization.

What this does: Adds a ~25s preflight gate (lint + format) before all expensive CI jobs, and flips regression + perf matrices to fail-fast: true. Broken lint/format now dies in 2 min instead of burning 30-60 min of runner time.

Verified:

  • Preflight pattern is consistent across all 5 workflows (checkout → setup-bun → setup-node → install → lint → format:check)
  • All action SHAs are pinned ✓
  • needs chains are correct — preflight gates the heavy jobs, and inherits the same if conditions from the changes filter so skip-propagation works properly
  • windows-render.yml preflight correctly uses ref: ${{ github.event.inputs.ref }} matching the existing workflow_dispatch checkout pattern
  • CI on this PR confirms the behavior: 4 preflights passed (~25s each), regression preflight+shards correctly skipped (no code changes detected)

One observation (not blocking): The 5 identical preflight job definitions could eventually be extracted into a reusable workflow (workflow_call), but that's a separate follow-up — the duplication is fine at this scale and easier to audit.

fail-fast: true tradeoff is reasonable: faster signal on the first failure is more valuable than seeing all failures at once, especially for pixel-regression tests where the root cause is usually shared.

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid follow-up to #855 — preflight gate is wired correctly and the empirical proof is in this PR's own run (every preflight finished in 21–28s, well under the 5-min budget). Trade-offs (fail-fast: true, missing fail-shard visibility) are called out honestly in the PR body.

Calibrated strengths

  • windows-render.yml:60-74 — running the Windows-render preflight on ubuntu-latest is the right call; lint/format don't need a Windows runner and the cost asymmetry is huge.
  • windows-render.yml:65-67 — preserving ref: ${{ github.event.inputs.ref }} on the preflight checkout matches the rest of the file's workflow_dispatch handling; easy detail to miss.
  • Summary-job semantics survive cleanly: preflight failure → shards result: skipped → existing if [ "${{ needs.<shards>.result }}" != "success" ]; then exit 1 path on regression (regression.yml:130) and player-perf (player-perf.yml:145) still fails the required check. Gate preserved without touching the summary contract — branch protection names stay intact (PR-body test plan #2 is satisfied by construction).
  • Heavy-job needs: updated to [changes, preflight] everywhere (regression.yml:38, player-perf.yml:43→needs, preview-regression.yml:46→needs, windows-render.yml:60,365) — no site missed.

Findings

important — bun install cache is missing on every new preflight. Each of the 5 preflights does a cold bun install --frozen-lockfile. Today the bun cache lives on the runner's local FS, gone after the job. oven-sh/setup-bun does not auto-cache — needs an explicit actions/cache keyed on bun.lock, or the v2-action's caching opt-in if available. Each preflight today: ~25s total, of which install is a meaningful slice. Across 5 preflights that's ~30-60s of redundant network/disk. Same nit raised on #855 cold-cache contention; would land it here too rather than carry across more workflows. Concrete fix:

- uses: actions/cache@v4
  with:
    path: ~/.bun/install/cache
    key: bun-${{ runner.os }}-${{ hashFiles('bun.lock') }}

important — fail-fast: true on regression-shards amplifies flakes. The 10 shards are bin-packed by duration (regression.yml:42-65) but heterogeneous in surface (HDR / render-compat / styles-a..g / fast). A flake in ONE Docker-render shard now cancels ALL nine others — author has to rerun the entire matrix to see whether the failure was the flake or whether another shard would have failed too. On hf#855 we already flagged regression flakiness as a real risk. Worth instrumenting: track regression-shards rerun rate over the next 1-2 weeks; if it spikes, revisit. Not a blocker because (a) the trade-off is honestly framed in the PR body, (b) flake amplification is recoverable via rerun, vs. the wasted-runner-minutes problem is not.

nit — preflight redundancy with ci.yml's Lint + Format. ci.yml:69,86 already runs Lint and Format on every PR. Preflight re-runs both 5 more times. ~25s × 5 ≈ 2 min of redundant runner time per green PR. The reason this is unavoidable today is cross-workflow needs: doesn't exist on GHA — can't gate a job in workflow A on a job in workflow B without polling. Leaving as nit because the alternatives (collapse workflows, polling via workflow_run, repurpose a single shared preflight workflow with workflow_call) are all bigger refactors than this PR's scope. Worth a follow-up ticket if total preflight runner-min becomes a budget concern.

nit — typecheck deferred to per-job build is the right call. PR body explains why and the reasoning checks out: typecheck is ~10 min, would dominate the preflight critical path, and a TS error still surfaces in each heavy job's own bun run build step (which can't compile broken types). Just noting agreement here.

Verdict: APPROVE
Reasoning: Gate is structurally sound and CI-proven on this PR; the perf wins (preflight ≈25s, blocks 30-60 min of heavy jobs on lint-broken PRs) outweigh the redundancy and fail-fast trade-offs which are honestly enumerated. The two important findings are both observability/efficiency follow-ups, not correctness issues.

— Vai

jrusso1020 and others added 2 commits May 15, 2026 22:36
Each of the 5 preflight gates was doing a cold bun install, costing
~30-60s of redundant install time per PR. Cache the install dir
keyed on bun.lock so subsequent preflights (and reruns) hit warm.

Addresses Vai's review on #877.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same 5-step preflight body (setup-bun, setup-node, cache, install,
lint, format:check) was duplicated across 5 workflows. Move it to
.github/actions/preflight/action.yml so future tweaks (adding
typecheck, swapping the cache key, etc.) are a single-file change.

Net diff: +33 / -65.

Addresses the "shared preflight" follow-up Vai called out on #877.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jrusso1020 jrusso1020 merged commit fb90025 into main May 15, 2026
39 checks passed
@jrusso1020 jrusso1020 deleted the ci/fast-fail-and-preflight-gate branch May 15, 2026 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants