ci: fast-fail regression matrix + preflight gate before expensive jobs#877
Conversation
Don't burn 60+ runner-minutes on regression shards, perf shards, preview-parity, Windows renders, or catalog-preview renders when the PR is already failing lint or format. - regression: matrix fail-fast: false → true (first failing shard cancels the rest), plus a new preflight (lint + format:check) job gating regression-shards. - player-perf: matrix fail-fast → true, plus preflight gate. - preview-regression, windows-render, catalog-previews: preflight gate added; heavy jobs now needs: [..., preflight]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
miguel-heygen
left a comment
There was a problem hiding this comment.
Approved — clean, well-scoped CI cost optimization.
What this does: Adds a ~25s preflight gate (lint + format) before all expensive CI jobs, and flips regression + perf matrices to fail-fast: true. Broken lint/format now dies in 2 min instead of burning 30-60 min of runner time.
Verified:
- Preflight pattern is consistent across all 5 workflows (checkout → setup-bun → setup-node → install → lint → format:check)
- All action SHAs are pinned ✓
needschains are correct — preflight gates the heavy jobs, and inherits the sameifconditions from thechangesfilter so skip-propagation works properlywindows-render.ymlpreflight correctly usesref: ${{ github.event.inputs.ref }}matching the existingworkflow_dispatchcheckout pattern- CI on this PR confirms the behavior: 4 preflights passed (~25s each), regression preflight+shards correctly skipped (no code changes detected)
One observation (not blocking): The 5 identical preflight job definitions could eventually be extracted into a reusable workflow (workflow_call), but that's a separate follow-up — the duplication is fine at this scale and easier to audit.
fail-fast: true tradeoff is reasonable: faster signal on the first failure is more valuable than seeing all failures at once, especially for pixel-regression tests where the root cause is usually shared.
vanceingalls
left a comment
There was a problem hiding this comment.
Solid follow-up to #855 — preflight gate is wired correctly and the empirical proof is in this PR's own run (every preflight finished in 21–28s, well under the 5-min budget). Trade-offs (fail-fast: true, missing fail-shard visibility) are called out honestly in the PR body.
Calibrated strengths
windows-render.yml:60-74— running the Windows-render preflight onubuntu-latestis the right call; lint/format don't need a Windows runner and the cost asymmetry is huge.windows-render.yml:65-67— preservingref: ${{ github.event.inputs.ref }}on the preflight checkout matches the rest of the file'sworkflow_dispatchhandling; easy detail to miss.- Summary-job semantics survive cleanly: preflight failure → shards
result: skipped→ existingif [ "${{ needs.<shards>.result }}" != "success" ]; then exit 1path onregression(regression.yml:130) andplayer-perf(player-perf.yml:145) still fails the required check. Gate preserved without touching the summary contract — branch protection names stay intact (PR-body test plan #2 is satisfied by construction). - Heavy-job
needs:updated to[changes, preflight]everywhere (regression.yml:38,player-perf.yml:43→needs,preview-regression.yml:46→needs,windows-render.yml:60,365) — no site missed.
Findings
important — bun install cache is missing on every new preflight. Each of the 5 preflights does a cold bun install --frozen-lockfile. Today the bun cache lives on the runner's local FS, gone after the job. oven-sh/setup-bun does not auto-cache — needs an explicit actions/cache keyed on bun.lock, or the v2-action's caching opt-in if available. Each preflight today: ~25s total, of which install is a meaningful slice. Across 5 preflights that's ~30-60s of redundant network/disk. Same nit raised on #855 cold-cache contention; would land it here too rather than carry across more workflows. Concrete fix:
- uses: actions/cache@v4
with:
path: ~/.bun/install/cache
key: bun-${{ runner.os }}-${{ hashFiles('bun.lock') }}important — fail-fast: true on regression-shards amplifies flakes. The 10 shards are bin-packed by duration (regression.yml:42-65) but heterogeneous in surface (HDR / render-compat / styles-a..g / fast). A flake in ONE Docker-render shard now cancels ALL nine others — author has to rerun the entire matrix to see whether the failure was the flake or whether another shard would have failed too. On hf#855 we already flagged regression flakiness as a real risk. Worth instrumenting: track regression-shards rerun rate over the next 1-2 weeks; if it spikes, revisit. Not a blocker because (a) the trade-off is honestly framed in the PR body, (b) flake amplification is recoverable via rerun, vs. the wasted-runner-minutes problem is not.
nit — preflight redundancy with ci.yml's Lint + Format. ci.yml:69,86 already runs Lint and Format on every PR. Preflight re-runs both 5 more times. ~25s × 5 ≈ 2 min of redundant runner time per green PR. The reason this is unavoidable today is cross-workflow needs: doesn't exist on GHA — can't gate a job in workflow A on a job in workflow B without polling. Leaving as nit because the alternatives (collapse workflows, polling via workflow_run, repurpose a single shared preflight workflow with workflow_call) are all bigger refactors than this PR's scope. Worth a follow-up ticket if total preflight runner-min becomes a budget concern.
nit — typecheck deferred to per-job build is the right call. PR body explains why and the reasoning checks out: typecheck is ~10 min, would dominate the preflight critical path, and a TS error still surfaces in each heavy job's own bun run build step (which can't compile broken types). Just noting agreement here.
Verdict: APPROVE
Reasoning: Gate is structurally sound and CI-proven on this PR; the perf wins (preflight ≈25s, blocks 30-60 min of heavy jobs on lint-broken PRs) outweigh the redundancy and fail-fast trade-offs which are honestly enumerated. The two important findings are both observability/efficiency follow-ups, not correctness issues.
— Vai
Each of the 5 preflight gates was doing a cold bun install, costing ~30-60s of redundant install time per PR. Cache the install dir keyed on bun.lock so subsequent preflights (and reruns) hit warm. Addresses Vai's review on #877. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same 5-step preflight body (setup-bun, setup-node, cache, install, lint, format:check) was duplicated across 5 workflows. Move it to .github/actions/preflight/action.yml so future tweaks (adding typecheck, swapping the cache key, etc.) are a single-file change. Net diff: +33 / -65. Addresses the "shared preflight" follow-up Vai called out on #877. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Stop burning 60+ runner-minutes on expensive jobs (Docker regression shards, Chrome perf/parity renders, Windows-latest renders, catalog-preview renders) when the PR is already failing lint or format.
Changes
regression.yml— matrixfail-fast: false→trueso the first failing render-regression shard cancels the other 9. Newpreflightjob (lint +oxfmt --check) gatesregression-shards.player-perf.yml— matrixfail-fast: false→true. Newpreflightjob gatesperf-shards.preview-regression.yml— newpreflightjob gatespreview-parity.windows-render.yml— newpreflightjob (runs onubuntu-latest) gates bothrender-windowsandtest-windows(each onwindows-latest).catalog-previews.yml— newpreflightjob gatesrender-previews.Each preflight installs deps, runs
bun run lint, andbun run format:check. Typecheck is intentionally not in preflight — it's slow (~10 min) and any TS error will still surface in each heavy job's ownbun run buildstep.Net effect
ci.yml's own lint/format/typecheck before heavy jobs kick off).Test plan
regressionandplayer-perfsummary required-check names are unchanged (still match branch protection).regression-shards/perf-shards/ etc. don't start.🤖 Generated with Claude Code