fix(ci): pin chrome-headless-shell + clamp PSNR checkpoint to a valid frame#926
Open
jrusso1020 wants to merge 2 commits into
Open
fix(ci): pin chrome-headless-shell + clamp PSNR checkpoint to a valid frame#926jrusso1020 wants to merge 2 commits into
jrusso1020 wants to merge 2 commits into
Conversation
… frame Two narrow fixes to keep the regression suite green and reproducible. Stale baselines from the sub-composition refactor (PR #918) are being regenerated separately in PR #925; this PR is just the structural fixes that PR can't make on its own. 1. **Pin `chrome-headless-shell` in `Dockerfile.test`** to `148.0.7778.167` instead of `@stable`. `@stable` is a moving tag; every Chrome stable promotion shifts pixel output enough to fail PSNR on the golden baselines, so the regression suite silently broke whenever Docker.test rebuilt against a freshly-promoted stable. Pinning to the version `@stable` currently resolves to (matching what main's regenerated baselines were captured under) makes Chrome bumps an explicit, batched-with-baseline-regen action. Comment on the `RUN` line spells out the bump procedure. 2. **Clamp the last PSNR checkpoint to a frame the video stream actually contains.** `runTestSuite` samples 100 checkpoints across `min(rendered, snapshot)` container duration. Container duration includes audio padding past the last video frame — many-cuts is 5.654s container vs 5.6s of video at 30fps = 168 frames. At i=99 the raw container duration mapped to time 5.59746s → frame index 168 (round(5.59746 × 30)), one past the last frame the stream contains. ffmpeg's `psnr` filter emits no `average:` line for a non-existent frame, so the harness crashed with `Unable to parse PSNR output at 5.59746s` — pre-existing on plain `origin/main`, which PR #918 admin-merged through on shard-2. Miguel's regen via `--update` didn't catch it because `--update` only writes the snapshot; it doesn't validate. Subtracting one frame interval from the sampling duration guarantees the last checkpoint always lands on a real frame. Verified locally inside `Dockerfile.test`: bun run --cwd packages/producer docker:build:test bun run --cwd packages/producer docker:test many-cuts # ✅ green bun run --cwd packages/producer docker:test style-3-prod \ style-5-prod sub-composition-video # ✅ green
4 tasks
miguel-heygen
approved these changes
May 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two narrow fixes to keep the regression suite green and reproducible. Stale baselines from the sub-comp refactor are being regenerated separately in #925; this PR is just the structural fixes that #925 can't make on its own.
chrome-headless-shellinDockerfile.testto148.0.7778.167instead of@stable.many-cuts.Why
Chrome pin
@stableis a moving tag. Every Chrome stable promotion shifts pixel output enough to fail PSNR on the golden baselines, so the regression suite silently broke whenever Docker.test rebuilt against a freshly-promoted stable. Pinning to the version@stablecurrently resolves to (matching what main's regenerated baselines were captured under) makes Chrome bumps an explicit, batched-with-baseline-regen action. The comment on theRUNline spells out the bump procedure.PSNR-parse crash on
many-cutsrunTestSuitesamples 100 checkpoints acrossmin(rendered, snapshot)container duration. Container duration includes audio padding past the last video frame —many-cutsis 5.654s container vs 5.6s of video at 30fps = 168 frames. At i=99 the raw container duration mapped to time 5.59746s → frame index 168 (round(5.59746 × 30)), one past the last frame the stream contains. ffmpeg'spsnrfilter emits noaverage:line for a non-existent frame, so the harness crashed withUnable to parse PSNR output at 5.59746s— pre-existing on plainorigin/main(#918 admin-merged through this same failure on shard-2). Miguel's regen via--updatedoesn't catch it because--updateonly writes the snapshot; it doesn't validate.Subtracting one frame interval from the sampling duration guarantees the last checkpoint always lands on a real frame.
How
Dockerfile.test:chrome-headless-shell@stable→chrome-headless-shell@148.0.7778.167(+ a comment documenting the bump procedure).packages/producer/src/regression-harness.ts: introducesampleDuration = max(0, videoDuration - 1/fps)and use it in place ofvideoDurationwhen computing the per-checkpointtime. Also reuses the already-resolvedfpsvariable inside the loop (was being recomputed viafpsToNumber(...)on every call topsnrAtCheckpoint).Test plan
Local Docker reproduction:
many-cutsno longer crashes withUnable to parse PSNR outputCoordination with #925
#925 (Miguel) regenerates the
style-1-prodandstyle-12-prodbaselines that drifted after #918's compiler refactor. That's content-level regen; this PR is structural. They're independent and either can land first — the other will then merge cleanly. Closes #919 (which had both changes plus baselines that would conflict with #925).