test(smoke): screenshot-regression mode (Phase 3)#1051
Merged
Aaronontheweb merged 4 commits intoMay 18, 2026
Conversation
Adds a `screenshots` profile to run-smoke.sh that captures PNGs of stable TUI states via VHS Screenshot directives and compares them byte-for-byte against committed baselines. - screenshot-preamble.tape — determinism-pinned preamble (CursorBlink false, pinned VHS theme) so captures are byte-stable across runs. - tests/smoke/tapes/screenshots/ — capture tapes for 6 frames: help, wizard provider-picker / security-posture / identity, provider-manager empty list / add-name. - run-smoke.sh `screenshots` mode — provisions like `light`, runs the capture tapes, cmp's each PNG against tests/smoke/screenshots/ <frame>.approved.png; on a missing baseline or mismatch it saves the actual (and an ImageMagick diff) for review and fails. - run-native-tape.sh — honors TAPE_PREAMBLE / TAPE_BODY_DIR so the screenshots mode can point it at the screenshot preamble + tapes. - install-vhs.sh — enabled the pinned VHS SHA256 (was SKIP_VERIFY); byte-stable screenshots require a pinned VHS binary. - smoke.yml — new "Screenshot Regression (Linux)" job. Baseline PNGs are committed separately, from a reviewed CI capture run — the first run has no baselines and fails by design, uploading the candidate PNGs for human review.
The first screenshot run captured 2 of 6 frames mid-render — the wizard provider-picker caught only the header, the identity step was blank, and the provider-manager "empty" frame caught the post-Down highlight. VHS Screenshot can fire before the TUI finishes painting and can let the next keystroke leak into the capture. Bracket every Screenshot with `Sleep 1s` (settle the frame before, isolate it from the next input after). A settled static screen captured at +1s is still deterministic; the no-Sleep rule remains for flow-tape step sync.
Baselines captured from the screenshot-regression CI run on this branch and reviewed frame-by-frame: help usage; wizard provider-picker / security-posture / identity; provider-manager empty-list / add-name. With these committed, the Screenshot Regression job compares fresh captures against them instead of failing on a missing baseline.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 of the smoke-testing restructure. Adds a
screenshotsprofile to the native harness that captures PNGs of stable TUI states via VHSScreenshotdirectives and compares them byte-for-byte against committed baselines. Harness-native — pure shell, no .NET test project, nothing in the VS/Rider test explorer.screenshot-preamble.tape— determinism-pinned preamble (CursorBlink false, pinned VHS theme).tests/smoke/tapes/screenshots/— capture tapes for 6 frames:--help; wizard provider-picker / security-posture / identity; provider-manager empty-list / add-name.run-smoke.sh screenshots— provisions likelight, runs the capture tapes,cmps each PNG againsttests/smoke/screenshots/<frame>.approved.png. Missing baseline or mismatch → saves the actual (+ ImageMagick diff) for review and fails.install-vhs.sh— enabled the pinned VHS SHA256 (wasSKIP_VERIFY); byte-stable screenshots require a pinned VHS binary.smoke.yml— newScreenshot Regression (Linux)job.Expected first-run behavior
Screenshot Regression (Linux)check will fail on this PR's first run — there are no baseline PNGs yet. That run uploads the captured candidates as thesmoke-screenshots-*artifact. I'll review those, commit the approved baselines to this branch astests/smoke/screenshots/<frame>.approved.png, and the check will go green.Screenshot Regressionis not a required check, so the red status does not block.Test plan
Screenshot Regressioncaptures + uploads 6 candidate PNGsNative Smoke (Linux)+pr_validationstill green