Skip to content

test(smoke): screenshot-regression mode (Phase 3)#1051

Merged
Aaronontheweb merged 4 commits into
netclaw-dev:devfrom
Aaronontheweb:smoke-phase3-screenshots
May 18, 2026
Merged

test(smoke): screenshot-regression mode (Phase 3)#1051
Aaronontheweb merged 4 commits into
netclaw-dev:devfrom
Aaronontheweb:smoke-phase3-screenshots

Conversation

@Aaronontheweb
Copy link
Copy Markdown
Collaborator

Summary

Phase 3 of the smoke-testing restructure. Adds a screenshots profile to the native harness that captures PNGs of stable TUI states via VHS Screenshot directives and compares them byte-for-byte against committed baselines. Harness-native — pure shell, no .NET test project, nothing in the VS/Rider test explorer.

  • screenshot-preamble.tape — determinism-pinned preamble (CursorBlink false, pinned VHS theme).
  • tests/smoke/tapes/screenshots/ — capture tapes for 6 frames: --help; wizard provider-picker / security-posture / identity; provider-manager empty-list / add-name.
  • run-smoke.sh screenshots — provisions like light, runs the capture tapes, cmps each PNG against tests/smoke/screenshots/<frame>.approved.png. Missing baseline or mismatch → saves the actual (+ ImageMagick diff) for review and fails.
  • install-vhs.sh — enabled the pinned VHS SHA256 (was SKIP_VERIFY); byte-stable screenshots require a pinned VHS binary.
  • smoke.yml — new Screenshot Regression (Linux) job.

Expected first-run behavior

⚠️ The Screenshot Regression (Linux) check will fail on this PR's first run — there are no baseline PNGs yet. That run uploads the captured candidates as the smoke-screenshots-* artifact. I'll review those, commit the approved baselines to this branch as tests/smoke/screenshots/<frame>.approved.png, and the check will go green. Screenshot Regression is not a required check, so the red status does not block.

Test plan

  • First run: Screenshot Regression captures + uploads 6 candidate PNGs
  • Baselines reviewed and committed; second run green
  • Native Smoke (Linux) + pr_validation still green

Adds a `screenshots` profile to run-smoke.sh that captures PNGs of
stable TUI states via VHS Screenshot directives and compares them
byte-for-byte against committed baselines.

- screenshot-preamble.tape — determinism-pinned preamble (CursorBlink
  false, pinned VHS theme) so captures are byte-stable across runs.
- tests/smoke/tapes/screenshots/ — capture tapes for 6 frames: help,
  wizard provider-picker / security-posture / identity, provider-manager
  empty list / add-name.
- run-smoke.sh `screenshots` mode — provisions like `light`, runs the
  capture tapes, cmp's each PNG against tests/smoke/screenshots/
  <frame>.approved.png; on a missing baseline or mismatch it saves the
  actual (and an ImageMagick diff) for review and fails.
- run-native-tape.sh — honors TAPE_PREAMBLE / TAPE_BODY_DIR so the
  screenshots mode can point it at the screenshot preamble + tapes.
- install-vhs.sh — enabled the pinned VHS SHA256 (was SKIP_VERIFY);
  byte-stable screenshots require a pinned VHS binary.
- smoke.yml — new "Screenshot Regression (Linux)" job.

Baseline PNGs are committed separately, from a reviewed CI capture run —
the first run has no baselines and fails by design, uploading the
candidate PNGs for human review.
@Aaronontheweb Aaronontheweb added the tests All issues related to testing, quality assurance, and smoke testing. label May 18, 2026
The first screenshot run captured 2 of 6 frames mid-render — the wizard
provider-picker caught only the header, the identity step was blank, and
the provider-manager "empty" frame caught the post-Down highlight. VHS
Screenshot can fire before the TUI finishes painting and can let the
next keystroke leak into the capture.

Bracket every Screenshot with `Sleep 1s` (settle the frame before, isolate
it from the next input after). A settled static screen captured at +1s is
still deterministic; the no-Sleep rule remains for flow-tape step sync.
Baselines captured from the screenshot-regression CI run on this branch
and reviewed frame-by-frame: help usage; wizard provider-picker /
security-posture / identity; provider-manager empty-list / add-name.

With these committed, the Screenshot Regression job compares fresh
captures against them instead of failing on a missing baseline.
@Aaronontheweb Aaronontheweb merged commit c48bb60 into netclaw-dev:dev May 18, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests All issues related to testing, quality assurance, and smoke testing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant