Skip to content

v1.4.3 — click-anchored narration + always-release pan-zoom

Choose a tag to compare

@marcushyett marcushyett released this 01 May 06:39
· 23 commits to main since this release

Two structural fixes for the "I can't follow what's happening" feedback on the videos.

Narration — click-anchored windows

The body is now partitioned into windows by the agent's clicks (with very-close clicks merged so we don't get sub-second slivers). Each window has a fixed start, end, duration. The narrator writes ONE beat per window: they pick the words, the editor pins the timing to the click. Because clicks are when meaningful state changes happen on screen, narration anchored to a click is automatically anchored to the moment the viewer cares about. There's no possible drift between voice and visuals.

The narrator prompt was rewritten with much stronger guidance:

  • Audience framing: "imagine the viewer is a teammate who has never seen this codebase. They have ~60 seconds and need to leave understanding what was built, why it matters, how it works, and whether the tests prove it works."
  • Plan the story arc first: read PR body, map goals to windows, decide setup / execute / confirm for each window, THEN write beats.
  • Name UI elements specifically in the developer's own vocabulary ("the new Bulk Archive button I added at the top") not generically ("a button").
  • Concrete good vs bad examples for setup beats, confirm beats, transitions, and silent-investigation moments.
  • Beats may be empty for windows with nothing meaningful to say — silent gap beats 4-word filler.

Goal-aware structuring: the narrator gets the test plan goals as the demo's chapter list and is told to march through them in order, each chapter spanning multiple windows.

Plus: fixed an upstream issue where the agent recorded ref=e123 (Playwright's internal handle) instead of the human-readable element description, so the narrator can now actually say "clicked Sign In" instead of "clicked ref=e123".

Pan-zoom — always release between clicks

The previous "ride zoom between clicks" mode held a 2× zoom on the last-clicked region for up to 12 seconds. If anything happened far from that region during the hold — a toast in the corner, a counter at the top, items reordering elsewhere — it was clipped out of frame.

The new cycle is per-click and always releases:

  1. APPROACH (0.5s before click): ease zoom 1.0 → 2.0
  2. click fires
  3. HOLD (0.7s): stay at 2.0 on click site so the viewer reads the immediate reaction
  4. RELEASE (0.5s): ease zoom 2.0 → 1.0, neutral framing
  5. neutral 1.0 until next APPROACH

During the gap between clicks the viewer always sees the full page after the hold releases — any visible change anywhere on screen is in frame by construction.

When approach overlaps previous click's hold/release (very tight click pairs), approach wins so the camera arrives on time.

Versions

  • CLI: 1.4.3
  • Plugin: 1.4.3

v1 tag force-updated.