Releases: marcushyett/tik-test
tik-test review — PR #34
Auto-generated video review of PR #34 in marcushyett/tik-test.
tik-test review — PR #34
Auto-generated video review of PR #34 in marcushyett/tik-test.
tik-test review — PR #34
Auto-generated video review of PR #34 in marcushyett/tik-test.
tik-test review — PR #34
Auto-generated video review of PR #34 in marcushyett/tik-test.
v1.4.7 — voice-paced captions + verification stamps + demo discipline
Three viewer-facing fixes bundled.
1. Captions now track voice exactly
WordCaption was clamping each word to a 0.28s minimum frame budget. When narration sped up (up to 1.5×) to fit a click window, captions visibly lagged the voice — every page would finish a few hundred ms after the spoken line. Fixes:
- Drop the per-word floor when voice is provided; let the highlight track the voice exactly.
- Cap the effective caption span at the segment duration so a sequence cut-off can't make captions outrun the on-screen voice.
- Pick the latest started caption page over an earlier still-fading one — a slow page-out can no longer visually block a page that's already started speaking.
2. Verification stamps over the video
New animated overlay drawn at the EXACT moment the agent declared each goal's outcome — a green check (pass), red cross (fail), or grey dash (skipped) with the goal's shortLabel below it. 1.8s lifetime, de-overlaps when goals end close together. Punctuates the recording so viewers see "this thing was just verified" without having to wait for the outro checklist.
Hand-drawn SVG (no emoji) with stroke-dash drawing animation, spring scale-in, and label slide-up.
3. Demo discipline — kills the "click and re-check the same thing" loop
The agent was clicking + screenshotting + re-snapshotting + re-clicking the same surface multiple times per goal. From the viewer's perspective, that's dead video — the same thing happening over and over with no forward progress.
Both fast and meticulous prompts now open with an explicit:
ONE GOAL = ONE DEMO BEAT — ACT (1–3 user actions) → VERIFY (one check) → OUTCOME → STOP. The first verification IS the verdict. Once you've seen the success state once, your next message is OUTCOME. No 'just to be sure' second screenshot.
The stuck-loop tripwire now fires on the second occurrence (was third) and explicitly names the cases: re-screenshotting / re-snapshotting / re-clicking / re-typing a surface you already touched in this goal.
Versions kept in sync
package.json→ 1.4.7plugin/.claude-plugin/plugin.json→ 1.4.7v1tag force-pushed to point at this commit
v1.4.6 — missing-content protocol + realistic desktop viewport
Two fixes bundled.
1. Missing-content protocol — kills the most common agent loop
When the agent took an action and the verification snapshot didn't show what it expected, it would loop: retry, screenshot, retry, screenshot. Especially bad on chat replies that scroll off, spinners that finish too fast, toasts that fade.
Both fast and meticulous prompts (src/goal-agent.ts) now have a mandatory rule: if the snapshot doesn't show the expected content, the next tool call MUST be __tikHistory.find({ text, sinceMs: 30000 }) — not a retry, not another screenshot.
The 30-second mutation buffer (added in v1.4.5) disambiguates the failure for the agent — it doesn't have to guess whether the element was ephemeral, scrolled off, or never rendered. Buffer hits → pass with "verified programmatically (time-travel): …". Buffer empty → real failure.
Mandatory for any goal whose success criterion mentions a chat/message/reply, a notification/toast/banner, a loading/spinner state, an animation, or any text appearing on a feed-style or scrolling surface.
2. Realistic 2026 desktop viewport (1920×1080)
Default desktop fallback bumped from 1280×800 → 1920×1080 to match the actual resolution most US users have in 2026 (StatCounter: ~21.5% share, top resolution). The snapWidth function gains a 1920 wide-desktop bucket so consumers who request 1920 keep the resolution end-to-end (1920 → 1080 canvas at 9/16 — clean repeating fraction). Aligned across runner.ts, config.ts, plan.ts, single-video-editor.ts, remotion/Root.tsx.
Consumers who explicitly set viewport: 1280x800 (or any other) in their tiktest.md are unaffected — only the unspecified default changes.
Versions kept in sync
package.json→ 1.4.6plugin/.claude-plugin/plugin.json→ 1.4.6v1tag force-pushed to point at this commit
v1.4.5 — page-side time-travel buffer for ephemeral UI
What changed
Agent loops on ephemeral state are the most expensive failure mode tik-test had: a chat reply scrolls out, a spinner completes, a toast fades — the agent takes a snapshot, sees nothing, retries, sees nothing again, eventually gives up or marks the feature broken when it actually worked.
Fixed permanently with one small page script:
window.__tikHistory — 30-second time-travel buffer
Records every meaningful DOM mutation (added / removed / text-changed / attr-changed) with text, bbox, timestamp, tag, testid, containerTestid, aria-label, role. Three queries:
__tikHistory.find({ text, tag, testid, containerTestid, kind, sinceMs })
__tikHistory.transients(sinceMs) // elements that appeared then disappeared
__tikHistory.stats()The agent uses these via the existing browser_evaluate tool — no new MCP infra. Trims itself on every push so memory stays ~900KB.
window.__tikFreeze — pause / resume animations
await __tikFreeze.pause() // pauses CSS animations + Web Animations API
// ... browser_take_screenshot ...
await __tikFreeze.resume()One-liner replacement for the inline freeze recipe the prompt already taught.
Agent prompt updates
Both fast and meticulous prompts gain a new tier 3 in the verification hierarchy: TIME-TRAVEL via __tikHistory, slotted between freeze-then-screenshot and generic programmatic fallback. The prompt includes literal example queries for the four canonical ephemeral cases:
- Chat-bot reply scrolled off:
__tikHistory.find({ text: 'expected reply', sinceMs: 30000 }) - Spinner completed too fast:
__tikHistory.transients(8000) - Toast faded:
__tikHistory.find({ containerTestid: 'toast-region', kind: 'added' }) aria-busyflipped:__tikHistory.find({ kind: 'attr-changed' })and look forchangedAttr === 'aria-busy'
Outcomes verified this way start with verified programmatically (time-travel): so reviewers can tell which tier produced the result.
Why this is the simple solution
- One page-side script (~120 lines), no external deps, no MCP changes, no tracing infra.
- Survives navigations because
addInitScriptruns before page scripts. - Agent reaches it via the tool it already uses (
browser_evaluate). - Buffer is bounded (~900KB max) and self-trimming.
Versions
- CLI:
1.4.5 - Plugin:
1.4.5
v1 tag force-updated.
v1.4.4 — ride zoom by default, release on off-target DOM mutations
What changed
v1.4.3 always-released zoom between clicks, which broke tight click sequences (form-fills, multi-step wizards) where the camera should stay magnified across consecutive nearby actions.
This restores ride-mode as the default and adds a real signal for when the camera SHOULD release: page-side MutationObserver data capturing which regions of the page mutate after each click.
The rule
- Mutation lands inside the clicked element's bbox (button press feedback, focus ring, dropdown on the click site) → expected, keep ride.
- Mutation lands outside the clicked element's bbox (toast in a corner, counter at the top updates, items reorder elsewhere) → editor flags that gap for release; pan-zoom drops to neutral so the viewer sees the full page during the change.
Implementation
runner.tsinjects two new page scripts viaaddInitScript:- The interaction recorder now also pushes
event.target.getBoundingClientRect()on each click via__tikRecordClickBbox, so we know exactly what region was clicked. - A new
MutationObserverscript (__tikRecordMutation) reports the bounding rect of every meaningful DOM change. Filters out invisible nodes, zero-size rects, off-screen rects, and throttles same-position mutations within 200ms so a long type-into-input flurry doesn't flood.
- The interaction recorder now also pushes
single-video-editor.tsmaps clicks + mutations to body-relative time, then for each click scans the post-click window (3.5s) for mutations whose bbox isn't contained in the padded click bbox. Off-target mutations build zoom-release intervals; overlapping intervals merge.SingleVideoReel.tsxpan-zoom v5: ride zoom by default (hold + pan to next click), release only when (a) the gap intersects a release interval from the editor, or (b) the next click is far from the previous one (> 40% of the viewport's shorter dimension).
Versions
- CLI:
1.4.4 - Plugin:
1.4.4
v1 tag force-updated.
v1.4.3 — click-anchored narration + always-release pan-zoom
Two structural fixes for the "I can't follow what's happening" feedback on the videos.
Narration — click-anchored windows
The body is now partitioned into windows by the agent's clicks (with very-close clicks merged so we don't get sub-second slivers). Each window has a fixed start, end, duration. The narrator writes ONE beat per window: they pick the words, the editor pins the timing to the click. Because clicks are when meaningful state changes happen on screen, narration anchored to a click is automatically anchored to the moment the viewer cares about. There's no possible drift between voice and visuals.
The narrator prompt was rewritten with much stronger guidance:
- Audience framing: "imagine the viewer is a teammate who has never seen this codebase. They have ~60 seconds and need to leave understanding what was built, why it matters, how it works, and whether the tests prove it works."
- Plan the story arc first: read PR body, map goals to windows, decide setup / execute / confirm for each window, THEN write beats.
- Name UI elements specifically in the developer's own vocabulary ("the new Bulk Archive button I added at the top") not generically ("a button").
- Concrete good vs bad examples for setup beats, confirm beats, transitions, and silent-investigation moments.
- Beats may be empty for windows with nothing meaningful to say — silent gap beats 4-word filler.
Goal-aware structuring: the narrator gets the test plan goals as the demo's chapter list and is told to march through them in order, each chapter spanning multiple windows.
Plus: fixed an upstream issue where the agent recorded ref=e123 (Playwright's internal handle) instead of the human-readable element description, so the narrator can now actually say "clicked Sign In" instead of "clicked ref=e123".
Pan-zoom — always release between clicks
The previous "ride zoom between clicks" mode held a 2× zoom on the last-clicked region for up to 12 seconds. If anything happened far from that region during the hold — a toast in the corner, a counter at the top, items reordering elsewhere — it was clipped out of frame.
The new cycle is per-click and always releases:
- APPROACH (0.5s before click): ease zoom 1.0 → 2.0
- click fires
- HOLD (0.7s): stay at 2.0 on click site so the viewer reads the immediate reaction
- RELEASE (0.5s): ease zoom 2.0 → 1.0, neutral framing
- neutral 1.0 until next APPROACH
During the gap between clicks the viewer always sees the full page after the hold releases — any visible change anywhere on screen is in frame by construction.
When approach overlaps previous click's hold/release (very tight click pairs), approach wins so the camera arrives on time.
Versions
- CLI:
1.4.3 - Plugin:
1.4.3
v1 tag force-updated.
v1.4.2 — narrator-defined timed body beats
What changed
The body narration is back to per-beat audio, but the beats are now defined by the narrator instead of the editor. Each beat carries an explicit startS + durS chosen against the body-moment timeline, gets its own TTS file, and is placed at the narrator-declared timestamp. The spoken word lands exactly when the corresponding visual happens — anchored sync by construction.
Why v1.4.0/v1.4.1 was wrong
A single continuous audio file with a global playbackRate has no anchor to the visuals. Even when the narrator perfectly summarised the journey, the words landed at unrelated times — feedback was 'narration talking about something completely different' from what was on screen.
The new contract
timed-narration.tsasks Claude for the body as{ beats: [{ startS, durS, text, captionText }, ...] }, sorted, non-overlapping, covering[0, masterDurS].normaliseBeatssnaps the narrator's beats into a clean contiguous timeline (preserves text + proportional duration; first beat starts at 0, last ends atbodyDurS).- The editor TTSes each beat separately, computes per-beat
playbackRate(clamped 0.7–1.5×), builds oneBodyChunkper beat. - The compositor renders one
<Sequence>per chunk with its own<Audio>+<WordCaption>.
Badges for silent investigative moments keep the 1.4.0 standalone-Sequence model so they stay tied to the original tool-window timestamps regardless of beat pacing.
Bonus fix
The agent was recording ref=e123 (Playwright's internal handle) as the action target, so the narrator had no idea what was actually clicked. Switched the priority so the human-readable element description wins — narrations can say 'clicked Sign In' instead of 'clicked ref=e123'.
Versions
- CLI:
1.4.2 - Plugin:
1.4.2
v1 tag force-updated.