Skip to content

Releases: marcushyett/tik-test

tik-test review — PR #34

02 May 23:55

Choose a tag to compare

Pre-release

Auto-generated video review of PR #34 in marcushyett/tik-test.

tik-test review — PR #34

02 May 23:36

Choose a tag to compare

Pre-release

Auto-generated video review of PR #34 in marcushyett/tik-test.

tik-test review — PR #34

02 May 22:34

Choose a tag to compare

Pre-release

Auto-generated video review of PR #34 in marcushyett/tik-test.

tik-test review — PR #34

02 May 10:08

Choose a tag to compare

Pre-release

Auto-generated video review of PR #34 in marcushyett/tik-test.

v1.4.7 — voice-paced captions + verification stamps + demo discipline

01 May 22:38

Choose a tag to compare

Three viewer-facing fixes bundled.

1. Captions now track voice exactly

WordCaption was clamping each word to a 0.28s minimum frame budget. When narration sped up (up to 1.5×) to fit a click window, captions visibly lagged the voice — every page would finish a few hundred ms after the spoken line. Fixes:

  • Drop the per-word floor when voice is provided; let the highlight track the voice exactly.
  • Cap the effective caption span at the segment duration so a sequence cut-off can't make captions outrun the on-screen voice.
  • Pick the latest started caption page over an earlier still-fading one — a slow page-out can no longer visually block a page that's already started speaking.

2. Verification stamps over the video

New animated overlay drawn at the EXACT moment the agent declared each goal's outcome — a green check (pass), red cross (fail), or grey dash (skipped) with the goal's shortLabel below it. 1.8s lifetime, de-overlaps when goals end close together. Punctuates the recording so viewers see "this thing was just verified" without having to wait for the outro checklist.

Hand-drawn SVG (no emoji) with stroke-dash drawing animation, spring scale-in, and label slide-up.

3. Demo discipline — kills the "click and re-check the same thing" loop

The agent was clicking + screenshotting + re-snapshotting + re-clicking the same surface multiple times per goal. From the viewer's perspective, that's dead video — the same thing happening over and over with no forward progress.

Both fast and meticulous prompts now open with an explicit:

ONE GOAL = ONE DEMO BEAT — ACT (1–3 user actions) → VERIFY (one check) → OUTCOME → STOP. The first verification IS the verdict. Once you've seen the success state once, your next message is OUTCOME. No 'just to be sure' second screenshot.

The stuck-loop tripwire now fires on the second occurrence (was third) and explicitly names the cases: re-screenshotting / re-snapshotting / re-clicking / re-typing a surface you already touched in this goal.

Versions kept in sync

  • package.json → 1.4.7
  • plugin/.claude-plugin/plugin.json → 1.4.7
  • v1 tag force-pushed to point at this commit

v1.4.6 — missing-content protocol + realistic desktop viewport

01 May 22:03

Choose a tag to compare

Two fixes bundled.

1. Missing-content protocol — kills the most common agent loop

When the agent took an action and the verification snapshot didn't show what it expected, it would loop: retry, screenshot, retry, screenshot. Especially bad on chat replies that scroll off, spinners that finish too fast, toasts that fade.

Both fast and meticulous prompts (src/goal-agent.ts) now have a mandatory rule: if the snapshot doesn't show the expected content, the next tool call MUST be __tikHistory.find({ text, sinceMs: 30000 }) — not a retry, not another screenshot.

The 30-second mutation buffer (added in v1.4.5) disambiguates the failure for the agent — it doesn't have to guess whether the element was ephemeral, scrolled off, or never rendered. Buffer hits → pass with "verified programmatically (time-travel): …". Buffer empty → real failure.

Mandatory for any goal whose success criterion mentions a chat/message/reply, a notification/toast/banner, a loading/spinner state, an animation, or any text appearing on a feed-style or scrolling surface.

2. Realistic 2026 desktop viewport (1920×1080)

Default desktop fallback bumped from 1280×8001920×1080 to match the actual resolution most US users have in 2026 (StatCounter: ~21.5% share, top resolution). The snapWidth function gains a 1920 wide-desktop bucket so consumers who request 1920 keep the resolution end-to-end (1920 → 1080 canvas at 9/16 — clean repeating fraction). Aligned across runner.ts, config.ts, plan.ts, single-video-editor.ts, remotion/Root.tsx.

Consumers who explicitly set viewport: 1280x800 (or any other) in their tiktest.md are unaffected — only the unspecified default changes.

Versions kept in sync

  • package.json → 1.4.6
  • plugin/.claude-plugin/plugin.json → 1.4.6
  • v1 tag force-pushed to point at this commit

v1.4.5 — page-side time-travel buffer for ephemeral UI

01 May 21:49

Choose a tag to compare

What changed

Agent loops on ephemeral state are the most expensive failure mode tik-test had: a chat reply scrolls out, a spinner completes, a toast fades — the agent takes a snapshot, sees nothing, retries, sees nothing again, eventually gives up or marks the feature broken when it actually worked.

Fixed permanently with one small page script:

window.__tikHistory — 30-second time-travel buffer

Records every meaningful DOM mutation (added / removed / text-changed / attr-changed) with text, bbox, timestamp, tag, testid, containerTestid, aria-label, role. Three queries:

__tikHistory.find({ text, tag, testid, containerTestid, kind, sinceMs })
__tikHistory.transients(sinceMs)  // elements that appeared then disappeared
__tikHistory.stats()

The agent uses these via the existing browser_evaluate tool — no new MCP infra. Trims itself on every push so memory stays ~900KB.

window.__tikFreeze — pause / resume animations

await __tikFreeze.pause()    // pauses CSS animations + Web Animations API
// ... browser_take_screenshot ...
await __tikFreeze.resume()

One-liner replacement for the inline freeze recipe the prompt already taught.

Agent prompt updates

Both fast and meticulous prompts gain a new tier 3 in the verification hierarchy: TIME-TRAVEL via __tikHistory, slotted between freeze-then-screenshot and generic programmatic fallback. The prompt includes literal example queries for the four canonical ephemeral cases:

  • Chat-bot reply scrolled off: __tikHistory.find({ text: 'expected reply', sinceMs: 30000 })
  • Spinner completed too fast: __tikHistory.transients(8000)
  • Toast faded: __tikHistory.find({ containerTestid: 'toast-region', kind: 'added' })
  • aria-busy flipped: __tikHistory.find({ kind: 'attr-changed' }) and look for changedAttr === 'aria-busy'

Outcomes verified this way start with verified programmatically (time-travel): so reviewers can tell which tier produced the result.

Why this is the simple solution

  • One page-side script (~120 lines), no external deps, no MCP changes, no tracing infra.
  • Survives navigations because addInitScript runs before page scripts.
  • Agent reaches it via the tool it already uses (browser_evaluate).
  • Buffer is bounded (~900KB max) and self-trimming.

Versions

  • CLI: 1.4.5
  • Plugin: 1.4.5

v1 tag force-updated.

v1.4.4 — ride zoom by default, release on off-target DOM mutations

01 May 06:51

Choose a tag to compare

What changed

v1.4.3 always-released zoom between clicks, which broke tight click sequences (form-fills, multi-step wizards) where the camera should stay magnified across consecutive nearby actions.

This restores ride-mode as the default and adds a real signal for when the camera SHOULD release: page-side MutationObserver data capturing which regions of the page mutate after each click.

The rule

  • Mutation lands inside the clicked element's bbox (button press feedback, focus ring, dropdown on the click site) → expected, keep ride.
  • Mutation lands outside the clicked element's bbox (toast in a corner, counter at the top updates, items reorder elsewhere) → editor flags that gap for release; pan-zoom drops to neutral so the viewer sees the full page during the change.

Implementation

  • runner.ts injects two new page scripts via addInitScript:
    • The interaction recorder now also pushes event.target.getBoundingClientRect() on each click via __tikRecordClickBbox, so we know exactly what region was clicked.
    • A new MutationObserver script (__tikRecordMutation) reports the bounding rect of every meaningful DOM change. Filters out invisible nodes, zero-size rects, off-screen rects, and throttles same-position mutations within 200ms so a long type-into-input flurry doesn't flood.
  • single-video-editor.ts maps clicks + mutations to body-relative time, then for each click scans the post-click window (3.5s) for mutations whose bbox isn't contained in the padded click bbox. Off-target mutations build zoom-release intervals; overlapping intervals merge.
  • SingleVideoReel.tsx pan-zoom v5: ride zoom by default (hold + pan to next click), release only when (a) the gap intersects a release interval from the editor, or (b) the next click is far from the previous one (> 40% of the viewport's shorter dimension).

Versions

  • CLI: 1.4.4
  • Plugin: 1.4.4

v1 tag force-updated.

v1.4.3 — click-anchored narration + always-release pan-zoom

01 May 06:39

Choose a tag to compare

Two structural fixes for the "I can't follow what's happening" feedback on the videos.

Narration — click-anchored windows

The body is now partitioned into windows by the agent's clicks (with very-close clicks merged so we don't get sub-second slivers). Each window has a fixed start, end, duration. The narrator writes ONE beat per window: they pick the words, the editor pins the timing to the click. Because clicks are when meaningful state changes happen on screen, narration anchored to a click is automatically anchored to the moment the viewer cares about. There's no possible drift between voice and visuals.

The narrator prompt was rewritten with much stronger guidance:

  • Audience framing: "imagine the viewer is a teammate who has never seen this codebase. They have ~60 seconds and need to leave understanding what was built, why it matters, how it works, and whether the tests prove it works."
  • Plan the story arc first: read PR body, map goals to windows, decide setup / execute / confirm for each window, THEN write beats.
  • Name UI elements specifically in the developer's own vocabulary ("the new Bulk Archive button I added at the top") not generically ("a button").
  • Concrete good vs bad examples for setup beats, confirm beats, transitions, and silent-investigation moments.
  • Beats may be empty for windows with nothing meaningful to say — silent gap beats 4-word filler.

Goal-aware structuring: the narrator gets the test plan goals as the demo's chapter list and is told to march through them in order, each chapter spanning multiple windows.

Plus: fixed an upstream issue where the agent recorded ref=e123 (Playwright's internal handle) instead of the human-readable element description, so the narrator can now actually say "clicked Sign In" instead of "clicked ref=e123".

Pan-zoom — always release between clicks

The previous "ride zoom between clicks" mode held a 2× zoom on the last-clicked region for up to 12 seconds. If anything happened far from that region during the hold — a toast in the corner, a counter at the top, items reordering elsewhere — it was clipped out of frame.

The new cycle is per-click and always releases:

  1. APPROACH (0.5s before click): ease zoom 1.0 → 2.0
  2. click fires
  3. HOLD (0.7s): stay at 2.0 on click site so the viewer reads the immediate reaction
  4. RELEASE (0.5s): ease zoom 2.0 → 1.0, neutral framing
  5. neutral 1.0 until next APPROACH

During the gap between clicks the viewer always sees the full page after the hold releases — any visible change anywhere on screen is in frame by construction.

When approach overlaps previous click's hold/release (very tight click pairs), approach wins so the camera arrives on time.

Versions

  • CLI: 1.4.3
  • Plugin: 1.4.3

v1 tag force-updated.

v1.4.2 — narrator-defined timed body beats

01 May 05:49

Choose a tag to compare

What changed

The body narration is back to per-beat audio, but the beats are now defined by the narrator instead of the editor. Each beat carries an explicit startS + durS chosen against the body-moment timeline, gets its own TTS file, and is placed at the narrator-declared timestamp. The spoken word lands exactly when the corresponding visual happens — anchored sync by construction.

Why v1.4.0/v1.4.1 was wrong

A single continuous audio file with a global playbackRate has no anchor to the visuals. Even when the narrator perfectly summarised the journey, the words landed at unrelated times — feedback was 'narration talking about something completely different' from what was on screen.

The new contract

  1. timed-narration.ts asks Claude for the body as { beats: [{ startS, durS, text, captionText }, ...] }, sorted, non-overlapping, covering [0, masterDurS].
  2. normaliseBeats snaps the narrator's beats into a clean contiguous timeline (preserves text + proportional duration; first beat starts at 0, last ends at bodyDurS).
  3. The editor TTSes each beat separately, computes per-beat playbackRate (clamped 0.7–1.5×), builds one BodyChunk per beat.
  4. The compositor renders one <Sequence> per chunk with its own <Audio> + <WordCaption>.

Badges for silent investigative moments keep the 1.4.0 standalone-Sequence model so they stay tied to the original tool-window timestamps regardless of beat pacing.

Bonus fix

The agent was recording ref=e123 (Playwright's internal handle) as the action target, so the narrator had no idea what was actually clicked. Switched the priority so the human-readable element description wins — narrations can say 'clicked Sign In' instead of 'clicked ref=e123'.

Versions

  • CLI: 1.4.2
  • Plugin: 1.4.2

v1 tag force-updated.