Skip to content

v1.4.5 — page-side time-travel buffer for ephemeral UI

Choose a tag to compare

@marcushyett marcushyett released this 01 May 21:49
· 21 commits to main since this release

What changed

Agent loops on ephemeral state are the most expensive failure mode tik-test had: a chat reply scrolls out, a spinner completes, a toast fades — the agent takes a snapshot, sees nothing, retries, sees nothing again, eventually gives up or marks the feature broken when it actually worked.

Fixed permanently with one small page script:

window.__tikHistory — 30-second time-travel buffer

Records every meaningful DOM mutation (added / removed / text-changed / attr-changed) with text, bbox, timestamp, tag, testid, containerTestid, aria-label, role. Three queries:

__tikHistory.find({ text, tag, testid, containerTestid, kind, sinceMs })
__tikHistory.transients(sinceMs)  // elements that appeared then disappeared
__tikHistory.stats()

The agent uses these via the existing browser_evaluate tool — no new MCP infra. Trims itself on every push so memory stays ~900KB.

window.__tikFreeze — pause / resume animations

await __tikFreeze.pause()    // pauses CSS animations + Web Animations API
// ... browser_take_screenshot ...
await __tikFreeze.resume()

One-liner replacement for the inline freeze recipe the prompt already taught.

Agent prompt updates

Both fast and meticulous prompts gain a new tier 3 in the verification hierarchy: TIME-TRAVEL via __tikHistory, slotted between freeze-then-screenshot and generic programmatic fallback. The prompt includes literal example queries for the four canonical ephemeral cases:

  • Chat-bot reply scrolled off: __tikHistory.find({ text: 'expected reply', sinceMs: 30000 })
  • Spinner completed too fast: __tikHistory.transients(8000)
  • Toast faded: __tikHistory.find({ containerTestid: 'toast-region', kind: 'added' })
  • aria-busy flipped: __tikHistory.find({ kind: 'attr-changed' }) and look for changedAttr === 'aria-busy'

Outcomes verified this way start with verified programmatically (time-travel): so reviewers can tell which tier produced the result.

Why this is the simple solution

  • One page-side script (~120 lines), no external deps, no MCP changes, no tracing infra.
  • Survives navigations because addInitScript runs before page scripts.
  • Agent reaches it via the tool it already uses (browser_evaluate).
  • Buffer is bounded (~900KB max) and self-trimming.

Versions

  • CLI: 1.4.5
  • Plugin: 1.4.5

v1 tag force-updated.