Release v0.8.0 · reticlehq/reticle

[0.8.0] — 2026-06-20

The "developers love it" release. 0.7.0 won the agent; 0.8.0 wins the human — the dev who watches the
agent work, points at what's wrong, and trusts the green.

Added

Human review marks — "annotate the bug where you see it" (packages/browser, packages/server,
packages/protocol). A dev-only "Flag a bug" button rides with the presenter: the human toggles
it, clicks the element that looks wrong, types what's wrong, and Iris drops a numbered pin + emits a
HUMAN_MARK. The mark carries the element's re-resolvable anchor (the same durable address a
recorded flow uses) and the source file:line — so the agent fixes the exact element and code,
not a guess. The agent drains marks with the new iris_review tool: each pending mark comes with
a ready-to-act fix hint (Open src/Checkout.tsx:42 and fix: <note>. Then iris_review { resolve: m1 }),
reading never consumes a mark, and resolve retires it once fixed. Off the deterministic benchmark
path (human-driven) — pnpm bench unchanged.
First-run readiness + loop intro — iris_wait_ready (packages/server). Call it right after
init: it blocks until the app's SDK connects (returns instantly if a session already exists, so zero
latency on the happy path and on the benchmark), or times out with a recovery hint. Smooths the
most common first-5-minutes footgun — the agent's first real call racing the WebSocket connect. Its
ready response also carries a one-line loop guide (look → act → observe → assert → regress, plus
the human-flag → iris_review loop), so a fresh agent learns how to drive Iris on its first call
without reading docs. Pure, injected clock/sleep; off the benchmark path.
Deterministic visual regression — iris_viewport (packages/server). Pin the driven page to a
fixed viewport size (clamped to sane bounds) so a screenshot baseline is reproducible across machines
— the last missing piece of CI-stable visual diffing, alongside the already-shipped iris_visual_diff
masks (neutralize volatile regions) and a frozen clock (iris_clock). Drive-only, additive; off the
benchmark path. Provider-driven and tested via a fake page like iris_network_mock.
CDP network mock / intercept — iris_network_mock (packages/server). On a driven page
(iris drive), stub a request deterministically: return a 500, force offline (abort), or delay a
response — so "verify the app handles a failed payment" is one declared rule, no backend changes. The
matcher is pure (first rule whose url-substring + optional method matches wins → fulfill/abort/continue)
and the Playwright page.route wiring is driven in tests with a fake Page/Route. Needs a driven
browser; returns a recommendation to iris drive otherwise. Off the agent/benchmark path.
iris status shows sessions + health at a glance (packages/server). The daemon exposes a
local GET /status; iris status now reports each connected tab (url, throttled, stale, pending
human marks) and the session count — not just "running: pid". The plan's "no more pkill in a README"
daemon DX. Local-only, off the agent/benchmark path.
Actionable error recovery (packages/server). Every tool error returned to the agent now carries
a recovery hint when the failure is recognized — the no-session footgun, multiple/unknown sessions,
a throttled tab, a missing baseline/recording, the pairing-token config — so the first 5 minutes never
dead-end on "what do I do now?". Conservative: an unrecognized error gets no invented advice.
The panel always reflects the agent's real state — iris_yield (packages/server,
packages/browser, packages/protocol). A human watching the browser must never see "live" when the
agent has actually stopped. The agent signals its turn boundary with iris_yield({ mode: "waiting" })
(done responding, will resume on your next message) or { mode: "ask", note } (blocked, needs your
answer — the question shows on the panel); the session is revived automatically on the agent's next
call. Taught as the mandatory last step in the session lease, the loop guide, and the skill — and it's
agent-independent (Codex / OpenCode / Claude / Hermes). The panel renders each handback distinctly
via a PRESENTER tone: waiting = calm teal ✋, ask = amber ❓ pulse, agent crashed/disconnected =
amber ⚠ pulse, a clean end = calm green. When the last agent's MCP connection drops, the daemon ends
every session and pushes the "switch to your terminal" notice (verified end-to-end through a SIGKILL-ed
agent). Off the benchmark path.
Don't lose a panel prompt in the death-race (packages/server, packages/protocol). If the human
types a message into the panel at the exact moment the agent stops, it would land in a dead inbox; now
both the agent-detach and idle paths fold any unread note into the end banner — quoted and labeled
Undelivered (paste into your terminal): "…" — so the words are surfaced back, not silently dropped.
Replay a saved flow from the panel — no agent (packages/browser, packages/server,
packages/protocol). The daemon pushes the saved-flow names to the HUD on connect; the human clicks
▶ on a flow and it re-runs with no agent in the loop — the page animates via the normal replay path
and the ✓ / ⚠ drift / ✗ verdict lands in the same activity log they watch the agent in. The dev plays
the regression suite directly. Off the benchmark path (a panel-driven control, not a tool).

Changed

Internal cohesion split (no behavior change): SessionManager moved to its own
session-manager.ts, and the on-disk-artifact constants to flow-constants.ts, bringing both
parent files back under the 500-line cap. All public import paths unchanged (re-exported).

Fixed

Panel composer is now multi-line (packages/browser). The HUD message box was a single-line
<input> that sent on any Enter; it's a <textarea> now — Enter sends, Shift+Enter inserts a
newline, and it auto-grows to fit.
Flag mode keeps the right cursors (packages/browser). In "Flag a bug" mode every element showed
the crosshair, including the Flag button and its popover — which are clickable; they keep the pointer
cursor now. And the hover outline that boxes the element under the cursor no longer snaps jumpily: it
waits for the cursor to rest (~130 ms), then glides into place on an ease and fades in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.8.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[0.8.0] — 2026-06-20

Added

Changed

Fixed

Uh oh!