v0.8.0
[0.8.0] — 2026-06-20
The "developers love it" release. 0.7.0 won the agent; 0.8.0 wins the human — the dev who watches the
agent work, points at what's wrong, and trusts the green.
Added
- Human review marks — "annotate the bug where you see it" (
packages/browser,packages/server,
packages/protocol). A dev-only "Flag a bug" button rides with the presenter: the human toggles
it, clicks the element that looks wrong, types what's wrong, and Iris drops a numbered pin + emits a
HUMAN_MARK. The mark carries the element's re-resolvable anchor (the same durable address a
recorded flow uses) and the sourcefile:line— so the agent fixes the exact element and code,
not a guess. The agent drains marks with the newiris_reviewtool: each pending mark comes with
a ready-to-actfixhint (Open src/Checkout.tsx:42 and fix: <note>. Then iris_review { resolve: m1 }),
reading never consumes a mark, andresolveretires it once fixed. Off the deterministic benchmark
path (human-driven) —pnpm benchunchanged. - First-run readiness + loop intro —
iris_wait_ready(packages/server). Call it right after
init: it blocks until the app's SDK connects (returns instantly if a session already exists, so zero
latency on the happy path and on the benchmark), or times out with arecoveryhint. Smooths the
most common first-5-minutes footgun — the agent's first real call racing the WebSocket connect. Its
ready response also carries a one-lineloopguide (look → act → observe → assert → regress, plus
the human-flag →iris_reviewloop), so a fresh agent learns how to drive Iris on its first call
without reading docs. Pure, injected clock/sleep; off the benchmark path. - Deterministic visual regression —
iris_viewport(packages/server). Pin the driven page to a
fixed viewport size (clamped to sane bounds) so a screenshot baseline is reproducible across machines
— the last missing piece of CI-stable visual diffing, alongside the already-shippediris_visual_diff
masks(neutralize volatile regions) and a frozen clock (iris_clock). Drive-only, additive; off the
benchmark path. Provider-driven and tested via a fake page likeiris_network_mock. - CDP network mock / intercept —
iris_network_mock(packages/server). On a driven page
(iris drive), stub a request deterministically: return a500, force offline (abort), or delay a
response — so "verify the app handles a failed payment" is one declared rule, no backend changes. The
matcher is pure (first rule whose url-substring + optional method matches wins → fulfill/abort/continue)
and the Playwrightpage.routewiring is driven in tests with a fake Page/Route. Needs a driven
browser; returns arecommendationtoiris driveotherwise. Off the agent/benchmark path. iris statusshows sessions + health at a glance (packages/server). The daemon exposes a
localGET /status;iris statusnow reports each connected tab (url, throttled, stale, pending
human marks) and the session count — not just "running: pid". The plan's "no more pkill in a README"
daemon DX. Local-only, off the agent/benchmark path.- Actionable error recovery (
packages/server). Every tool error returned to the agent now carries
arecoveryhint when the failure is recognized — the no-session footgun, multiple/unknown sessions,
a throttled tab, a missing baseline/recording, the pairing-token config — so the first 5 minutes never
dead-end on "what do I do now?". Conservative: an unrecognized error gets no invented advice. - The panel always reflects the agent's real state —
iris_yield(packages/server,
packages/browser,packages/protocol). A human watching the browser must never see "live" when the
agent has actually stopped. The agent signals its turn boundary withiris_yield({ mode: "waiting" })
(done responding, will resume on your next message) or{ mode: "ask", note }(blocked, needs your
answer — the question shows on the panel); the session is revived automatically on the agent's next
call. Taught as the mandatory last step in the session lease, the loop guide, and the skill — and it's
agent-independent (Codex / OpenCode / Claude / Hermes). The panel renders each handback distinctly
via a PRESENTERtone: waiting = calm teal ✋, ask = amber ❓ pulse, agent crashed/disconnected =
amber ⚠ pulse, a clean end = calm green. When the last agent's MCP connection drops, the daemon ends
every session and pushes the "switch to your terminal" notice (verified end-to-end through a SIGKILL-ed
agent). Off the benchmark path. - Don't lose a panel prompt in the death-race (
packages/server,packages/protocol). If the human
types a message into the panel at the exact moment the agent stops, it would land in a dead inbox; now
both the agent-detach and idle paths fold any unread note into the end banner — quoted and labeled
Undelivered (paste into your terminal): "…"— so the words are surfaced back, not silently dropped. - Replay a saved flow from the panel — no agent (
packages/browser,packages/server,
packages/protocol). The daemon pushes the saved-flow names to the HUD on connect; the human clicks
▶ on a flow and it re-runs with no agent in the loop — the page animates via the normal replay path
and the ✓ / ⚠ drift / ✗ verdict lands in the same activity log they watch the agent in. The dev plays
the regression suite directly. Off the benchmark path (a panel-driven control, not a tool).
Changed
- Internal cohesion split (no behavior change):
SessionManagermoved to its own
session-manager.ts, and the on-disk-artifact constants toflow-constants.ts, bringing both
parent files back under the 500-line cap. All public import paths unchanged (re-exported).
Fixed
- Panel composer is now multi-line (
packages/browser). The HUD message box was a single-line
<input>that sent on any Enter; it's a<textarea>now — Enter sends, Shift+Enter inserts a
newline, and it auto-grows to fit. - Flag mode keeps the right cursors (
packages/browser). In "Flag a bug" mode every element showed
the crosshair, including the Flag button and its popover — which are clickable; they keep the pointer
cursor now. And the hover outline that boxes the element under the cursor no longer snaps jumpily: it
waits for the cursor to rest (~130 ms), then glides into place on an ease and fades in.