Skip to content

v0.8.0

Choose a tag to compare

@divshekhar divshekhar released this 20 Jun 09:22

[0.8.0] — 2026-06-20

The "developers love it" release. 0.7.0 won the agent; 0.8.0 wins the human — the dev who watches the
agent work, points at what's wrong, and trusts the green.

Added

  • Human review marks — "annotate the bug where you see it" (packages/browser, packages/server,
    packages/protocol). A dev-only "Flag a bug" button rides with the presenter: the human toggles
    it, clicks the element that looks wrong, types what's wrong, and Iris drops a numbered pin + emits a
    HUMAN_MARK. The mark carries the element's re-resolvable anchor (the same durable address a
    recorded flow uses) and the source file:line — so the agent fixes the exact element and code,
    not a guess. The agent drains marks with the new iris_review tool: each pending mark comes with
    a ready-to-act fix hint (Open src/Checkout.tsx:42 and fix: <note>. Then iris_review { resolve: m1 }),
    reading never consumes a mark, and resolve retires it once fixed. Off the deterministic benchmark
    path (human-driven) — pnpm bench unchanged.
  • First-run readiness + loop intro — iris_wait_ready (packages/server). Call it right after
    init: it blocks until the app's SDK connects (returns instantly if a session already exists, so zero
    latency on the happy path and on the benchmark), or times out with a recovery hint. Smooths the
    most common first-5-minutes footgun — the agent's first real call racing the WebSocket connect. Its
    ready response also carries a one-line loop guide (look → act → observe → assert → regress, plus
    the human-flag → iris_review loop), so a fresh agent learns how to drive Iris on its first call
    without reading docs. Pure, injected clock/sleep; off the benchmark path.
  • Deterministic visual regression — iris_viewport (packages/server). Pin the driven page to a
    fixed viewport size (clamped to sane bounds) so a screenshot baseline is reproducible across machines
    — the last missing piece of CI-stable visual diffing, alongside the already-shipped iris_visual_diff
    masks (neutralize volatile regions) and a frozen clock (iris_clock). Drive-only, additive; off the
    benchmark path. Provider-driven and tested via a fake page like iris_network_mock.
  • CDP network mock / intercept — iris_network_mock (packages/server). On a driven page
    (iris drive), stub a request deterministically: return a 500, force offline (abort), or delay a
    response — so "verify the app handles a failed payment" is one declared rule, no backend changes. The
    matcher is pure (first rule whose url-substring + optional method matches wins → fulfill/abort/continue)
    and the Playwright page.route wiring is driven in tests with a fake Page/Route. Needs a driven
    browser; returns a recommendation to iris drive otherwise. Off the agent/benchmark path.
  • iris status shows sessions + health at a glance (packages/server). The daemon exposes a
    local GET /status; iris status now reports each connected tab (url, throttled, stale, pending
    human marks) and the session count — not just "running: pid". The plan's "no more pkill in a README"
    daemon DX. Local-only, off the agent/benchmark path.
  • Actionable error recovery (packages/server). Every tool error returned to the agent now carries
    a recovery hint when the failure is recognized — the no-session footgun, multiple/unknown sessions,
    a throttled tab, a missing baseline/recording, the pairing-token config — so the first 5 minutes never
    dead-end on "what do I do now?". Conservative: an unrecognized error gets no invented advice.
  • The panel always reflects the agent's real state — iris_yield (packages/server,
    packages/browser, packages/protocol). A human watching the browser must never see "live" when the
    agent has actually stopped. The agent signals its turn boundary with iris_yield({ mode: "waiting" })
    (done responding, will resume on your next message) or { mode: "ask", note } (blocked, needs your
    answer — the question shows on the panel); the session is revived automatically on the agent's next
    call. Taught as the mandatory last step in the session lease, the loop guide, and the skill — and it's
    agent-independent (Codex / OpenCode / Claude / Hermes). The panel renders each handback distinctly
    via a PRESENTER tone: waiting = calm teal ✋, ask = amber ❓ pulse, agent crashed/disconnected =
    amber ⚠ pulse, a clean end = calm green. When the last agent's MCP connection drops, the daemon ends
    every session and pushes the "switch to your terminal" notice (verified end-to-end through a SIGKILL-ed
    agent). Off the benchmark path.
  • Don't lose a panel prompt in the death-race (packages/server, packages/protocol). If the human
    types a message into the panel at the exact moment the agent stops, it would land in a dead inbox; now
    both the agent-detach and idle paths fold any unread note into the end banner — quoted and labeled
    Undelivered (paste into your terminal): "…" — so the words are surfaced back, not silently dropped.
  • Replay a saved flow from the panel — no agent (packages/browser, packages/server,
    packages/protocol). The daemon pushes the saved-flow names to the HUD on connect; the human clicks
    on a flow and it re-runs with no agent in the loop — the page animates via the normal replay path
    and the ✓ / ⚠ drift / ✗ verdict lands in the same activity log they watch the agent in. The dev plays
    the regression suite directly. Off the benchmark path (a panel-driven control, not a tool).

Changed

  • Internal cohesion split (no behavior change): SessionManager moved to its own
    session-manager.ts, and the on-disk-artifact constants to flow-constants.ts, bringing both
    parent files back under the 500-line cap. All public import paths unchanged (re-exported).

Fixed

  • Panel composer is now multi-line (packages/browser). The HUD message box was a single-line
    <input> that sent on any Enter; it's a <textarea> now — Enter sends, Shift+Enter inserts a
    newline
    , and it auto-grows to fit.
  • Flag mode keeps the right cursors (packages/browser). In "Flag a bug" mode every element showed
    the crosshair, including the Flag button and its popover — which are clickable; they keep the pointer
    cursor now. And the hover outline that boxes the element under the cursor no longer snaps jumpily: it
    waits for the cursor to rest (~130 ms), then glides into place on an ease and fades in.