Skip to content

Releases: reticlehq/reticle

v1.2.0

Choose a tag to compare

@divshekhar divshekhar released this 27 Jun 09:12

[1.2.0] — 2026-06-27

The multi-agent release. One Chromium now serves many agents at once — a leased browser pool gives each its own isolated context, and project-scoped session identity keeps several apps on one machine from cross-talking. Plus a polish pass: the benchmark suite runs unattended, CI stops going red on dependency advisories it can't control, the daemon-readiness window is tunable, and the docs + README are rewritten to lead with value. Measured: 16 flows across 8 contexts in 5.2s vs 35.4s serial — 6.78× faster.

Added

  • BrowserPool — one Chromium, N isolated leased contexts. A fleet of agents shares one browser instead of launching one each. Leases carry a TTL + heartbeat with a reaper for orphans, iris_lease_acquire waits for the tab to connect, and iris_sessions shows projectId + leased.
  • Project-scoped session identity (on by default). Sessions resolve against a stable build-stamped projectId (Next / HTML / .iris.json, auto-stamped by the Vite plugin), so concurrent apps never steal each other's session.
  • SvelteKit support in iris init for projects the Vite plugin can't inject into.
  • Real-Chromium + multi-agent CI suites — framework-connect tests (Vite/React, Next App Router, Remix, Astro), the browser-pool path, and single-page crash isolation.
  • IRIS_DAEMON_READY_TIMEOUT_MS — tune how long the MCP proxy waits for the daemon to become ready (default 10s) for slow machines / CI.

Changed

  • Daemon resilience + per-page fault isolation. One bad page can't sink the fleet: page faults are isolated, the pool enforces its cap under burst, aborted acquires clean up, and stale daemon pidfiles are reclaimed (no ghost ports).
  • Docs lead with value and read for everyone. README rewritten — value-upfront hero, a "who you are → what you get" table (vibe coder / engineer / QA / founder), and a "How to use it" walkthrough. New multi-agent testing guide; benchmark images + numbers refreshed; benchmark passes renamed to plain names (observation-cost / agent-loop / replay).
  • The benchmark self-boots. pnpm bench now starts and tears down its own fixtures (demo + api) with env-tunable readiness (BENCH_*), so the suite runs unattended.
  • CI hardened against flaky reds. The security-audit step is non-blocking (a new transitive advisory no longer fails an unrelated PR), the e2e job retries with cleanup, and pre-commit matches CI step order.

Fixed

  • @syrin/iris/next withIris no longer crashes the host build (a bundled __require.resolve).
  • iris init detects the monorepo package manager and gives correct guidance for non-Vite/Next apps (CRA / webpack).
  • Clearer edge errors — an unopenable leased URL says why; the browser warns when the bridge is unreachable on first connect.
  • Skill & docs corrections for the public integration path (MCP registration, iris init flow, stale-npx cache as the main -32000 cause).

Removed

  • Unused public exportsObserverType / UpdateStatus (@syrin/iris-protocol), buildClock (@syrin/iris-test), and the test-only IRIS_VITE_PLUGIN_NAME re-export from @syrin/iris/vite. No real consumers.

v1.0.0

Choose a tag to compare

@divshekhar divshekhar released this 22 Jun 07:09

[1.0.0] — 2026-06-22

The 1.0 release. Iris is stable, documented, and benchmarked end to end: every package is versioned
1.0.0 under the open-core license split, and the same verify loop that wins on a toy app stays the
cheapest way to observe a real production dashboard.

The headline is the "lean responses" pass — same observations, fewer tokens. On the cross-tool
detection benchmark Iris's average observation cost drops 959 → 815 tokens with detection unchanged at
1.0 and zero false positives, lifting Verification Efficiency past the best external tool (12.27 vs
10.55) while remaining the only tool that catches every regression. Re-verifying a saved suite costs
47 tokens with no model and 0% flake, up to 2,574× cheaper than re-driving it with an LLM.

Added

  • Honest, reproducible benchmarks with a small-app vs real-app story. A committed benchmark image
    set (re-run efficiency, the two-apps small-vs-real comparison, the per-tool cost on the real Syrin
    dashboard, and a capability matrix) rendered from a public source pipeline (assets/benchmarks +
    a shared design system), with the methodology written up in docs/benchmarks.md.
    On a real production dashboard Iris observes a page for 1,023 tokens vs Chrome DevTools MCP's 1,357
    and Playwright MCP's 2,193, and is the only tool that asserts success from the app's own signal.
  • Documentation set — an architecture overview, the benchmarks explainer,
    an expanded getting-started, and a Mintlify configuration so the docs
    publish as a site.
  • Open-source project hygieneCONTRIBUTING.md, CODE_OF_CONDUCT.md, issue and pull-request
    templates, plus contributor / stargazer / forker recognition in the README.

Changed

  • iris_act collapses a clean action to its consequence — the effect block now omits fields at
    their uninformative default (an absent dispatched/targetMatched/visible/enabled means true;
    an absent focusMoved/occludedBy means null; an absent occluded/scrolledIntoView/
    valueChanged/defaultPrevented means false), so a successful click returns just domMutatedWithin
    and any real signal still surfaces. No information is lost — absence always means the boring value.
  • MCP tool results serialize as compact JSON by default — the agent-facing text content drops the
    two-space indentation (the typed structuredContent is unchanged), ~40% cheaper on the structured
    payloads that dominate. Set IRIS_ENCODING=pretty for the previous indented form; IRIS_ENCODING=toon
    remains the densest tabular encoding.
  • iris_act_and_wait returns a reaction digest, not the full timelinetrace is now
    { window_ms, summary } (the counts that answer "what did the app do?") plus a since cursor; the full
    per-event timeline is one iris_observe { since } away when the counts aren't enough. On a large DOM the
    dropped events array was the bulk of the loop cost — a verify loop on a 5,000-row grid falls from ~531 to
    ~279 tokens with the consequence still asserted from the row:approved signal.

Fixed

  • Multiple apps on one machine no longer collide or orphan the daemon. Several Next.js / React apps
    (or browser tabs) can run at once: the @syrin/iris-next integration now defaults to a unique per-tab
    session id (SESSION_AUTO) instead of a shared constant, so two Next apps never silently evict each
    other. A bridge/daemon port collision now fails fast with a clear error instead of hanging forever
    and leaving an orphaned process — the listen() calls finally handle EADDRINUSE.
  • License files now carry a real copyright. Filled the Apache-2.0 appendix in every SDK package
    license so no [yyyy] / [name of copyright owner] placeholders remain.

Security

  • Daemon mode now enforces the documented auth contract. iris serve / the MCP daemon previously
    built its bridge without forwarding the pairing token, bind host, or origin allow-list, so
    IRIS_TOKEN / IRIS_HOST / IRIS_ALLOWED_ORIGINS were silently ignored in daemon mode. They are now
    honored identically to the in-process path. (Residual risk was bounded — the daemon is loopback-pinned —
    but the advertised control is now actually enforced.)
  • Every security-critical environment variable is a single named constant (IrisEnv in
    @syrin/iris-protocol). A typo in an inline 'IRIS_TOKEN' string could previously have disabled auth
    silently; the names now live in exactly one place.

v0.8.0

Choose a tag to compare

@divshekhar divshekhar released this 20 Jun 09:22

[0.8.0] — 2026-06-20

The "developers love it" release. 0.7.0 won the agent; 0.8.0 wins the human — the dev who watches the
agent work, points at what's wrong, and trusts the green.

Added

  • Human review marks — "annotate the bug where you see it" (packages/browser, packages/server,
    packages/protocol). A dev-only "Flag a bug" button rides with the presenter: the human toggles
    it, clicks the element that looks wrong, types what's wrong, and Iris drops a numbered pin + emits a
    HUMAN_MARK. The mark carries the element's re-resolvable anchor (the same durable address a
    recorded flow uses) and the source file:line — so the agent fixes the exact element and code,
    not a guess. The agent drains marks with the new iris_review tool: each pending mark comes with
    a ready-to-act fix hint (Open src/Checkout.tsx:42 and fix: <note>. Then iris_review { resolve: m1 }),
    reading never consumes a mark, and resolve retires it once fixed. Off the deterministic benchmark
    path (human-driven) — pnpm bench unchanged.
  • First-run readiness + loop intro — iris_wait_ready (packages/server). Call it right after
    init: it blocks until the app's SDK connects (returns instantly if a session already exists, so zero
    latency on the happy path and on the benchmark), or times out with a recovery hint. Smooths the
    most common first-5-minutes footgun — the agent's first real call racing the WebSocket connect. Its
    ready response also carries a one-line loop guide (look → act → observe → assert → regress, plus
    the human-flag → iris_review loop), so a fresh agent learns how to drive Iris on its first call
    without reading docs. Pure, injected clock/sleep; off the benchmark path.
  • Deterministic visual regression — iris_viewport (packages/server). Pin the driven page to a
    fixed viewport size (clamped to sane bounds) so a screenshot baseline is reproducible across machines
    — the last missing piece of CI-stable visual diffing, alongside the already-shipped iris_visual_diff
    masks (neutralize volatile regions) and a frozen clock (iris_clock). Drive-only, additive; off the
    benchmark path. Provider-driven and tested via a fake page like iris_network_mock.
  • CDP network mock / intercept — iris_network_mock (packages/server). On a driven page
    (iris drive), stub a request deterministically: return a 500, force offline (abort), or delay a
    response — so "verify the app handles a failed payment" is one declared rule, no backend changes. The
    matcher is pure (first rule whose url-substring + optional method matches wins → fulfill/abort/continue)
    and the Playwright page.route wiring is driven in tests with a fake Page/Route. Needs a driven
    browser; returns a recommendation to iris drive otherwise. Off the agent/benchmark path.
  • iris status shows sessions + health at a glance (packages/server). The daemon exposes a
    local GET /status; iris status now reports each connected tab (url, throttled, stale, pending
    human marks) and the session count — not just "running: pid". The plan's "no more pkill in a README"
    daemon DX. Local-only, off the agent/benchmark path.
  • Actionable error recovery (packages/server). Every tool error returned to the agent now carries
    a recovery hint when the failure is recognized — the no-session footgun, multiple/unknown sessions,
    a throttled tab, a missing baseline/recording, the pairing-token config — so the first 5 minutes never
    dead-end on "what do I do now?". Conservative: an unrecognized error gets no invented advice.
  • The panel always reflects the agent's real state — iris_yield (packages/server,
    packages/browser, packages/protocol). A human watching the browser must never see "live" when the
    agent has actually stopped. The agent signals its turn boundary with iris_yield({ mode: "waiting" })
    (done responding, will resume on your next message) or { mode: "ask", note } (blocked, needs your
    answer — the question shows on the panel); the session is revived automatically on the agent's next
    call. Taught as the mandatory last step in the session lease, the loop guide, and the skill — and it's
    agent-independent (Codex / OpenCode / Claude / Hermes). The panel renders each handback distinctly
    via a PRESENTER tone: waiting = calm teal ✋, ask = amber ❓ pulse, agent crashed/disconnected =
    amber ⚠ pulse, a clean end = calm green. When the last agent's MCP connection drops, the daemon ends
    every session and pushes the "switch to your terminal" notice (verified end-to-end through a SIGKILL-ed
    agent). Off the benchmark path.
  • Don't lose a panel prompt in the death-race (packages/server, packages/protocol). If the human
    types a message into the panel at the exact moment the agent stops, it would land in a dead inbox; now
    both the agent-detach and idle paths fold any unread note into the end banner — quoted and labeled
    Undelivered (paste into your terminal): "…" — so the words are surfaced back, not silently dropped.
  • Replay a saved flow from the panel — no agent (packages/browser, packages/server,
    packages/protocol). The daemon pushes the saved-flow names to the HUD on connect; the human clicks
    on a flow and it re-runs with no agent in the loop — the page animates via the normal replay path
    and the ✓ / ⚠ drift / ✗ verdict lands in the same activity log they watch the agent in. The dev plays
    the regression suite directly. Off the benchmark path (a panel-driven control, not a tool).

Changed

  • Internal cohesion split (no behavior change): SessionManager moved to its own
    session-manager.ts, and the on-disk-artifact constants to flow-constants.ts, bringing both
    parent files back under the 500-line cap. All public import paths unchanged (re-exported).

Fixed

  • Panel composer is now multi-line (packages/browser). The HUD message box was a single-line
    <input> that sent on any Enter; it's a <textarea> now — Enter sends, Shift+Enter inserts a
    newline
    , and it auto-grows to fit.
  • Flag mode keeps the right cursors (packages/browser). In "Flag a bug" mode every element showed
    the crosshair, including the Flag button and its popover — which are clickable; they keep the pointer
    cursor now. And the hover outline that boxes the element under the cursor no longer snaps jumpily: it
    waits for the cursor to rest (~130 ms), then glides into place on an ease and fades in.

v0.6.10

Choose a tag to compare

@divshekhar divshekhar released this 18 Jun 14:34
ad2ff44

[0.6.10] — 2026-06-18

Added

  • Deterministic waiting — the settled predicate (packages/server). A new predicate
    { kind: "settled", quietMs } passes once network + structural-DOM activity has been quiet for
    quietMs (default 500ms); ambient dom.text/animation churn (count-ups, spinners) is ignored so
    an animated page can still settle. Usable in iris_wait_for and iris_assert, and composable inside
    allOf with the consequence you expect. Replaces fixed sleeps — the #1 cause of flaky agent tests.
  • iris_act_and_wait auto-settle (packages/server). Omit until and the tool waits for the page
    to settle instead of requiring a predicate — "act, then wait for quiet" is now a single zero-config
    call, the documented alternative to a sleep.
  • iris_query token controls (packages/server) — limit (cap returned descriptors; reports
    total + truncated so a trim is never silent) and count_only (return just the match count).
  • iris_network / iris_console token controls (packages/server) — limit (keep the most
    recent N matches, reporting total + droppedOldest) and a cost:{bytes,tokens} hint, matching the
    other read tools so the agent can self-budget everywhere.
  • iris_domain mustHold per flow (packages/server) — each flow now reports the success
    consequence that must hold for it (signal name / net URL), so an agent can answer "what are the
    critical flows and what must hold for each?" from the domain model alone.

Changed

  • Self-healing now verifies the consequence before persisting (packages/server). iris_flow_heal
    with apply:true re-replays the healed flow and re-asserts its success consequence; if a rebound
    locator resolves but the flow no longer satisfies its intent, the write is refused
    (status:consequence_broken, file untouched). It heals the locator, never the intent.

Fixed

  • Browser observers fully restore patched globals on teardown (packages/browser). The network,
    route, and console observers stored a bound copy and assigned it back on teardown, so window.fetch
    / history.pushState / console.* were never restored to their original identity. They now keep the
    true original for restore and a bound copy only for invocation.

v0.5.0

Choose a tag to compare

@divshekhar divshekhar released this 15 Jun 22:01

[0.5.0] — 2026-06-15

Added

  • iris mcp — smart proxy with auto-start (packages/server). Run iris mcp --drive <url> and you're
    done: it starts the daemon if one isn't running, waits for it to be ready, then bridges Claude Code's stdin/stdout to the daemon's SSE endpoint. Users no longer manage the daemon manually.
  • iris mcp --drive <url> / iris serve --drive <url> — pass a URL and Iris launches its own
    Playwright browser at that URL, giving the agent full autonomous control without relying on the user's open browser tab.
  • iris mcp --headed / --headed flag — opt in to a visible browser window so you can watch exactly what the agent is doing.
  • Three new update MCP tools (packages/server):
    • iris_version_info — returns the installed version, execution kind (npx / global / local), and
      whether a newer version is available on npm.
    • iris_apply_update — upgrades Iris in place; requires confirm: true to actually run.
    • iris_rollback — downgrades to the previous version; requires confirm: true.
  • Presenter mode (packages/browser, packages/server) — iris.connect({ present: true }) mounts a
    dev-only HUD overlay that the agent can control: iris_narrate shows a caption, iris_highlight
    draws a ring around any element. The HUD is excluded from snapshots and tree-shaken in production.
  • Unified SKILL.md at repo root — a single skill file auto-detects mode: setup wizard on first
    run (no .iris.json), live-app testing on every run after. Covers Claude Code, OpenCode, Codex CLI, Cursor, Windsurf, VS Code, and Zed MCP config formats.
  • .iris.json project config — written after first-run setup; persists port, headed,
    framework, and harnesses so subsequent runs need zero questions.
  • dev:iris script in apps/demo — second Vite dev server on port 4310, isolated from the user's normal dev port.

Fixed

  • All-throttled session auto-selection (packages/server). When every connected tab is hidden
    (e.g. user is in VS Code with Chrome on another desktop), SessionManager.resolve() now picks the session with the freshest heartbeat instead of throwing "multiple sessions connected".
  • Presenter HUD shows on bridge connect — the overlay now mounts as soon as the SDK connects to the bridge, not only after the first iris_narrate call.
  • iris_narrate MCP schema validation — relaxed the output schema so the tool no longer rejects responses from narration calls.
  • iris_inspect / iris_clock output schemas — relaxed to pass through extra fields instead of stripping them, fixing spurious validation errors.