Releases: reticlehq/reticle
Release list
v1.2.0
[1.2.0] — 2026-06-27
The multi-agent release. One Chromium now serves many agents at once — a leased browser pool gives each its own isolated context, and project-scoped session identity keeps several apps on one machine from cross-talking. Plus a polish pass: the benchmark suite runs unattended, CI stops going red on dependency advisories it can't control, the daemon-readiness window is tunable, and the docs + README are rewritten to lead with value. Measured: 16 flows across 8 contexts in 5.2s vs 35.4s serial — 6.78× faster.
Added
- BrowserPool — one Chromium, N isolated leased contexts. A fleet of agents shares one browser instead of launching one each. Leases carry a TTL + heartbeat with a reaper for orphans,
iris_lease_acquirewaits for the tab to connect, andiris_sessionsshowsprojectId+leased. - Project-scoped session identity (on by default). Sessions resolve against a stable build-stamped
projectId(Next / HTML /.iris.json, auto-stamped by the Vite plugin), so concurrent apps never steal each other's session. - SvelteKit support in
iris initfor projects the Vite plugin can't inject into. - Real-Chromium + multi-agent CI suites — framework-connect tests (Vite/React, Next App Router, Remix, Astro), the browser-pool path, and single-page crash isolation.
IRIS_DAEMON_READY_TIMEOUT_MS— tune how long the MCP proxy waits for the daemon to become ready (default 10s) for slow machines / CI.
Changed
- Daemon resilience + per-page fault isolation. One bad page can't sink the fleet: page faults are isolated, the pool enforces its cap under burst, aborted acquires clean up, and stale daemon pidfiles are reclaimed (no ghost ports).
- Docs lead with value and read for everyone. README rewritten — value-upfront hero, a "who you are → what you get" table (vibe coder / engineer / QA / founder), and a "How to use it" walkthrough. New multi-agent testing guide; benchmark images + numbers refreshed; benchmark passes renamed to plain names (observation-cost / agent-loop / replay).
- The benchmark self-boots.
pnpm benchnow starts and tears down its own fixtures (demo + api) with env-tunable readiness (BENCH_*), so the suite runs unattended. - CI hardened against flaky reds. The security-audit step is non-blocking (a new transitive advisory no longer fails an unrelated PR), the e2e job retries with cleanup, and pre-commit matches CI step order.
Fixed
@syrin/iris/nextwithIrisno longer crashes the host build (a bundled__require.resolve).iris initdetects the monorepo package manager and gives correct guidance for non-Vite/Next apps (CRA / webpack).- Clearer edge errors — an unopenable leased URL says why; the browser warns when the bridge is unreachable on first connect.
- Skill & docs corrections for the public integration path (MCP registration,
iris initflow, stale-npxcache as the main-32000cause).
Removed
- Unused public exports —
ObserverType/UpdateStatus(@syrin/iris-protocol),buildClock(@syrin/iris-test), and the test-onlyIRIS_VITE_PLUGIN_NAMEre-export from@syrin/iris/vite. No real consumers.
v1.0.0
[1.0.0] — 2026-06-22
The 1.0 release. Iris is stable, documented, and benchmarked end to end: every package is versioned
1.0.0 under the open-core license split, and the same verify loop that wins on a toy app stays the
cheapest way to observe a real production dashboard.
The headline is the "lean responses" pass — same observations, fewer tokens. On the cross-tool
detection benchmark Iris's average observation cost drops 959 → 815 tokens with detection unchanged at
1.0 and zero false positives, lifting Verification Efficiency past the best external tool (12.27 vs
10.55) while remaining the only tool that catches every regression. Re-verifying a saved suite costs
47 tokens with no model and 0% flake, up to 2,574× cheaper than re-driving it with an LLM.
Added
- Honest, reproducible benchmarks with a small-app vs real-app story. A committed benchmark image
set (re-run efficiency, the two-apps small-vs-real comparison, the per-tool cost on the real Syrin
dashboard, and a capability matrix) rendered from a public source pipeline (assets/benchmarks+
a shared design system), with the methodology written up indocs/benchmarks.md.
On a real production dashboard Iris observes a page for 1,023 tokens vs Chrome DevTools MCP's 1,357
and Playwright MCP's 2,193, and is the only tool that asserts success from the app's own signal. - Documentation set — an architecture overview, the benchmarks explainer,
an expanded getting-started, and a Mintlify configuration so the docs
publish as a site. - Open-source project hygiene —
CONTRIBUTING.md,CODE_OF_CONDUCT.md, issue and pull-request
templates, plus contributor / stargazer / forker recognition in the README.
Changed
iris_actcollapses a clean action to its consequence — the effect block now omits fields at
their uninformative default (an absentdispatched/targetMatched/visible/enabledmeanstrue;
an absentfocusMoved/occludedBymeansnull; an absentoccluded/scrolledIntoView/
valueChanged/defaultPreventedmeansfalse), so a successful click returns justdomMutatedWithin
and any real signal still surfaces. No information is lost — absence always means the boring value.- MCP tool results serialize as compact JSON by default — the agent-facing
textcontent drops the
two-space indentation (the typedstructuredContentis unchanged), ~40% cheaper on the structured
payloads that dominate. SetIRIS_ENCODING=prettyfor the previous indented form;IRIS_ENCODING=toon
remains the densest tabular encoding. iris_act_and_waitreturns a reaction digest, not the full timeline —traceis now
{ window_ms, summary }(the counts that answer "what did the app do?") plus asincecursor; the full
per-event timeline is oneiris_observe { since }away when the counts aren't enough. On a large DOM the
dropped events array was the bulk of the loop cost — a verify loop on a 5,000-row grid falls from ~531 to
~279 tokens with the consequence still asserted from therow:approvedsignal.
Fixed
- Multiple apps on one machine no longer collide or orphan the daemon. Several Next.js / React apps
(or browser tabs) can run at once: the@syrin/iris-nextintegration now defaults to a unique per-tab
session id (SESSION_AUTO) instead of a shared constant, so two Next apps never silently evict each
other. A bridge/daemon port collision now fails fast with a clear error instead of hanging forever
and leaving an orphaned process — thelisten()calls finally handleEADDRINUSE. - License files now carry a real copyright. Filled the Apache-2.0 appendix in every SDK package
license so no[yyyy]/[name of copyright owner]placeholders remain.
Security
- Daemon mode now enforces the documented auth contract.
iris serve/ the MCP daemon previously
built its bridge without forwarding the pairingtoken, bindhost, or origin allow-list, so
IRIS_TOKEN/IRIS_HOST/IRIS_ALLOWED_ORIGINSwere silently ignored in daemon mode. They are now
honored identically to the in-process path. (Residual risk was bounded — the daemon is loopback-pinned —
but the advertised control is now actually enforced.) - Every security-critical environment variable is a single named constant (
IrisEnvin
@syrin/iris-protocol). A typo in an inline'IRIS_TOKEN'string could previously have disabled auth
silently; the names now live in exactly one place.
v0.8.0
[0.8.0] — 2026-06-20
The "developers love it" release. 0.7.0 won the agent; 0.8.0 wins the human — the dev who watches the
agent work, points at what's wrong, and trusts the green.
Added
- Human review marks — "annotate the bug where you see it" (
packages/browser,packages/server,
packages/protocol). A dev-only "Flag a bug" button rides with the presenter: the human toggles
it, clicks the element that looks wrong, types what's wrong, and Iris drops a numbered pin + emits a
HUMAN_MARK. The mark carries the element's re-resolvable anchor (the same durable address a
recorded flow uses) and the sourcefile:line— so the agent fixes the exact element and code,
not a guess. The agent drains marks with the newiris_reviewtool: each pending mark comes with
a ready-to-actfixhint (Open src/Checkout.tsx:42 and fix: <note>. Then iris_review { resolve: m1 }),
reading never consumes a mark, andresolveretires it once fixed. Off the deterministic benchmark
path (human-driven) —pnpm benchunchanged. - First-run readiness + loop intro —
iris_wait_ready(packages/server). Call it right after
init: it blocks until the app's SDK connects (returns instantly if a session already exists, so zero
latency on the happy path and on the benchmark), or times out with arecoveryhint. Smooths the
most common first-5-minutes footgun — the agent's first real call racing the WebSocket connect. Its
ready response also carries a one-lineloopguide (look → act → observe → assert → regress, plus
the human-flag →iris_reviewloop), so a fresh agent learns how to drive Iris on its first call
without reading docs. Pure, injected clock/sleep; off the benchmark path. - Deterministic visual regression —
iris_viewport(packages/server). Pin the driven page to a
fixed viewport size (clamped to sane bounds) so a screenshot baseline is reproducible across machines
— the last missing piece of CI-stable visual diffing, alongside the already-shippediris_visual_diff
masks(neutralize volatile regions) and a frozen clock (iris_clock). Drive-only, additive; off the
benchmark path. Provider-driven and tested via a fake page likeiris_network_mock. - CDP network mock / intercept —
iris_network_mock(packages/server). On a driven page
(iris drive), stub a request deterministically: return a500, force offline (abort), or delay a
response — so "verify the app handles a failed payment" is one declared rule, no backend changes. The
matcher is pure (first rule whose url-substring + optional method matches wins → fulfill/abort/continue)
and the Playwrightpage.routewiring is driven in tests with a fake Page/Route. Needs a driven
browser; returns arecommendationtoiris driveotherwise. Off the agent/benchmark path. iris statusshows sessions + health at a glance (packages/server). The daemon exposes a
localGET /status;iris statusnow reports each connected tab (url, throttled, stale, pending
human marks) and the session count — not just "running: pid". The plan's "no more pkill in a README"
daemon DX. Local-only, off the agent/benchmark path.- Actionable error recovery (
packages/server). Every tool error returned to the agent now carries
arecoveryhint when the failure is recognized — the no-session footgun, multiple/unknown sessions,
a throttled tab, a missing baseline/recording, the pairing-token config — so the first 5 minutes never
dead-end on "what do I do now?". Conservative: an unrecognized error gets no invented advice. - The panel always reflects the agent's real state —
iris_yield(packages/server,
packages/browser,packages/protocol). A human watching the browser must never see "live" when the
agent has actually stopped. The agent signals its turn boundary withiris_yield({ mode: "waiting" })
(done responding, will resume on your next message) or{ mode: "ask", note }(blocked, needs your
answer — the question shows on the panel); the session is revived automatically on the agent's next
call. Taught as the mandatory last step in the session lease, the loop guide, and the skill — and it's
agent-independent (Codex / OpenCode / Claude / Hermes). The panel renders each handback distinctly
via a PRESENTERtone: waiting = calm teal ✋, ask = amber ❓ pulse, agent crashed/disconnected =
amber ⚠ pulse, a clean end = calm green. When the last agent's MCP connection drops, the daemon ends
every session and pushes the "switch to your terminal" notice (verified end-to-end through a SIGKILL-ed
agent). Off the benchmark path. - Don't lose a panel prompt in the death-race (
packages/server,packages/protocol). If the human
types a message into the panel at the exact moment the agent stops, it would land in a dead inbox; now
both the agent-detach and idle paths fold any unread note into the end banner — quoted and labeled
Undelivered (paste into your terminal): "…"— so the words are surfaced back, not silently dropped. - Replay a saved flow from the panel — no agent (
packages/browser,packages/server,
packages/protocol). The daemon pushes the saved-flow names to the HUD on connect; the human clicks
▶ on a flow and it re-runs with no agent in the loop — the page animates via the normal replay path
and the ✓ / ⚠ drift / ✗ verdict lands in the same activity log they watch the agent in. The dev plays
the regression suite directly. Off the benchmark path (a panel-driven control, not a tool).
Changed
- Internal cohesion split (no behavior change):
SessionManagermoved to its own
session-manager.ts, and the on-disk-artifact constants toflow-constants.ts, bringing both
parent files back under the 500-line cap. All public import paths unchanged (re-exported).
Fixed
- Panel composer is now multi-line (
packages/browser). The HUD message box was a single-line
<input>that sent on any Enter; it's a<textarea>now — Enter sends, Shift+Enter inserts a
newline, and it auto-grows to fit. - Flag mode keeps the right cursors (
packages/browser). In "Flag a bug" mode every element showed
the crosshair, including the Flag button and its popover — which are clickable; they keep the pointer
cursor now. And the hover outline that boxes the element under the cursor no longer snaps jumpily: it
waits for the cursor to rest (~130 ms), then glides into place on an ease and fades in.
v0.6.10
[0.6.10] — 2026-06-18
Added
- Deterministic waiting — the
settledpredicate (packages/server). A new predicate
{ kind: "settled", quietMs }passes once network + structural-DOM activity has been quiet for
quietMs(default 500ms); ambientdom.text/animation churn (count-ups, spinners) is ignored so
an animated page can still settle. Usable iniris_wait_forandiris_assert, and composable inside
allOfwith the consequence you expect. Replaces fixed sleeps — the #1 cause of flaky agent tests. iris_act_and_waitauto-settle (packages/server). Omituntiland the tool waits for the page
to settle instead of requiring a predicate — "act, then wait for quiet" is now a single zero-config
call, the documented alternative to a sleep.iris_querytoken controls (packages/server) —limit(cap returned descriptors; reports
total+truncatedso a trim is never silent) andcount_only(return just the match count).iris_network/iris_consoletoken controls (packages/server) —limit(keep the most
recent N matches, reportingtotal+droppedOldest) and acost:{bytes,tokens}hint, matching the
other read tools so the agent can self-budget everywhere.iris_domainmustHoldper flow (packages/server) — each flow now reports the success
consequence that must hold for it (signal name / net URL), so an agent can answer "what are the
critical flows and what must hold for each?" from the domain model alone.
Changed
- Self-healing now verifies the consequence before persisting (
packages/server).iris_flow_heal
withapply:truere-replays the healed flow and re-asserts its success consequence; if a rebound
locator resolves but the flow no longer satisfies its intent, the write is refused
(status:consequence_broken, file untouched). It heals the locator, never the intent.
Fixed
- Browser observers fully restore patched globals on teardown (
packages/browser). The network,
route, and console observers stored a bound copy and assigned it back on teardown, sowindow.fetch
/history.pushState/console.*were never restored to their original identity. They now keep the
true original for restore and a bound copy only for invocation.
v0.5.0
[0.5.0] — 2026-06-15
Added
iris mcp— smart proxy with auto-start (packages/server). Runiris mcp --drive <url>and you're
done: it starts the daemon if one isn't running, waits for it to be ready, then bridges Claude Code's stdin/stdout to the daemon's SSE endpoint. Users no longer manage the daemon manually.iris mcp --drive <url>/iris serve --drive <url>— pass a URL and Iris launches its own
Playwright browser at that URL, giving the agent full autonomous control without relying on the user's open browser tab.iris mcp --headed/--headedflag — opt in to a visible browser window so you can watch exactly what the agent is doing.- Three new update MCP tools (
packages/server):iris_version_info— returns the installed version, execution kind (npx / global / local), and
whether a newer version is available on npm.iris_apply_update— upgrades Iris in place; requiresconfirm: trueto actually run.iris_rollback— downgrades to the previous version; requiresconfirm: true.
- Presenter mode (
packages/browser,packages/server) —iris.connect({ present: true })mounts a
dev-only HUD overlay that the agent can control:iris_narrateshows a caption,iris_highlight
draws a ring around any element. The HUD is excluded from snapshots and tree-shaken in production. - Unified
SKILL.mdat repo root — a single skill file auto-detects mode: setup wizard on first
run (no.iris.json), live-app testing on every run after. Covers Claude Code, OpenCode, Codex CLI, Cursor, Windsurf, VS Code, and Zed MCP config formats. .iris.jsonproject config — written after first-run setup; persistsport,headed,
framework, andharnessesso subsequent runs need zero questions.dev:irisscript inapps/demo— second Vite dev server on port 4310, isolated from the user's normal dev port.
Fixed
- All-throttled session auto-selection (
packages/server). When every connected tab is hidden
(e.g. user is in VS Code with Chrome on another desktop),SessionManager.resolve()now picks the session with the freshest heartbeat instead of throwing"multiple sessions connected". - Presenter HUD shows on bridge connect — the overlay now mounts as soon as the SDK connects to the bridge, not only after the first
iris_narratecall. iris_narrateMCP schema validation — relaxed the output schema so the tool no longer rejects responses from narration calls.iris_inspect/iris_clockoutput schemas — relaxed to pass through extra fields instead of stripping them, fixing spurious validation errors.