Skip to content

jasonkneen/agent-simulator

Repository files navigation

Agent Simulator

Live iOS Simulator preview with React Native inspector, accessibility-tree driven tools, and an MCP bridge. Drive iOS apps from your browser or an AI agent — see the live screen, click any element to get its real source code, tap / swipe / type without moving the macOS cursor.

Built for three audiences at once:

  1. RN / Expo devs who want a Figma-style "layers panel" for their running app, with real App.tsx:42 source lookups.
  2. QA & demo flows that want a scriptable sim without the pain of XCUITest — accessibility-label taps, swipes, keyboard input, screenshots.
  3. AI agents (Claude Desktop, OpenAI Agents SDK, Cursor, …) that need a first-class tool surface to see what's on the screen and act on it. Everything the browser can do is exposed over MCP.
┌──────────────────────────┐    ┌─────────────────────────────┐
│ Browser UI               │    │ MCP client                  │
│ /   (agent-simulator)    │    │ (Claude Desktop, Codex, …)  │
│  - live MJPEG stream     │    └────────────────┬────────────┘
│  - iOS a11y layer tree   │                     │ stdio
│  - React component tree  │                     ↓
│  - props + source window │    ┌─────────────────────────────┐
│  - tap/swipe/drive/type  │    │ mcp/server.mjs              │
└─────────────┬────────────┘    │ 15 MCP tools over WS        │
              │ WS + HTTP       └────────────────┬────────────┘
              ↓                                  │ WS
┌──────────────────────────────────────────────────┐
│ server.js (Node)                                 │
│ static · /api/config · /api/tree · /api/source   │
│ WS bridge · Metro /symbolicate proxy             │
└──┬────────┬──────────────────────────────────────┘
   │ stdin  │ WS (opt.)
   ↓        ↓
┌──────────┐ ┌─────────────────────┐
│sim-server│ │ Expo / RN app with  │
│ (Rust)   │ │ agent-simulator     │
│ MJPEG +  │ │ metro plugin        │
│ axe HID  │ │ (inspector bridge)  │
└─────┬────┘ └─────────────────────┘
      │
  iOS Simulator (axe + simctl)

Features

Cursor-free input Every tap / swipe / key goes through axeFBSimulatorHIDCoreSimulator. No macOS cursor movement, no Simulator.app focus requirement, pixel-exact at any zoom.
Drive any iOS app Taps, drags, mouse-wheel scroll, multitask, home, lock, keyboard passthrough. Works on Settings, Maps, Messages — not just RN apps.
iOS accessibility tree Populated from axe describe-ui on boot. Every on-screen element with its label, role, and frame — before you click anything.
React component tree For RN apps running the metro plugin, inspect any rendered component and get the full fiber stack with real source paths (React 19 _debugStack + Metro /symbolicate).
Real source code panel Select any component and the Properties panel fetches the actual file and renders a code window around the target line. Works for your code, RN's Libraries, and Expo shims.
MCP bridge 15 tools: sim_tree, sim_tap_by_label, sim_swipe, sim_type, sim_inspect, sim_source, sim_screenshot, and more. Every browser capability exposed to AI agents.

Requirements

  • macOS with Xcode + at least one iOS simulator installed.

  • Node ≥ 18 (or Bun — scripts are bun-friendly).

  • Rust toolchain (for building sim-server once).

  • axe — the cursor-free input driver. One-line install:

    brew install cameroncooke/axe/axe
  • A React Native / Expo app with the agent-simulator metro plugin, only if you want React component inspection. Driving, screenshots, and the iOS accessibility tree all work against any iOS app.


Quick start (1 minute)

# 1. Clone & install
git clone https://github.com/jkneen/agent-simulator
cd agent-simulator
bun install

# 2. Build the Rust sim-server (once)
(cd sim-server && cargo build --release)

# 3. Build the web UI (once, or after UI changes)
(cd web-app && bun install && bun run build)

# 4. Boot any simulator yourself, then start the server
xcrun simctl boot "iPhone 17 Pro" || true
open -a Simulator
bun start

# 5. Open the UI
open http://localhost:3200

The UI will attach to whatever iOS simulator is booted. Open Settings, Maps, Messages, or any app — you can drive it immediately. The iOS layer tree on the left is populated from the accessibility API; the React tree is empty until you load an RN app with the inspector bridge (see below).


One-command demo (the fun one)

We ship an Expo demo app and a launcher that starts everything for you.

bun demo

This will:

  1. Boot the first available iPhone simulator (or reuse the booted one).
  2. Start Metro for examples/expo-demo on port 8081.
  3. Launch the demo via exp://127.0.0.1:8081 in Expo Go.
  4. Start the agent-simulator Node server on :3200.
  5. Open http://localhost:3200 in your browser.

Ctrl-C cleans up both processes. Once it's running you can:

  • Click anywhere in the preview to drive the app (the tap animates with a pulse ring, and the Layers panel expands to the nearest accessibility element).
  • Hit I to enter Inspect mode, hover / click a React component, and watch the Properties panel fill with the component name, source file, and actual JSX code.
  • Tap the + on the counter — the Code panel opens examples/expo-demo/App.tsx:42 with the <Text style={styles.pillText}>+</Text> line highlighted.

Flags:

bun demo --no-open             # skip launching the browser
bun demo --port=3300           # use a different agent-simulator port
bun demo --metro-port=8082     # use a different Metro port
bun demo --device=<UDID>       # target a specific simulator

Adding agent-simulator to your own RN / Expo app

If you already have an app, add the metro plugin so the inspector bridge injects at boot:

// metro.config.js
const { getDefaultConfig } = require("expo/metro-config");
const { withAgentSimulator } = require("agent-simulator/runtime/metro-plugin");

module.exports = withAgentSimulator(getDefaultConfig(__dirname));

…then install this package alongside:

bun add -D agent-simulator

Or, if you don't want to touch Metro config, import the bridge at your app's entry point before registerRootComponent:

// index.ts
import "agent-simulator/runtime/inspector-bridge";
import { registerRootComponent } from "expo";
import App from "./App";
registerRootComponent(App);

When your app launches with either setup, the BRIDGE pill in the toolbar turns green and inspect clicks produce full component stacks with real source locations.


The UI

Region What it shows
Header Device chip, stream + bridge status pills, Inspect toggle, Home / Multitask / Lock / Rotate, panel toggles, theme menu.
Layers panel (left) Toggles between the iOS accessibility tree (populated automatically) and the React component tree (populated on inspect). Click any layer to select, hover to highlight, the tree auto-expands ancestors when you tap inside the sim.
Simulator canvas Live MJPEG. In drive mode, the native cursor is hidden and replaced by a round in-preview pointer; taps fire a ping ring, drags become swipes, the mouse wheel scrolls, the keyboard types into the focused field. Inspect mode replaces the pointer with a crosshair and draws selection overlays.
Properties panel (right) Selected layer's accessibility properties, React component name + source, and a highlighted code window around the JSX call site or component definition.
Status bar UDID, mode flags, last-inspect frame count, keyboard shortcut hints.

Keyboard shortcuts

  • I — toggle Inspect (disabled when no RN bridge)
  • Esc — cancel Inspect / unpin selection
  • Any other keystroke while the preview is focused → forwarded to the sim (works for any text field)

MCP bridge

mcp/server.mjs is a standalone Model Context Protocol server. It speaks stdio to any MCP host and bridges to the running agent-simulator server over WebSocket. All 15 tools:

Tool Purpose
sim_info Device UDID / name / stream URL / device-point size.
sim_tree Full iOS accessibility tree. flat: true returns a labelled-element list for cheap LLM context.
sim_tap Tap at (x, y) sim-ratio.
sim_tap_by_label Look up an AX element by label / value (optional type filter) and tap its centre — robust to layout changes.
sim_swipe Single-call native swipe with durationMs.
sim_type Type printable text into the focused TextField.
sim_key Press one USB-HID keycode (Return=40, Backspace=42, Esc=41, arrows 80–83…).
sim_button home / lock / power / side-button / siri / apple-pay.
sim_multitask Open the app switcher.
sim_screenshot Returns the current frame as an MCP image content.
sim_inspect Inspect the RN component at (x, y) — returns a source-symbolicated component stack.
sim_source Read a code window around file:line for any absolute path in the workspace.
sim_select_by_source Given fileName + line, probes a grid until it finds a rendered component whose source matches.
sim_subscribe_selections Streams inspectResult events back as MCP logging notifications.
sim_unsubscribe_selections Stops the above.

Configure an MCP client

For Claude Desktop, add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "agent-simulator": {
      "command": "node",
      "args": ["/absolute/path/to/agent-simulator/mcp/server.mjs"],
      "env": { "SIM_PREVIEW_URL": "http://localhost:3200" }
    }
  }
}

Then start the agent-simulator server (bun start or bun demo). The MCP server connects on demand.

For any other MCP host (OpenAI Agents SDK, Cursor, Continue, Codex, custom harnesses): spawn node /path/to/mcp/server.mjs with SIM_PREVIEW_URL pointing at your running server.

Example agent flow

1. sim_tree flat:true                       → see every element on screen
2. sim_tap_by_label "+"                     → tap the counter's plus
3. sim_tap_by_label "Your name" type:"TextField"
4. sim_type "hello agent"
5. sim_key 40                               → Return
6. sim_inspect x:0.86 y:0.24                → returns App.tsx:42
7. sim_source file:".../App.tsx" line:42    → read the JSX

Architecture

Three binaries, one simulator:

  1. sim-server (Rust) — captures the simulator screen via simctl io screenshot, encodes MJPEG, serves /stream.mjpeg, /snapshot.jpg, /api/tree, and /health. Reads tap / swipe / type commands from stdin and shells out to axe.
  2. server.js (Node) — orchestrates sim-server, serves the Vite-built UI, exposes /api/config / /api/tree / /api/source, bridges browser and MCP WebSocket clients, proxies stack frames through Metro's /symbolicate for React inspection.
  3. web-app (React + shadcn + Tailwind) — the UI. Dark / light themes, two trees, resizable panels, keyboard capture, drag-to-swipe, wheel-to-scroll.

Touch injection

Clicks from the UI (or MCP) travel:

browser ── WS ──> server.js ── stdin ──> sim-server ── exec ──> axe tap -x … -y … --udid …
                                                                       │
                                                                       └──> FBSimulatorHID → CoreSimulator
                                                                           (device-point coords, cursor-free)

Coordinates are carried as [0, 1] ratios end-to-end; sim-server converts to device points using the root AXFrame from axe describe-ui (cached with a 5 s TTL, invalidated on rotate). Every gesture is one subprocess call — no per-step stdin traffic.

/api/tree and /api/source

  • GET /api/tree — proxies axe describe-ui --udid <udid> via sim-server and returns the raw JSON. The UI pulls this on boot and on refresh; the MCP sim_tree tool wraps it.
  • GET /api/source?file=ABS&line=N&context=M — reads a window of lines around line from an absolute path, returns {file, lines, startLine0, endLine0, targetLine0, language}. Powers the Code panel in the UI and the sim_source MCP tool.

React source resolution

React 19 removed fiber._debugSource. Sources now live inside fiber._debugStack, an Error object whose stack string points at bundle-URL line/col pairs. On every inspectResult:

  1. The inspector-bridge (in the RN app) parses _debugStack, collects up to 12 candidate frames per fiber, attaches them as bundleFrames: […].
  2. server.js batch-POSTs every bundle frame to http://localhost:8081/symbolicate in one round trip and picks the first candidate per fiber that isn't inside React's own createElement / jsxDEV plumbing.
  3. The UI receives a regular stack[i].source with a real absolute file path.

This means every user component, RN internal component (RCTView, ScrollView, …), and Expo shim resolves to actual code on disk.


Stream quality and the FPS ceiling

Click the Quality dropdown in the toolbar. Five presets plus fine-tune sliders for FPS, JPEG quality, scale, and capture mode.

Preset Target FPS Resolution Mode Notes
Eco 2 300×650 MJPEG battery-friendly, background monitoring
Balanced 3 400×870 MJPEG default — matches the CSS size the browser paints
Smooth up to 10 400×870 MJPEG good for scrolling
Max up to 30 400×870 MJPEG upper limit of the MJPEG path
Fluid up to 60 400×870 BGRA + keepalive experimental, see caveats below

Changing a preset fires a WebSocket setCapture, which makes the server respawn sim-server with the new arguments and broadcast a configChanged event. The browser <img> re-subscribes to the new stream URL automatically. Total switch time is about half a second.

Click accuracy is decoupled from resolution

Every tap / swipe carries [0, 1] ratios. They get converted to device points server-side using the root AXFrame from axe describe-ui, so a 300×650 Eco stream has exactly the same click precision as a 1206×2622 native-retina stream. Lower resolutions only cost you pixels on screen, not targeting.

Honest FPS numbers

Measured on an iPhone 17 Pro simulator on an M-series Mac:

Path Requested Actual
axe screenshot (one-shot) 280–370 ms per call — ~3 fps ceiling
axe stream-video --format raw --fps 30 30 ~5–10 fps (axe's screenshot loop is the bottleneck)
axe stream-video --format bgra 60 First frame, then stalls — FBVideoStreamConfiguration is hard-coded to framesPerSecond: nil, IOSurface callback stays silent on current iOS sims

So the "up to 60 fps" on the Fluid preset is aspirational: the architecture is in place, but axe as shipped cannot deliver 60 fps today. To unblock real 60 fps you'd need one of:

  1. Fork axe and pass a concrete framesPerSecond value into FBVideoStreamConfiguration on the BGRA path.
  2. Replace the axe stream-video subprocess with a Swift helper that links FBSimulatorControl.xcframework directly and drives SimDevice / SimDeviceIOClient ourselves.

Both are real half-day projects. The current wire protocol is already a drop-in replacement — swapping in a faster backend is one file.

Keepalive in Fluid mode

Because BGRA otherwise delivers a single frame and then nothing, sim-server runs a screenshot keepalive alongside the BGRA reader in that mode. If no frame arrives for 300 ms, it fires one axe screenshot, decodes the PNG, re-encodes as JPEG, and broadcasts it — so the preview is never blank. When BGRA does fire (animation-heavy apps, scrolling), the keepalive stays silent.

Net effect in practice:

  • Idle screen → ~3 fps from keepalive.
  • Active scrolling → keepalive silent, BGRA may or may not contribute.
  • Worst case is never 0 fps.

Env-var overrides

The server accepts the same knobs as environment variables so you can pick defaults without the UI:

SP_FPS=10 SP_QUALITY=60 SP_SCALE=0.33 SP_CAPTURE=mjpeg bun start
SP_CAPTURE=bgra bun start   # force BGRA path from boot

Tips

  • Use axe's simulator conventions (device points, not Mac pixels) if you're writing agent scripts directly. The UI handles the ratio ↔ point conversion.
  • axe describe-ui is a great reference: axe describe-ui --udid <UDID> | jq '.[0]'.
  • The Metro cache is invalidated when inspector-bridge.js changes. If the bridge version log doesn't advance after an edit, restart Metro with expo start --clear.
  • SP_FPS and SP_QUALITY env vars tune MJPEG performance (defaults: fps=3 q=55).

Development

# Re-build the Rust server after Rust edits
(cd sim-server && cargo build --release)

# Re-build the UI after web-app edits
(cd web-app && bun run build)

# Run the MCP server standalone
bun mcp

# Tail MCP traffic with the official inspector
npx @modelcontextprotocol/inspector node mcp/server.mjs

Project layout:

agent-simulator/
├── server.js              Node orchestrator, WS bridge, HTTP
├── scripts/demo.mjs       One-command demo launcher
├── sim-server/            Rust MJPEG + axe driver
├── runtime/               RN metro plugin + inspector bridge
├── web-app/               Vite + React + shadcn UI
├── web/                   Legacy single-file UI (served at /classic)
├── mcp/                   MCP server (15 tools)
├── examples/expo-demo/    Expo app with the plugin pre-configured
└── LICENSE                AGPL-3.0-or-later

License

AGPL-3.0-or-later. See LICENSE.

If you're embedding agent-simulator inside a commercial product, or you need a different license, open an issue — I'm happy to talk about dual-licensing.

About

An iOS simulator in a browser you can inspect

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages