MCPUI makes web UIs legible to LLMs — the same way humans use them: by knowing which buttons to click, not which APIs to call.
Most browser agents today scrape raw DOM or guess from screenshots. That works poorly on real products: layouts are noisy, vision models miss fine structure, and every failed guess burns tokens on trial-and-error paths. When the UI changes, those agents break. When the UI is unfamiliar, they perform worst.
MCPUI is a small instrumentation layer for React apps plus a Python agent and MCP tool server. Your app publishes a stable, agent-oriented page description (/layout.md and PageSpec JSON). The agent plans and acts on component ids and bounds, not CSS selectors or pixels alone.
| Pain | What happens |
|---|---|
| Vision is weak | LLMs do not reliably “see” UI the way humans do; screenshot-only agents misread controls, spacing, and state. |
| DOM is hostile | Production DOM is huge, dynamic, and full of implementation detail that is irrelevant to the user’s goal. |
| Infrastructure mismatch | Browsers, test runners, and MCP hosts were built for humans or scripts with fixed selectors — not for agents that reason in natural language. |
| Token waste | Agents explore dead-end pathways (wrong element, wrong tab, wrong modal) and pay for every mistake. |
| Cold start | Performance collapses on UIs the agent has never seen before. |
| Brittleness | A renamed class, moved button, or new modal layer breaks selector- or DOM-based automation overnight. |
Humans do not navigate apps via internal APIs. They look at the screen, find the control that matches their intent, and click it. MCPUI gives agents that same contract: a concise map of what is on the page and what they can do next.
APIs and MCP tools are essential for structured, fast, permissioned automation — but they do not replace UI agents:
- Coverage — Many flows only exist in the product UI (onboarding wizards, admin consoles, third-party embeds, feature flags, A/B layouts). There is no stable public API.
- Human parity — Users accomplish goals through clicks; agents that only speak HTTP cannot verify what the user actually sees (validation messages, disabled states, loading spinners).
- Integration lag — New screens ship in the frontend long before every action is exposed as an MCP tool.
- Trust boundary — Some operations should run in a real browser session (cookies, CSRF, SSO redirects, file uploads) rather than as raw API calls from an LLM host.
- Composability — MCPUI’s
accomplish_tasktool meets agents where they already work (Cursor, Claude Desktop) while still using the same layout contract as the standalone CLI.
MCPUI is not “APIs vs browser.” It is instrument the UI once, then let planners and navigators use stable component ids instead of reinventing the page on every run.
Wrap interactive regions with @mcpui/react McpuiCapture components. On each render (and when you call useMcpuiRefresh), the client:
- Measures viewport-relative bounds for each captured control
- Builds a PageSpec (JSON) and
/layout.md(markdown summary) - Exposes them via the Vite dev plugin or your production static route
Agents read a short, stable document — not ten thousand DOM nodes.
The browser agent (CLI or MCP accomplish_task) runs a headed Chromium session:
- Navigate to the start URL and load layout markdown (
window.__MCPUI__.getSpec(),/layout.md, or localMCPUI.md). - Planner (LLM): task + initial layout (+ optional site workflows) → structured
TaskPlan. - Loop: refresh layout → Navigator (LLM) picks one validated action → Playwright executes via component centroids (real mouse/keyboard).
- Stop when the navigator marks the task done or
max_stepsis reached.
Actions reference component ids from the layout doc. The dispatcher rejects unknown or invisible/disabled components before clicking.
When gbrain is enabled, successful runs are captured as site workflows under mcpui-tool-server/sites/<host>/workflows/, indexed into gbrain, and retrieved on future tasks:
- Capture — After a successful run, write or update a workflow markdown file (steps, success criteria, action history) and sync to gbrain.
- Retrieve — The planner searches gbrain for similar past tasks and injects hints (adapt to current layout, do not blindly replay).
- MCP —
search_site_knowledgeexposes the same index to any connected agent.
The agent gets cheaper and more reliable on repeat intents without hard-coding selectors.
flowchart TB
subgraph app ["Your React app (@mcpui/react + @mcpui/vite)"]
UI[UI components]
Cap[McpuiCapture + PageSpec]
LM["/layout.md + /mcpui.json"]
UI --> Cap --> LM
end
subgraph agent ["mcpui-tool-server"]
PW[Playwright Chromium]
Parse[MCPUI parser]
PL[Planner LLM]
NV[Navigator LLM]
AD[Action dispatch]
PW --> Parse
Parse --> PL
Parse --> NV
PL --> NV
NV --> AD --> PW
end
subgraph knowledge ["Site knowledge (optional)"]
WF[sites/ host /workflows/*.md]
GB[gbrain index]
WF --> GB
GB -.->|search hints| PL
AD -.->|capture success| WF
end
subgraph mcp ["MCP host e.g. Cursor"]
AT[accomplish_task]
SK[search_site_knowledge]
AT --> agent
SK --> GB
end
LM <-->|refresh each step| PW
Packages in this repo
| Path | Role |
|---|---|
[mcpui-client-packages/](mcpui-client-packages/) |
npm: @mcpui/spec, @mcpui/react, @mcpui/vite |
[mcpui-tool-server/](mcpui-tool-server/) |
Python: browser agent CLI, MCP server, gbrain integration |
React only for instrumentation. Vue, Svelte, and plain HTML apps are not supported by the client packages yet. The agent can still run against any URL with
--no-mockdisabled only if you serve/layout.mdyourself.
| Kind | Typical use |
|---|---|
page |
Screen root (one per view); drives overview text |
label |
Headings, static text regions |
input |
Text fields, text areas |
action |
Buttons, links treated as actions |
selector |
Dropdowns, radios, segmented controls |
tooltip |
Secondary hints (when captured) |
Each captured control needs a stable testid and human label. Bounds are computed in viewport client coordinates (1280×720 default in the agent).
| Action | Description |
|---|---|
click |
Click component by id (centroid) |
type |
Type text into an input component |
hover |
Hover component centroid |
press |
Keyboard key (e.g. Enter, Tab) |
navigate |
Go to URL (respects site domain policy in MCP mode) |
click_at |
Fallback pixel click in viewport |
screenshot |
Save PNG |
back / forward / reload |
Browser navigation |
Without a real MCPUI embed, the agent can fall back to a mock layout generated from live DOM (lower quality; stderr warns). Use --no-mock in production-style runs once /layout.md is served.
cd mcpui-client-packages && pnpm install && pnpm buildIn your app:
pnpm add @mcpui/react @mcpui/spec @mcpui/vite// main.tsx
import { Mcpui } from "@mcpui/react";
createRoot(document.getElementById("root")!).render(
<Mcpui.Provider>
<App />
</Mcpui.Provider>,
);import { Mcpui, useMcpuiRefresh } from "@mcpui/react";
function CheckoutPage() {
useMcpuiRefresh([cart]); // re-capture when state changes
return (
<Mcpui.Capture testid="page-checkout" kind="page" label="Checkout">
<Mcpui.Capture testid="email" kind="input" label="Email" asChild>
<input type="email" />
</Mcpui.Capture>
<Mcpui.Capture testid="pay-btn" kind="action" label="Pay now" asChild>
<button type="submit">Pay</button>
</Mcpui.Capture>
</Mcpui.Capture>
);
}// vite.config.ts
import { mcpuiDevPlugin } from "@mcpui/vite";
export default defineConfig({
plugins: [react(), mcpuiDevPlugin()],
});Serve /layout.md in production (the Vite plugin does this in dev). See mcpui-client-packages/README.md.
Reference app: shopping-list example (source).
cd mcpui-tool-server
cp .env.example .env # OPENAI_API_KEY=...
make install
make playwrightAgainst your dev server or deployed URL:
poetry run browser-agent run "Add milk to the list" https://your-app.example/list --keep-openFixture demo (no React required):
make serve-fixture # terminal 1
make demo # terminal 2Full CLI flags and pacing: mcpui-tool-server/README.md.
# .env
MCPUI_SITE_HOST=shop.example.com
MCPUI_SITE_ORIGIN=https://shop.example.com
OPENAI_API_KEY=sk-...make mcp-serverCursor / Claude Desktop config (paths adjusted to your machine):
{
"mcpServers": {
"mcpui-shop": {
"command": "/path/to/mcpui-tool-server/.venv/bin/python",
"args": ["-m", "mcpui_tool_server.server"],
"env": {
"MCPUI_SITE_HOST": "shop.example.com",
"MCPUI_SITE_ORIGIN": "https://shop.example.com",
"OPENAI_API_KEY": "sk-..."
}
}
}
}| MCP tool | Purpose |
|---|---|
accomplish_task |
Run a natural-language task on the bound site |
get_site_info |
Host, origin, gbrain metadata |
search_site_knowledge |
Query captured workflows (requires gbrain) |
Use a separate MCP server entry per hostname — navigation to other hosts is rejected.
Install gbrain, then in .env:
MCPUI_GBRAIN_ENABLED=1
MCPUI_GBRAIN_CAPTURE=1Curate mcpui-tool-server/sites/<your-host>/capabilities.md, import, and register:
make gbrain-register-site SITE_HOST=shop.example.com
make gbrain-import-site SITE_HOST=shop.example.comSuccessful accomplish_task / CLI runs append or create sites/<host>/workflows/*.md and sync to gbrain for future planner hints.
# Client packages
cd mcpui-client-packages && pnpm install && pnpm build && pnpm test
# Tool server
cd mcpui-tool-server && make install && make testPublished npm packages: [@mcpui/spec](https://www.npmjs.com/package/@mcpui/spec), [@mcpui/react](https://www.npmjs.com/package/@mcpui/react), [@mcpui/vite](https://www.npmjs.com/package/@mcpui/vite).
See LICENSE.