MCPUI

MCPUI makes web UIs legible to LLMs — the same way humans use them: by knowing which buttons to click, not which APIs to call.

Most browser agents today scrape raw DOM or guess from screenshots. That works poorly on real products: layouts are noisy, vision models miss fine structure, and every failed guess burns tokens on trial-and-error paths. When the UI changes, those agents break. When the UI is unfamiliar, they perform worst.

MCPUI is a small instrumentation layer for React apps plus a Python agent and MCP tool server. Your app publishes a stable, agent-oriented page description (/layout.md and PageSpec JSON). The agent plans and acts on component ids and bounds, not CSS selectors or pixels alone.

The problem

Browser agents struggle on today’s web

Pain	What happens
Vision is weak	LLMs do not reliably “see” UI the way humans do; screenshot-only agents misread controls, spacing, and state.
DOM is hostile	Production DOM is huge, dynamic, and full of implementation detail that is irrelevant to the user’s goal.
Infrastructure mismatch	Browsers, test runners, and MCP hosts were built for humans or scripts with fixed selectors — not for agents that reason in natural language.
Token waste	Agents explore dead-end pathways (wrong element, wrong tab, wrong modal) and pay for every mistake.
Cold start	Performance collapses on UIs the agent has never seen before.
Brittleness	A renamed class, moved button, or new modal layer breaks selector- or DOM-based automation overnight.

Humans do not navigate apps via internal APIs. They look at the screen, find the control that matches their intent, and click it. MCPUI gives agents that same contract: a concise map of what is on the page and what they can do next.

Why a browser agent is still necessary (even with APIs and MCP)

APIs and MCP tools are essential for structured, fast, permissioned automation — but they do not replace UI agents:

Coverage — Many flows only exist in the product UI (onboarding wizards, admin consoles, third-party embeds, feature flags, A/B layouts). There is no stable public API.
Human parity — Users accomplish goals through clicks; agents that only speak HTTP cannot verify what the user actually sees (validation messages, disabled states, loading spinners).
Integration lag — New screens ship in the frontend long before every action is exposed as an MCP tool.
Trust boundary — Some operations should run in a real browser session (cookies, CSRF, SSO redirects, file uploads) rather than as raw API calls from an LLM host.
Composability — MCPUI’s accomplish_task tool meets agents where they already work (Cursor, Claude Desktop) while still using the same layout contract as the standalone CLI.

MCPUI is not “APIs vs browser.” It is instrument the UI once, then let planners and navigators use stable component ids instead of reinventing the page on every run.

How MCPUI solves it

1. Instrument the React app

Wrap interactive regions with @mcpui/react McpuiCapture components. On each render (and when you call useMcpuiRefresh), the client:

Measures viewport-relative bounds for each captured control
Builds a PageSpec (JSON) and /layout.md (markdown summary)
Exposes them via the Vite dev plugin or your production static route

Agents read a short, stable document — not ten thousand DOM nodes.

2. Plan once, then observe → act

The browser agent (CLI or MCP accomplish_task) runs a headed Chromium session:

Navigate to the start URL and load layout markdown (window.__MCPUI__.getSpec(), /layout.md, or local MCPUI.md).
Planner (LLM): task + initial layout (+ optional site workflows) → structured TaskPlan.
Loop: refresh layout → Navigator (LLM) picks one validated action → Playwright executes via component centroids (real mouse/keyboard).
Stop when the navigator marks the task done or max_steps is reached.

Actions reference component ids from the layout doc. The dispatcher rejects unknown or invisible/disabled components before clicking.

3. Improve over time with gbrain (optional)

When gbrain is enabled, successful runs are captured as site workflows under mcpui-tool-server/sites/<host>/workflows/, indexed into gbrain, and retrieved on future tasks:

Capture — After a successful run, write or update a workflow markdown file (steps, success criteria, action history) and sync to gbrain.
Retrieve — The planner searches gbrain for similar past tasks and injects hints (adapt to current layout, do not blindly replay).
MCP — search_site_knowledge exposes the same index to any connected agent.

The agent gets cheaper and more reliable on repeat intents without hard-coding selectors.

Architecture

flowchart TB
  subgraph app ["Your React app (@mcpui/react + @mcpui/vite)"]
    UI[UI components]
    Cap[McpuiCapture + PageSpec]
    LM["/layout.md + /mcpui.json"]
    UI --> Cap --> LM
  end

  subgraph agent ["mcpui-tool-server"]
    PW[Playwright Chromium]
    Parse[MCPUI parser]
    PL[Planner LLM]
    NV[Navigator LLM]
    AD[Action dispatch]
    PW --> Parse
    Parse --> PL
    Parse --> NV
    PL --> NV
    NV --> AD --> PW
  end

  subgraph knowledge ["Site knowledge (optional)"]
    WF[sites/ host /workflows/*.md]
    GB[gbrain index]
    WF --> GB
    GB -.->|search hints| PL
    AD -.->|capture success| WF
  end

  subgraph mcp ["MCP host e.g. Cursor"]
    AT[accomplish_task]
    SK[search_site_knowledge]
    AT --> agent
    SK --> GB
  end

  LM <-->|refresh each step| PW

Packages in this repo

Path	Role
`[mcpui-client-packages/](mcpui-client-packages/)`	npm: `@mcpui/spec`, `@mcpui/react`, `@mcpui/vite`
`[mcpui-tool-server/](mcpui-tool-server/)`	Python: browser agent CLI, MCP server, gbrain integration

Supported surface (today)

React only for instrumentation. Vue, Svelte, and plain HTML apps are not supported by the client packages yet. The agent can still run against any URL with --no-mock disabled only if you serve /layout.md yourself.

Component kinds (`McpuiCapture` `kind`)

Kind	Typical use
`page`	Screen root (one per view); drives overview text
`label`	Headings, static text regions
`input`	Text fields, text areas
`action`	Buttons, links treated as actions
`selector`	Dropdowns, radios, segmented controls
`tooltip`	Secondary hints (when captured)

Each captured control needs a stable testid and human label. Bounds are computed in viewport client coordinates (1280×720 default in the agent).

Agent actions

Action	Description
`click`	Click component by id (centroid)
`type`	Type text into an input component
`hover`	Hover component centroid
`press`	Keyboard key (e.g. `Enter`, `Tab`)
`navigate`	Go to URL (respects site domain policy in MCP mode)
`click_at`	Fallback pixel click in viewport
`screenshot`	Save PNG
`back` / `forward` / `reload`	Browser navigation

Without a real MCPUI embed, the agent can fall back to a mock layout generated from live DOM (lower quality; stderr warns). Use --no-mock in production-style runs once /layout.md is served.

Set it up for your app

1. Instrument your React app

cd mcpui-client-packages && pnpm install && pnpm build

In your app:

pnpm add @mcpui/react @mcpui/spec @mcpui/vite

// main.tsx
import { Mcpui } from "@mcpui/react";

createRoot(document.getElementById("root")!).render(
  <Mcpui.Provider>
    <App />
  </Mcpui.Provider>,
);

import { Mcpui, useMcpuiRefresh } from "@mcpui/react";

function CheckoutPage() {
  useMcpuiRefresh([cart]); // re-capture when state changes
  return (
    <Mcpui.Capture testid="page-checkout" kind="page" label="Checkout">
      <Mcpui.Capture testid="email" kind="input" label="Email" asChild>
        <input type="email" />
      </Mcpui.Capture>
      <Mcpui.Capture testid="pay-btn" kind="action" label="Pay now" asChild>
        <button type="submit">Pay</button>
      </Mcpui.Capture>
    </Mcpui.Capture>
  );
}

// vite.config.ts
import { mcpuiDevPlugin } from "@mcpui/vite";

export default defineConfig({
  plugins: [react(), mcpuiDevPlugin()],
});

Serve /layout.md in production (the Vite plugin does this in dev). See mcpui-client-packages/README.md.

Reference app: shopping-list example (source).

2. Run the browser agent locally

cd mcpui-tool-server
cp .env.example .env   # OPENAI_API_KEY=...
make install
make playwright

Against your dev server or deployed URL:

poetry run browser-agent run "Add milk to the list" https://your-app.example/list --keep-open

Fixture demo (no React required):

make serve-fixture   # terminal 1
make demo            # terminal 2

Full CLI flags and pacing: mcpui-tool-server/README.md.

3. Expose tasks via MCP (one hostname per server)

# .env
MCPUI_SITE_HOST=shop.example.com
MCPUI_SITE_ORIGIN=https://shop.example.com
OPENAI_API_KEY=sk-...

make mcp-server

Cursor / Claude Desktop config (paths adjusted to your machine):

{
  "mcpServers": {
    "mcpui-shop": {
      "command": "/path/to/mcpui-tool-server/.venv/bin/python",
      "args": ["-m", "mcpui_tool_server.server"],
      "env": {
        "MCPUI_SITE_HOST": "shop.example.com",
        "MCPUI_SITE_ORIGIN": "https://shop.example.com",
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

MCP tool	Purpose
`accomplish_task`	Run a natural-language task on the bound site
`get_site_info`	Host, origin, gbrain metadata
`search_site_knowledge`	Query captured workflows (requires gbrain)

Use a separate MCP server entry per hostname — navigation to other hosts is rejected.

4. Enable gbrain learning (optional)

Install gbrain, then in .env:

MCPUI_GBRAIN_ENABLED=1
MCPUI_GBRAIN_CAPTURE=1

Curate mcpui-tool-server/sites/<your-host>/capabilities.md, import, and register:

make gbrain-register-site SITE_HOST=shop.example.com
make gbrain-import-site SITE_HOST=shop.example.com

Successful accomplish_task / CLI runs append or create sites/<host>/workflows/*.md and sync to gbrain for future planner hints.

Development

# Client packages
cd mcpui-client-packages && pnpm install && pnpm build && pnpm test

# Tool server
cd mcpui-tool-server && make install && make test

Published npm packages: [@mcpui/spec](https://www.npmjs.com/package/@mcpui/spec), [@mcpui/react](https://www.npmjs.com/package/@mcpui/react), [@mcpui/vite](https://www.npmjs.com/package/@mcpui/vite).

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.cursor		.cursor
.github/workflows		.github/workflows
.vscode		.vscode
mcpui-client-packages		mcpui-client-packages
mcpui-tool-server		mcpui-tool-server
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MCPUI

The problem

Browser agents struggle on today’s web

Why a browser agent is still necessary (even with APIs and MCP)

How MCPUI solves it

1. Instrument the React app

2. Plan once, then observe → act

3. Improve over time with gbrain (optional)

Architecture

Supported surface (today)

Component kinds (`McpuiCapture` `kind`)

Agent actions

Set it up for your app

1. Instrument your React app

2. Run the browser agent locally

3. Expose tasks via MCP (one hostname per server)

4. Enable gbrain learning (optional)

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MCPUI

The problem

Browser agents struggle on today’s web

Why a browser agent is still necessary (even with APIs and MCP)

How MCPUI solves it

1. Instrument the React app

2. Plan once, then observe → act

3. Improve over time with gbrain (optional)

Architecture

Supported surface (today)

Component kinds (McpuiCapture kind)

Agent actions

Set it up for your app

1. Instrument your React app

2. Run the browser agent locally

3. Expose tasks via MCP (one hostname per server)

4. Enable gbrain learning (optional)

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Component kinds (`McpuiCapture` `kind`)

Packages