Skip to content

gerryfp/mcpui

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCPUI

MCPUI makes web UIs legible to LLMs — the same way humans use them: by knowing which buttons to click, not which APIs to call.

Most browser agents today scrape raw DOM or guess from screenshots. That works poorly on real products: layouts are noisy, vision models miss fine structure, and every failed guess burns tokens on trial-and-error paths. When the UI changes, those agents break. When the UI is unfamiliar, they perform worst.

MCPUI is a small instrumentation layer for React apps plus a Python agent and MCP tool server. Your app publishes a stable, agent-oriented page description (/layout.md and PageSpec JSON). The agent plans and acts on component ids and bounds, not CSS selectors or pixels alone.


The problem

Browser agents struggle on today’s web

Pain What happens
Vision is weak LLMs do not reliably “see” UI the way humans do; screenshot-only agents misread controls, spacing, and state.
DOM is hostile Production DOM is huge, dynamic, and full of implementation detail that is irrelevant to the user’s goal.
Infrastructure mismatch Browsers, test runners, and MCP hosts were built for humans or scripts with fixed selectors — not for agents that reason in natural language.
Token waste Agents explore dead-end pathways (wrong element, wrong tab, wrong modal) and pay for every mistake.
Cold start Performance collapses on UIs the agent has never seen before.
Brittleness A renamed class, moved button, or new modal layer breaks selector- or DOM-based automation overnight.

Humans do not navigate apps via internal APIs. They look at the screen, find the control that matches their intent, and click it. MCPUI gives agents that same contract: a concise map of what is on the page and what they can do next.

Why a browser agent is still necessary (even with APIs and MCP)

APIs and MCP tools are essential for structured, fast, permissioned automation — but they do not replace UI agents:

  • Coverage — Many flows only exist in the product UI (onboarding wizards, admin consoles, third-party embeds, feature flags, A/B layouts). There is no stable public API.
  • Human parity — Users accomplish goals through clicks; agents that only speak HTTP cannot verify what the user actually sees (validation messages, disabled states, loading spinners).
  • Integration lag — New screens ship in the frontend long before every action is exposed as an MCP tool.
  • Trust boundary — Some operations should run in a real browser session (cookies, CSRF, SSO redirects, file uploads) rather than as raw API calls from an LLM host.
  • Composability — MCPUI’s accomplish_task tool meets agents where they already work (Cursor, Claude Desktop) while still using the same layout contract as the standalone CLI.

MCPUI is not “APIs vs browser.” It is instrument the UI once, then let planners and navigators use stable component ids instead of reinventing the page on every run.


How MCPUI solves it

1. Instrument the React app

Wrap interactive regions with @mcpui/react McpuiCapture components. On each render (and when you call useMcpuiRefresh), the client:

  • Measures viewport-relative bounds for each captured control
  • Builds a PageSpec (JSON) and /layout.md (markdown summary)
  • Exposes them via the Vite dev plugin or your production static route

Agents read a short, stable document — not ten thousand DOM nodes.

2. Plan once, then observe → act

The browser agent (CLI or MCP accomplish_task) runs a headed Chromium session:

  1. Navigate to the start URL and load layout markdown (window.__MCPUI__.getSpec(), /layout.md, or local MCPUI.md).
  2. Planner (LLM): task + initial layout (+ optional site workflows) → structured TaskPlan.
  3. Loop: refresh layout → Navigator (LLM) picks one validated action → Playwright executes via component centroids (real mouse/keyboard).
  4. Stop when the navigator marks the task done or max_steps is reached.

Actions reference component ids from the layout doc. The dispatcher rejects unknown or invisible/disabled components before clicking.

3. Improve over time with gbrain (optional)

When gbrain is enabled, successful runs are captured as site workflows under mcpui-tool-server/sites/<host>/workflows/, indexed into gbrain, and retrieved on future tasks:

  • Capture — After a successful run, write or update a workflow markdown file (steps, success criteria, action history) and sync to gbrain.
  • Retrieve — The planner searches gbrain for similar past tasks and injects hints (adapt to current layout, do not blindly replay).
  • MCPsearch_site_knowledge exposes the same index to any connected agent.

The agent gets cheaper and more reliable on repeat intents without hard-coding selectors.


Architecture

flowchart TB
  subgraph app ["Your React app (@mcpui/react + @mcpui/vite)"]
    UI[UI components]
    Cap[McpuiCapture + PageSpec]
    LM["/layout.md + /mcpui.json"]
    UI --> Cap --> LM
  end

  subgraph agent ["mcpui-tool-server"]
    PW[Playwright Chromium]
    Parse[MCPUI parser]
    PL[Planner LLM]
    NV[Navigator LLM]
    AD[Action dispatch]
    PW --> Parse
    Parse --> PL
    Parse --> NV
    PL --> NV
    NV --> AD --> PW
  end

  subgraph knowledge ["Site knowledge (optional)"]
    WF[sites/ host /workflows/*.md]
    GB[gbrain index]
    WF --> GB
    GB -.->|search hints| PL
    AD -.->|capture success| WF
  end

  subgraph mcp ["MCP host e.g. Cursor"]
    AT[accomplish_task]
    SK[search_site_knowledge]
    AT --> agent
    SK --> GB
  end

  LM <-->|refresh each step| PW
Loading

Packages in this repo

Path Role
[mcpui-client-packages/](mcpui-client-packages/) npm: @mcpui/spec, @mcpui/react, @mcpui/vite
[mcpui-tool-server/](mcpui-tool-server/) Python: browser agent CLI, MCP server, gbrain integration

Supported surface (today)

React only for instrumentation. Vue, Svelte, and plain HTML apps are not supported by the client packages yet. The agent can still run against any URL with --no-mock disabled only if you serve /layout.md yourself.

Component kinds (McpuiCapture kind)

Kind Typical use
page Screen root (one per view); drives overview text
label Headings, static text regions
input Text fields, text areas
action Buttons, links treated as actions
selector Dropdowns, radios, segmented controls
tooltip Secondary hints (when captured)

Each captured control needs a stable testid and human label. Bounds are computed in viewport client coordinates (1280×720 default in the agent).

Agent actions

Action Description
click Click component by id (centroid)
type Type text into an input component
hover Hover component centroid
press Keyboard key (e.g. Enter, Tab)
navigate Go to URL (respects site domain policy in MCP mode)
click_at Fallback pixel click in viewport
screenshot Save PNG
back / forward / reload Browser navigation

Without a real MCPUI embed, the agent can fall back to a mock layout generated from live DOM (lower quality; stderr warns). Use --no-mock in production-style runs once /layout.md is served.


Set it up for your app

1. Instrument your React app

cd mcpui-client-packages && pnpm install && pnpm build

In your app:

pnpm add @mcpui/react @mcpui/spec @mcpui/vite
// main.tsx
import { Mcpui } from "@mcpui/react";

createRoot(document.getElementById("root")!).render(
  <Mcpui.Provider>
    <App />
  </Mcpui.Provider>,
);
import { Mcpui, useMcpuiRefresh } from "@mcpui/react";

function CheckoutPage() {
  useMcpuiRefresh([cart]); // re-capture when state changes
  return (
    <Mcpui.Capture testid="page-checkout" kind="page" label="Checkout">
      <Mcpui.Capture testid="email" kind="input" label="Email" asChild>
        <input type="email" />
      </Mcpui.Capture>
      <Mcpui.Capture testid="pay-btn" kind="action" label="Pay now" asChild>
        <button type="submit">Pay</button>
      </Mcpui.Capture>
    </Mcpui.Capture>
  );
}
// vite.config.ts
import { mcpuiDevPlugin } from "@mcpui/vite";

export default defineConfig({
  plugins: [react(), mcpuiDevPlugin()],
});

Serve /layout.md in production (the Vite plugin does this in dev). See mcpui-client-packages/README.md.

Reference app: shopping-list example (source).

2. Run the browser agent locally

cd mcpui-tool-server
cp .env.example .env   # OPENAI_API_KEY=...
make install
make playwright

Against your dev server or deployed URL:

poetry run browser-agent run "Add milk to the list" https://your-app.example/list --keep-open

Fixture demo (no React required):

make serve-fixture   # terminal 1
make demo            # terminal 2

Full CLI flags and pacing: mcpui-tool-server/README.md.

3. Expose tasks via MCP (one hostname per server)

# .env
MCPUI_SITE_HOST=shop.example.com
MCPUI_SITE_ORIGIN=https://shop.example.com
OPENAI_API_KEY=sk-...
make mcp-server

Cursor / Claude Desktop config (paths adjusted to your machine):

{
  "mcpServers": {
    "mcpui-shop": {
      "command": "/path/to/mcpui-tool-server/.venv/bin/python",
      "args": ["-m", "mcpui_tool_server.server"],
      "env": {
        "MCPUI_SITE_HOST": "shop.example.com",
        "MCPUI_SITE_ORIGIN": "https://shop.example.com",
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}
MCP tool Purpose
accomplish_task Run a natural-language task on the bound site
get_site_info Host, origin, gbrain metadata
search_site_knowledge Query captured workflows (requires gbrain)

Use a separate MCP server entry per hostname — navigation to other hosts is rejected.

4. Enable gbrain learning (optional)

Install gbrain, then in .env:

MCPUI_GBRAIN_ENABLED=1
MCPUI_GBRAIN_CAPTURE=1

Curate mcpui-tool-server/sites/<your-host>/capabilities.md, import, and register:

make gbrain-register-site SITE_HOST=shop.example.com
make gbrain-import-site SITE_HOST=shop.example.com

Successful accomplish_task / CLI runs append or create sites/<host>/workflows/*.md and sync to gbrain for future planner hints.


Development

# Client packages
cd mcpui-client-packages && pnpm install && pnpm build && pnpm test

# Tool server
cd mcpui-tool-server && make install && make test

Published npm packages: [@mcpui/spec](https://www.npmjs.com/package/@mcpui/spec), [@mcpui/react](https://www.npmjs.com/package/@mcpui/react), [@mcpui/vite](https://www.npmjs.com/package/@mcpui/vite).

License

See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors