MacWright is an MCP server for reliable native macOS desktop control from AI agents. It exposes 72 tools for screenshots, mouse, keyboard, scroll, clipboard, window management, native UI automation via the Accessibility API, Safari browser automation, AppleScript, shell commands, and visual UI parsing.
Use it when you want Hermes Agent, Claude Desktop, or another MCP client to operate a Mac like a careful power user: read the UI, choose the most semantic control path, act, and verify the result.
Macwright.Demo.mp4
The demo shows MacWright control a real macOS desktop through MCP.
The demo shows the core workflow MacWright is built for:
- Inspecting the live screen and native app state.
- Choosing actions through semantic UI tools before falling back to coordinates.
- Driving mouse, keyboard, window, and app interactions on macOS.
- Verifying changes after actions instead of assuming the desktop responded.
- macOS (tested on Apple Silicon)
- Node.js >= 18
- cliclick (
brew install cliclick) - Optional: OmniParser server on port 8650 for
screen_parse
MacWright is macOS-only desktop automation. It depends on macOS Accessibility, Screen Recording, AppleScript/JXA, Safari Apple Events, screencapture, pbcopy/pbpaste, and cliclick.
Linux and Windows are not supported targets for the native desktop-control tools. The schema tests may run on other platforms, but real mouse, keyboard, screenshot, window, Safari, AppleScript, and Accessibility behavior requires a configured Mac.
npm test runs portable MCP schema validation. Native desktop smoke coverage is explicit:
npm run test:macos-smokePrefer the highest-level reliable interface before falling back to coordinates:
- Safari/web pages: use
page_snapshot,find_element,click_element,fill_form, and wait tools. - Native apps with Accessibility support: use
read_ui, thenax_clickorax_action. - Visual fallback: use
screenshotorscreen_parse, then click/type with verification. - Raw input: use mouse, keyboard, and clipboard tools only when semantic tools are unavailable.
- Always verify: use
wait_for_ui,wait_for_change,screenshot,screenshotAfterClick, orverifyChangeafter important actions.
Coordinates passed to mouse tools are logical macOS screen coordinates. If you are clicking based on a downscaled screenshot returned by MacWright, pass screenshotCoords: true so MacWright scales them back to logical screen coordinates.
git clone https://github.com/ruchit-p/macwright.git
cd macwright
npm run setupnpm run setup handles the local bootstrap: installs cliclick via Homebrew when needed, installs npm dependencies, builds the project, checks macOS permissions, opens System Settings if approvals are missing, and prints ready-to-paste MCP config for Claude Desktop, Hermes Agent, and other stdio MCP clients.
After granting Accessibility or Screen Recording permissions, restart the app that launches MacWright, then run:
./setup.sh --doctornpm install && npm run build
brew install cliclickThen grant Accessibility permission to your terminal in System Settings > Privacy & Security > Accessibility.
Add MacWright to ~/.hermes/config.yaml and restart Hermes:
mcp_servers:
macwright:
command: "node"
args: ["/absolute/path/to/macwright/dist/index.js"]
timeout: 120
connect_timeout: 30Then ask Hermes to use the mcp_macwright_* tools. On macOS, grant Accessibility and Screen Recording permissions to the app or service that launches Hermes/MacWright, then restart that app after changing permissions.
npm run setup prints a ready-to-copy MCP config with the absolute dist/index.js path.
Paste into ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"macwright": {
"command": "node",
"args": ["/absolute/path/to/macwright/dist/index.js"]
}
}
}npm run setup prints this with the correct absolute path filled in.
macOS privacy permissions are enforced by TCC and cannot be bypassed silently by scripts. Grant permissions to the controlling app that launches MacWright, for example Hermes runner, Claude Desktop, Terminal, iTerm, or Ghostty. Restart that app after changing permissions.
| Permission | Needed for | How to grant |
|---|---|---|
| Accessibility | mouse, keyboard, scroll, drag, read_ui, ax_click, window focus/resize, System Events UI reads |
System Settings > Privacy & Security > Accessibility |
| Screen Recording | screenshot, screen_parse, wait_for_change, visual verification |
System Settings > Privacy & Security > Screen Recording |
| Automation prompts | AppleScript, System Events, app-specific control, Safari automation | macOS prompts when first used; approve the controlling app |
| Input Monitoring | may be requested for keyboard/input events on some systems | System Settings > Privacy & Security > Input Monitoring |
| Safari JavaScript from Apple Events | safari_js and DOM automation helpers |
Safari > Settings > Advanced > enable developer features, then Safari > Develop > Allow JavaScript from Apple Events |
Run ./setup.sh --doctor to check the current controlling app. Run ./setup.sh --open-settings to open the relevant Privacy panes.
MacWright can control your desktop and includes powerful tools such as run_shell and run_applescript. Treat it like local admin automation. Do not expose the stdio server over an unauthenticated network bridge, and only connect it to MCP clients you trust. See SECURITY.md.
screenshot — Capture the screen or a region. Returns JPEG by default (131KB vs 11.8MB for PNG).
{ "format": "jpg", "maxWidth": 1280 }
{ "format": "png" }
{ "x": 100, "y": 100, "width": 800, "height": 600 }Avg: 431ms (JPEG 1280px) | 4110ms (PNG full)
screen_parse — Use OmniParser V2 to detect all UI elements on screen via YOLO + OCR + Florence-2. Returns structured elements with labels, types, and screen coordinates ready for clicking. Use for native app automation or anti-bot websites where DOM tools can't work. Requires OmniParser server on port 8650.
{}
{ "x": 100, "y": 100, "width": 800, "height": 600 }
{ "timeout": 90000 }Returns elements with screen-pixel coordinates:
{ "elements": [{ "id": 0, "type": "text", "label": "File", "x": 126, "y": 13 },
{ "id": 34, "type": "icon", "label": "Allow", "x": 924, "y": 375, "interactive": true }],
"total": 220, "textElements": 89, "iconElements": 131, "elapsed": 8.09 }Avg: 8-20s depending on screen complexity
click — Left click at coordinates. Avg: 284ms.
{ "x": 864, "y": 476 }
{ "x": 864, "y": 476, "screenshot": true }double_click — Double-click. Avg: 469ms.
right_click — Right-click (context menu). Avg: 280ms.
triple_click — Triple-click (select line/paragraph).
move_mouse — Move cursor without clicking. Supports smooth movement with steps/duration. Avg: 266ms.
drag — Click-drag between two points. Avg: 362ms.
{ "startX": 100, "startY": 100, "endX": 500, "endY": 300 }get_mouse_position — Returns current cursor position as {"x": N, "y": N}. Avg: 255ms.
hover_and_wait — Move mouse to coordinates or an AX element by label, wait, then screenshot. Great for revealing tooltips.
{ "x": 500, "y": 300, "delay": 1000 }
{ "text": "Wi-Fi", "app": "System Settings" }inspect_element — Inspect the accessibility element at screen coordinates. Returns role, title, value, and available actions.
All mouse tools accept optional wait (ms) and screenshot (bool) params to chain actions. All support screenshotCoords: true to auto-scale coordinates from screenshot pixels to screen pixels.
scroll — Scroll at screen coordinates using CGEvents. Moves mouse to position first, then scrolls. Target app must be frontmost. CGEvents scroll whatever element is under the cursor — use this for custom overflow containers (divs with overflow:auto/scroll). For page-level scrolling in Safari, prefer scroll_page.
{ "x": 500, "y": 500, "amount": 5 }
{ "x": 500, "y": 500, "amount": -5 }
{ "x": 500, "y": 500, "amount": 3, "horizontalAmount": 2 }amount: lines to scroll (positive = down, negative = up)horizontalAmount: optional horizontal scroll (positive = right, negative = left)
Requires Accessibility permission for your terminal app.
type_text — Type a string at the current cursor. Avg: 644ms (AppleScript) / 1204ms (cliclick fallback). Unicode/emoji auto-detected and pasted via clipboard.
{ "text": "Hello, World!" }
{ "text": "search query", "slowly": true }
{ "text": "search term", "submit": true, "wait": 1000 }
{ "text": "hello", "field": "Search", "app": "Finder" }slowly: type one character at a time (50ms delay) — triggers autocomplete/keystroke handlerssubmit: press Enter after typingclear: select all (Cmd+A) before typing — replaces existing contentfield: target a specific input field by its accessibility labelfields: batch mode — fill multiple fields in one call
press_key — Press a key or key combination. Avg: 342–762ms.
{ "key": "a", "modifiers": ["cmd"] }
{ "key": "return" }
{ "key": "tab" }Named keys: return, tab, esc, delete, space, arrow-up/down/left/right, f1–f16
Modifiers: cmd, ctrl, alt, shift
get_clipboard — Read clipboard text. Supports text, HTML, image, and file formats. Avg: 205ms.
set_clipboard — Write text to clipboard. Avg: 202ms.
{ "text": "https://example.com" }get_screen_size — Returns screen dimensions, display info, dock position, dark mode status. Avg: 509ms.
get_frontmost_app — Returns frontmost app name, window title, bounds, menus, and focused element. Avg: 340ms.
list_windows — Lists all visible apps with names, titles, and window bounds. Avg: 375ms.
open_app — Launch or activate an app by name. Supports opening files/URLs, waiting for windows. Avg: 236ms.
{ "name": "Safari" }
{ "name": "TextEdit", "url": "/path/to/file.txt" }focus_window — Bring an app's window to front. Supports title filter and bounds. Avg: 320ms.
{ "app": "Finder" }resize_window — Resize and/or move an application window. Supports snapping to halves, thirds, quarters.
{ "app": "Safari", "x": 100, "y": 100, "width": 1200, "height": 800 }
{ "app": "Safari", "half": "left" }
{ "app": "Safari", "maximize": true }close_window — Close an app's window. Supports title filter, close all, and save options.
quit_app — Quit an application. Supports force quit.
These tools interact with native macOS applications using the Accessibility API — no screenshots or coordinates needed.
read_ui — Read all UI elements (buttons, inputs, checkboxes, menus, etc.) from a native app. Returns labels, roles, coordinates, and state.
{ "app": "System Settings" }
{ "app": "Finder", "role": "button" }
{ "app": "System Settings", "text": "Wi-Fi", "exact": true }
{ "app": "Notes", "interactiveOnly": true }ax_click — Click a native UI element by its text label. Auto-retries until found.
{ "text": "General", "app": "System Settings" }
{ "text": "Allow", "app": "Safari", "timeout": 5000 }ax_action — Perform accessibility actions: press, confirm, showMenu, increment, decrement, setValue, pick (for popups/dropdowns), focus, getValue.
{ "text": "Dark Mode", "action": "press", "app": "System Settings" }
{ "text": "Font Size", "action": "pick", "value": "14", "app": "TextEdit" }
{ "text": "Volume", "action": "getValue", "app": "System Settings" }ax_drag — Drag from one AX element to another by label.
ax_read_table — Read native table/outline data. Supports row search, click/double-click on matched rows.
wait_for_ui — Poll the accessibility tree until an element appears (or disappears with gone: true).
{ "text": "Connected", "app": "System Settings", "timeout": 10000 }
{ "text": "Loading", "app": "Safari", "gone": true }get_selected_text — Read selected text from any app via AXSelectedText.
click_menu_item — Click a menu bar item by path, list shortcuts, or execute keyboard shortcuts.
{ "menuPath": ["File", "Save"], "app": "TextEdit" }
{ "menuPath": ["Edit"], "list": true }dismiss_sheet — Detect and dismiss modal sheets/dialogs.
open_spotlight — Open Spotlight, type a query, and launch apps.
navigate_file_dialog — Navigate macOS file open/save dialogs via Cmd+Shift+G.
navigate_system_pref — Open and navigate to a System Settings section.
safari_url — Navigate Safari to a URL or get the current URL. No Accessibility needed. Avg: 331–372ms. Supports waitUntil: 'load' or 'domcontentloaded' for reliable page load waiting (polls readyState, 15s max).
{ "url": "https://example.com", "waitUntil": "load", "screenshot": true }
{ "url": "https://example.com", "wait": 1000, "screenshot": true }
{}safari_js — Execute JavaScript in the current Safari tab. Requires "Allow JavaScript from Apple Events" in Safari Developer settings. Multi-line code with return is auto-wrapped in an IIFE. Objects/arrays auto-serialized as JSON. Avg: ~350ms.
{ "code": "document.title" }
{ "code": "window.scrollY" }
{ "code": "(function(){ const x = 42; return x; })()" }
{ "code": "document.body.textContent", "iframe": "#editor_iframe" }iframe: CSS selector of a same-origin iframe to execute code inside
safari_navigate_back — Go back in Safari history (equivalent to pressing the back button).
safari_navigate_forward — Go forward in Safari history (Cmd+]).
safari_tabs — Manage Safari tabs: list, open, close, or switch.
{ "action": "list" }
{ "action": "new", "url": "https://example.com" }
{ "action": "select", "index": 2 }
{ "action": "close" }safari_reload — Reload the current Safari page. Equivalent to Cmd+R.
scroll_page — Scroll the current Safari page. Focus-free — works without making Safari frontmost. Returns new scroll position. Prefer over scroll for web page content.
{ "y": 500 }
{ "x": 300, "y": 0 }
{ "y": 0, "absolute": true }absolute: usewindow.scrollTo()instead ofscrollBy()— scroll to exact position
find_element — Resolve a CSS selector to screen coordinates using the viewport formula. Returns center point ready for click().
{ "selector": "[name='custname']" }
{ "selector": "button.submit", "all": true }
{ "selector": "a", "href": "/issues" }click_element — Click a DOM element by CSS selector in one step (scrolls into view, then clicks). Supports auto-waiting.
{ "selector": "a.btn-primary", "wait": 1000 }
{ "selector": "button", "doubleClick": true }
{ "selector": "a[data-hovercard-type='issue']", "jsClick": true }
{ "selector": "#finish", "timeout": 10000 }timeout: auto-wait for element to appear before clicking (polls every 200ms)doubleClick: double-click instead of single clickmodifiers: hold modifier keys (Alt,Control,Meta,Shift)jsClick: use DOM.click()instead of native mouse — needed for SPAs (GitHub, React apps) where native clicks don't trigger JS navigationhref: filter by URL pattern (substring match on href attribute)
click_text — Click a visible element by its text content. No CSS selector needed. Supports auto-waiting.
{ "text": "Submit" }
{ "text": "Sign in", "elementType": "button" }
{ "text": "Exact match", "exact": true }
{ "text": "Issues", "jsClick": true }hover_element — Move mouse over a DOM element without clicking. Triggers CSS :hover states, dropdown menus, tooltips.
wait_for_element — Wait until a CSS selector appears in the DOM. Useful after navigation or AJAX calls.
wait_for_text — Wait until specific text appears (or disappears with gone: true) on the page.
wait_for_url — Wait until the current URL contains (or stops containing with gone:true) a pattern.
wait_for_function — Wait until a JavaScript expression returns truthy. Most flexible wait tool.
get_page_text — Extract clean visible text from the page (strips scripts/styles/nav). Optionally scoped to a CSS selector.
fill_form — Fill multiple form fields at once. Supports text inputs, textareas, checkboxes, radio buttons, and select dropdowns. Triggers input/change events for React/Vue/Angular compatibility.
{
"fields": [
{ "selector": "[name='custname']", "value": "Alice" },
{ "selector": "[name='size']", "value": "large", "type": "select" },
{ "selector": "[name='topping']", "value": "true", "type": "checkbox" }
]
}get_element_attr — Get a property or attribute of a DOM element.
screenshot_element — Take a cropped screenshot of a specific DOM element by CSS selector.
get_page_info — Get the current page's title, URL, and document ready state in one call.
get_form_fields — Discover all form fields on the page with their types, names, and current values.
get_links — Extract all links from the current page with their text and URLs.
is_element_visible — Check whether an element exists AND is visible in the current viewport.
scroll_to_element — Scroll the page until an element is visible using scrollIntoView().
focus_element — Focus a DOM element without clicking (triggers focus event).
page_snapshot — Get a comprehensive snapshot of the current Safari page in one call: URL, title, h1, meta description, visible text preview, link count, form field count, image count, scroll position, viewport size, page height, and heading outline (h1-h3).
scroll_element — Scroll a specific DOM element (overflow container) in Safari.
drag_element — Drag one DOM element to another in Safari using CSS selectors.
get_table_data — Extract data from an HTML table as JSON rows keyed by header text.
run_shell — Execute a shell command and return stdout/stderr/exit code. Much faster than Terminal+clipboard. Homebrew PATH is included (jq, python3, git, node all available).
{ "command": "git log --oneline -10", "cwd": "/path/to/repo" }
{ "command": "jq '.field' file.json" }
{ "command": "python3 -c 'import json; ...'", "timeout": 10000 }Non-zero exit codes are shown in output text but do NOT set isError (grep/diff/test return 1 legitimately). Only timeouts set isError.
run_applescript — Run AppleScript or JavaScript for Automation (JXA). Avg: 228ms.
{ "script": "return \"hello\"" }
{ "script": "Application('Finder').name()", "language": "JavaScript" }Multiline scripts use a temp file approach to avoid escaping issues.
wait — Pause for N milliseconds. ~190ms server overhead.
{ "ms": 1000 }wait_for_change — Take a baseline screenshot and poll until the screen changes. Useful for waiting on animations or loading states.
{ "app": "Safari", "timeout": 10000 }
{ "app": "Finder", "stable": true }stable: wait until the screen stops changing (two consecutive identical screenshots)
send_notification — Show a macOS notification banner. No permissions needed.
{ "title": "Done", "message": "Task complete", "sound": true }MacWright uses logical pixels (not physical pixels). On a Retina display, screen coordinates are typically 1728x952.
Screenshots default to 1280px wide. To pass screenshot coordinates directly to tools, use screenshotCoords: true — this auto-scales from screenshot pixels to screen pixels. No manual math needed.
{ "x": 640, "y": 400, "screenshotCoords": true }When combined with app, uses window-based scaling for higher accuracy.
safari_url({ "url": "https://example.com/form", "wait": 2000 })
page_snapshot() // understand the page
get_form_fields() // discover all form fields
fill_form({ "fields": [
{ "selector": "[name='email']", "value": "user@example.com" },
{ "selector": "[name='plan']", "value": "Pro", "type": "select" },
{ "selector": "[name='agree']", "value": "true", "type": "checkbox" }
]})
click_element({ "selector": "button[type='submit']", "wait": 2000 })
wait_for_url({ "pattern": "/success" })
get_page_text({ "selector": ".confirmation" })
safari_url({ "url": "https://news.ycombinator.com", "wait": 2000 })
safari_js({ "code": "Array.from(document.querySelectorAll('.titleline a')).map(a => ({title: a.textContent, url: a.href}))" })
// Process with shell:
run_shell({ "command": "python3 -c 'import json,sys; data=json.load(sys.stdin); [print(f\"{i+1}. {d[\"title\"]}\") for i,d in enumerate(data)]'", "input": "<output from safari_js>" })
safari_url({ "url": "https://example.com", "wait": 2000 })
click_text({ "text": "Sign in" })
wait_for_element({ "selector": "#login-form" })
fill_form({ "fields": [{ "selector": "#email", "value": "user@test.com" }] })
click_text({ "text": "Continue", "elementType": "button" })
open_app({ "name": "System Settings" })
ax_click({ "text": "Wi-Fi", "app": "System Settings" })
wait_for_ui({ "text": "Wi-Fi", "app": "System Settings" })
read_ui({ "app": "System Settings" })
navigate_system_pref({ "section": "Displays" })
read_ui({ "app": "System Settings", "interactiveOnly": true })
ax_click({ "text": "Resolution", "app": "System Settings" })
| Tool | Avg Time | Notes |
|---|---|---|
| screenshot (JPEG 1280px) | 431ms | ~131KB — recommended default |
| screenshot (PNG full res) | 4110ms | ~11.8MB — use only when needed |
| get_screen_size | 509ms | JXA NSScreen |
| get_frontmost_app | 340ms | AppleScript |
| list_windows | 375ms | AppleScript visible apps |
| get_clipboard | 205ms | pbpaste |
| set_clipboard | 202ms | pbcopy via spawn |
| click | 284ms | cliclick, 10-click avg |
| move_mouse | 266ms | cliclick |
| double_click | 469ms | cliclick |
| right_click | 280ms | cliclick |
| drag | 362ms | cliclick |
| type_text | 329–644ms | AppleScript primary, cliclick fallback |
| press_key | 342–762ms | AppleScript primary, cliclick fallback |
| scroll | 346ms | cliclick + JXA CGEvent |
| open_app | 236ms | /usr/bin/open |
| focus_window | 320ms | AppleScript activate |
| run_applescript | 228ms | osascript |
| get_mouse_position | 255ms | cliclick p: |
| wait | ~190ms overhead | setTimeout + MCP round-trip |
| safari_url | 331–372ms | AppleScript |
| safari_js | ~350ms | Needs Safari Developer setting |
src/
├── index.ts # MCP server entry point
└── tools/
├── screenshot.ts # screenshot, screen_parse
├── screen.ts # get_screen_size
├── mouse.ts # click, double_click, right_click, triple_click, move_mouse, drag, get_mouse_position, hover_and_wait, inspect_element
├── keyboard.ts # type_text, press_key
├── scroll.ts # scroll
├── clipboard.ts # get_clipboard, set_clipboard
├── shell.ts # run_shell
├── applescript.ts # run_applescript
├── notification.ts # send_notification
├── wait.ts # wait, wait_for_change
├── window/
│ ├── management.ts # get_frontmost_app, list_windows, get_screen_size helpers
│ ├── window-actions.ts # open_app, focus_window, resize_window, close_window, quit_app
│ ├── dialog.ts # dismiss_sheet, navigate_file_dialog, open_spotlight
│ ├── system.ts # click_menu_item, navigate_system_pref
│ ├── native-ui.ts # wait_for_ui, ax_click, ax_action, ax_drag
│ └── read-ui.ts # read_ui, ax_read_table, get_selected_text
└── safari/
├── navigation.ts # safari_url, safari_navigate_back/forward, safari_tabs, safari_reload
├── elements.ts # find_element, click_element, click_text, hover_element, focus_element, scroll_to_element
├── waiting.ts # wait_for_element, wait_for_text, wait_for_url, wait_for_function
├── forms.ts # fill_form, get_form_fields
├── page-reading.ts # get_page_text, get_page_info, get_links, get_element_attr, is_element_visible, get_table_data
├── page-snapshot.ts # page_snapshot, screenshot_element
├── scrolling.ts # scroll_page, scroll_element, drag_element
└── tabs.ts # safari_tabs
npm test # portable schema/tool-catalog validation
npm run test:macos-smoke # native desktop smoke suite on a configured MacAll tools verified on Apple Silicon Mac (M-series), screen resolution 1728x952. Safari-specific tools require "Allow JavaScript from Apple Events" in Safari developer settings.
- Fork the repository
- Create a feature branch
- Make your changes in
src/tools/ - Run
npm run buildto compile - Run
npm testto verify the MCP schema/tool catalog - Run
npm run test:macos-smokewhen native desktop behavior changed - Submit a pull request