mac-cua

The computer use agent that doesn't take over your computer.
An open-source MCP server for macOS that lets AI control desktop apps
in the background — without ever touching your mouse or stealing your focus.

Quickstart · Background-First · Tools · Setup · Architecture

Demo

Task: Search Google for the population of Tokyo, calculate what percentage that is of the world population (8.1 billion), write the result in Notes, then open the Wikipedia page for Tokyo in Safari and find the mayor's name. Only CUA tools — no other tools allowed.

Result from Claude:

Searched Google for "population of Tokyo" — found approximately 14.1–14.2 million residents

Calculated the percentage: 14.15 million / 8.1 billion = 0.17% of the world's population

Wrote the result in Notes — created a new note titled "Tokyo Population vs World Population"

Opened the Wikipedia page for Tokyo in Safari and found that the leader of Tokyo is Governor Yuriko Koike (Tokyo has a Governor rather than a mayor, since it's officially the Tokyo Metropolitan Prefecture)

Most computer use agents take over your screen. They grab your cursor, steal window focus, and lock you out while they work. You sit and watch.

mac-cua works differently. It sends input events directly to target processes using CGEventPostToPid — a macOS API that delivers clicks, keystrokes, and gestures to a specific app without moving your cursor or activating any window. The AI works in the background. You keep working in the foreground. At the same time. On the same machine.

OpenAI and Perplexity both shipped computer use agents this week — locked to their platforms, behind paywalls. mac-cua is the same capability as an open MCP tool. Plug it into Claude Code, Cursor, Codex, or any MCP client. Free, open source, Apache 2.0.

Background-First

This is the core idea behind mac-cua, and it influences every design decision.

  Traditional computer use agent:            mac-cua:

  +----------------------------------+       +----------------------------------+
  |  YOUR SCREEN                     |       |  YOUR SCREEN                     |
  |                                  |       |                                  |
  |  +----------------------------+  |       |  +----------------------------+  |
  |  |                            |  |       |  |                            |  |
  |  |  [Agent controls this]     |  |       |  |  You're working here.      |  |
  |  |  You're locked out.        |  |       |  |  Writing code, browsing,   |  |
  |  |  Cursor hijacked.          |  |       |  |  whatever you want.        |  |
  |  |  Focus stolen.             |  |       |  |                            |  |
  |  |  Don't touch anything.     |  |       |  |  Your cursor. Your focus.  |  |
  |  |                            |  |       |  |                            |  |
  |  +----------------------------+  |       |  +----------------------------+  |
  |                                  |       |                                  |
  |  Cursor: [Agent's]               |       |  Meanwhile, in the background:   |
  |  Focus:  [Agent's]               |       |  mac-cua clicks, types, scrolls  |
  |  You:    Watching.               |       |  in Safari, Music, Finder...     |
  +----------------------------------+       +----------------------------------+

How it stays invisible

What	How
Mouse clicks	`CGEventPostToPid` sends click events to the target PID. Your cursor doesn't move.
Keyboard input	Key events are posted to the target process, not the global event stream.
Window focus	Mac-cua reads window state without activating windows. Temporary activation happens only when strictly required (e.g., key-window targeting) and is immediately released.
Screenshots	GPU-accelerated `ScreenCaptureKit` captures specific windows by ID — works even if the window is behind other windows.
AX tree reads	Accessibility API queries are read-only and non-intrusive. They don't trigger any visual changes.

A note on focus

Most operations are fully invisible, but a few macOS APIs have limitations that may cause a brief, momentary focus flash:

Launching an app — macOS activates apps when they start; mac-cua yields focus back immediately
Scroll events — some apps require momentary focus to receive scroll input
Key-window targeting — certain actions need the window to be key window briefly

These flashes are sub-second and mac-cua restores your previous focus automatically. The vast majority of interactions — clicks, typing, value setting, screenshots, tree reads — are completely invisible.

What this means in practice

You can browse the web while mac-cua fills out a form in another app
You can write code while mac-cua navigates System Settings to change a preference
You can be in a video call while mac-cua organizes files in Finder
The agent never interrupts you. If a conflict arises, you win — mac-cua detects user interruption and backs off

Why mac-cua?

	Codex CUA	Perplexity Computer	mac-cua
Cost	$20–200/mo (ChatGPT tier)	$200/mo (Max only)	Free
Source	Closed	Closed	Open (Apache 2.0)
LLM	GPT only	Perplexity-routed	Any model
Protocol	Proprietary (in-app)	Proprietary (in-app)	MCP (open standard)
Integration	Codex app only	Perplexity app only	Claude Code, Cursor, VS Code, Codex, Zed, any MCP client
Background mode	Yes (virtual cursor)	Unknown	Yes (CGEventPostToPid)
Accessibility API	Yes (AX tree + screenshots)	Screenshots + AppleScript	Yes (AX tree + screenshots)
Platform	macOS only	macOS only	macOS
Availability	Not in EU/UK/CH	Waitlist (Max subscribers)	Everyone, everywhere

Quickstart

Prerequisites

macOS 13+ (Ventura or later)
Python 3.13+
uv package manager

Install

git clone https://github.com/hyprcat/mac-cua.git
cd mac-cua
uv sync

Run

uv run python main.py

On first launch, macOS will prompt for two permissions:

Permission	Why
Accessibility	Read UI element trees and perform actions on elements
Screen Recording	Capture window screenshots without activating windows

Grant both, and the MCP server starts on stdio — ready for your AI tool to connect.

Setup Your AI Tool

mac-cua is a standard MCP stdio server. It works with any tool that supports the Model Context Protocol — no plugins, no extensions, just config.

Note: Replace /path/to/mac-cua with the actual path where you cloned the repo.

Claude Code

Option A — CLI command (recommended):

claude mcp add mac-cua -- uv run --directory /path/to/mac-cua python main.py

Option B — Manual config in ~/.claude.json or project .mcp.json:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Restart Claude Desktop after saving.

Cursor

Option A — Project-level: Create .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Option B — Global: Create ~/.cursor/mcp.json with the same content.

VS Code (GitHub Copilot)

Create .vscode/mcp.json in your project root:

{
  "servers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Requires the GitHub Copilot extension with MCP support enabled.

Windsurf

Open Windsurf Settings > MCP and add:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Or edit ~/.codeium/windsurf/mcp_config.json directly.

Codex (OpenAI CLI)

Create or edit ~/.codex/config.json:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Amp

Create .amp/mcp.json in your project root (or ~/.amp/mcp.json globally):

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Zed

Add to your Zed settings.json (Zed > Settings > Open Settings):

{
  "context_servers": {
    "mac-cua": {
      "command": {
        "path": "uv",
        "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
      }
    }
  }
}

Cline (VS Code Extension)

Open Cline settings in VS Code, navigate to MCP Servers, and add:

{
  "mcpServers": {
    "mac-cua": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mac-cua", "python", "main.py"]
    }
  }
}

Any other MCP client

mac-cua is a standard MCP stdio server. Point your client at:

Command:  uv
Args:     run --directory /path/to/mac-cua python main.py
Protocol: stdio

No API keys, no accounts, no network calls. It runs locally on your Mac.

How It Works

mac-cua reads apps through two complementary channels and acts through background-targeted input:

                        +-----------------------+
                        |      LLM Client       |
                        |  (Claude, GPT, etc.)  |
                        +-----------+-----------+
                                    |
                              MCP (stdio)
                                    |
                        +-----------+-----------+
                        |     mac-cua Server    |
                        +-----------+-----------+
                                    |
                    +---------------+---------------+
                    |                               |
          +---------+---------+           +---------+---------+
          |   Accessibility   |           |    Screenshots    |
          |    API (AXTree)   |           | (ScreenCaptureKit)|
          +---------+---------+           +---------+---------+
                    |                               |
          Structured element              Visual pixel-level
          tree with roles,                context via GPU-
          states, actions                 accelerated window
          (read-only, non-               capture (works even
           intrusive)                     behind other windows)
                    |                               |
                    +---------------+---------------+
                                    |
                        +-----------+-----------+
                        |   Background Input    |
                        |   CGEventPostToPid    |
                        |                       |
                        |  Your cursor: unmoved |
                        |  Your focus: untouched|
                        +-----------------------+

Every tool call returns a fresh snapshot — the accessibility tree and a screenshot together — so the LLM always sees the current state before deciding what to do next.

Tools

9 MCP tools that cover the full range of desktop interaction — all operating in the background.

Discovery

Tool	Description
`list_apps`	List running and recently-used apps with bundle IDs and usage stats
`get_app_state`	Capture a window's accessibility tree + screenshot.Called each turn before interaction.

Interaction

Tool	Description
`click`	Click by element index or pixel coordinates. Supports double-click, right-click. All clicks are background-targeted.
`type_text`	Type literal text via background keyboard input— keys go to the target process, not your focused app
`press_key`	Send key combos inxdotool syntax (`super+c`, `Return`, `Tab`) to a specific process
`set_value`	Directly set an accessibility element's value— no focus or typing needed
`scroll`	Scroll a specific element by direction and page count
`drag`	Drag between two pixel coordinates
`perform_secondary_action`	Invoke non-primary AX actions (expand, collapse, zoom, raise)

Reliability Hierarchy

When multiple tools could accomplish the same thing, prefer them in this order:

  Most reliable                                          Least reliable
  +-------------------+------------------+-----------------+------------------+
  | AX secondary      | set_value        | click by        | click by         |
  | action            |                  | element         | coordinates      |
  +-------------------+------------------+-----------------+------------------+

Example Workflow

Here's what a typical interaction looks like. Notice: every step happens in the background.

# 1. Discover what's running
list_apps()
# => Safari (running), Music (running), Finder (running), ...

# 2. Get the current state of Safari (screenshot + AX tree)
get_app_state(app="Safari")
# => You don't even see Safari activate. mac-cua reads it silently.

# 3. Click the URL bar (element 12 from the tree)
click(app="Safari", element_index="12")
# => Click delivered to Safari's process. Your cursor didn't move.

# 4. Set the URL
set_value(app="Safari", element_index="12", value="https://example.com")
# => Value set directly via AX API. No typing animation. No focus change.

# 5. Press Enter
press_key(app="Safari", key="Return")
# => Key event sent to Safari. You didn't feel a thing.

# 6. Verify it worked
get_app_state(app="Safari")
# => Fresh screenshot shows the page loaded. All in the background.

Architecture

Three clean layers. No framework magic.

Layer 1 ─ MCP Protocol         app/server.py         Thin. Validates, delegates, formats.
Layer 2 ─ Session Manager       app/session.py        Per-app lifecycle, snapshots, recovery.
Layer 3 ─ Platform Backend      app/_lib/             One module per macOS subsystem.

Platform Backend Modules

Module	Responsibility
`accessibility.py`	AX tree walking, batch attribute reads, element actions
`screenshot.py`	`CGWindowListCreateImage`, window ID resolution
`screen_capture.py`	GPU-accelerated `ScreenCaptureKit` capture
`input.py`	`CGEventPostToPid` — background mouse, keyboard, typing
`apps.py`	`NSWorkspace` app discovery, launch, PID/AX caching
`focus.py`	Focus tracking, user interruption detection, conflict resolution
`virtual_cursor.py`	Background cursor, input strategy, app-type detection
`selection.py`	Text selection extraction and formatting
`tree.py`	AX tree→ indexed text serialization
`pruning.py`	Smart tree pruning to fit LLM context windows
`keys.py`	xdotool syntax→ CGKeyCode + modifier mapping
`event_tap.py`	`CGEventTap` wrapper with auto-reenable
`safety.py`	App/URL blocklists, SSRF protection
`retry.py`	Exponential backoff policies
`elicitation.py`	App approval store (session + persistent)
`lifecycle.py`	Per-turn cleanup and step tracking
`errors.py`	Typed exceptions and AX error code table

Key Design Decisions

CGEventPostToPid, never CGEventPost — all input is process-targeted. The global event stream (your cursor, your keyboard) is never touched
Window capture without activation — ScreenCaptureKit captures by window ID, even if the window is fully occluded
User interruption detection — if you start using an app the agent is working in, mac-cua detects the conflict and yields to you
Snapshot-local indices — element indices are valid only for the snapshot that produced them; no stale references
Cross-app robustness — detects and adapts to Native Cocoa, Electron, Safari, Chrome, Java, and Qt apps
Event-driven settling — wait_for_settle with per-tool timeouts and debounce, not fixed sleep() calls
Per-app guidance — custom operational hints per bundle ID (e.g., app/guidance/com.apple.Music.md)

Supported Apps

mac-cua works with any macOS application that exposes an accessibility tree:

Native Cocoa — Finder, Safari, Music, System Settings, Notes, Calendar
Electron — VS Code, Slack, Discord, Notion
Chromium — Chrome, Arc, Edge
Java — JetBrains IDEs (IntelliJ, PyCharm, WebStorm)
Qt — Various Qt-based applications

Apps with minimal accessibility exposure fall back to screenshot-based coordinate interaction automatically.

Safety

App blocklist — prevents interaction with system security processes (Keychain, login)
URL blocklist — SSRF protection for web-based interactions
App approval flow — session and persistent approval gates before controlling new apps
Step limits — per-turn cleanup and step tracking to prevent runaway loops
Background-only — cannot inject events globally; input is always process-targeted
User wins — interruption detection yields control back to you immediately

Development

# Install dependencies
uv sync

# Run tests
uv run pytest

# Run a specific test
uv run pytest tests/test_safety.py -v

# Run the server
uv run python main.py

Project Structure

mac-cua/
  main.py                  Entry point, permissions, logging
  app/
    server.py              MCP protocol layer
    session.py             Session lifecycle & orchestration
    response.py            Response dataclasses
    guidance/              Per-app operational hints
    _lib/                  Platform backend (17 modules, ~7300 LOC)
  tests/                   136 tests
  specs/                   Tool reference docs

Contributing

Contributions are welcome! mac-cua is a community-driven project and we'd love your help.

Fork the repo
Create a branch (git checkout -b my-feature)
Make your changes — add tests if applicable
Run the test suite (uv run pytest)
Open a Pull Request

Whether it's a bug fix, new app guidance file, documentation improvement, or a whole new feature — all contributions are appreciated.

If you find mac-cua useful, consider giving it a star. It helps others discover the project.

License

Apache License 2.0 — use it, fork it, ship it, sell it. No strings attached.

Acknowledgments

mac-cua was inspired by Codex computer use (OpenAI, April 2026) and Personal Computer (Perplexity, April 2026). Both showed that background desktop automation is the future — mac-cua brings that capability to everyone as an open-source MCP tool that works with any LLM.

Built with MCP for universal LLM compatibility, PyObjC for macOS integration, and ScreenCaptureKit for GPU-accelerated background capture.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
assets		assets
specs		specs
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mac-cua

Demo

Background-First

How it stays invisible

A note on focus

What this means in practice

Why mac-cua?

Quickstart

Prerequisites

Install

Run

Setup Your AI Tool

How It Works

Tools

Discovery

Interaction

Reliability Hierarchy

Example Workflow

Architecture

Platform Backend Modules

Key Design Decisions

Supported Apps

Safety

Development

Project Structure

Contributing

License

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mac-cua

Demo

Background-First

How it stays invisible

A note on focus

What this means in practice

Why mac-cua?

Quickstart

Prerequisites

Install

Run

Setup Your AI Tool

How It Works

Tools

Discovery

Interaction

Reliability Hierarchy

Example Workflow

Architecture

Platform Backend Modules

Key Design Decisions

Supported Apps

Safety

Development

Project Structure

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages