Codex-style computer use for Pi on macOS and Linux/X11.
pi-computer-use gives Pi agents a computer-use surface for visible desktop windows. On macOS it prefers Accessibility (AX) targets such as @e1, returns semantic state after every action, and attaches screenshots only when AX coverage is too weak for reliable operation. On Linux it currently provides an X11 coordinate-based MVP for screenshots, window selection, pointer actions, scrolling, and keyboard input.
- Quick Start
- Linux/X11 Prerequisites
- What It Adds to Pi
- Examples
- How It Works
- Documentation
- Development & Benchmarks
- Release & Install Notes
- License
- See Also
Install the Pi package:
pi install git:github.com/injaneity/pi-computer-use#v0.2.0On macOS, start Pi in interactive mode. On the first session, grant macOS permissions to:
~/.pi/agent/helpers/pi-computer-use/bridge
Required macOS permissions:
- Accessibility
- Screen Recording
On Linux, make sure you are running an X11/Xorg session and install the system prerequisites in Linux/X11 Prerequisites. No Linux desktop permission prompt is expected.
Then call screenshot first in a Pi session. It selects the controlled window and returns the latest state. On macOS this may include AX refs such as @e1; on Linux, use coordinates from the returned screenshot.
screenshot({ app: "Safari" })
click({ ref: "@e1" })
set_text({ ref: "@e2", text: "hello" })Linux coordinate example:
screenshot({ app: "Firefox" })
click({ x: 320, y: 180, captureId: "..." })
type_text({ text: "hello from Linux/X11" })Use /computer-use in Pi to inspect the effective config and config sources.
The Linux helper is a Node.js script that drives an X11 desktop through standard command-line tools. Install these before using the package on Ubuntu/Debian-style systems:
sudo apt update
sudo apt install wmctrl xdotool imagemagick x11-utils x11-appsThe install/runtime setup checks for these tools and prints a clear hint if anything is missing. npm install/pi install does not install operating-system packages for you.
Linux support requires an X11 session with DISPLAY set. If your desktop login screen offers a choice, select an Xorg/X11 session instead of Wayland.
- Public tools:
screenshot,click,double_click,move_mouse,drag,scroll,keypress,type_text,set_text,wait,computer_actions. - AX target refs in tool results on macOS, with capabilities such as
canSetValue,canPress,canFocus,canScroll, andadjust. - Ref-first actions on macOS such as
click({ ref: "@eN" }),scroll({ ref: "@eN" }), andset_text({ ref: "@eN", text }). - Coordinate-based actions on macOS and Linux/X11 using coordinates from the latest screenshot.
- Batched actions through
computer_actions, with one post-action state update plus per-action execution metadata. - Execution metadata that reports
stealthfor background-safe AX paths anddefaultfor focus/raw-event fallbacks. - Full pointer and keyboard primitive coverage for common GUI flows, with AX-first equivalents where available on macOS.
- Browser-aware targeting, including isolated browser window preference where appropriate on macOS.
- Optional strict AX mode for macOS background-safe operation without foreground focus, raw pointer events, raw keyboard events, or cursor takeover.
- Official QA benchmark harness in
benchmarks/.
Prefer AX refs over coordinates when a matching target exists. This is primarily the macOS workflow:
click({ ref: "@e1" })
scroll({ ref: "@e3", scrollY: 600 })Use coordinates from the latest screenshot when there is no suitable AX target or when running on Linux/X11:
click({ x: 320, y: 180, captureId: "..." })Replace text through AX value semantics on macOS:
set_text({ ref: "@e2", text: "https://example.com" })
keypress({ keys: ["Enter"] })Type through foreground keyboard events on Linux/X11:
click({ x: 320, y: 180, captureId: "..." })
type_text({ text: "https://example.com" })
keypress({ keys: ["Enter"] })Batch obvious actions when no intermediate inspection is needed:
computer_actions({
captureId: "...",
actions: [
{ type: "click", ref: "@e1" },
{ type: "set_text", ref: "@e2", text: "https://example.com" },
{ type: "keypress", keys: ["Enter"] }
]
})Linux/X11 batch example:
computer_actions({
captureId: "...",
actions: [
{ type: "click", x: 320, y: 180 },
{ type: "type_text", text: "https://example.com" },
{ type: "keypress", keys: ["Enter"] }
]
})See docs/usage.md for the full workflow and tool patterns.
pi-computer-use has three pieces:
- The Pi extension in
extensions/computer-use.tsregisters the public tools and/computer-usecommand. - The TypeScript bridge in
src/bridge.tsmanages the current window, capture IDs, AX refs when available, fallback policy, batching, and execution metadata. - A platform helper implements the desktop protocol:
- macOS: the native Swift helper in
native/macos/bridge.swifttalks to Accessibility, ScreenCaptureKit, AppKit, and CoreGraphics. - Linux/X11: the Node.js helper in
native/linux/bridge.mjstalks to X11 throughwmctrl,xdotool,xprop, and ImageMagick/X11 screenshot tools.
- macOS: the native Swift helper in
The macOS result is semantic-first GUI control: Pi sees useful AX targets first, falls back to screenshots only when needed, and reports whether each action stayed background-safe. The Linux/X11 backend currently provides a coordinate-first fallback surface for common desktop automation tasks.
- Usage guide: tool workflow, AX refs, text input, browser flows, batching, and strict AX mode.
- Configuration: config files, environment overrides, browser control, and stealth mode.
- Development: local setup, helper builds, validation, release signing notes, and PR workflow.
- Troubleshooting: permissions, helper setup, stale refs, browser refusal, Linux/X11 prerequisites, and strict mode errors.
- Benchmarks: benchmark commands, metrics, regression policy, and local comparison workflow.
- Contributing: issue-first contribution rules and PR checklist.
Install dependencies:
npm installRun checks:
npm testRun the local checkout in Pi without loading another installed copy:
pi --no-extensions -e .Run the default QA benchmark:
npm run benchmark:qaRun the wider benchmark that may open apps:
npm run benchmark:qa:fullThe package is published on npm as @injaneity/pi-computer-use.
npm install @injaneity/pi-computer-use
npm install @injaneity/pi-computer-use@0.2.0Pi installs should pin a GitHub release tag:
pi install git:github.com/injaneity/pi-computer-use#v0.2.0
pi install -l git:github.com/injaneity/pi-computer-use#v0.2.0
pi install /absolute/path/to/pi-computer-useRemove:
pi remove git:github.com/injaneity/pi-computer-use#v0.2.0
npm remove @injaneity/pi-computer-useFor a different release, replace v0.2.0 or 0.2.0 with the version you want to pin.
MIT

