Skip to content

ishandotsh/pi-computer-use

 
 

Repository files navigation

pi-computer-use

pi-computer-use

npm license platform ci

Codex-style computer use for Pi on macOS and Linux/X11.

pi-computer-use gives Pi agents a computer-use surface for visible desktop windows. On macOS it prefers Accessibility (AX) targets such as @e1, returns semantic state after every action, and attaches screenshots only when AX coverage is too weak for reliable operation. On Linux it currently provides an X11 coordinate-based MVP for screenshots, window selection, pointer actions, scrolling, and keyboard input.

Table of Contents

Quick Start

Install the Pi package:

pi install git:github.com/injaneity/pi-computer-use#v0.2.0

On macOS, start Pi in interactive mode. On the first session, grant macOS permissions to:

~/.pi/agent/helpers/pi-computer-use/bridge

Required macOS permissions:

  • Accessibility
  • Screen Recording

On Linux, make sure you are running an X11/Xorg session and install the system prerequisites in Linux/X11 Prerequisites. No Linux desktop permission prompt is expected.

Then call screenshot first in a Pi session. It selects the controlled window and returns the latest state. On macOS this may include AX refs such as @e1; on Linux, use coordinates from the returned screenshot.

screenshot({ app: "Safari" })
click({ ref: "@e1" })
set_text({ ref: "@e2", text: "hello" })

Linux coordinate example:

screenshot({ app: "Firefox" })
click({ x: 320, y: 180, captureId: "..." })
type_text({ text: "hello from Linux/X11" })

Use /computer-use in Pi to inspect the effective config and config sources.

Linux/X11 Prerequisites

The Linux helper is a Node.js script that drives an X11 desktop through standard command-line tools. Install these before using the package on Ubuntu/Debian-style systems:

sudo apt update
sudo apt install wmctrl xdotool imagemagick x11-utils x11-apps

The install/runtime setup checks for these tools and prints a clear hint if anything is missing. npm install/pi install does not install operating-system packages for you.

Linux support requires an X11 session with DISPLAY set. If your desktop login screen offers a choice, select an Xorg/X11 session instead of Wayland.

What It Adds to Pi

  • Public tools: screenshot, click, double_click, move_mouse, drag, scroll, keypress, type_text, set_text, wait, computer_actions.
  • AX target refs in tool results on macOS, with capabilities such as canSetValue, canPress, canFocus, canScroll, and adjust.
  • Ref-first actions on macOS such as click({ ref: "@eN" }), scroll({ ref: "@eN" }), and set_text({ ref: "@eN", text }).
  • Coordinate-based actions on macOS and Linux/X11 using coordinates from the latest screenshot.
  • Batched actions through computer_actions, with one post-action state update plus per-action execution metadata.
  • Execution metadata that reports stealth for background-safe AX paths and default for focus/raw-event fallbacks.
  • Full pointer and keyboard primitive coverage for common GUI flows, with AX-first equivalents where available on macOS.
  • Browser-aware targeting, including isolated browser window preference where appropriate on macOS.
  • Optional strict AX mode for macOS background-safe operation without foreground focus, raw pointer events, raw keyboard events, or cursor takeover.
  • Official QA benchmark harness in benchmarks/.

Examples

Prefer AX refs over coordinates when a matching target exists. This is primarily the macOS workflow:

click({ ref: "@e1" })
scroll({ ref: "@e3", scrollY: 600 })

Use coordinates from the latest screenshot when there is no suitable AX target or when running on Linux/X11:

click({ x: 320, y: 180, captureId: "..." })

Replace text through AX value semantics on macOS:

set_text({ ref: "@e2", text: "https://example.com" })
keypress({ keys: ["Enter"] })

Type through foreground keyboard events on Linux/X11:

click({ x: 320, y: 180, captureId: "..." })
type_text({ text: "https://example.com" })
keypress({ keys: ["Enter"] })

Batch obvious actions when no intermediate inspection is needed:

computer_actions({
  captureId: "...",
  actions: [
    { type: "click", ref: "@e1" },
    { type: "set_text", ref: "@e2", text: "https://example.com" },
    { type: "keypress", keys: ["Enter"] }
  ]
})

Linux/X11 batch example:

computer_actions({
  captureId: "...",
  actions: [
    { type: "click", x: 320, y: 180 },
    { type: "type_text", text: "https://example.com" },
    { type: "keypress", keys: ["Enter"] }
  ]
})

See docs/usage.md for the full workflow and tool patterns.

How It Works

pi-computer-use has three pieces:

  1. The Pi extension in extensions/computer-use.ts registers the public tools and /computer-use command.
  2. The TypeScript bridge in src/bridge.ts manages the current window, capture IDs, AX refs when available, fallback policy, batching, and execution metadata.
  3. A platform helper implements the desktop protocol:
    • macOS: the native Swift helper in native/macos/bridge.swift talks to Accessibility, ScreenCaptureKit, AppKit, and CoreGraphics.
    • Linux/X11: the Node.js helper in native/linux/bridge.mjs talks to X11 through wmctrl, xdotool, xprop, and ImageMagick/X11 screenshot tools.

The macOS result is semantic-first GUI control: Pi sees useful AX targets first, falls back to screenshots only when needed, and reports whether each action stayed background-safe. The Linux/X11 backend currently provides a coordinate-first fallback surface for common desktop automation tasks.

Documentation

  • Usage guide: tool workflow, AX refs, text input, browser flows, batching, and strict AX mode.
  • Configuration: config files, environment overrides, browser control, and stealth mode.
  • Development: local setup, helper builds, validation, release signing notes, and PR workflow.
  • Troubleshooting: permissions, helper setup, stale refs, browser refusal, Linux/X11 prerequisites, and strict mode errors.
  • Benchmarks: benchmark commands, metrics, regression policy, and local comparison workflow.
  • Contributing: issue-first contribution rules and PR checklist.

Development & Benchmarks

Install dependencies:

npm install

Run checks:

npm test

Run the local checkout in Pi without loading another installed copy:

pi --no-extensions -e .

Run the default QA benchmark:

npm run benchmark:qa

Run the wider benchmark that may open apps:

npm run benchmark:qa:full

Release & Install Notes

The package is published on npm as @injaneity/pi-computer-use.

npm install @injaneity/pi-computer-use
npm install @injaneity/pi-computer-use@0.2.0

Pi installs should pin a GitHub release tag:

pi install git:github.com/injaneity/pi-computer-use#v0.2.0
pi install -l git:github.com/injaneity/pi-computer-use#v0.2.0
pi install /absolute/path/to/pi-computer-use

Remove:

pi remove git:github.com/injaneity/pi-computer-use#v0.2.0
npm remove @injaneity/pi-computer-use

For a different release, replace v0.2.0 or 0.2.0 with the version you want to pin.

Screenshots

pi-computer-use screenshot

License

MIT

See Also

About

control your applications using pi-coding-agent. fully invisible.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 59.2%
  • Swift 28.5%
  • JavaScript 12.3%