Release v0.2.0 — Accessibility-First Architecture · jellythomas/agent-eyes

What is agent-eyes?

Accessibility-tree vision for AI agents — see and interact with any application without screenshots.

Instead of pixel-based screen capture, agent-eyes reads the OS accessibility tree to give AI agents a structured, semantic view of every UI element on screen.

Key Advantages

No screenshots needed — works through accessibility APIs, not pixels
Cross-platform — macOS (AXUIElement), Windows (UI Automation), Linux (AT-SPI2)
Native + Web — interact with desktop apps and Chrome tabs from one MCP server
Shadow mode — control Chrome in the background without stealing window focus
Human-like input — real keyboard/mouse events that trigger all event listeners
28 tools — orientation, reading UI, interaction, app management, Chrome/web, shadow mode

Installation

uvx agent-eyes

Or add to Claude Code (~/.claude.json):

{
  "mcpServers": {
    "agent-eyes": {
      "command": "uvx",
      "args": ["agent-eyes"]
    }
  }
}

What's in this release

Accessibility-first architecture — no screenshots, pure accessibility tree navigation
Cross-platform native adapters (macOS, Windows, Linux)
Chrome DevTools Protocol (CDP) integration for web content
Shadow mode for background browser control
Human-like keyboard/mouse input simulation
OCR fallback for apps with sparse accessibility trees
28 MCP tools covering full desktop + web interaction
PyPI-ready packaging via uvx

Supported Platforms

Platform	Native Adapter	Web (Chrome)	Shadow Mode
macOS	AXUIElement + pyobjc	CDP + AppleScript fallback	Yes
Windows	UI Automation + pywinauto	CDP	Yes
Linux	AT-SPI2 + pyatspi	CDP	Yes

Requirements: Python 3.10+ • Chrome with --remote-debugging-port=9222 for web tools

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0 — Accessibility-First Architecture

Choose a tag to compare

Sorry, something went wrong.