Skip to content

v0.2.0 — Accessibility-First Architecture

Choose a tag to compare

@jellythomas jellythomas released this 21 Mar 04:08

What is agent-eyes?

Accessibility-tree vision for AI agents — see and interact with any application without screenshots.

Instead of pixel-based screen capture, agent-eyes reads the OS accessibility tree to give AI agents a structured, semantic view of every UI element on screen.

Key Advantages

  • No screenshots needed — works through accessibility APIs, not pixels
  • Cross-platform — macOS (AXUIElement), Windows (UI Automation), Linux (AT-SPI2)
  • Native + Web — interact with desktop apps and Chrome tabs from one MCP server
  • Shadow mode — control Chrome in the background without stealing window focus
  • Human-like input — real keyboard/mouse events that trigger all event listeners
  • 28 tools — orientation, reading UI, interaction, app management, Chrome/web, shadow mode

Installation

uvx agent-eyes

Or add to Claude Code (~/.claude.json):

{
  "mcpServers": {
    "agent-eyes": {
      "command": "uvx",
      "args": ["agent-eyes"]
    }
  }
}

What's in this release

  • Accessibility-first architecture — no screenshots, pure accessibility tree navigation
  • Cross-platform native adapters (macOS, Windows, Linux)
  • Chrome DevTools Protocol (CDP) integration for web content
  • Shadow mode for background browser control
  • Human-like keyboard/mouse input simulation
  • OCR fallback for apps with sparse accessibility trees
  • 28 MCP tools covering full desktop + web interaction
  • PyPI-ready packaging via uvx

Supported Platforms

Platform Native Adapter Web (Chrome) Shadow Mode
macOS AXUIElement + pyobjc CDP + AppleScript fallback Yes
Windows UI Automation + pywinauto CDP Yes
Linux AT-SPI2 + pyatspi CDP Yes

Requirements: Python 3.10+ • Chrome with --remote-debugging-port=9222 for web tools