v0.2.0 — Accessibility-First Architecture
What is agent-eyes?
Accessibility-tree vision for AI agents — see and interact with any application without screenshots.
Instead of pixel-based screen capture, agent-eyes reads the OS accessibility tree to give AI agents a structured, semantic view of every UI element on screen.
Key Advantages
- No screenshots needed — works through accessibility APIs, not pixels
- Cross-platform — macOS (AXUIElement), Windows (UI Automation), Linux (AT-SPI2)
- Native + Web — interact with desktop apps and Chrome tabs from one MCP server
- Shadow mode — control Chrome in the background without stealing window focus
- Human-like input — real keyboard/mouse events that trigger all event listeners
- 28 tools — orientation, reading UI, interaction, app management, Chrome/web, shadow mode
Installation
uvx agent-eyesOr add to Claude Code (~/.claude.json):
{
"mcpServers": {
"agent-eyes": {
"command": "uvx",
"args": ["agent-eyes"]
}
}
}What's in this release
- Accessibility-first architecture — no screenshots, pure accessibility tree navigation
- Cross-platform native adapters (macOS, Windows, Linux)
- Chrome DevTools Protocol (CDP) integration for web content
- Shadow mode for background browser control
- Human-like keyboard/mouse input simulation
- OCR fallback for apps with sparse accessibility trees
- 28 MCP tools covering full desktop + web interaction
- PyPI-ready packaging via
uvx
Supported Platforms
| Platform | Native Adapter | Web (Chrome) | Shadow Mode |
|---|---|---|---|
| macOS | AXUIElement + pyobjc | CDP + AppleScript fallback | Yes |
| Windows | UI Automation + pywinauto | CDP | Yes |
| Linux | AT-SPI2 + pyatspi | CDP | Yes |
Requirements: Python 3.10+ • Chrome with --remote-debugging-port=9222 for web tools