PROJECT STATUS: Research / Pre-Alpha This project is in early research phase. The architecture is designed and documented, but implementation has not yet begun. See RESEARCH.md for the development roadmap and how to contribute.
AI-Powered Visual Automation for Edge Hardware
PixelPilotMCP is an intelligent RPA (Robotic Process Automation) tool designed to run on edge hardware with Google Coral TPU acceleration. It connects to remote systems via RDP/VNC, performs real-time window detection and OCR using the Coral's ML capabilities, and integrates with AI assistants through the Model Context Protocol (MCP) for intelligent decision-making.
- Edge-First Design - Optimized for low-power SBCs with Google Coral TPU
- Visual Automation - Real-time screen analysis, window detection, and OCR
- Remote Control - Connect to systems via RDP or VNC protocols
- MCP Integration - Expose automation capabilities as MCP tools for AI assistants
- Autonomous Operation - Run unattended automation workflows
- Scriptable - Python-based scripting for custom automation tasks
- Hardware Accelerated - Leverage Coral TPU for fast inference
┌─────────────────────────────────────────────────────────────────┐
│ AI Assistant │
│ (Claude, etc. via MCP) │
└─────────────────────────────┬───────────────────────────────────┘
│ MCP Protocol
▼
┌─────────────────────────────────────────────────────────────────┐
│ PixelPilotMCP Server │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ MCP Tools │ │ Scripts │ │ Decision Engine │ │
│ └──────┬──────┘ └──────┬──────┘ └────────────┬────────────┘ │
│ │ │ │ │
│ └────────────────┼──────────────────────┘ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Core Engine │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Vision │ │ Input │ │ Connection │ │ │
│ │ │ (Coral) │ │ (Mouse/KB) │ │ (RDP/VNC) │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │ │
│ └─────────┼────────────────┼──────────────────┼─────────────┘ │
└────────────┼────────────────┼──────────────────┼────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌───────────┐ ┌───────────────────┐
│ Google Coral │ │ Virtual │ │ Remote System │
│ TPU │ │ Input │ │ (via RDP/VNC) │
└─────────────────┘ └───────────┘ └───────────────────┘
- Single Board Computer (Raspberry Pi 4, Orange Pi, etc.)
- Google Coral USB Accelerator or Coral M.2/Mini PCIe
- 2GB RAM minimum (4GB recommended)
- Network connectivity to target systems
- Coral Dev Board or similar with integrated TPU
- 4GB+ RAM
- Gigabit Ethernet for low-latency remote connections
- SSD storage for faster model loading
- Python 3.10+
- PyCoral / TensorFlow Lite runtime
- FreeRDP or compatible RDP client library
- VNC client library (e.g., vncdotool)
- Edge TPU runtime
# Clone the repository
git clone https://github.com/emesix/PixelPilotMCP.git
cd PixelPilotMCP
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -e .
# Install Coral Edge TPU runtime (follow Google's guide)
# https://coral.ai/docs/accelerator/get-started/For complete OS installation and hardware setup on the Advantech ARK-1124H edge device (or similar hardware), see the comprehensive Installation Guide. This covers:
- Debian 12 base OS installation and partitioning
- System hardening (SSH, firewall, automatic updates)
- Google Coral Edge TPU setup
- Python environment and dependencies
- Remote desktop client stack (RDP/VNC)
- Vision and OCR stack (OpenCV, Tesseract)
- Web interface with Nginx
- MCP server configuration
- Systemd service setup
- Network configuration
- Post-install verification
Note: These commands are planned but not yet implemented. The project is currently in research phase.
# Start the MCP server (planned)
pixelpilot serve
# Run a script (planned)
pixelpilot run examples/basic_automation.py
# Connect to a remote system (planned)
pixelpilot connect rdp://192.168.1.100See RESEARCH.md for the implementation roadmap.
PixelPilotMCP exposes its automation capabilities as MCP tools, allowing AI assistants to:
- Screen Analysis - Capture and analyze remote screen content
- Element Detection - Find windows, buttons, text fields, and UI elements
- OCR - Extract text from screen regions
- Input Control - Send mouse clicks, keyboard input, and gestures
- Script Execution - Run pre-defined automation scripts
- Session Management - Connect/disconnect from remote systems
{
"tool": "pixelpilot_click_element",
"arguments": {
"element_type": "button",
"text": "Submit",
"confidence": 0.8
}
}Add to your MCP configuration:
{
"mcpServers": {
"pixelpilot": {
"command": "pixelpilot",
"args": ["serve", "--stdio"]
}
}
}- Project structure and architecture
- Core RDP/VNC connection handling
- Basic screen capture and analysis
- Coral TPU integration for inference
- Window and UI element detection models
- OCR integration with Coral acceleration
- Mouse and keyboard input simulation
- Basic MCP tool exposure
- Action planning and decision engine
- Error recovery and retry logic
- Session state management
- Script execution framework
- Scratch-style visual scripting interface
- Drag-and-drop automation workflow builder
- Block-based programming for non-developers
- Visual debugging and step-through execution
- Export to Python scripts
- Multi-session management
- Distributed execution across multiple edge devices
- Recording and playback of user actions
- Integration with popular automation frameworks
PixelPilotMCP/
├── src/pixelpilot/
│ ├── core/ # Core engine and orchestration
│ ├── vision/ # Coral TPU, detection, OCR
│ ├── input/ # Mouse/keyboard control
│ ├── connection/ # RDP/VNC handling
│ ├── mcp/ # MCP server and tools
│ └── scripts/ # User script support
├── tests/ # Test suite
├── docs/ # Documentation
├── examples/ # Example scripts
└── configs/ # Configuration files
We welcome contributions! This project is in research phase and needs help with:
- Research: Evaluating libraries, models, and approaches (see RESEARCH.md)
- Prototyping: Building proof-of-concept implementations
- Documentation: Improving architecture docs and adding examples
- Testing: Setting up test infrastructure
See CONTRIBUTING.md for guidelines.
# Clone and setup
git clone https://github.com/emesix/PixelPilotMCP.git
cd PixelPilotMCP
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
ruff check src/This project is licensed under the MIT License - see the LICENSE file for details.
- Google Coral for edge TPU hardware and software
- Model Context Protocol for AI integration standards
- The open-source RDP/VNC communities
PixelPilotMCP - Bringing intelligent automation to the edge.