Skip to content

emesix/PixelPilotMCP

Repository files navigation

PixelPilotMCP

PROJECT STATUS: Research / Pre-Alpha This project is in early research phase. The architecture is designed and documented, but implementation has not yet begun. See RESEARCH.md for the development roadmap and how to contribute.

AI-Powered Visual Automation for Edge Hardware

PixelPilotMCP is an intelligent RPA (Robotic Process Automation) tool designed to run on edge hardware with Google Coral TPU acceleration. It connects to remote systems via RDP/VNC, performs real-time window detection and OCR using the Coral's ML capabilities, and integrates with AI assistants through the Model Context Protocol (MCP) for intelligent decision-making.

Features

  • Edge-First Design - Optimized for low-power SBCs with Google Coral TPU
  • Visual Automation - Real-time screen analysis, window detection, and OCR
  • Remote Control - Connect to systems via RDP or VNC protocols
  • MCP Integration - Expose automation capabilities as MCP tools for AI assistants
  • Autonomous Operation - Run unattended automation workflows
  • Scriptable - Python-based scripting for custom automation tasks
  • Hardware Accelerated - Leverage Coral TPU for fast inference

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        AI Assistant                              │
│                    (Claude, etc. via MCP)                        │
└─────────────────────────────┬───────────────────────────────────┘
                              │ MCP Protocol
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      PixelPilotMCP Server                        │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  MCP Tools  │  │   Scripts   │  │    Decision Engine      │  │
│  └──────┬──────┘  └──────┬──────┘  └────────────┬────────────┘  │
│         │                │                      │                │
│         └────────────────┼──────────────────────┘                │
│                          ▼                                       │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                      Core Engine                           │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐    │  │
│  │  │   Vision    │  │    Input    │  │   Connection    │    │  │
│  │  │  (Coral)    │  │ (Mouse/KB)  │  │   (RDP/VNC)     │    │  │
│  │  └──────┬──────┘  └──────┬──────┘  └────────┬────────┘    │  │
│  └─────────┼────────────────┼──────────────────┼─────────────┘  │
└────────────┼────────────────┼──────────────────┼────────────────┘
             │                │                  │
             ▼                ▼                  ▼
┌─────────────────┐    ┌───────────┐    ┌───────────────────┐
│   Google Coral  │    │  Virtual  │    │   Remote System   │
│      TPU        │    │   Input   │    │   (via RDP/VNC)   │
└─────────────────┘    └───────────┘    └───────────────────┘

Hardware Requirements

Minimum

  • Single Board Computer (Raspberry Pi 4, Orange Pi, etc.)
  • Google Coral USB Accelerator or Coral M.2/Mini PCIe
  • 2GB RAM minimum (4GB recommended)
  • Network connectivity to target systems

Recommended

  • Coral Dev Board or similar with integrated TPU
  • 4GB+ RAM
  • Gigabit Ethernet for low-latency remote connections
  • SSD storage for faster model loading

Software Requirements

  • Python 3.10+
  • PyCoral / TensorFlow Lite runtime
  • FreeRDP or compatible RDP client library
  • VNC client library (e.g., vncdotool)
  • Edge TPU runtime

Installation

Quick Start

# Clone the repository
git clone https://github.com/emesix/PixelPilotMCP.git
cd PixelPilotMCP

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -e .

# Install Coral Edge TPU runtime (follow Google's guide)
# https://coral.ai/docs/accelerator/get-started/

Full Installation Guide

For complete OS installation and hardware setup on the Advantech ARK-1124H edge device (or similar hardware), see the comprehensive Installation Guide. This covers:

  • Debian 12 base OS installation and partitioning
  • System hardening (SSH, firewall, automatic updates)
  • Google Coral Edge TPU setup
  • Python environment and dependencies
  • Remote desktop client stack (RDP/VNC)
  • Vision and OCR stack (OpenCV, Tesseract)
  • Web interface with Nginx
  • MCP server configuration
  • Systemd service setup
  • Network configuration
  • Post-install verification

Usage

Note: These commands are planned but not yet implemented. The project is currently in research phase.

# Start the MCP server (planned)
pixelpilot serve

# Run a script (planned)
pixelpilot run examples/basic_automation.py

# Connect to a remote system (planned)
pixelpilot connect rdp://192.168.1.100

See RESEARCH.md for the implementation roadmap.

MCP Integration

PixelPilotMCP exposes its automation capabilities as MCP tools, allowing AI assistants to:

  • Screen Analysis - Capture and analyze remote screen content
  • Element Detection - Find windows, buttons, text fields, and UI elements
  • OCR - Extract text from screen regions
  • Input Control - Send mouse clicks, keyboard input, and gestures
  • Script Execution - Run pre-defined automation scripts
  • Session Management - Connect/disconnect from remote systems

Example MCP Tool Usage

{
  "tool": "pixelpilot_click_element",
  "arguments": {
    "element_type": "button",
    "text": "Submit",
    "confidence": 0.8
  }
}

Configuring with Claude Code

Add to your MCP configuration:

{
  "mcpServers": {
    "pixelpilot": {
      "command": "pixelpilot",
      "args": ["serve", "--stdio"]
    }
  }
}

Roadmap

Phase 1: Foundation (Current)

  • Project structure and architecture
  • Core RDP/VNC connection handling
  • Basic screen capture and analysis
  • Coral TPU integration for inference

Phase 2: Vision & Control

  • Window and UI element detection models
  • OCR integration with Coral acceleration
  • Mouse and keyboard input simulation
  • Basic MCP tool exposure

Phase 3: Intelligence

  • Action planning and decision engine
  • Error recovery and retry logic
  • Session state management
  • Script execution framework

Phase 4: Visual Scripting

  • Scratch-style visual scripting interface
  • Drag-and-drop automation workflow builder
  • Block-based programming for non-developers
  • Visual debugging and step-through execution
  • Export to Python scripts

Phase 5: Advanced Features

  • Multi-session management
  • Distributed execution across multiple edge devices
  • Recording and playback of user actions
  • Integration with popular automation frameworks

Project Structure

PixelPilotMCP/
├── src/pixelpilot/
│   ├── core/          # Core engine and orchestration
│   ├── vision/        # Coral TPU, detection, OCR
│   ├── input/         # Mouse/keyboard control
│   ├── connection/    # RDP/VNC handling
│   ├── mcp/           # MCP server and tools
│   └── scripts/       # User script support
├── tests/             # Test suite
├── docs/              # Documentation
├── examples/          # Example scripts
└── configs/           # Configuration files

Contributing

We welcome contributions! This project is in research phase and needs help with:

  • Research: Evaluating libraries, models, and approaches (see RESEARCH.md)
  • Prototyping: Building proof-of-concept implementations
  • Documentation: Improving architecture docs and adding examples
  • Testing: Setting up test infrastructure

See CONTRIBUTING.md for guidelines.

Development Setup

# Clone and setup
git clone https://github.com/emesix/PixelPilotMCP.git
cd PixelPilotMCP
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


PixelPilotMCP - Bringing intelligent automation to the edge.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages