😆You’re already a mature LLM, so you should learn to operate the computer by yourself.
🛠️MCP server for LLM-controlled computer operations — screen capture, window management, mouse & keyboard automation.
ControlMCP is a Model Context Protocol (MCP) server that gives LLMs the ability to see and control a computer — take screenshots, manage windows, move/click the mouse, type on the keyboard, and chain all of these into complex automation workflows.
The repository also ships with a reusable agent skill at skills/computer-control/. It packages desktop-operation SOPs, shortcut guidance, JetBrains IDE workflows, and screenshot-to-click coordinate rules for agents that support skills.
install from source:
git clone https://github.com/nix18/ControlMCP.git
cd ControlMCP
pip install -e .control-mcpThe server communicates over stdio (standard MCP transport). Configure your MCP client to connect to the control-mcp command.
Add to your MCP client config (e.g. Claude Desktop, Cursor, etc.):
{
"mcpServers": {
"control-mcp": {
"command": "control-mcp",
"args": []
}
}
}| Tool | Description |
|---|---|
plan_desktop_task |
Convert a vague desktop instruction into a structured plan |
execute_desktop_plan |
Run a structured plan through the guarded executor |
get_execution_status |
Query the current status of a high-level execution run |
confirm_sensitive_action |
Explicitly approve or reject a sensitive action |
recover_execution_context |
Rebuild context after shortcut misuse or UI drift |
record_workflow_experience |
Persist reusable workflow experience |
| Tool | Description |
|---|---|
capture_screen |
Full screen or monitor screenshot |
capture_region |
Region screenshot (x, y, width, height) |
capture_scroll_region |
Stitch a long screenshot while scrolling inside a fixed region |
get_screen_info |
List all monitors with resolution |
read_screenshot_base64 |
Read a screenshot file as Base64 text |
resolve_grid_target |
Convert a grid cell + anchor into precise screen coordinates |
click_grid_target |
Resolve screenshot grid metadata and click directly |
| Tool | Description |
|---|---|
list_windows |
List all visible windows |
find_windows |
Find windows by title substring |
focus_window |
Bring a window to the foreground |
capture_window |
Focus + screenshot a specific window |
| Tool | Description |
|---|---|
mouse_click |
Click at coordinates (single/double/multi/hold) |
mouse_drag |
Drag from point A to point B |
mouse_move |
Move cursor without clicking |
mouse_position |
Get current cursor position |
mouse_scroll |
Scroll wheel up/down |
| Tool | Description |
|---|---|
key_press |
Press keys or hotkey combinations |
key_hold |
Hold keys for a duration |
key_type |
Type text character by character |
key_sequence |
Execute a timed sequence of key actions |
| Tool | Description |
|---|---|
mouse_and_keyboard |
Execute a mixed sequence of mouse + keyboard + wait + screenshot actions |
| Tool | Description |
|---|---|
clipboard_get |
Get clipboard text |
clipboard_set |
Set clipboard text |
launch_app |
Launch an application |
launch_url |
Open a URL in the browser |
wait |
Pause for N seconds |
get_pixel_color |
Get RGB color at screen coordinates |
hotkey |
Press a keyboard shortcut |
See docs/TUTORIAL.md for comprehensive usage examples.
// Plan a vague desktop task first
{"tool": "plan_desktop_task", "args": {"instruction": "Switch to PyCharm and run the current config"}}
// Execute a generated plan
{"tool": "execute_desktop_plan", "args": {"plan_id": "plan_abc123"}}
// Take a screenshot
{"tool": "capture_screen", "args": {}}
// Take a sharper screenshot when text clarity matters
{"tool": "capture_window", "args": {"title": "PyCharm", "quality": 75, "sharpen": true}}
// Read that screenshot as Base64 text for non-multimodal models
{"tool": "read_screenshot_base64", "args": {"file_path": "/tmp/screen.jpg"}}
// Click at (500, 300)
{"tool": "mouse_click", "args": {"x": 500, "y": 300}}
// Combined: click → select all → type
{"tool": "mouse_and_keyboard", "args": {"actions": [
{"action": "click", "x": 500, "y": 300},
{"action": "key_press", "keys": ["ctrl", "a"]},
{"action": "key_type", "text": "New text"}
]}}ControlMCP now supports a control-plane-first workflow for higher precision desktop automation:
- Normalize the user instruction with
plan_desktop_task - Review or directly execute the structured plan
- Let the guarded executor choose a faster observation strategy (
capture_window/capture_region/capture_scroll_region) - Verify each critical step and recover when context is lost
- Require explicit confirmation for payment/password/asset-related actions
- Save successful workflow experience for future runs
For small or visually ambiguous targets, you can also ask capture_screen, capture_region,
or capture_window to generate a second grid_file_path overlay image with grid_rows and
grid_cols, then convert a chosen cell + anchor through resolve_grid_target before clicking.
| Document | Description |
|---|---|
| README.md | This file |
| README.zh-CN.md | Chinese version of this file |
| docs/REQUIREMENTS.md | Requirements analysis |
| docs/ARCHITECTURE.md | Architecture design |
| docs/MODULE_DESIGN.md | Module design |
| docs/FUNCTIONAL_DESIGN.md | Functional design |
| docs/TUTORIAL.md | Tutorial & examples |
| skills/computer-control/ | Agent Skill: computer operation SOPs |
| skills/computer-control/README.md | Skill-specific install and usage guide |
| skills/computer-control/docs/window-management.md | Window rescue and window shortcut reference |
| skills/computer-control/docs/idea-run-workflow.md | JetBrains IDE run/log observation workflow |
The skills/computer-control/ folder contains a ready-to-use Agent Skill that teaches LLMs how to operate computers proficiently.
SKILL.md: the main skill instructions, SOPs, shortcut tables, and common failure patternsdocs/coordinate-system.md: coordinate conversion reference for screenshot-to-click workflowsdocs/window-management.md: window maximize/restore/snap shortcuts and window recovery workflowdocs/idea-run-workflow.md: JetBrains IDE startup, run-panel switching, and log stabilization workflowREADME.md: skill-local installation and usage notes
- Keyboard-first automation: prefer shortcuts over UI clicking whenever possible
- Plan-before-act control plane: normalize ambiguous instructions before touching the desktop
- Window recovery: fix minimized, half-screen, or partially restored windows before further actions
- Coordinate-safe clicking: convert screenshot-local coordinates into screen coordinates explicitly
- IDE workflows: IntelliJ IDEA / PyCharm run-configuration selection, run-panel switching, and log monitoring
- Sensitive-action gating: require confirmation before payment/password/asset-related steps
- Operational fallback: when JetBrains shortcuts do not behave as expected, check the local
ReferenceCard.pdfor JetBrains official documentation
You can either copy skills/computer-control/ into your agent's skill directory, or add it via a symbolic link.
Option 1: copy the directory
# Codex CLI
cp -r skills/computer-control ~/.codex/skills/
# Claude Code
cp -r skills/computer-control ~/.claude/skills/
# OpenCode
cp -r skills/computer-control ~/.config/opencode/skills/Option 2: create a symbolic link
On macOS / Linux:
# Codex CLI
ln -s "$(pwd)/skills/computer-control" ~/.codex/skills/computer-control
# Claude Code
ln -s "$(pwd)/skills/computer-control" ~/.claude/skills/computer-control
# OpenCode
ln -s "$(pwd)/skills/computer-control" ~/.config/opencode/skills/computer-controlOn Windows (Command Prompt as Administrator when required):
mklink /D "%USERPROFILE%\.codex\skills\computer-control" "%CD%\skills\computer-control"
mklink /D "%USERPROFILE%\.claude\skills\computer-control" "%CD%\skills\computer-control"
mklink /D "%USERPROFILE%\.config\opencode\skills\computer-control" "%CD%\skills\computer-control"Using a symbolic link is convenient while iterating on the skill, because changes in this repository are reflected immediately in the agent's skills directory.
If your agent supports custom skill paths, you can also reference this folder directly.
After installation, invoke it naturally in prompts such as:
Use $computer-control to restart the IDEA app and wait until logs stop updatingUse $computer-control to maximize the target window and capture itUse $computer-control to operate PyCharm with keyboard shortcuts first
For skill-specific details, see skills/computer-control/README.md.
ControlMCP/
├── README.md # This file
├── README.zh-CN.md # Chinese README
├── LICENSE # GNU GPLv3 license
├── pyproject.toml # Package config
├── src/
│ └── control_mcp/
│ ├── __init__.py
│ ├── server.py # MCP server + tool registration
│ ├── schemas/
│ │ ├── __init__.py
│ │ └── responses.py # Structured response types
│ ├── tools/
│ │ ├── __init__.py
│ │ ├── screen.py # Screen capture tools
│ │ ├── window.py # Window management tools
│ │ ├── mouse.py # Mouse control tools
│ │ ├── keyboard.py # Keyboard control tools
│ │ ├── combined.py # Combined operations
│ │ └── actions.py # Additional actions
│ └── utils/
│ ├── __init__.py
│ ├── capture.py # Capture utilities (JPEG, resize)
│ ├── _win_window.py # Windows backend
│ ├── _mac_window.py # macOS backend
│ └── _linux_window.py # Linux backend
├── skills/
│ └── computer-control/ # Agent Skill: computer operation SOPs
│ ├── SKILL.md # Main skill instructions
│ ├── docs/
│ │ ├── coordinate-system.md # Coordinate system reference
│ │ ├── window-management.md # Window management reference
│ │ └── idea-run-workflow.md # JetBrains IDE run/log workflow
│ └── README.md # Skill install & usage guide
├── docs/
│ ├── REQUIREMENTS.md
│ ├── ARCHITECTURE.md
│ ├── MODULE_DESIGN.md
│ ├── FUNCTIONAL_DESIGN.md
│ ├── TUTORIAL.md
│ └── zh-CN/ # Chinese documentation
│ ├── REQUIREMENTS.md
│ ├── ARCHITECTURE.md
│ ├── MODULE_DESIGN.md
│ ├── FUNCTIONAL_DESIGN.md
│ └── TUTORIAL.md
└── tests/
├── __init__.py
├── test_schemas.py # 22 tests
├── test_screen.py # 6 tests
├── test_window.py # 11 tests
├── test_mouse.py # 13 tests
├── test_keyboard.py # 16 tests
├── test_combined.py # 12 tests
└── test_actions.py # 13 tests
| Platform | Screen Capture | Window Management | Mouse/Keyboard |
|---|---|---|---|
| Windows | ✅ mss | ✅ pygetwindow | ✅ pyautogui |
| macOS | ✅ mss | ✅ Quartz | ✅ pyautogui |
| Linux | ✅ mss | ✅ xlib | ✅ pyautogui |
GNU General Public License v3.0 (GPLv3)