Skip to content

Allows an LLM/coding agent like GitHub Copilot or Claude to use Windows 11 with mouse, keyboard and windows tools. Take screenshots to allow the LLM to see what it is doing.

License

Notifications You must be signed in to change notification settings

sbroenne/mcp-windows

Repository files navigation

Windows MCP Server

A Model Context Protocol (MCP) server that allows an LLM/coding agent like GitHub Copilot or Claude to use the Windows 11 with mouse, keyboard and windows tools. Take screenshots to allow the LLM to see what it is doing.

Designed for computer use, QA and RPA scenarios.

🤖 Co-designed with Claude Sonnet 4.5 via GitHub Copilot - This project was developed in collaboration with AI pair programming, leveraging Claude Opus 4.5's capabilities through GitHub Copilot to design, create & test a robust, production-ready Windows automation solution.

Features

🖱️ Mouse Control

  • Click, double-click, right-click, middle-click
  • Move cursor to absolute coordinates
  • Drag operations with hold/release
  • Scroll up/down/left/right
  • Multi-monitor support with DPI awareness
  • Modifier key support (Ctrl+click, Shift+click, etc.)

⌨️ Keyboard Control

  • Unicode text typing (layout-independent) - type any character in any language
  • Virtual key presses - Enter, Tab, Escape, F1-F24, navigation keys
  • Key combinations - Ctrl+S, Alt+Tab, Ctrl+Shift+P, Win+L
  • Key sequences - multi-key macros with configurable timing
  • Hold/release keys - for Shift-select and other hold operations
  • Special keys - Copilot key (Windows 11), media controls, browser keys
  • Layout detection - query current keyboard layout (BCP-47 format)

🪟 Window Management

  • List windows - enumerate all visible top-level windows with titles, handles, process info, and bounds
  • Find windows - locate windows by title (substring or regex matching)
  • Activate windows - bring windows to foreground with focus
  • Get foreground - report which window currently has focus
  • Control state - minimize, maximize, restore, and close windows
  • Move/resize - position and size windows to specified coordinates
  • Wait for window - wait for a window to appear with configurable timeout
  • Multi-monitor support - full awareness of monitor index and DPI
  • UWP/Store apps - proper detection and handling
  • Cloaking detection - filter out virtual desktop and shell-managed windows

📸 Screenshot Capture

  • LLM-Optimized by Default - JPEG format, auto-scaling to 1568px, quality 85 for minimal token usage
  • Capture primary screen - full screenshot of the main display
  • Capture specific monitor - screenshot any connected display by index
  • Capture window - screenshot a specific window (even if partially obscured)
  • Capture region - screenshot an arbitrary rectangular area
  • Capture all monitors - composite screenshot of entire virtual desktop
  • Format options - JPEG (default), PNG, WebP with configurable quality (1-100)
  • Auto-scaling - defaults to 1568px width (LLM vision model native limit); disable with maxWidth: 0
  • Output modes - inline base64 (default) or file path for zero-overhead file workflows
  • Cursor inclusion - optionally include mouse cursor in captures
  • Multi-monitor aware - supports extended desktop configurations
  • DPI aware - correct pixel dimensions on high-DPI displays

Why Choose Windows MCP?

Comprehensive Windows Automation - Unlike generic computer control tools, Windows MCP is purpose-built for Windows with native API integration. It handles Windows-specific challenges (UIPI elevation blocks, secure desktop restrictions, virtual desktops) that generic solutions miss.

Multi-Monitor & DPI-Aware - Correctly handles multi-monitor setups, DPI scaling, and virtual desktops—critical for modern Windows environments. Most alternatives struggle with coordinate translation and DPI awareness.

Full Windows API Coverage - Direct P/Invoke to Windows APIs (SendInput, SetWindowPos, GetWindowText, GdiPlus) provides reliable, low-level control. No browser automation tricks or approximate solutions.

Security-Conscious Design - Detects and gracefully handles elevated windows (UIPI), UAC prompts, and lock screens. Respects Windows security model instead of bypassing it.

Performance - Synchronous I/O on dedicated thread pool prevents blocking the LLM. Configurable delays for stability without sacrificing speed.

Active Development - Release workflows, comprehensive testing, VS Code extension, and clear contribution guidelines show this is a maintained project, not abandoned.

Installation

Option 1: VS Code Extension (Recommended)

Install the Windows MCP extension from the VS Code Marketplace for one-click deployment:

  1. Open VS Code
  2. Go to Extensions (Ctrl+Shift+X)
  3. Search for "Windows MCP"
  4. Click Install

The extension automatically configures the MCP server and makes it available to GitHub Copilot.

Option 2: Download from Releases

Download pre-built binaries from the GitHub Releases page:

  1. Download the latest mcp-windows-v*.zip
  2. Extract to your preferred location
  3. Add to your MCP client configuration (see MCP Configuration)

Usage

VS Code Extension

If you installed via the VS Code extension, the MCP server is automatically configured. No manual setup required.

Manual Configuration (For Downloaded Releases)

If you downloaded from the releases page, add to your MCP client configuration:

{
  "servers": {
    "windows": {
      "command": "dotnet",
      "args": ["path/to/extracted/Sbroenne.WindowsMcp.dll"],
      "env": {}
    }
  }
}

Note: Releases are framework-dependent and require .NET 8 Runtime to be installed.

Tools

mouse_control

Control mouse input on Windows.

Action Description Required Parameters
click Left-click at coordinates x, y
double_click Double-click at coordinates x, y
right_click Right-click at coordinates x, y
middle_click Middle-click at coordinates x, y
move Move cursor to coordinates x, y
drag Drag from current position to coordinates x, y
scroll Scroll at coordinates x, y, direction, amount

keyboard_control

Control keyboard input on Windows.

Action Description Required Parameters
type Type text using Unicode input text
press Press and release a key key
key_down Hold a key down key
key_up Release a held key key
combo Key + modifiers combination key, modifiers
sequence Multiple keys in order keys
release_all Release all held keys none
get_keyboard_layout Query current layout none

window_management

Control windows on the Windows desktop.

Action Description Required Parameters
list List all visible windows none
find Find windows by title title
activate Bring window to foreground handle
get_foreground Get current foreground window none
minimize Minimize window handle
maximize Maximize window handle
restore Restore window from min/max handle
close Close window (sends WM_CLOSE) handle
move Move window to position handle, x, y
resize Resize window handle, width, height
set_bounds Move and resize atomically handle, x, y, width, height
wait_for Wait for window to appear title

screenshot_control

Capture screenshots on Windows.

Action Description Required Parameters
capture Capture screenshot target
list_monitors List all connected monitors none

Capture Targets:

Target Description Additional Parameters
primary_screen Capture primary monitor none
monitor Capture specific monitor monitor_index
window Capture specific window window_handle
region Capture rectangular region x, y, width, height

Optional Parameters:

Parameter Type Default Description
include_cursor boolean false Include mouse cursor in capture
imageFormat string "jpeg" Output format: "jpeg", "png", "webp"
quality integer 85 Compression quality for JPEG/WebP (1-100)
maxWidth integer 1568 Max width in pixels (LLM-optimized); 0 to disable scaling
maxHeight integer null Max height in pixels (optional)
outputMode string "inline" "inline" (base64) or "file" (save to disk)
outputPath string null Custom file path when using file output mode

Supported Keys

Function Keys

f1 through f24

Navigation

up, down, left, right, home, end, pageup, pagedown, insert, delete

Control

enter, tab, escape, space, backspace

Modifiers

ctrl, shift, alt, win

Media

volumemute, volumedown, volumeup, mediaplaypause, medianexttrack, mediaprevtrack, mediastop

Special

copilot (Windows 11 Copilot+ PCs)

Browser

browserback, browserforward, browserrefresh, browserstop, browsersearch, browserfavorites, browserhome

Error Handling

The server handles common Windows security scenarios:

Error Code Description
ElevatedWindowActive Target window is running as Administrator
SecureDesktopActive UAC prompt or lock screen is active
InvalidKey Unrecognized key name
InputBlocked Input was blocked by UIPI
Timeout Operation timed out
InvalidMonitorIndex Monitor index out of range
InvalidWindowHandle Window handle is invalid or window no longer exists
WindowMinimized Cannot capture minimized window
WindowNotVisible Window is not visible
InvalidRegion Capture region has invalid dimensions
CaptureFailed Screenshot capture operation failed
SizeLimitExceeded Requested capture exceeds maximum allowed size

Configuration

Environment Variables

Variable Default Description
MCP_WINDOWS_KEYBOARD_CHUNK_DELAY_MS 10 Delay between text chunks
MCP_WINDOWS_KEYBOARD_KEY_DELAY_MS 10 Delay between key presses
MCP_WINDOWS_KEYBOARD_SEQUENCE_DELAY_MS 50 Delay between sequence keys
MCP_WINDOWS_MOUSE_MOVE_DELAY_MS 10 Delay after mouse move
MCP_WINDOWS_MOUSE_CLICK_DELAY_MS 50 Delay after mouse click
MCP_WINDOWS_WINDOW_TIMEOUT_MS 5000 Default window operation timeout
MCP_WINDOWS_WINDOW_WAITFOR_TIMEOUT_MS 30000 Default wait_for timeout
MCP_WINDOWS_WINDOW_PROPERTY_TIMEOUT_MS 100 Timeout for querying window properties
MCP_WINDOWS_WINDOW_POLLING_INTERVAL_MS 250 Polling interval for wait_for
MCP_WINDOWS_WINDOW_ACTIVATION_MAX_RETRIES 3 Max retries for window activation
MCP_WINDOWS_SCREENSHOT_TIMEOUT_MS 5000 Screenshot operation timeout
MCP_WINDOWS_SCREENSHOT_MAX_PIXELS 33177600 Maximum capture size (default 8K)

Testing

# Run all tests
dotnet test

# Run unit tests only
dotnet test --filter "FullyQualifiedName~Unit"

# Run integration tests only (requires Windows desktop session)
dotnet test --filter "FullyQualifiedName~Integration"

Security Considerations

  • UIPI: Windows User Interface Privilege Isolation blocks input to elevated windows from non-elevated processes
  • Secure Desktop: Input cannot be sent during UAC prompts or lock screen
  • Input Simulation: The server uses SendInput which is the standard Windows API for simulating input

License

MIT License - see LICENSE file for details.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on:

  • Setting up the development environment
  • Branch naming and commit conventions
  • Testing requirements
  • Pull request process
  • Code style standards
  • Release procedures

Start with Getting Started if you're new to the project.

About

Allows an LLM/coding agent like GitHub Copilot or Claude to use Windows 11 with mouse, keyboard and windows tools. Take screenshots to allow the LLM to see what it is doing.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published