Desktop automation for AI agents that doesn't look like a robot. Bezier mouse paths with muscle-frequency tremor. Bigram-aware typing with realistic typos. No WebDriver. No kernel hooks. No
navigator.webdriver.
ghosthand is a single PowerShell module + SKILL.md manifest that
lets any agent — GitHub Copilot CLI, Claude Code, Cursor, Hermes, or any
Anthropic-compatible runtime — drive a real Windows desktop the way a person
does. It sees the screen via screenshots, clicks and types at human speed,
navigates the browser by keyboard, and reads structured UI state via Windows
UI Automation.
The trick: most desktop automation is trivially distinguishable from a human
(instant cursor teleports, batched keystrokes, navigator.webdriver=true).
ghosthand intentionally produces realistic input timing on the OS layer where
real users live. Cost: actions take real human time. Benefit: they look like
a human did them — because, structurally, they are.
| ghosthand | PyAutoGUI / nut.js | Playwright / Puppeteer | Selenium | |
|---|---|---|---|---|
| Realistic mouse paths (Bezier + tremor) | ✅ | ❌ teleport | n/a | n/a |
| Bigram-aware human typing cadence | ✅ | ❌ uniform | ❌ uniform | ❌ uniform |
navigator.webdriver stays false |
✅ | n/a | ❌ true | ❌ true |
| Works in any app, not just browsers | ✅ | ✅ | ❌ browser-only | ❌ browser-only |
| Reads structured UI tree (UIA) | ✅ | ❌ pixels only | ✅ DOM | ✅ DOM |
| Kernel hooks / virtual HID required | ❌ user-mode | ❌ user-mode | n/a | n/a |
| Ships as an agent skill | ✅ | ❌ | ❌ | ❌ |
- Sees the screen. Background watcher writes a fresh screenshot to
state/current.pngevery ~250 ms. Multi-monitor + virtual-screen capture. - Drives the mouse like a human. Bezier paths with side-bias, 8–13 Hz sinusoidal tremor (real muscle frequency), sub-pixel correction near the target, optional overshoot for long travels, Fitts-law timing, pre-click hover dwell.
- Types like a human. Bigram-aware log-normal cadence (common pairs like th fast, same-finger pairs like ed slower), occasional adjacent-key typos with backspace correction, configurable WPM and typo rate.
- Talks to the UI tree, not pixels. Native UIA
PropertyConditionfiltering (~100 ms on busy pages), accessibility-snapshot trees with auto-generated refs (w1e36), retry on transient COM errors, built-inWait-Forprimitives. - Drives the browser by clicking and typing, not by attaching a debugger.
navigator.webdriverstaysfalsebecause there is no WebDriver. Captures page text via the browser's own UIA tree. - Detects CAPTCHAs without solving them. Surfaces the detection so a human can step in.
git clone https://github.com/niteshdangi/ghosthand ~/.copilot/skills/ghosthand
# or for Claude Code:
git clone https://github.com/niteshdangi/ghosthand ~/.claude/skills/ghosthandThe agent auto-discovers skills in those folders. Tell it something like "open Notepad and type a haiku" — it'll invoke the skill and execute.
git clone https://github.com/niteshdangi/ghosthand
cd ghosthand
. ./Ghosthand.ps1
# Move the cursor with overshoot + tremor + sub-pixel correction
Move-MouseHuman -X 800 -Y 400
# Type with bigram-aware human cadence + occasional typos
Send-Text "Hello, world!"
# Drive the browser
$br = Open-Url 'https://github.com/trending'
Click-LinkByText -Text 'microsoft / vscode' -Partial
$pageText = Get-PageText -ProcessId $br.ProcessId. ./Ghosthand.ps1
Start-ScreenWatcher # vision daemon
$br = Open-Url 'https://github.com/trending' # human-typed URL
Click-LinkByText -Text 'microsoft / vscode' -Partial # navigate
$null = Wait-ForUrlContains -Pattern '/microsoft/vscode'
Click-UIElement -NameLike 'Issues (' # repo issues tab
$null = Wait-ForUrlContains -Pattern '/issues'
Get-PageText -ProcessId $br.ProcessId | Select-Object -First 50
Stop-ScreenWatcherThat whole flow takes ~25 s, mostly browser render time. See
SKILL.md for the latency-budget table.
This is the bit most desktop-automation libraries skip.
Mouse:
- Bezier curve, not straight line. Two control points biased perpendicular to the travel vector — left-handers and right-handers curve differently; ghosthand picks one and stays consistent within a session.
- 8–13 Hz tremor layered on top. That's the real frequency of physiological hand tremor. Anti-bot systems looking at velocity power-spectra see the right bump.
- Fitts' law timing — small targets take longer than big ones, by the same coefficients real humans hit on mouse studies.
- Overshoot + correction on long travels. Real arms ballistically overshoot and self-correct; ghosthand does the same with a sub-pixel approach phase.
- Pre-click hover dwell — humans don't click the millisecond they arrive.
Keyboard:
- Bigram-aware cadence. A log-normal distribution per bigram, with common
English pairs (
th,er,in) faster and same-finger pairs (ed,tr) slower — matches measured typist data. - Adjacent-key typos at a configurable rate, with realistic backspace correction latency. (Type "Hellp" → pause → backspace → "o".)
- Per-keystroke jitter so two consecutive
es don't fire at identical intervals.
See src/Realism.ps1 for the constants and the
distribution code.
| File | Purpose |
|---|---|
Ghosthand.psd1 |
Module manifest (version, exports) |
Ghosthand.psm1 |
Module entry — sources src/*.ps1 in order |
Ghosthand.ps1 |
Back-compat shim for . ./Ghosthand.ps1 callers |
src/ |
Per-concern PowerShell sources (12 files: Types, State, Realism, Logging, Vision, Mouse, Keyboard, Window, UIA, UIASnapshot, Browser, Bundle) |
watcher.ps1 |
Background screenshot daemon |
SKILL.md |
Manifest + agent-facing usage rules and recipes |
tests/ |
Pester smoke tests (run via Invoke-Pester ./tests) |
CHANGELOG.md |
Versioned change log |
SECURITY.md |
Threat model + reporting |
CONTRIBUTING.md |
Dev/test guide + PR rules |
v0.1.0 — alpha. Works end-to-end on common Windows workflows (dashboard navigation, file management, text editing). Documented limitations and known sharp edges below.
| Platform | Status |
|---|---|
| Windows 10 / 11 | ✅ supported |
| macOS | 🚧 not yet implemented — PRs welcome |
| Linux (X11 / Wayland) | 🚧 not yet implemented — PRs welcome |
The current implementation is built on Win32 P/Invoke (SetCursorPos,
mouse_event, keybd_event) plus Windows UI Automation. Adding macOS
support means implementing the same surface against Quartz Event Services
(input) and the macOS Accessibility API (AX) for UI-tree access, with a
small platform-abstraction layer in Ghosthand.ps1. Linux support would
similarly use XTest/uinput plus AT-SPI. Both are scoped, tractable
contributions if you'd like to take a swing — see
CONTRIBUTING.md for the rules of engagement, and
open an issue first to coordinate the design before sending a large PR.
- Personal AI assistants that run on your own desktop.
- Long-form GUI flows where no API is available (third-party dashboards, legacy apps, configuration panels).
- Disclosed, legitimate browser automation: research, content drafting, reading email, summarising pages — the kind of thing you'd do yourself.
- Accessibility-style flows where reading the UIA tree is genuinely useful.
- Games / kernel anti-cheat bypass. Synthetic input sets the
LLMHF_INJECTEDflag in every event record. Anti-cheat drivers see it. Don't. - Platform Terms-of-Service evasion. Most major sites prohibit automated account creation, automated engagement, and undisclosed bot-account operation. The skill detects CAPTCHAs but won't solve them. PRs that add ToS-evasion features will be closed without review.
- Multi-user, multi-machine, or RDP-session targeting. Single user, single primary console session.
- Replacement for proper APIs when one exists. If a service has a real API, use that instead.
- Multi-monitor on mixed-DPI setups. The capture math handles negative monitor offsets, but combinations of 100% + 150% + 200% scaling have not been stress-tested.
- Firefox. Tested informally; Edge and Chrome are the maintained targets.
- Non-English UI. Bigram timing tables are tuned for English. The module still works for any text, just with less-realistic timing.
Wait-PageLoaduses title-stability as a proxy for page-load completion. Pages with live notification counters (Gmail "(5)" → "(6)") can fool it into timing out.Find-DropdownByLabelis a heuristic — it picks the nearest interactive element below a label. Occasionally matches the wrong control on dense forms; pass-MaxYDistance/-MaxXDistanceto tighten.Focus-Windowis best-effort — Windows foreground-lock policies can causeSetForegroundWindowto silently fail or return the wrong status. Always check.Successand useWait-ForForegroundWindowto confirm.- If a script is killed mid-
Send-Hotkey, modifier keys can stay logically down. RunReset-InputStateto recover, or document this prominently in any tooling that wraps the skill.
Most desktop automation libraries either:
- Use kernel-mode hooks or virtual HIDs that fail at the user level (and sometimes get flagged by AV).
- Drive a debugger-attached browser, which web platforms detect via
navigator.webdriverand similar tells. - Optimise for "fast" — instant cursor teleports, batched keystrokes — which produces input timing patterns that are trivially distinguishable from a human.
ghosthand takes the third path seriously. It uses ordinary user-mode
SetCursorPos / mouse_event / keybd_event, drives a normal user-launched
browser, and intentionally produces realistic timing. The cost is that
actions take real human time. The benefit is that they look like a real
human did them, on real hardware — because, structurally, they are.
PRs welcome — see CONTRIBUTING.md. Particularly looking
for: macOS port (Quartz + AX), Linux port (XTest/uinput + AT-SPI), Firefox
parity, and bigram tables for non-English layouts.
MIT — see LICENSE.