ghosthand

Desktop automation for AI agents that doesn't look like a robot. Bezier mouse paths with muscle-frequency tremor. Bigram-aware typing with realistic typos. No WebDriver. No kernel hooks. No navigator.webdriver.

ghosthand is a single PowerShell module + SKILL.md manifest that lets any agent — GitHub Copilot CLI, Claude Code, Cursor, Hermes, or any Anthropic-compatible runtime — drive a real Windows desktop the way a person does. It sees the screen via screenshots, clicks and types at human speed, navigates the browser by keyboard, and reads structured UI state via Windows UI Automation.

The trick: most desktop automation is trivially distinguishable from a human (instant cursor teleports, batched keystrokes, navigator.webdriver=true). ghosthand intentionally produces realistic input timing on the OS layer where real users live. Cost: actions take real human time. Benefit: they look like a human did them — because, structurally, they are.

Why ghosthand vs. the alternatives

	ghosthand	PyAutoGUI / nut.js	Playwright / Puppeteer	Selenium
Realistic mouse paths (Bezier + tremor)	✅	❌ teleport	n/a	n/a
Bigram-aware human typing cadence	✅	❌ uniform	❌ uniform	❌ uniform
`navigator.webdriver` stays `false`	✅	n/a	❌ true	❌ true
Works in any app, not just browsers	✅	✅	❌ browser-only	❌ browser-only
Reads structured UI tree (UIA)	✅	❌ pixels only	✅ DOM	✅ DOM
Kernel hooks / virtual HID required	❌ user-mode	❌ user-mode	n/a	n/a
Ships as an agent skill	✅	❌	❌	❌

What it does

Sees the screen. Background watcher writes a fresh screenshot to state/current.png every ~250 ms. Multi-monitor + virtual-screen capture.
Drives the mouse like a human. Bezier paths with side-bias, 8–13 Hz sinusoidal tremor (real muscle frequency), sub-pixel correction near the target, optional overshoot for long travels, Fitts-law timing, pre-click hover dwell.
Types like a human. Bigram-aware log-normal cadence (common pairs like th fast, same-finger pairs like ed slower), occasional adjacent-key typos with backspace correction, configurable WPM and typo rate.
Talks to the UI tree, not pixels. Native UIA PropertyCondition filtering (~100 ms on busy pages), accessibility-snapshot trees with auto-generated refs (w1e36), retry on transient COM errors, built-in Wait-For primitives.
Drives the browser by clicking and typing, not by attaching a debugger. navigator.webdriver stays false because there is no WebDriver. Captures page text via the browser's own UIA tree.
Detects CAPTCHAs without solving them. Surfaces the detection so a human can step in.

Quick start

As a GitHub Copilot CLI / Claude Code skill

git clone https://github.com/niteshdangi/ghosthand ~/.copilot/skills/ghosthand
# or for Claude Code:
git clone https://github.com/niteshdangi/ghosthand ~/.claude/skills/ghosthand

The agent auto-discovers skills in those folders. Tell it something like "open Notepad and type a haiku" — it'll invoke the skill and execute.

As a standalone PowerShell module

git clone https://github.com/niteshdangi/ghosthand
cd ghosthand
. ./Ghosthand.ps1

# Move the cursor with overshoot + tremor + sub-pixel correction
Move-MouseHuman -X 800 -Y 400

# Type with bigram-aware human cadence + occasional typos
Send-Text "Hello, world!"

# Drive the browser
$br = Open-Url 'https://github.com/trending'
Click-LinkByText -Text 'microsoft / vscode' -Partial
$pageText = Get-PageText -ProcessId $br.ProcessId

A minimal end-to-end demo

. ./Ghosthand.ps1
Start-ScreenWatcher                                     # vision daemon
$br = Open-Url 'https://github.com/trending'            # human-typed URL
Click-LinkByText -Text 'microsoft / vscode' -Partial    # navigate
$null = Wait-ForUrlContains -Pattern '/microsoft/vscode'
Click-UIElement -NameLike 'Issues  ('                   # repo issues tab
$null = Wait-ForUrlContains -Pattern '/issues'
Get-PageText -ProcessId $br.ProcessId | Select-Object -First 50
Stop-ScreenWatcher

That whole flow takes ~25 s, mostly browser render time. See SKILL.md for the latency-budget table.

How the realism actually works

This is the bit most desktop-automation libraries skip.

Mouse:

Bezier curve, not straight line. Two control points biased perpendicular to the travel vector — left-handers and right-handers curve differently; ghosthand picks one and stays consistent within a session.
8–13 Hz tremor layered on top. That's the real frequency of physiological hand tremor. Anti-bot systems looking at velocity power-spectra see the right bump.
Fitts' law timing — small targets take longer than big ones, by the same coefficients real humans hit on mouse studies.
Overshoot + correction on long travels. Real arms ballistically overshoot and self-correct; ghosthand does the same with a sub-pixel approach phase.
Pre-click hover dwell — humans don't click the millisecond they arrive.

Keyboard:

Bigram-aware cadence. A log-normal distribution per bigram, with common English pairs (th, er, in) faster and same-finger pairs (ed, tr) slower — matches measured typist data.
Adjacent-key typos at a configurable rate, with realistic backspace correction latency. (Type "Hellp" → pause → backspace → "o".)
Per-keystroke jitter so two consecutive es don't fire at identical intervals.

See src/Realism.ps1 for the constants and the distribution code.

What's in the box

File	Purpose
`Ghosthand.psd1`	Module manifest (version, exports)
`Ghosthand.psm1`	Module entry — sources `src/*.ps1` in order
`Ghosthand.ps1`	Back-compat shim for `. ./Ghosthand.ps1` callers
`src/`	Per-concern PowerShell sources (12 files: Types, State, Realism, Logging, Vision, Mouse, Keyboard, Window, UIA, UIASnapshot, Browser, Bundle)
`watcher.ps1`	Background screenshot daemon
`SKILL.md`	Manifest + agent-facing usage rules and recipes
`tests/`	Pester smoke tests (run via `Invoke-Pester ./tests`)
`CHANGELOG.md`	Versioned change log
`SECURITY.md`	Threat model + reporting
`CONTRIBUTING.md`	Dev/test guide + PR rules

Status

v0.1.0 — alpha. Works end-to-end on common Windows workflows (dashboard navigation, file management, text editing). Documented limitations and known sharp edges below.

Platform support

Platform	Status
Windows 10 / 11	✅ supported
macOS	🚧 not yet implemented — PRs welcome
Linux (X11 / Wayland)	🚧 not yet implemented — PRs welcome

The current implementation is built on Win32 P/Invoke (SetCursorPos, mouse_event, keybd_event) plus Windows UI Automation. Adding macOS support means implementing the same surface against Quartz Event Services (input) and the macOS Accessibility API (AX) for UI-tree access, with a small platform-abstraction layer in Ghosthand.ps1. Linux support would similarly use XTest/uinput plus AT-SPI. Both are scoped, tractable contributions if you'd like to take a swing — see CONTRIBUTING.md for the rules of engagement, and open an issue first to coordinate the design before sending a large PR.

Scope and limits

What this is for

Personal AI assistants that run on your own desktop.
Long-form GUI flows where no API is available (third-party dashboards, legacy apps, configuration panels).
Disclosed, legitimate browser automation: research, content drafting, reading email, summarising pages — the kind of thing you'd do yourself.
Accessibility-style flows where reading the UIA tree is genuinely useful.

What this is not for

Games / kernel anti-cheat bypass. Synthetic input sets the LLMHF_INJECTED flag in every event record. Anti-cheat drivers see it. Don't.
Platform Terms-of-Service evasion. Most major sites prohibit automated account creation, automated engagement, and undisclosed bot-account operation. The skill detects CAPTCHAs but won't solve them. PRs that add ToS-evasion features will be closed without review.
Multi-user, multi-machine, or RDP-session targeting. Single user, single primary console session.
Replacement for proper APIs when one exists. If a service has a real API, use that instead.

Things that work but aren't extensively tested

Multi-monitor on mixed-DPI setups. The capture math handles negative monitor offsets, but combinations of 100% + 150% + 200% scaling have not been stress-tested.
Firefox. Tested informally; Edge and Chrome are the maintained targets.
Non-English UI. Bigram timing tables are tuned for English. The module still works for any text, just with less-realistic timing.

Known sharp edges

Wait-PageLoad uses title-stability as a proxy for page-load completion. Pages with live notification counters (Gmail "(5)" → "(6)") can fool it into timing out.
Find-DropdownByLabel is a heuristic — it picks the nearest interactive element below a label. Occasionally matches the wrong control on dense forms; pass -MaxYDistance / -MaxXDistance to tighten.
Focus-Window is best-effort — Windows foreground-lock policies can cause SetForegroundWindow to silently fail or return the wrong status. Always check .Success and use Wait-ForForegroundWindow to confirm.
If a script is killed mid-Send-Hotkey, modifier keys can stay logically down. Run Reset-InputState to recover, or document this prominently in any tooling that wraps the skill.

Why this exists

Most desktop automation libraries either:

Use kernel-mode hooks or virtual HIDs that fail at the user level (and sometimes get flagged by AV).
Drive a debugger-attached browser, which web platforms detect via navigator.webdriver and similar tells.
Optimise for "fast" — instant cursor teleports, batched keystrokes — which produces input timing patterns that are trivially distinguishable from a human.

ghosthand takes the third path seriously. It uses ordinary user-mode SetCursorPos / mouse_event / keybd_event, drives a normal user-launched browser, and intentionally produces realistic timing. The cost is that actions take real human time. The benefit is that they look like a real human did them, on real hardware — because, structurally, they are.

Contributing

PRs welcome — see CONTRIBUTING.md. Particularly looking for: macOS port (Quartz + AX), Linux port (XTest/uinput + AT-SPI), Firefox parity, and bigram tables for non-English layouts.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ghosthand

Why ghosthand vs. the alternatives

What it does

Quick start

As a GitHub Copilot CLI / Claude Code skill

As a standalone PowerShell module

A minimal end-to-end demo

How the realism actually works

What's in the box

Status

Platform support

Scope and limits

What this is for

What this is not for

Things that work but aren't extensively tested

Known sharp edges

Why this exists

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
src		src
state		state
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Ghosthand.ps1		Ghosthand.ps1
Ghosthand.psd1		Ghosthand.psd1
Ghosthand.psm1		Ghosthand.psm1
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SKILL.md		SKILL.md
watcher.ps1		watcher.ps1

Folders and files

Latest commit

History

Repository files navigation

ghosthand

Why ghosthand vs. the alternatives

What it does

Quick start

As a GitHub Copilot CLI / Claude Code skill

As a standalone PowerShell module

A minimal end-to-end demo

How the realism actually works

What's in the box

Status

Platform support

Scope and limits

What this is for

What this is not for

Things that work but aren't extensively tested

Known sharp edges

Why this exists

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages