Add ui send-keys (#562) and ui drag + scroll --wheel (#498) input-injection verbs#587
Conversation
Introduce winapp ui send-keys to synthesize keyboard input, the keyboard counterpart to click. Supports named keys (down, enter, tab, f5), whitespace-separated sequences, modifier combos (ctrl+shift+t), literal text, and raw virtual keys (vk=0xNN). Two transports via --via: post-message (default, HWND-targeted, bypasses UIPI) and send-input (OS-wide). --target focuses an element via UIA before sending. Standard -w/-a/--json options. - KeyStringParser: token grammar to KeyAction list - IKeyboardInput/RealKeyboardInput/KeyboardInput: PostMessage + SendInput transports - UiSendKeysCommand wired into DI, UiCommand, HostBuilderExtensions - UiSendKeysResult JSON model - Unit tests (parser + command) and FakeKeyboardInput - Docs, skill fragment, regenerated cli-schema + npm SDK wrapper Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add `winapp ui drag <selector> <fromX,fromY> <toX,toY>` to synthesize a mouse drag: press at a point inside an element, move to another point in small steps (so the app sees a realistic WM_MOUSEMOVE stream), then release. Coordinates are x,y offsets from the element's top-left corner. Use --right for a right-button drag. Useful for reorder/resize handles, sliders, canvas drawing, and drag-and-drop. Also add `--wheel <delta>` to `ui scroll` to synthesize real mouse-wheel input over an element (120 = one notch), for testing wheel handlers (zoom, custom scroll) that ScrollPattern can't exercise. - Extend IMouseInput/RealMouseInput/MouseInput with Drag + ScrollWheel (SendInput, virtual-desktop normalized like the existing click/hover) - UiDragCommand wired into DI, UiCommand, HostBuilderExtensions - UiDragResult JSON model; UiScrollResult gains a wheel field - Unit tests for drag + scroll-wheel; FakeMouseInput records both - Docs, skill fragment, regenerated cli-schema + npm SDK wrappers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The SendInput transport previously typed literal text via KEYEVENTF_UNICODE packets, which deliver the character (WM_CHAR -> TextChanged) but raise KeyDown with VK_PACKET rather than the real key. Downstream WinUI 3/WPF handlers that key off KeyDown with the correct virtual key did not see a faithful keystroke. Map each character to its virtual key (plus Shift) via VkKeyScan and emit real key-down/up events so the target sees a genuine KeyDown with the correct VK and the OS composes the matching WM_CHAR (TextChanged) - one full keystroke per character. Fall back to a Unicode packet for characters not reachable on the current layout or requiring Ctrl/AltGr, so the exact character still lands. PostMessage typed text keeps WM_CHAR (correct text + TextChanged across integrity levels, no per-char KeyDown); docs clarify when to use which transport for per-keystroke fidelity. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…help (#562) The --via option and command help now state that typed text raises a real per-character KeyDown only on send-input (post-message raises TextChanged but not a per-character KeyDown), and that named keys/combos raise KeyDown on both. Recommends --via send-input for per-keystroke KeyDown on a WinUI 3/WPF TextBox. Regenerated cli-schema.json and the ui-automation skill from the CLI help. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build Metrics ReportBinary Sizes
Test Results✅ 1545 passed, 1 skipped out of 1546 tests in 428.7s (+221 tests, +73.6s vs. baseline) Test Coverage❌ 18.1% line coverage, 37.3% branch coverage · ✅ +0.7% vs. baseline CLI Startup Time41ms median (x64, Updated 2026-06-29 23:15:26 UTC · commit |
There was a problem hiding this comment.
Pull request overview
This PR adds three input-injection capabilities to winapp ui, the keyboard/mouse gestures needed to migrate a WinUI 3 E2E suite off WinAppDriver: a new ui send-keys verb (synthetic keyboard input via PostMessage or SendInput), a new ui drag verb (button-press → stepped move → release), and a --wheel option on ui scroll (real mouse-wheel input). The implementation cleanly follows the existing click/hover conventions: command + nested DI handler, source-generated JSON result models, UiErrors/UiJsonError envelopes, and new testable abstractions (IKeyboardInput, IMouseInput.Drag/ScrollWheel) with Real* production and Fake* test implementations. Docs, the CLI schema, the Copilot/Claude skills, and the npm TS SDK are regenerated accordingly.
Changes:
- New
UiSendKeysCommand+KeyStringParser/KeyboardInput(named keys, sequences, modifier combos,vk=0xNN, literal text;--via post-message|send-input,--targetfocus). - New
UiDragCommand+MouseInput.Drag; new--wheelonUiScrollCommand+MouseInput.ScrollWheel; DI/registration, JSON result models, andNativeMethods.txtP/Invoke additions. - 34 new unit tests (parser + commands), plus regenerated docs/schema/skills/npm wrappers.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs |
New send-keys command + handler, transport parsing, target focus. |
src/winapp-CLI/WinApp.Cli/Commands/UiDragCommand.cs |
New drag command + handler with element-relative point parsing. |
src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs |
Adds --wheel option and SendInput wheel path; reports element hwnd. |
src/winapp-CLI/WinApp.Cli/Commands/UiCommand.cs |
Registers drag/send-keys subcommands. |
src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs |
New key-string grammar parser (chords, named keys, vk=, text). |
src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs |
PostMessage/SendInput keyboard synthesis with documented limits. |
src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs |
Adds Drag and ScrollWheel; repeats the SendInput-failure guard (nit). |
src/winapp-CLI/WinApp.Cli/Helpers/I{Keyboard,Mouse}Input.cs, Real*, UiJsonContext.cs |
Interfaces, prod impls, new JSON result models. |
src/winapp-CLI/WinApp.Cli/Helpers/HostBuilderExtensions.cs |
DI registration of IKeyboardInput and the two new handlers. |
src/winapp-CLI/WinApp.Cli/NativeMethods.txt |
Adds PostMessage/MapVirtualKey/VkKeyScan + WM_*/VIRTUAL_KEY. |
src/winapp-CLI/WinApp.Cli.Tests/* |
KeyStringParserTests, send-keys/drag/wheel command tests, FakeKeyboardInput. |
src/winapp-npm/src/winapp-commands.ts |
Regenerated uiDrag/uiSendKeys wrappers + wheel on uiScroll. |
docs/ui-automation.md, docs/fragments/skills/.../ui-automation.md, .github/plugin/.../SKILL.md, .claude/.../SKILL.md, docs/cli-schema.json, scripts/generate-llm-docs.ps1 |
Regenerated/updated docs, schema, skills, and skill→command map. |
Note on doc sync (not commentable inline — these files are not in the diff): two hand-written command lists were not updated for the new verbs — docs/usage.md (### ui "Commands", lines 1076-1091) and .github/plugin/agents/winapp.agent.md ("Key subcommands", lines 207-221). Both enumerate click/hover/etc. by hand and should gain drag, send-keys, and the scroll --wheel option to stay in sync.
Inline findings: one test_coverage nit (the new --wheel zero-size guard is untested while the analogous hover/click guards are covered) and one maintainability nit (the SendInput sent == 0 guard is now duplicated four times in MouseInput.cs).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…size path (#587) - Extract MouseInput.SendInputs(Span<INPUT>) helper so the SendInput zero-return / 'window may be elevated' guard lives in one place instead of being duplicated across SendMove/SendClick/SendButton/ScrollWheel. - Add Scroll_Wheel_ZeroSizeElement_ReturnsError covering the new --wheel CodeZeroSize guard (exit 1, no ScrollWheel call), matching the Hover_ZeroSizeElement_ReturnsError coverage convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The auto-generated schema/skill docs already cover the new ui drag and send-keys verbs, but two hand-written command lists were missed: - docs/usage.md (ui Commands list) - .github/plugin/agents/winapp.agent.md (Key subcommands), mirrored to .claude/agents/winapp.md via sync-claude-plugin.ps1 Add drag and send-keys entries matching the existing sibling-verb style in each list. scroll --wheel is already covered by the existing scroll entry, so no separate entry was added. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR Review —
|
| Dimension | Result |
|---|---|
| security | ⚠ 2 findings (1 folded into H2) |
| correctness | ⚠ 4 findings |
| cli-ux | ⚠ 2 findings (1 folded into H1) |
| alternative-solution | ✓ clean |
| test-coverage | ⚠ 4 findings |
| docs-and-samples | ✓ clean (1 finding disputed & dropped by cross-model) |
| packaging | ⚠ 1 finding |
| multi-model | 2/3 highs confirmed · 1 disputed/dropped · +1 new high |
Findings
| ID | File:lines | Domain | Finding |
|---|---|---|---|
| H1 | src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:86-113 |
correctness, cli-ux | Literal text split on whitespace → send-keys "Hello world" types Helloworld; documented example is broken |
| H2 | src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:142-151 |
correctness, security | Keyboard SendInput short write (sent < length) treated as success → stuck modifier |
| H3 | src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77,143-179 |
correctness, security | Mouse SendInputs short write treated as success + no finally cleanup → stuck mouse button (new, found by cross-model; absorbs the drag-cleanup finding) |
| M1 | src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs:85-130 |
cli-ux | scroll accepts conflicting --direction/--to/--wheel and silently prefers --wheel |
| M2 | src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134 |
security | --via send-input doesn't verify focus landed on the target before sending → may inject into the wrong window |
| M3 | src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134 |
test-coverage | send-keys/drag/scroll --wheel not exercised by the WinUI e2e script |
| M4 | src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:91-124 |
correctness | Any token containing + is parsed as a chord → realistic literals like C++ / a+b throw |
| M5 | src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:119-145 |
test-coverage | Parser negative/edge coverage shallow (malformed combos, {} literals, Unicode) |
| M6 | src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:58-69 |
test-coverage | Modifier down/up pairing and release-on-failure (stuck-key) untested |
| L1 | src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77 |
test-coverage | Drag interpolation / button-event sequencing not covered |
| L2 | src/winapp-npm/src/winapp-commands.ts:1003-1010 |
packaging | Generated npm wrapper descriptions stale vs cli-schema.json |
Details
H1 — src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:86-113 · correctness + cli-ux · Multi-model: confirmed
- Severity: high · Confidence: high
- Finding: Literal text containing spaces is split into separate
TextInputactions, so the space is lost —send-keys "Hello world"typesHelloworld. - Evidence:
var tokens = keys.Split((char[]?)null, StringSplitOptions.RemoveEmptyEntries);(87) → each token falls through toactions.Add(new TextInput(token));(113); the space is never reintroduced when sending chars (KeyboardInput.cs:73-76, 128-132). The documented examplewinapp ui send-keys "Hello world"(docs/ui-automation.md:289) is therefore broken. (Unknown tokens being typed as literal text is intentional/documented; the whitespace loss is the bug.) - Recommendation: Preserve whitespace within literal runs, or add an explicit literal path (e.g.
--text <string>) that types the string verbatim. Fix or remove the broken"Hello world"doc example to match the chosen behavior.
H2 — src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:142-151 · correctness + security · Multi-model: confirmed
- Severity: high · Confidence: high
- Finding: A partial
SendInputwrite is treated as success, which can leave a modifier-down event without its matching key-up (a "stuck modifier" that affects the user's whole session). - Evidence:
var sent = PInvoke.SendInput((uint)array.Length, pInputs, sizeof(INPUT));(145) thenif (sent == 0)(146) — only zero is treated as failure. The batch can contain modifier-down, key-down/up, and modifier-up events (KeyboardInput.cs:113-125), sosent < array.Lengthis silently accepted with no retry or best-effort release. - Recommendation: Treat
sent != array.Lengthas failure, and in afinally/cleanup path issue best-effort key-up for any modifiers pressed in the batch.
H3 — src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77,143-179 · correctness + security · Multi-model: new finding
- Severity: high · Confidence: high
- Finding: The mouse path has the same defect:
SendInputsaccepts short writes as success, andDragpresses the button via separate calls with nofinallycleanup — so a mid-gesture failure can leave a mouse button logically held down. - Evidence:
DragdoesSendButton(downFlag)(62) …SendButton(upFlag)(76) as separate calls with no try/finally;SendClickbuilds a two-event down/up batch (145-156); butSendInputsonly checksif (sent == 0)(172-173). A short write strands the button-down. (This was independently flagged by the correctness pass for the drag-cleanup angle,MouseInput.cs:62-76.) - Recommendation: Require
sent == inputs.LengthinSendInputs, and wrap click/drag/right-drag gestures intry/finallythat always attempts the button-up. Consider extracting the shared "send N inputs, verify count, release-on-failure" logic so keyboard (H2) and mouse use one correct implementation.
M1 — src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs:85-130 · cli-ux
- Severity: medium · Confidence: high
- Finding:
ui scrollaccepts mutually conflicting modes and silently prefers--wheel, so--direction/--toare ignored without warning. - Evidence: Validation only rejects when all three are null (85); the handler branches on
wheel is int delta(107) and only usesdirection/toin theelse(130). - Recommendation: Require exactly one of
--direction/--to/--wheel; on conflict, return a non-zero invalid-arguments error naming the conflicting flags.
M2 — src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134 · security
- Severity: medium · Confidence: high
- Finding: With
--via send-input, focus success isn't verified immediately before sending, so keystrokes can land in whatever window is actually foreground (a different app, a UAC prompt, etc.). - Evidence: Lines 126-134 call
SetForegroundWindow, wait, then send input without checkingGetForegroundWindowagainst the resolved target. - Recommendation: For
send-input, verify the foreground window belongs to the resolved target HWND/process right before sending and fail (non-zero) otherwise.
M3 — src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134 · test-coverage
- Severity: medium · Confidence: high
- Finding: The new real-input verbs aren't in the WinUI e2e script that claims to cover all
uicommands. - Evidence:
scripts/test-e2e-winui-ui.ps1Phase 3 exercises click/hover/scroll-pattern only — nosend-keys,drag, orscroll --wheel. (Thescroll --wheelzero-size unit path is covered.) - Recommendation: Add e2e steps for
send-keys,drag, andscroll --wheelagainst the WinUI sample.
M4 — src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:91-124 · correctness
- Severity: medium · Confidence: high
- Finding: Any multi-character token containing
+is treated as a chord, so realistic literals likeC++ora+bthrow instead of typing. - Evidence:
if (token.Contains('+') && token.Length > 1)(92) thenthrow new FormatException($"Invalid key combo '{token}'...");(124). - Recommendation: Only treat a token as a chord when the leading segments are valid modifiers; provide an escape/literal path for
+.
M5 — src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:119-145 · test-coverage
- Severity: medium · Confidence: high
- Finding: Parser negative/edge coverage is shallow.
- Evidence: Tests cover happy paths plus empty input, invalid
vk=, and unknown modifier, but not malformed combos (ctrl+,+a,ctrl++a,ctrl+bogus), brace-like/empty{}literals, or Unicode tokens (KeyStringParserTests.cs). - Recommendation: Add
[DataRow]-driven tests for malformed combos and the intended literal behavior of brace/+/Unicode inputs — these would directly pin H1 and M4.
M6 — src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:58-69 · test-coverage
- Severity: medium · Confidence: high
- Finding: Modifier down/up pairing and stuck-key-on-failure behavior are untested.
- Evidence: Command tests use
FakeKeyboardInputand assert transport/action counts, but nothing observes theKeyDown(main) → KeyUp(main) → KeyUp(modifier)order or simulates a failure after a modifier-down. - Recommendation: Add a keyboard-event sink (or fault-injection wrapper) so tests assert modifier release order and release-on-exception — the regression test for H2.
L1 — src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77 · test-coverage
- Severity: low · Confidence: high
- Finding: Drag interpolation and button-event sequencing aren't covered.
- Evidence:
UiCommandTestsassertFakeMouseInput.Dragreceives endpoints/right-button, but not the multi-step movement stream or down/up ordering implemented inMouseInput. - Recommendation: Add a low-level mouse-event sink test (or e2e drag scenario) validating the interpolated path and button ordering — the regression test for H3.
L2 — src/winapp-npm/src/winapp-commands.ts:1003-1010 · packaging
- Severity: low · Confidence: high
- Finding: The autogenerated npm command surface is stale vs
docs/cli-schema.jsonforsend-keysdescriptions. - Evidence:
winapp-commands.tslacks the newer per-keystroke/TextChanged wording present atdocs/cli-schema.json:3308-3309;npm run generate-commands:checkreports out of date. - Recommendation: Re-run
npm run generate-commandsfrom the updated schema and commit the regeneratedwinapp-commands.ts.
Disputed / dropped by cross-model
- A docs finding claimed the
send-keysKEY-STRING grammar was undocumented. Dropped:docs/ui-automation.md:295-314already has an explicit Key grammar block (named keys, sequences, modifier combos, literal text, raw VKs) plus transport/KeyDown behavior, mirrored in the skill fragment. The cross-model pass also noted that{KEYNAME}/{{/}}escaping isn't implemented inKeyStringParser.cs, so documenting it would be wrong. The only genuine doc issue is the broken"Hello world"example, folded into H1.
Generated by the winappcli pr-review skill · 8 specialist passes + GPT-5.4 cross-model verification · 2026-06-23. No code was built or executed; verify findings before acting. 🤖
Hands-on test results —
|
| Verb | Result |
|---|---|
drag DragSlider 225,24 400,24 |
slider value 50 → 91; reverse drag → 9. Offsets-from-top-left + screen mapping correct. --right and zero-size (start==end) work. |
scroll ScrollList --wheel -360 / 360 |
offset 0 → 89 → 0 (delta sign correct, clamps at top). |
scroll ScrollList --direction down / --to |
ScrollPattern paging works (offset → 441). |
send-keys --via send-input typed text |
Unicode é€, symbols a1!, case-insensitive CTRL+A/DELETE, raw vk=0x0d, named keys/arrows — all correct, real per-char KeyDown+Char observed. |
JSON output (--json) |
clean, parseable, no log lines mixed in. |
| Arg validation | bad from/to points, missing args, invalid --via → exit 1 with actionable messages. |
🐞 Confirmed bugs & usability issues
1. send-keys "Hello world" types Helloworld — space dropped (confirms static-review H1) · High
Ran send-keys "Hello world" --target InputBox --via send-input. Event log:
KeyDown H, e, l, l, o, w, o, r, l, d (no space keystroke) → InputBox = "Helloworld"
Exit code 0, prints ✅ Sent keys "Hello world". The space is the token separator (keys.Split(null, RemoveEmptyEntries) in KeyStringParser.cs:87) and is never re-emitted. The documented example winapp ui send-keys "Hello world" (docs/ui-automation.md:289) does not work. The only way to type a space today is the space named token: send-keys "Hello space world" → correctly produced Hello world. Recommend a literal path (e.g. --text "<string>") and fixing the doc example.
2. Default post-message transport silently no-ops typed text on WinUI 3 · High / Medium
Ran send-keys "x y z" --target InputBox --via post-message (post-message is the default). Result: InputBox stayed empty, EventLog stayed empty — zero events — yet the CLI printed ✅ Sent keys. WM_CHAR posted to the WinUI host HWND is not turned into text by the XAML input pipeline. Since post-message is the default and WinUI 3 is winapp's primary framework, a user running the documented happy path (winapp ui send-keys "hello" -a MyApp) gets a false success with nothing typed. (Named keys do deliver KeyDown via post-message — see below — only WM_CHAR text is dropped.) Recommend: detect/report when post-message text delivery is unsupported, or default typed-text to send-input, and at minimum make the success message honest.
3. Accelerators / combos don't fire via post-message · Medium
send-keys "ctrl+t" (a KeyboardAccelerator) via post-message did not fire (click counter unchanged, nothing logged). Via send-input it fired (Accelerator Ctrl+T, counter incremented). Worth calling out explicitly in --via help that combos/accelerators require send-input. (Named non-modifier keys like down down up do log KeyDown via post-message, so behavior is inconsistent across token kinds.)
4. Cannot type text containing + (confirms static-review M4) · Medium
send-keys "C++" → ❌ Invalid key combo 'C++'. send-keys "a+b" → ❌ Unknown modifier 'a' in 'a+b'. Any token with + is forced through chord parsing, so realistic literals fail.
5. scroll accepts conflicting modes (confirms static-review M1) · Medium
scroll ScrollList --direction down --wheel 120 passes validation (no error) and silently prefers --wheel. Recommend requiring exactly one of --direction / --to / --wheel.
6. Minor — inconsistent error formatting
The "no scroll mode specified" error prints [ERROR] - Specify --direction... (raw logger format) while every other ui error uses the ❌ symbol. Cosmetic but visible.
🔐 Security assessment (the headline ask: Ctrl+Alt+Del & system inputs)
Ctrl+Alt+Del — safe, cannot be synthesized ✅
Fired send-keys "ctrl+alt+del" --via send-input at the app. The Secure Attention Sequence did not trigger — no lock screen, no secure desktop, session stayed fully interactive. The three keys arrived at the app as ordinary KeyDowns (Control, Menu, Delete). This is a Windows kernel guarantee (SAS is unsynthesizable via SendInput), not something this tool implements — but the net result is that ctrl+alt+del is harmless here.
No blocklist for any other system combo ⚠️
The parser + injector accept and would inject every dangerous combo via --via send-input, with no warning, confirmation, or allowlist (verified each is accepted at parse stage):
win+l (lock workstation) · win+r (Run) · win+d (show desktop) · win+e · ctrl+shift+esc (Task Manager) · alt+tab · alt+f4 · ctrl+esc (Start) · printscreen · raw vk=0x5B (Win).
Via send-input these execute system-wide — e.g. win+l would lock the machine. I also confirmed alt+f4 via the (non-elevated, window-scoped) post-message transport closed the target window with no confirmation.
Why this is moderate rather than critical
- The default transport is window-scoped and cannot trigger system hotkeys.
win+lvia post-message just posts harmless messages to the target window — the dangerous path requires the explicit, documented--via send-input. - UIPI already protects elevated targets: SendInput from a non-elevated CLI into a higher-integrity window fails, and the code surfaces a clear error. You can't drive an elevated app without elevating the CLI.
- For an automation/test tool, synthesizing
win+letc. is partly by design (you may want to test lock/again behavior).
Recommended hardening
- Verify foreground before
send-input(ties to static-review M2).send-keys/drag/scroll --wheelcallSetForegroundWindowthen sleep 100 ms and inject without checking it succeeded.SetForegroundWindowsilently fails from a background process under the foreground-lock timeout, so an injected sequence — including a system combo — can land on whatever window is actually foreground, not the intended target. CheckGetForegroundWindow()belongs to the resolved HWND/process and fail otherwise. - Guard system-reserved combos. Warn (or require an explicit
--allow-system-keys) when synthesizingwin+*,ctrl+shift+esc,ctrl+alt+delvia send-input. - Fix the short-write handling (static-review H2/H3). Both keyboard and mouse paths treat a partial
SendInputwrite as success with no release-on-failure. A partial send that strandsWin/Ctrl/a mouse button down is materially more dangerous when those are system modifiers — combine the fix with best-effort key-up/button-up in afinally.
Suggested test additions for the PR
- A parser test asserting
"Hello world"round-trips a space (would catch Bump form-data from 4.0.3 to 4.0.4 in /sample #1). - A WinUI e2e step typing text via both transports and asserting the resulting TextBox value (would catch Bump form-data from 4.0.3 to 4.0.4 in /sample #1 and Bump tmp from 0.2.3 to 0.2.4 in /sample #2).
- A
scrolltest asserting conflicting--direction+--wheelis rejected (Added C# SDK. #5).
Hands-on testing performed against a packaged WinUI 3 harness on Windows; the harness was removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-23. 🤖
Fixes for the pr-review findings on the UI input-injection verbs: - H1: send-keys preserves whitespace in literal text - adjacent literal tokens coalesce into one TextInput so "Hello world" types verbatim. - H2: SendInput (keyboard) requires full delivery; on a short/zero write it releases any held keys to avoid a stuck modifier, with distinct zero-vs-partial error messages. - H3: mouse Drag releases the button in a finally block if a move/up throws; SendInputs requires full delivery (was sent==0). - M1: scroll rejects conflicting --direction/--to/--wheel instead of silently preferring one. - M2: --via send-input verifies the target owns the foreground window before injecting (new foreground_not_target error code). - M3: e2e script now exercises send-keys (literal + named key) and scroll --wheel against the WinUI sample app. - M4: a '+' token is only a chord when leading segments are real modifiers, so 'C++', 'a+b' type literally; 'ctrl+bogus' still errors. - M5: expanded KeyStringParser unit tests (literals, coalescing, vk edge cases, modifier-led unknown main key). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… formatting Follow-up to Nikola's empirical test report on the input verbs: - #2/#3: send-keys via post-message warns when literal text is sent, since WinUI 3 / XAML drops posted WM_CHAR (typed text silently no-ops). Docs corrected to recommend --via send-input for typed text on WinUI/WPF. - Hardening #1: extend the foreground-ownership guard (was send-keys only) to drag and scroll --wheel via a shared ForegroundGuard helper, so an OS-wide gesture can't land on the wrong window when SetForegroundWindow silently fails. Returns foreground_not_target. - #6: the "no scroll mode specified" error now uses the same ❌ symbol as every other ui error (was raw [ERROR] format). System-reserved-combo blocklist (report hardening #2) intentionally left for discussion — it changes the tool contract and partly conflicts with legitimate automation (testing win+l etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks @nmetulev — really thorough, both the static review and the hands-on test pass. I pushed two commits ( Fixed
Intentionally left (open question)
All 95 parser/command tests pass; the 5 pre-existing |
Drag (inline review thread): support `drag <from> <to>` where each endpoint is an element selector (resolves to its center) or app x,y coordinates as reported by `ui inspect`, alongside the retained legacy 3-arg `drag <selector> <fromOffset> <toOffset>` form (disambiguated by the third positional arg). Adds From/To to the JSON result. Hardening #2 (system-reserved combos): send-keys --via send-input now warns (advisory, still sends) before synthesizing win+*, ctrl+shift+esc, ctrl+alt+del, alt+tab, alt+f4, ctrl+esc, lone win/printscreen, via a new SystemKeyGuard helper. Bug #3: --via help now notes accelerators only fire via send-input. Tests: expand KeyStringParser coverage (modifier/key aliases, extended flags, function keys, raw-vk, whitespace, coalescing edge cases); new SystemKeyGuardTests; new 2-arg drag tests (selector/coords combinations, error paths); send-keys system-combo warn-only test. Docs: ui-automation.md, SKILL.md (x2), agent docs (x2), usage.md, and the drag/send-keys blocks in cli-schema.json updated for both forms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…clearer legacy error - SystemKeyGuard: warn on Alt+PrintScreen (active-window screenshot); add test row. - Drag tests: add per-call FindSingleResults queue to the UIA fake so selector->selector resolves two distinct elements; assert both endpoint centers; add coords->selector and to-selector-not-found cases. - UiDragCommand: legacy offset parse errors now explain that 3 args selects the legacy '<selector> <fromOffset> <toOffset>' form and 2 args is selector/coords -> selector/coords. - docs fragment (SKILL source): sync dual-form drag + system-combo send-input note. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ardTests
CI builds with -warnaserror, so the inline 'new[] { ... }' constant arrays in
CollectionAssert.AreEqual were promoted from CA1861 warnings to errors. Switch to
element-wise Assert.AreEqual (count + indexer) which is equivalent and CA1861-clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes the three forward-compat gaps the microsoft-ui-reactor E2E harness hit when mapping its temporary SendInput fallback onto the new input verbs: - send-keys: add a `text=` token escape so a literal value that collides with a key/modifier name (enter, del, up, ctrl+a) is typed verbatim. Mirrors `vk=`, is checked first, and still coalesces with adjacent literal words. - drag: add `--dwell-ms` (hold at the destination before releasing) so drop targets / merge overlays that arm from a sustained hover can latch before the button-up — required for tear-off/merge drags that a plain A->B drag misses. - drag: add `--hold-ms` (hold at the press point before moving). With <from> == <to> this is a press-and-hold / long-press gesture. Threads holdMs/dwellMs through IMouseInput.Drag -> MouseInput.Drag and the UiDragResult JSON. 14 new unit tests (text= escape, hold/dwell flow, negative validation, defaults). Regenerated cli-schema.json, the ui-automation SKILL, and the npm command wrappers (the wrappers had drifted behind the schema and are now back in sync); updated docs/ui-automation.md by hand. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Re-test of
|
| Prev finding | Fix | Verified |
|---|---|---|
H1 "Hello world" → Helloworld |
CoalesceText re-joins adjacent literal tokens |
live: InputBox = Hello world (space preserved); unit Parse_QuotedPhrase_PreservesSpacesAsSingleLiteral |
M4 C++ / a+b rejected |
TryParseChord only treats modifier-led tokens as chords |
live: "C++" types C++; a+b, 1+1=2, + parse as text; ctrl+bogus still rejected |
M1 scroll --direction --wheel both accepted |
modeCount > 1 → error |
live: scroll --direction down --to bottom → ❌ Specify only one of … |
| M2 send-input lands on wrong window | new ForegroundGuard on all 3 gesture verbs |
live both ways: refuses (exit 1 + JSON foreground_not_target) when not foreground; works once foregrounded |
| H2 keyboard short-write | sent != length → release held keys + throw |
code KeyboardInput.SendViaSendInput + ReleaseHeldKeys |
| H3 drag button left down | try/finally best-effort button-up |
code MouseInput.Drag |
| round-1 "false success" on post-message text | warns before injecting | code UiSendKeysCommand (but see N6 — the warning is now over-eager) |
| round-1 "no system-combo guard" | new SystemKeyGuard advisory warning |
live: ctrl+alt+del / printscreen via send-input → ⚠ warning, still exit 0 |
Other live confirmations: drag thumb-grab moved a slider 50→89; scroll --wheel changed offset (sign correct); Unicode é€ lands verbatim. Unit tests: KeyStringParser 79 / SystemKeyGuard 28 / UiCommand 81 — all green. Zero-size → zero_size_element JSON error; missing element → "The UI may have changed … prefer targeting by AutomationId."
🐞 Findings
N1 · send-keys literal text mis-fires on any word that is a key name — High (usability) — confirmed live
Words that collide with key/modifier names fire keys instead of typing, with no escape syntax:
send-keys … |
InputBox got | What happened |
|---|---|---|
"the end" |
the |
end pressed End |
"go home" |
go |
home pressed Home |
"press down" |
press |
down pressed Down arrow |
"Hi enter Bye" |
HiBye |
enter fired Enter between the words |
Colliding words: space enter tab esc escape up down left right home end del delete ins insert pageup pagedown backspace return apps printscreen capslock f1..f16 + modifiers ctrl shift alt win cmd super meta menu control. So send-keys "<arbitrary prose>" is not reliable for typing text. Recommend a dedicated literal path, e.g. --text "<string>" (verbatim, no tokenization), and/or an escape ({end} = key vs end = word). This also fixes N2.
N2 · Literal text can't contain consecutive/edge whitespace — Low–Med — confirmed live
CoalesceText re-joins with a single space: "a b" → a b, " pad " → pad, tab → single space. Can't type double spaces, tabs, or indentation. (Naturally solved by --text.)
N5 · gesture verbs silently miss a moving/animating target (report success anyway) — Medium, NEW — the "changing size between checks" answer
With the chaos timer moving & resizing the target every ~200 ms:
| Condition | Clicks that actually landed |
|---|---|
| Chaos OFF (stable) | 6 / 6 |
| Chaos ON (moving) | 0 / 12 |
Every one of the 12 misses still reported ✅ click … at (x,y) and exit 0. The verbs resolve element bounds, then inject at those screen coordinates after a settle/foreground delay (~50–150 ms click, ~500 ms full drag); if the element moves/resizes in that window the gesture hits the stale rectangle (empty space) and the element never receives it — yet the command reports success. Related: a drag starting off the grabbable part (slider track vs thumb) also no-ops while reporting ✅. So click / drag / scroll --wheel give no signal that the gesture actually hit the target under shifting UI. Suggested: (a) re-read bounds immediately before the press / verify cursor-still-over-element; (b) optionally verify a post-condition (focus/value change) and return non-zero on miss; (c) prefer UIA-pattern verbs (invoke, set-value, scroll --direction/--to) for moving targets — those acted correctly throughout.
N6 · post-message-text behavior + warning are framework-dependent — Low–Med, NEW (from the Electron pass)
send-keys text via post-message is dropped by WinUI 3/XAML but consumed by Electron/Chromium (and typically Win32/WPF). Yet the CLI emits its warning unconditionally ("…may not be delivered to WinUI 3 / XAML apps (WM_CHAR is dropped)…") even when targeting Electron, where it works — so it false-alarms on non-XAML apps. Recommend making the warning framework-aware (only for XAML/WinUI targets) or softening it.
N3 · misleading "elevated" error when there is no input desktop — Low
On a locked/secure desktop, SendInput returns 0 and the older ui click/ui hover report "the target window may be elevated … run as administrator" — wrong cause. The new verbs avoid this (their ForegroundGuard fires first). Consider a "no interactive desktop (session locked?)" message and applying the foreground guard to click/hover too.
N4 · send-input / drag / scroll --wheel require an unlocked, foreground, interactive session — Informational (by design)
A consequence of the (correct) ForegroundGuard: these refuse unless the target is genuinely foreground, so they can't be driven from a background/headless/CI or locked context. The UIA-path verbs (set-value, invoke, scroll --direction/--to) remain headless-friendly. Worth a docs callout.
🔐 Security — re-assessment (substantially improved)
- Ctrl+Alt+Del: unsynthesizable (kernel-protected SAS).
- Locked / secure-desktop boundary — confirmed empirically: with the box locked (
LogonUIon the secure desktop),GetForegroundWindow()=0 andSendInput=0 — a user-session process cannot inject into the lock screen (can't type a password into LogonUI). Strong OS-enforced boundary. ForegroundGuardprevents the M2 "wrong window" multiplier (verified live, both directions).SystemKeyGuardis advisory — warns, doesn't block — forwin+<key>, lonewin,ctrl+alt+del,ctrl+shift+esc,ctrl+esc,alt+tab,alt+esc,alt+f4,printscreen,alt+printscreenvia send-input (verified live). Reasonable for an automation tool; gate behind--allow-system-keysif hard blocking is ever wanted.- Short-write handling: keyboard releases held keys before throwing; drag releases the button in
finally— a partialSendInputcan't strand a (possibly system) modifier/button.
Net: the round-1 dangerous-combo concern is now mitigated by layered guards (foreground verification + advisory warning + SAS/secure-desktop protection + short-write recovery) rather than a hard blocklist — a sound design.
🧩 Cross-framework — WinUI 3 vs Electron / Chromium
| Behavior | WinUI 3 / XAML | Electron / Chromium |
|---|---|---|
send-keys text via send-input |
✓ types | ✓ types |
send-keys text via post-message |
✗ dropped (WM_CHAR) | ✓ works |
H1 "Hello world" |
✓ Hello world |
✓ Hello world |
M4 "C++" |
✓ C++ |
✓ C++ |
N1 "the end" |
"the" + End key | identical |
N2 "a b" |
a b |
identical |
drag slider |
50→89 | 50→81 |
scroll --wheel |
✓ | ✓ (0→200) |
scroll --direction (UIA) |
✓ | ✓ (200→654) |
Parser-level findings (H1/M4/N1/N2) are framework-independent. Electron targeting caveat: DOM elements are visible to ui inspect/search/--target/drag-scroll resolution only when renderer accessibility is active (I launched with --force-renderer-accessibility; Chromium also enables it on-demand when a UIA client attaches) — worth a docs note for Electron users.
🤖 Agent-ergonomics feedback (opportunities to improve)
Separate from correctness — candid notes from driving ui heavily as an automated agent (no eyes on the screen). The verbs are powerful and well-guarded, but the ergonomics currently assume a human who can see the result; for an agent that costs extra round-trips (tokens) and risks false conclusions.
- Self-verifying actions (highest impact).
click/drag/scroll --wheelreport✅/exit 0 from injecting at coordinates, not from the element receiving anything (see N5). With no eyes, my only recourse is a separateget-value/inspectcall, so every logical action becomes resolve → act → verify (3 invocations). Returning post-action state in the JSON envelope and/or flipping exit code on no-effect would collapse that to one call and kill the silent-miss class. - Foreground requirement is hard to satisfy headlessly. The
ForegroundGuardis the right security call, but from a background/agent context the verbs just returnforeground_not_target; I had to hand-rollAttachThreadInput+SPI_SETFOREGROUNDLOCKTIMEOUT=0to test at all. A first-class--activate(reliably foreground + verify) and a loud doc note would save every agent from reinventing that. --textfor literal typing (see N1/N2) — removes the single most likely way an LLM getssend-keyswrong (prose words firing keys).- State readback hygiene.
get-valuesilently falls back to the accessible Name when there's no value (an empty TextBox read back as"Input box"made me think it had content) — tag the source ({"value":"","source":"name"}). And theAuto-selected HWND … pass -w 393384 …banner prints to stdout interleaved with the value, which both wastes tokens and corrupts parsing — route advisory chatter to stderr and never mix it into a value payload. Autoscroll-into-view(or an explicitoffscreen:truein resolve output) would also drop a mandatory extra call. - Output / token hygiene. Every
inspect/searchappends a "Found N elements. Use the first token as selector, e.g. …" footer — useful once for a human, pure overhead per agent call; suppress under--json/a terse mode. Make--jsonpurely machine-readable everywhere (structured errors and warnings — extend the greatforeground_not_target/zero_size_elementcodes to all outcomes; keep❌⚠✅prose for human mode only). A batch/script mode (run several verbs against one resolved session) would amortize process-startup + re-resolution — the biggest token win. And drop the unconditional post-message warning on non-XAML targets (N6) so warnings stay trustworthy.
Worth keeping as-is: the error copy is genuinely excellent for agents ("The UI may have changed … prefer targeting by AutomationId — these survive layout changes"), the structured error codes are exactly right, and the guard layering is coherent. The ask is mainly to make each call self-verifying and machine-clean so an agent doesn't reconstruct "did that work?" from extra inspections.
Verdict
The round-1 fixes hold up under live injection on both WinUI 3 and Electron. Open items, by impact: N1 (literal-text/key ambiguity → add --text), N5 (gesture silent-miss on moving UI, success-coded → outcome verification), N6 (post-message warning not framework-aware), N2 (whitespace doc note), and the minor N3/N4 polish.
Live testing on WinUI 3 (chaos/zero-size/resize harness) and an Electron/Chromium harness; both removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-23. 🤖
| - `ui screenshot -a <app> [--output file.png] [--json] [--focus] [--capture-screen]` — capture window as PNG. Default uses Windows.Graphics.Capture (composited surface — preserves rounded corners and works while occluded), with PrintWindow as fallback. Use `--focus` to bring the window to the foreground first; use `--capture-screen` for popup overlays not owned by the target window. | ||
| - `ui invoke <selector> -a <app>` — activate element by slug or text search. Auto-walks to invokable ancestor for non-invokable elements. | ||
| - `ui hover <selector> -a <app> [--dwell-time <ms>]` — move mouse to element center to trigger tooltips, flyouts, and hover states. Use with `ui screenshot --capture-screen` to capture the result. | ||
| - `ui drag <from> <to> -a <app> [--right]` — press the mouse button at one point, move to another, and release (reorder, resize, sliders, drag-and-drop). Each of `<from>`/`<to>` is an element selector (drags from/to its center) or app coordinates `x,y` as reported by `ui inspect`. Legacy form `ui drag <selector> <fromX,fromY> <toX,toY>` is still accepted, where the two `x,y` are pixel offsets from the element's top-left corner. |
There was a problem hiding this comment.
There is no legacy form, it never shipped - remove that to keep it simple
There was a problem hiding this comment.
Removed — there's no legacy form anymore. drag is now only drag <from> <to>, where each endpoint is an element selector (drags from/to its center) or app-relative x,y coords from ui inspect. I dropped the to-offset argument, the legacy code branch, and the ElementId field from the JSON result (only the legacy path ever set it), and cleaned up every reference — both agent files, both SKILL.md (+ the shared template), docs/ui-automation.md, the generated cli-schema.json/SKILL tables, and the npm command wrapper. The legacy drag tests were removed and the remaining drag tests converted to the two-arg form.
| # Raw app coordinates → app coordinates | ||
| winapp ui drag 120,200 480,200 -a myapp | ||
|
|
||
| # Legacy: offsets from the element's top-left corner |
There was a problem hiding this comment.
same here, no such thing as legacy
There was a problem hiding this comment.
Removed — there's no legacy form anymore. drag is now only drag <from> <to>, where each endpoint is an element selector (drags from/to its center) or app-relative x,y coords from ui inspect. I dropped the to-offset argument, the legacy code branch, and the ElementId field from the JSON result (only the legacy path ever set it), and cleaned up every reference — both agent files, both SKILL.md (+ the shared template), docs/ui-automation.md, the generated cli-schema.json/SKILL tables, and the npm command wrapper. The legacy drag tests were removed and the remaining drag tests converted to the two-arg form.
| Press the mouse button at one point, move to another, then release. Two forms: | ||
|
|
||
| - **Preferred — `drag <from> <to>`:** each endpoint is either an **element selector** (drags from/to the element's center) or **app coordinates `x,y`** exactly as reported by `winapp ui inspect`. Mix and match freely (selector→selector, selector→coords, coords→coords). | ||
| - **Legacy — `drag <selector> <fromX,fromY> <toX,toY>`:** the two `x,y` values are pixel **offsets from the element's top-left corner**. Disambiguated by the third positional argument; existing scripts keep working unchanged. |
There was a problem hiding this comment.
here too and many other places re legacy
There was a problem hiding this comment.
Removed — there's no legacy form anymore. drag is now only drag <from> <to>, where each endpoint is an element selector (drags from/to its center) or app-relative x,y coords from ui inspect. I dropped the to-offset argument, the legacy code branch, and the ElementId field from the JSON result (only the legacy path ever set it), and cleaned up every reference — both agent files, both SKILL.md (+ the shared template), docs/ui-automation.md, the generated cli-schema.json/SKILL tables, and the npm command wrapper. The legacy drag tests were removed and the remaining drag tests converted to the two-arg form.
| **Choosing a transport / known limits:** | ||
| - `post-message` is the default because it bypasses UIPI and doesn't depend on the window being foreground. Limits: it cannot trigger global hotkeys registered through `WH_KEYBOARD_LL` low-level hooks (those tap input upstream of any window queue), and apps that read raw key state via `GetAsyncKeyState` may not observe held modifiers. For classic Win32/WinForms apps whose controls are separate child windows, target the control's native window handle with `-w` (or `--target`) so keys reach the right control. WinUI 3 / WPF apps have a single window and route keys to the internally focused element, so targeting the top-level window works. | ||
| - `send-input` produces fully real input (modifiers visible to `GetAsyncKeyState`, fires low-level hooks) but goes to whatever window is foreground and is **blocked by UIPI when injecting from an elevated process into a lower-integrity (AppContainer/AppX) target**. If `send-input` reports a failure, the target is likely elevated or an AppX app — use `post-message`, or run the CLI at a matching integrity level. As a safety guard, `send-input` verifies the target window is actually in the foreground immediately before injecting and **fails (`foreground_not_target`) rather than typing into the wrong window** if focus could not be brought to it — focus or click the window first. | ||
| - **System-reserved combos** (`win+l`, `win+r`, `ctrl+shift+esc`, `ctrl+alt+del`, `alt+tab`, `alt+f4`, `ctrl+esc`, lone `win`/`printscreen`, …) act on the OS/shell rather than just the target when sent OS-wide. `send-input` **emits a warning** before synthesizing them (e.g. `win+l` would lock the session) but still sends them — synthesizing them can be the legitimate intent of a test. `post-message` is window-scoped and is not affected (a posted `win+l` is harmless, though a posted `alt+f4` still closes the target window). |
There was a problem hiding this comment.
Agreed — changed from warn-and-send to reject. send-keys --via send-input now errors with invalid_arguments and sends nothing when it sees a system-reserved combo (win+l, alt+f4, ctrl+alt+del, alt+tab, …), since injecting those OS-wide has effects well beyond the target window. --via post-message stays unaffected because it's window-scoped (a posted win+l is harmless), so it remains the escape hatch if you genuinely need to deliver one to a specific window. Updated the SystemKeyGuard doc + prose and the tests (SendKeys_SystemCombo_ViaSendInput_IsRejected asserts exit 1 + 0 sends; added SendKeys_SystemCombo_ViaPostMessage_StillSends).
| **Options:** | ||
| - `--direction <up|down|left|right>` — Scroll incrementally via `ScrollPattern`. | ||
| - `--to <top|bottom>` — Jump to the start/end via `ScrollPattern`. | ||
| - `--wheel <delta>` — Synthesize mouse-wheel input over the element's center via `SendInput`. One notch is `120`; positive scrolls up/away, negative down/toward. Bypasses `ScrollPattern`. |
There was a problem hiding this comment.
Good call — --wheel now takes notches, so 1 = one notch up and -1 = one notch down. The reason it was 120 is that 120 is the Windows WHEEL_DELTA constant: SendInput's MOUSEEVENTF_WHEEL measures rotation in units where one physical detent = 120, so the old value was the raw Win32 unit leaking through the CLI surface. The command now multiplies the notch count by WHEEL_DELTA (120) internally — e.g. --wheel 3 injects a delta of 360 — so callers think in notches while SendInput still gets the right magnitude. Updated the option description, the example (--wheel -1), and the tests (--wheel -1 → fake delta -120; added Scroll_Wheel_MultipleNotches_ScaleByWheelDelta for --wheel 3 → 360).
… wheel in notches - ui drag: remove the never-shipped legacy 3-arg form entirely; only `drag <from> <to>` (selector or app coords per endpoint). Drop the to-offset arg, legacy branch, and the ElementId result field (legacy-only). - ui send-keys: reject system-reserved combos on --via send-input (error + no send) since OS-wide injection reaches beyond the target; --via post-message (window-scoped) unaffected. Document the text= escape for typing literal phrases that collide with key names. - ui scroll --wheel: take notches instead of raw WHEEL_DELTA units (1 = one notch); scale by WHEEL_DELTA (120) internally before SendInput. - Update tests, docs (ui-automation.md, SKILL template + generated tables, agent files, cli-schema.json, npm wrapper). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… click/hover, docs Addresses the consolidated pr-review report on the input-injection verbs. - M2/M3 (security): move the foreground guard to AFTER the stabilize awaits in drag/scroll, immediately before injection, so a focus steal during the awaited re-resolve can't route the OS-wide gesture to the wrong window. - M4: extract the N5 target-stability re-resolve into GestureTargeting and reuse it across click/hover/drag/scroll; add moving-target + settle-after-move tests. - M5: extract pure ForegroundGuard.Classify so the no_interactive_desktop vs foreground_not_target selection is unit-tested without a live desktop. - M6: gate the post-message WM_CHAR warning behind FrameworkHint.IsLikelyXaml and extract ShouldWarnPostMessageTextDropped, so the gate is unit-tested and only warns on XAML targets instead of false-alarming Win32/WPF/Electron apps. - L1: click/hover re-resolve their target just before injecting (refuse on a moving/vanished target) instead of acting on a stale rect. - L2: drag target_moved errors report the caller's selector token, not "from"/"to". - L3: decode backslash whitespace escapes in text= values (\s \t \n \r \\) so multi/leading whitespace survives the tokenizer; add coverage. - MouseInput/KeyboardInput: distinguish a locked/secure desktop (no interactive desktop) from elevation/UIPI in the SendInput failure message. - Docs: document the interactive-desktop requirement, target_moved re-resolve, the text= escape table, and the XAML-only post-message caveat across ui-automation.md, the fragment, and both SKILL.md mirrors (version unchanged). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks again @nmetulev — the rounds 2–3 report was incredibly useful. Everything actionable is now in, across
Security. Per the thread, system-reserved combos via Agent-ergonomics. The two that live inside these verbs are in: literal typing (N1, via New coverage: |
--wheel is expressed in notches now (UiScrollCommand scales by WHEEL_DELTA=120 internally), so --wheel -120 meant 120 notches down (-14400 raw delta) instead of a single detent. The scroll E2E assertion now exercises one notch down, matching the documented notch semantics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Round 4 — updated PR (
|
| Finding | Fix in this PR | Verified |
|---|---|---|
| N1 prose words fire keys | new text= escape token (text=enter types "enter") |
live WinUI: text=enter→"enter", text=ctrl+a→"ctrl+a", bare enter→key still fires |
| N2 whitespace lost | backslash escapes in text= (\s \t \n \r \\) |
live: text=a\s\sb → "a b" (double space) |
| N3 misleading "elevated" error when locked | new no_interactive_desktop code + message |
verified live while the box was locked on send-keys/drag/scroll --wheel |
| N6 post-message warning false-alarmed on Electron | FrameworkHint.IsLikelyXaml(hwnd) scopes it to XAML |
live: WinUI warns, Electron no warning + text types fine |
| round-3 "no system-combo block" | SystemKeyGuard now blocks (was advisory) |
live: ctrl+alt+del/printscreen/alt+f4/alt+tab via send-input → exit 1, refused, not sent |
| ergonomics: JSON/value interleaving | JSON errors now → stderr; success envelopes → stdout | confirmed |
Also new: drag --hold-ms (long-press) / --dwell-ms (drop settle); drag endpoints accept selector or x,y; a final ForegroundGuard re-check after the stabilize re-resolve closes the focus-steal race. Unit tests green: KeyStringParser 97, SystemKeyGuard 28, FrameworkHint 13, InjectionGuard 9, UiCommand 89.
🎯 N5 (silent-miss on a moving target) — the "changing size between checks" ask: much better, not fully closed
New GestureTargeting.ResolveStableAsync re-reads the element's bounds immediately before injecting, refuses zero-size / never-settling targets with target_moved / zero_size_element (non-zero exit), and clicks at the element's current position rather than a stale rectangle. Measured against the round-3 baseline (chaos ON = 0/12 landed, 12/12 silent ✅):
| Harness (chaos ON, ~200 ms moves) | reported ✅ | actually landed | false success | refused (zero_size/target_moved) |
|---|---|---|---|---|
| WinUI (3×10 clicks) | ~2–4 /10 | matches when stable | ~2 /10 | 6–8 zero_size, 0–1 target_moved |
| Electron (10 clicks) | 10/10 | 4/10 | 6/10 | 0 |
| (baseline chaos OFF) | — | 6/6, 5/5 | 0 | 0 |
So false-success dropped from ~100% to ~20% (WinUI) but is still ~60% on Electron. A reported success can still miss a continuously-moving target: the target moves during the ~50 ms cursor-settle inside MouseInput.Click() after ResolveStableAsync already returned "settled" — and on a smoothly-animating UI (Electron) neither target_moved nor zero_size fires, so the miss is silent. Suggest re-verifying the cursor-is-over-element (or re-reading position) immediately before the button-down, dropping/shortening the internal settle, or verifying a post-condition and returning non-zero on no-effect. (Continuously-moving targets are an edge case; for normal "foreground settles then static" UI the re-resolve is exactly right.)
🐞 New findings
- F1 (Medium):
clickandhoverwere given the N5 re-resolve but not theForegroundGuard.TryEnsureForegroundgate thatdrag/scroll --wheel/send-keyshave. Two effects (verified live while locked): (a) inconsistent error — click/hover returninternal_error("SendInput failed…") where the other three return the cleanno_interactive_desktop; (b) click/hover still lack the wrong-window (M2) protection — ifSetForegroundWindowsilently fails and another window is foreground, they'd inject into it. Recommend routing click/hover throughTryEnsureForegroundtoo. - F2 (Low): the
text=escape (the N1/N2 fix) is documented indocs/ui-automation.mdbut absent fromsend-keys --help/ thekeysargument description /cli-schema.json(0 matches) — so it's not discoverable via--helpor the npm-wrapper / LLM-skill surface. Addtext=<literal>to the argument description. - F4 (Low/compat): the
dragargument shape changed todrag <from> <to>(each a selector or absolutex,y) from the priordrag <selector> <from-offset> <to-offset>. Breaking for existing scripts; pre-release, so just worth a changelog note.
Verdict (round 4)
Strong iteration — every prior finding is fixed or materially mitigated, with good new tests and structured/stderr JSON. The two things I'd still act on: F1 (give click/hover the same foreground guard) and F3/N5 (close the residual re-resolve→button-down race so a ✅ always means the gesture landed). Both are about the same north star from the ergonomics note: a success result should guarantee the action actually happened.
Live testing on WinUI 3 (chaos/zero-size/resize harness) and an Electron/Chromium harness; both removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-25. 🤖
…race close + text= discoverability Round-4 test report (PR #587) flagged three actionable items; this addresses them. F1 — click/hover now gated by the foreground guard (parity with drag / scroll --wheel / send-keys). Introduce IForegroundGuard (RealForegroundGuard wraps the existing static ForegroundGuard.TryEnsureForeground so the classification logic stays the single source of truth) and inject it into all five gesture verbs. click/hover now fail with the clean no_interactive_desktop error on a locked session instead of a misleading internal_error, and gain wrong-window (foreground_not_target) protection. The DI seam also makes the deny path unit-testable without a live desktop. F3/N5 — close the residual re-resolve to button-down race on click. After the stable re-resolve and foreground check, position the cursor (new IMouseInput.MoveCursor), run the cursor-settle here, then do a final GestureTargeting.ConfirmStillAsync read immediately before the button-down, which now uses settleMs:0 (no second internal settle). A target that drifts during the settle window is caught and refused with target_moved instead of reporting a false success after landing on empty space. F2 — document the text=<literal> escape (incl. \s \t \n \r \\ backslash escapes) in the send-keys keys-argument description so it is discoverable via --help / cli-schema.json / the npm SDK, not only docs/ui-automation.md. Regenerated cli-schema.json, plugin/Claude skills, and the npm command SDK. F4 — the drag arg-shape change is pre-release and already reflected in help/docs; the repo has no CHANGELOG, so no changelog entry is added. Tests: 1509 pass, 5 pre-existing E2E_Node* env failures (need the npm wrapper build, unrelated), 1 skip. New tests cover click/hover guard deny+allow paths, the move-during-settle race, settleMs:0 on the press, and text= discoverability. Live click/drag verification against a real app remains blocked by the lock screen — these findings are unit-test-verified only. Refs #587 #562 #498 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the round-4 report — addressed the two verdict items plus the F2 quick win in F1 (click/hover foreground guard) — F3/N5 (residual re-resolve→button-down race) — closed for F2 ( F4 (changelog) — the repo has no CHANGELOG, and the breaking Verification — F1 and F3 are deterministic logic fixes with full unit coverage; they do not depend on a live desktop. The F3 race is replicable in-process: the regression test drives a target that reads as "settled" in Unit suite: 1509 pass, 1 skip, 5 pre-existing |
send-keys' per-token text= escape (e.g. text=enter) is easy to forget and awkward when the entire payload is literal text. Add a command-level --verbatim flag that types the whole keys argument as a single literal: no named-key/combo/vk=/text= interpretation, and exact whitespace preserved (the normal path collapses internal whitespace runs to a single space). - KeyStringParser.ParseVerbatim: whole-string TextInput, no tokenizing or escape decoding; throws on empty/whitespace like Parse. - UiSendKeysCommand: --verbatim option; branch the parse path on it; help and keys-arg description point at it. Verbatim text is a TextInput, so the send-input system-combo guard correctly does not refuse e.g. "win+l". - Tests: parser (literal / exact whitespace / no escape decode / empty) and command (types literal not keys, preserves whitespace, system-combo-as-text, missing keys still errors, option documented). - Docs + skill fragment + agent subcommand list updated; regenerated cli-schema.json and npm SDK now carry --verbatim. Refs #562, #587 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Added a Why: the per-token
Full suite green (1523 pass; the 5 |
Address the pr-review findings on the send-keys (#562) and drag / scroll --wheel (#498) input-injection verbs. High: - H1 ForegroundGuard: compare the foreground window's root ancestor (GetAncestor GA_ROOT) to the target instead of PID-matching, so a child/owned window of the target counts as "foreground" and a same-PID but different top-level window does not. Drops the unsafe block. - H2 drag from-point settle race: move the cursor, wait, re-confirm the from element is still there (ConfirmStillAsync), re-check the foreground, then press with settleMs:0. Mirrors the click fix so "success" means the button-down landed on the resolved target, not empty space. Medium: - M1 KeyStringParser: a modifier-led chord with an empty segment around '+' ("ctrl++a", "ctrl+", "shift++enter") now throws instead of silently dropping the empty and pressing a different chord. Non-modifier-led tokens ("a+b", "C++", "ctrl+a+b") stay literal text. - M2 scroll --wheel: same cursor-settle / confirm / foreground sequence as drag before synthesizing the wheel. - M3 click: re-check the foreground immediately before the button-down (second gate), closing the same settle-window race on the success path. - M7 drag: a comma-shaped from/to token that isn't a valid x,y pair ("100,", "100,200,300") returns invalid_arguments instead of being treated as a selector and misreported as element-not-found. - M8 --verbatim: whitespace-only is legitimate verbatim content (types the spaces) rather than a "nothing to send" error; only genuinely empty is rejected. - M9 send-keys --via send-input: refuse OS-wide injection when no target window resolves (keystrokes have no location to verify the foreground against) instead of typing blindly into whatever has focus. - M10 send-keys: the human log reports the action count, not the raw keystrokes (which may carry secrets in persisted CI logs); --json still includes `keys` for opt-in callers. Tests: - M4 split the 1.5k-line UiCommandTests.cs into partial-class files (UiCommandTests.SendKeys.cs, UiCommandTests.Mouse.cs) so no file exceeds the ~1000-line guideline; behavior-preserving. - M5 foreground-deny tests for drag, scroll --wheel, and send-keys --via send-input. - M6 drag from-point moves-during-settle test (mirrors the click case). - Added malformed-chord, verbatim-whitespace, malformed-coordinate, and no-resolvable-target tests. 1540 pass / 5 pre-existing E2E_Node env fails / 1 skip. cli-schema content unchanged (no Description edits). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Self-review hardening round (commit 24d5575)Ran a multi-dimension review over this branch and addressed every finding it surfaced (2 high, 10 medium). All changes are unit-covered and the full suite is green: 1540 pass / 1 skip, with the only failures being the 5 pre-existing High
Medium
Tests
|
Round 5 — updated PR (
|
| Round-4 finding | Fix in this PR | Verified |
|---|---|---|
| F1 click & hover lacked the foreground guard | ForegroundGuard is now an injected IForegroundGuard and TryEnsureForeground is wired into all 5 gesture verbs (click, hover, drag, scroll --wheel, send-keys) |
code: hover :123, click :130+:157; + unit tests; live: click/hover correctly foreground-then-act |
| F3 moving-target "success" could still miss | MoveCursor → settle → ConfirmStillAsync (fresh UIA re-read) → inject with settleMs: 0, so the button-down happens immediately after a confirm |
the headline result — see below |
F2 text= undocumented in --help |
text= and the new --verbatim are now in the keys arg description + option help |
confirmed in send-keys --help |
| my ergonomics ask: a safe "type a string" path | new --verbatim types the whole argument literally, exact whitespace preserved |
live: --verbatim "a b c" → get-value returns "a b c" (len 8) on both WinUI + Electron |
Unit tests: KeyStringParser 112, SystemKeyGuard 28, FrameworkHint 13, ForegroundGuard 6, UiCommand 111.
🎯 F3 (the round-4 silent-miss) — essentially fixed
Re-ran the moving-target break-it test (now at a faster 150 ms mutation). The new confirm-immediately-before-button-down closes almost the entire race:
| Harness, chaos ON | round 4 false-success | round 5 false-success | round-5 refusals |
|---|---|---|---|
| Electron (15 clicks) | ~6/10 (~60%) | 1/15 (~7%) | 14 target_moved |
| WinUI (15 clicks) | ~2/10 (~20%) | 0/15 (0%) | 2 target_moved + 13 zero_size |
| (baseline chaos OFF) | — | 6/6 land (both) | 0 |
drag and scroll --wheel against a moving target also now refuse cleanly with target_moved (exit 1, JSON). The residual ~7% on Electron is the irreducible microsecond gap between ConfirmStillAsync's UIA read and the OS SendInput button-down — about as tight as coordinate injection can get. A reported ✅ now almost always means the gesture landed (vs round-4 where, under animation, it frequently didn't).
✅ Regressions all clean on the refactored build
- N6 framework-aware warning: post-message text warns on WinUI/XAML, silent on Electron (and types fine) — both correct.
- SystemKeyGuard still hard-blocks
ctrl+alt+del/alt+f4/win+lvia send-input (exit 1) on both. text=per-token escape intact (text=the text=end→ "the end"); bareenterstill fires the key.drag --hold-ms/--dwell-msparse fine.
Verdict (round 5)
This closes the loop on the round-4 review — F1, F2, F3 are all addressed, --verbatim lands the ergonomics ask, and I found no new defects this round. Nicely done. The only thing left from my notes is philosophical (the text=/--verbatim "type a string" ergonomics are now solved two ways; the default bare-text path is still key-interpreted, which is the right call given --verbatim exists and is documented).
One thing I could not stage live: the no_interactive_desktop half of F1 (click/hover returning the clean locked-desktop error instead of round-4's internal_error) needs a locked session to demonstrate — verified in code (the guard now runs before the SendInput call) and by the consistent wiring across all five verbs.
Live testing on WinUI 3 + Electron/Chromium chaos harnesses (target moving/resizing every ~150 ms); both removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-30. 🤖
Summary
Implements the two missing input-injection features in
winapp ui, the only blockers for migrating the microsoft-ui-reactor WinUI 3 E2E suite off WinAppDriver. Before this, nouiverb could synthesize keyboard input or perform mouse drag/wheel gestures.Two focused commits, one per verb:
winapp ui send-keys(#562)Synthetic keyboard input — the keyboard counterpart to
click.down,enter,tab,f5), whitespace-separated sequences (down down enter), modifier combos (ctrl+shift+t), literal text (hello), and raw virtual keys (vk=0xNN).--via:post-message(default — HWND-targeted, bypasses UIPI) andsend-input(OS-wide).--target <selector>focuses an element (UIA) before sending. Standard-w/-a/--json.WH_KEYBOARD_LLglobal hotkeys; SendInput is UIPI-blocked elevated → AppX.winapp ui drag+ui scroll --wheel(#498)winapp ui drag <selector> <fromX,fromY> <toX,toY>— press, move in steps (realisticWM_MOUSEMOVEstream), release. Coordinates are x,y offsets from the element's top-left corner.--rightfor right-button drag.ui scroll --wheel <delta>— synthesizes real mouse-wheel input (120 = one notch) for wheel handlers (zoom/custom scroll) thatScrollPatterncan't exercise.Implementation notes
click/hoververb conventions: command + nested DI handler, source-generated JSON result models,UiErrors/UiJsonErrorenvelopes,SetForegroundWindow+SendInput(virtual-desktop normalized) for mouse,PostMessage/SendInputfor keys.IKeyboardInput(PostMessage + SendInput) andIMouseInput.Drag/ScrollWheel, withFakeKeyboardInput/FakeMouseInputrecording calls.docs/ui-automation.md, skill fragment) andcli-schema.json+ plugin/Claude skills + npm TS SDK wrappers regenerated.Testing
dotnet buildclean; full unit suite 1340 passed, 1 skipped (the only excluded tests are the pre-existingE2E_Node*env tests, unrelated).vk=0xNNtranslation). Live SendInput-based drag verification was blocked only by the machine being on the Windows lock screen (the lock screen intercepts injected input); the drag path is identical to the shippedclick/hoverSendInput code and the cursor was confirmed to move to the correct physical coordinates.Closes nothing — issues #562 and #498 are intentionally left open.