Add ui send-keys (#562) and ui drag + scroll --wheel (#498) input-injection verbs by azchohfi · Pull Request #587 · microsoft/winappCli

azchohfi · 2026-06-22T20:32:04Z

Summary

Implements the two missing input-injection features in winapp ui, the only blockers for migrating the microsoft-ui-reactor WinUI 3 E2E suite off WinAppDriver. Before this, no ui verb could synthesize keyboard input or perform mouse drag/wheel gestures.

Two focused commits, one per verb:

`winapp ui send-keys` (#562)

Synthetic keyboard input — the keyboard counterpart to click.

Named keys (down, enter, tab, f5), whitespace-separated sequences (down down enter), modifier combos (ctrl+shift+t), literal text (hello), and raw virtual keys (vk=0xNN).
Two transports via --via: post-message (default — HWND-targeted, bypasses UIPI) and send-input (OS-wide).
--target <selector> focuses an element (UIA) before sending. Standard -w/-a/--json.
Documented hard limits: PostMessage can't fire WH_KEYBOARD_LL global hotkeys; SendInput is UIPI-blocked elevated → AppX.

`winapp ui drag` + `ui scroll --wheel` (#498)

winapp ui drag <selector> <fromX,fromY> <toX,toY> — press, move in steps (realistic WM_MOUSEMOVE stream), release. Coordinates are x,y offsets from the element's top-left corner. --right for right-button drag.
ui scroll --wheel <delta> — synthesizes real mouse-wheel input (120 = one notch) for wheel handlers (zoom/custom scroll) that ScrollPattern can't exercise.
Mouse drag/scroll-wheel were prioritized per the issue; hover/scroll already existed.

Implementation notes

Mirrors the existing click/hover verb conventions: command + nested DI handler, source-generated JSON result models, UiErrors/UiJsonError envelopes, SetForegroundWindow + SendInput (virtual-desktop normalized) for mouse, PostMessage/SendInput for keys.
New testable abstractions: IKeyboardInput (PostMessage + SendInput) and IMouseInput.Drag/ScrollWheel, with FakeKeyboardInput/FakeMouseInput recording calls.
Docs updated (docs/ui-automation.md, skill fragment) and cli-schema.json + plugin/Claude skills + npm TS SDK wrappers regenerated.

Testing

dotnet build clean; full unit suite 1340 passed, 1 skipped (the only excluded tests are the pre-existing E2E_Node* env tests, unrelated).
26 new send-keys tests (parser + command) and 8 new drag/scroll-wheel tests; coordinate math, JSON shape, transports, and error paths all covered.
PostMessage keyboard transport verified live end-to-end against a WinForms TextBox (literal text, Enter→newline, vk=0xNN translation). Live SendInput-based drag verification was blocked only by the machine being on the Windows lock screen (the lock screen intercepts injected input); the drag path is identical to the shipped click/hover SendInput code and the cursor was confirmed to move to the correct physical coordinates.

Closes nothing — issues #562 and #498 are intentionally left open.

Introduce winapp ui send-keys to synthesize keyboard input, the keyboard counterpart to click. Supports named keys (down, enter, tab, f5), whitespace-separated sequences, modifier combos (ctrl+shift+t), literal text, and raw virtual keys (vk=0xNN). Two transports via --via: post-message (default, HWND-targeted, bypasses UIPI) and send-input (OS-wide). --target focuses an element via UIA before sending. Standard -w/-a/--json options. - KeyStringParser: token grammar to KeyAction list - IKeyboardInput/RealKeyboardInput/KeyboardInput: PostMessage + SendInput transports - UiSendKeysCommand wired into DI, UiCommand, HostBuilderExtensions - UiSendKeysResult JSON model - Unit tests (parser + command) and FakeKeyboardInput - Docs, skill fragment, regenerated cli-schema + npm SDK wrapper Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add `winapp ui drag <selector> <fromX,fromY> <toX,toY>` to synthesize a mouse drag: press at a point inside an element, move to another point in small steps (so the app sees a realistic WM_MOUSEMOVE stream), then release. Coordinates are x,y offsets from the element's top-left corner. Use --right for a right-button drag. Useful for reorder/resize handles, sliders, canvas drawing, and drag-and-drop. Also add `--wheel <delta>` to `ui scroll` to synthesize real mouse-wheel input over an element (120 = one notch), for testing wheel handlers (zoom, custom scroll) that ScrollPattern can't exercise. - Extend IMouseInput/RealMouseInput/MouseInput with Drag + ScrollWheel (SendInput, virtual-desktop normalized like the existing click/hover) - UiDragCommand wired into DI, UiCommand, HostBuilderExtensions - UiDragResult JSON model; UiScrollResult gains a wheel field - Unit tests for drag + scroll-wheel; FakeMouseInput records both - Docs, skill fragment, regenerated cli-schema + npm SDK wrappers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The SendInput transport previously typed literal text via KEYEVENTF_UNICODE packets, which deliver the character (WM_CHAR -> TextChanged) but raise KeyDown with VK_PACKET rather than the real key. Downstream WinUI 3/WPF handlers that key off KeyDown with the correct virtual key did not see a faithful keystroke. Map each character to its virtual key (plus Shift) via VkKeyScan and emit real key-down/up events so the target sees a genuine KeyDown with the correct VK and the OS composes the matching WM_CHAR (TextChanged) - one full keystroke per character. Fall back to a Unicode packet for characters not reachable on the current layout or requiring Ctrl/AltGr, so the exact character still lands. PostMessage typed text keeps WM_CHAR (correct text + TextChanged across integrity levels, no per-char KeyDown); docs clarify when to use which transport for per-keystroke fidelity. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…help (#562) The --via option and command help now state that typed text raises a real per-character KeyDown only on send-input (post-message raises TextChanged but not a per-character KeyDown), and that named keys/combos raise KeyDown on both. Recommends --via send-input for per-keystroke KeyDown on a WinUI 3/WPF TextBox. Regenerated cli-schema.json and the ui-automation skill from the CLI help. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-22T21:12:11Z

Build Metrics Report

Binary Sizes

Artifact	Baseline	Current	Delta
CLI (ARM64)	31.57 MB	31.71 MB	📈 +142.5 KB (+0.44%)
CLI (x64)	31.90 MB	32.04 MB	📈 +141.0 KB (+0.43%)
MSIX (ARM64)	13.25 MB	13.30 MB	📈 +50.0 KB (+0.37%)
MSIX (x64)	14.09 MB	14.15 MB	📈 +64.0 KB (+0.44%)
NPM Package	27.65 MB	27.77 MB	📈 +120.6 KB (+0.43%)
NuGet Package	27.68 MB	27.79 MB	📈 +118.5 KB (+0.42%)
VS Code Extension	20.47 MB	20.60 MB	📈 +127.1 KB (+0.61%)

Test Results

✅ 1545 passed, 1 skipped out of 1546 tests in 428.7s (+221 tests, +73.6s vs. baseline)

Test Coverage

❌ 18.1% line coverage, 37.3% branch coverage · ✅ +0.7% vs. baseline

CLI Startup Time

41ms median (x64, winapp --version) · ✅ no change vs. baseline

Updated 2026-06-29 23:15:26 UTC · commit 24d5575 · workflow run

Copilot

Pull request overview

This PR adds three input-injection capabilities to winapp ui, the keyboard/mouse gestures needed to migrate a WinUI 3 E2E suite off WinAppDriver: a new ui send-keys verb (synthetic keyboard input via PostMessage or SendInput), a new ui drag verb (button-press → stepped move → release), and a --wheel option on ui scroll (real mouse-wheel input). The implementation cleanly follows the existing click/hover conventions: command + nested DI handler, source-generated JSON result models, UiErrors/UiJsonError envelopes, and new testable abstractions (IKeyboardInput, IMouseInput.Drag/ScrollWheel) with Real* production and Fake* test implementations. Docs, the CLI schema, the Copilot/Claude skills, and the npm TS SDK are regenerated accordingly.

Changes:

New UiSendKeysCommand + KeyStringParser/KeyboardInput (named keys, sequences, modifier combos, vk=0xNN, literal text; --via post-message|send-input, --target focus).
New UiDragCommand + MouseInput.Drag; new --wheel on UiScrollCommand + MouseInput.ScrollWheel; DI/registration, JSON result models, and NativeMethods.txt P/Invoke additions.
34 new unit tests (parser + commands), plus regenerated docs/schema/skills/npm wrappers.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs`	New send-keys command + handler, transport parsing, target focus.
`src/winapp-CLI/WinApp.Cli/Commands/UiDragCommand.cs`	New drag command + handler with element-relative point parsing.
`src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs`	Adds `--wheel` option and SendInput wheel path; reports element hwnd.
`src/winapp-CLI/WinApp.Cli/Commands/UiCommand.cs`	Registers `drag`/`send-keys` subcommands.
`src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs`	New key-string grammar parser (chords, named keys, vk=, text).
`src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs`	PostMessage/SendInput keyboard synthesis with documented limits.
`src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs`	Adds `Drag` and `ScrollWheel`; repeats the SendInput-failure guard (nit).
`src/winapp-CLI/WinApp.Cli/Helpers/I{Keyboard,Mouse}Input.cs`, `Real*`, `UiJsonContext.cs`	Interfaces, prod impls, new JSON result models.
`src/winapp-CLI/WinApp.Cli/Helpers/HostBuilderExtensions.cs`	DI registration of `IKeyboardInput` and the two new handlers.
`src/winapp-CLI/WinApp.Cli/NativeMethods.txt`	Adds PostMessage/MapVirtualKey/VkKeyScan + WM_*/VIRTUAL_KEY.
`src/winapp-CLI/WinApp.Cli.Tests/*`	`KeyStringParserTests`, send-keys/drag/wheel command tests, `FakeKeyboardInput`.
`src/winapp-npm/src/winapp-commands.ts`	Regenerated `uiDrag`/`uiSendKeys` wrappers + `wheel` on `uiScroll`.
`docs/ui-automation.md`, `docs/fragments/skills/.../ui-automation.md`, `.github/plugin/.../SKILL.md`, `.claude/.../SKILL.md`, `docs/cli-schema.json`, `scripts/generate-llm-docs.ps1`	Regenerated/updated docs, schema, skills, and skill→command map.

Note on doc sync (not commentable inline — these files are not in the diff): two hand-written command lists were not updated for the new verbs — docs/usage.md (### ui "Commands", lines 1076-1091) and .github/plugin/agents/winapp.agent.md ("Key subcommands", lines 207-221). Both enumerate click/hover/etc. by hand and should gain drag, send-keys, and the scroll --wheel option to stay in sync.

Inline findings: one test_coverage nit (the new --wheel zero-size guard is untested while the analogous hover/click guards are covered) and one maintainability nit (the SendInput sent == 0 guard is now duplicated four times in MouseInput.cs).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…size path (#587) - Extract MouseInput.SendInputs(Span<INPUT>) helper so the SendInput zero-return / 'window may be elevated' guard lives in one place instead of being duplicated across SendMove/SendClick/SendButton/ScrollWheel. - Add Scroll_Wheel_ZeroSizeElement_ReturnsError covering the new --wheel CodeZeroSize guard (exit 1, no ScrollWheel call), matching the Hover_ZeroSizeElement_ReturnsError coverage convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

The auto-generated schema/skill docs already cover the new ui drag and send-keys verbs, but two hand-written command lists were missed: - docs/usage.md (ui Commands list) - .github/plugin/agents/winapp.agent.md (Key subcommands), mirrored to .claude/agents/winapp.md via sync-claude-plugin.ps1 Add drag and send-keys entries matching the existing sibling-verb style in each list. scroll --wheel is already covered by the existing scroll entry, so no separate entry was added. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

nmetulev · 2026-06-23T17:16:00Z

🤖 AI-generated review — produced by the pr-review skill (8 parallel review dimensions + a cross-model verification pass on GPT-5.4, orchestrated by Claude Opus 4.8). This is automated analysis, not a human review: findings may contain false positives, and nothing here has been built or tested. Treat it as a checklist of things to verify, not a merge gate.

PR Review — `azchohfi-ui-input-verbs` vs `origin/main` (7 commits, 27 files, +2,008 / −26)

Summary

Critical: 0 High: 3 Medium: 6 Low: 2

Nice work adding a testability seam (IKeyboardInput/IMouseInput + fakes) — alternative-solution came back clean and the design is sound. The Highs cluster around two real themes in the synthetic-input layer: (1) the key-string parser splits literal text on whitespace, so send-keys "Hello world" types Helloworld; and (2) both the keyboard and mouse SendInput paths treat a partial write (sent < length) as success, which can strand a held modifier or mouse button. The cross-model pass confirmed H1/H2, dropped a docs finding (the key grammar is in fact documented), and surfaced the mouse-path variant of the short-write bug as a new High.

Coverage

Dimension	Result
security	⚠ 2 findings (1 folded into H2)
correctness	⚠ 4 findings
cli-ux	⚠ 2 findings (1 folded into H1)
alternative-solution	✓ clean
test-coverage	⚠ 4 findings
docs-and-samples	✓ clean (1 finding disputed & dropped by cross-model)
packaging	⚠ 1 finding
multi-model	2/3 highs confirmed · 1 disputed/dropped · +1 new high

Findings

ID	File:lines	Domain	Finding
H1	`src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:86-113`	correctness, cli-ux	Literal text split on whitespace → `send-keys "Hello world"` types `Helloworld`; documented example is broken
H2	`src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:142-151`	correctness, security	Keyboard `SendInput` short write (`sent < length`) treated as success → stuck modifier
H3	`src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77,143-179`	correctness, security	Mouse `SendInputs` short write treated as success + no `finally` cleanup → stuck mouse button (new, found by cross-model; absorbs the drag-cleanup finding)
M1	`src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs:85-130`	cli-ux	`scroll` accepts conflicting `--direction`/`--to`/`--wheel` and silently prefers `--wheel`
M2	`src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134`	security	`--via send-input` doesn't verify focus landed on the target before sending → may inject into the wrong window
M3	`src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134`	test-coverage	`send-keys`/`drag`/`scroll --wheel` not exercised by the WinUI e2e script
M4	`src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:91-124`	correctness	Any token containing `+` is parsed as a chord → realistic literals like `C++` / `a+b` throw
M5	`src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:119-145`	test-coverage	Parser negative/edge coverage shallow (malformed combos, `{}` literals, Unicode)
M6	`src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:58-69`	test-coverage	Modifier down/up pairing and release-on-failure (stuck-key) untested
L1	`src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77`	test-coverage	Drag interpolation / button-event sequencing not covered
L2	`src/winapp-npm/src/winapp-commands.ts:1003-1010`	packaging	Generated npm wrapper descriptions stale vs `cli-schema.json`

Details

H1 — `src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:86-113` · correctness + cli-ux · Multi-model: confirmed

Severity: high · Confidence: high
Finding: Literal text containing spaces is split into separate TextInput actions, so the space is lost — send-keys "Hello world" types Helloworld.
Evidence: var tokens = keys.Split((char[]?)null, StringSplitOptions.RemoveEmptyEntries); (87) → each token falls through to actions.Add(new TextInput(token)); (113); the space is never reintroduced when sending chars (KeyboardInput.cs:73-76, 128-132). The documented example winapp ui send-keys "Hello world" (docs/ui-automation.md:289) is therefore broken. (Unknown tokens being typed as literal text is intentional/documented; the whitespace loss is the bug.)
Recommendation: Preserve whitespace within literal runs, or add an explicit literal path (e.g. --text <string>) that types the string verbatim. Fix or remove the broken "Hello world" doc example to match the chosen behavior.

H2 — `src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:142-151` · correctness + security · Multi-model: confirmed

Severity: high · Confidence: high
Finding: A partial SendInput write is treated as success, which can leave a modifier-down event without its matching key-up (a "stuck modifier" that affects the user's whole session).
Evidence: var sent = PInvoke.SendInput((uint)array.Length, pInputs, sizeof(INPUT)); (145) then if (sent == 0) (146) — only zero is treated as failure. The batch can contain modifier-down, key-down/up, and modifier-up events (KeyboardInput.cs:113-125), so sent < array.Length is silently accepted with no retry or best-effort release.
Recommendation: Treat sent != array.Length as failure, and in a finally/cleanup path issue best-effort key-up for any modifiers pressed in the batch.

H3 — `src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77,143-179` · correctness + security · Multi-model: new finding

Severity: high · Confidence: high
Finding: The mouse path has the same defect: SendInputs accepts short writes as success, and Drag presses the button via separate calls with no finally cleanup — so a mid-gesture failure can leave a mouse button logically held down.
Evidence: Drag does SendButton(downFlag) (62) … SendButton(upFlag) (76) as separate calls with no try/finally; SendClick builds a two-event down/up batch (145-156); but SendInputs only checks if (sent == 0) (172-173). A short write strands the button-down. (This was independently flagged by the correctness pass for the drag-cleanup angle, MouseInput.cs:62-76.)
Recommendation: Require sent == inputs.Length in SendInputs, and wrap click/drag/right-drag gestures in try/finally that always attempts the button-up. Consider extracting the shared "send N inputs, verify count, release-on-failure" logic so keyboard (H2) and mouse use one correct implementation.

M1 — `src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs:85-130` · cli-ux

Severity: medium · Confidence: high
Finding: ui scroll accepts mutually conflicting modes and silently prefers --wheel, so --direction/--to are ignored without warning.
Evidence: Validation only rejects when all three are null (85); the handler branches on wheel is int delta (107) and only uses direction/to in the else (130).
Recommendation: Require exactly one of --direction / --to / --wheel; on conflict, return a non-zero invalid-arguments error naming the conflicting flags.

M2 — `src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134` · security

Severity: medium · Confidence: high
Finding: With --via send-input, focus success isn't verified immediately before sending, so keystrokes can land in whatever window is actually foreground (a different app, a UAC prompt, etc.).
Evidence: Lines 126-134 call SetForegroundWindow, wait, then send input without checking GetForegroundWindow against the resolved target.
Recommendation: For send-input, verify the foreground window belongs to the resolved target HWND/process right before sending and fail (non-zero) otherwise.

M3 — `src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134` · test-coverage

Severity: medium · Confidence: high
Finding: The new real-input verbs aren't in the WinUI e2e script that claims to cover all ui commands.
Evidence: scripts/test-e2e-winui-ui.ps1 Phase 3 exercises click/hover/scroll-pattern only — no send-keys, drag, or scroll --wheel. (The scroll --wheel zero-size unit path is covered.)
Recommendation: Add e2e steps for send-keys, drag, and scroll --wheel against the WinUI sample.

M4 — `src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:91-124` · correctness

Severity: medium · Confidence: high
Finding: Any multi-character token containing + is treated as a chord, so realistic literals like C++ or a+b throw instead of typing.
Evidence: if (token.Contains('+') && token.Length > 1) (92) then throw new FormatException($"Invalid key combo '{token}'..."); (124).
Recommendation: Only treat a token as a chord when the leading segments are valid modifiers; provide an escape/literal path for +.

M5 — `src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:119-145` · test-coverage

Severity: medium · Confidence: high
Finding: Parser negative/edge coverage is shallow.
Evidence: Tests cover happy paths plus empty input, invalid vk=, and unknown modifier, but not malformed combos (ctrl+, +a, ctrl++a, ctrl+bogus), brace-like/empty {} literals, or Unicode tokens (KeyStringParserTests.cs).
Recommendation: Add [DataRow]-driven tests for malformed combos and the intended literal behavior of brace/+/Unicode inputs — these would directly pin H1 and M4.

M6 — `src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:58-69` · test-coverage

Severity: medium · Confidence: high
Finding: Modifier down/up pairing and stuck-key-on-failure behavior are untested.
Evidence: Command tests use FakeKeyboardInput and assert transport/action counts, but nothing observes the KeyDown(main) → KeyUp(main) → KeyUp(modifier) order or simulates a failure after a modifier-down.
Recommendation: Add a keyboard-event sink (or fault-injection wrapper) so tests assert modifier release order and release-on-exception — the regression test for H2.

L1 — `src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77` · test-coverage

Severity: low · Confidence: high
Finding: Drag interpolation and button-event sequencing aren't covered.
Evidence: UiCommandTests assert FakeMouseInput.Drag receives endpoints/right-button, but not the multi-step movement stream or down/up ordering implemented in MouseInput.
Recommendation: Add a low-level mouse-event sink test (or e2e drag scenario) validating the interpolated path and button ordering — the regression test for H3.

L2 — `src/winapp-npm/src/winapp-commands.ts:1003-1010` · packaging

Severity: low · Confidence: high
Finding: The autogenerated npm command surface is stale vs docs/cli-schema.json for send-keys descriptions.
Evidence: winapp-commands.ts lacks the newer per-keystroke/TextChanged wording present at docs/cli-schema.json:3308-3309; npm run generate-commands:check reports out of date.
Recommendation: Re-run npm run generate-commands from the updated schema and commit the regenerated winapp-commands.ts.

Disputed / dropped by cross-model

A docs finding claimed the send-keys KEY-STRING grammar was undocumented. Dropped: docs/ui-automation.md:295-314 already has an explicit Key grammar block (named keys, sequences, modifier combos, literal text, raw VKs) plus transport/KeyDown behavior, mirrored in the skill fragment. The cross-model pass also noted that {KEYNAME}/{{/}} escaping isn't implemented in KeyStringParser.cs, so documenting it would be wrong. The only genuine doc issue is the broken "Hello world" example, folded into H1.

_{Generated by the winappcli pr-review skill · 8 specialist passes + GPT-5.4 cross-model verification · 2026-06-23. No code was built or executed; verify findings before acting. 🤖}

nmetulev · 2026-06-23T22:32:26Z

🤖 AI-generated hands-on test report — produced by the GitHub Copilot CLI agent (Claude Opus 4.8). This complements the earlier automated pr-review comment with empirical testing: I built the CLI from this branch, built a purpose-made WinUI 3 target app (with AutomationIds + a ground-truth event log capturing KeyDown / TextChanged / Char / slider value / scroll offset / clicks), and fired every new verb at it. Findings below were observed at runtime, not just read from the diff. Verify before acting.

Hands-on test results — `ui send-keys` / `ui drag` / `scroll --wheel`

Setup: Debug build of winapp.exe; packaged WinUI 3 InputTestApp as the injection target; results read back via winapp ui get-value <EventLog/InputBox/SliderValue/ScrollState/ButtonState>.

TL;DR

Mouse verbs (drag, scroll --wheel) and send-input text injection work well. The problems are concentrated in send-keys: the documented "Hello world" example is broken, and the default transport silently does nothing for typed text on WinUI 3 while reporting success. On security: Ctrl+Alt+Del cannot be synthesized (good), but there is no guard at all on other system combos via --via send-input.

✅ What works (verified)

Verb	Result
`drag DragSlider 225,24 400,24`	slider value 50 → 91; reverse drag → 9. Offsets-from-top-left + screen mapping correct. `--right` and zero-size (start==end) work.
`scroll ScrollList --wheel -360 / 360`	offset 0 → 89 → 0 (delta sign correct, clamps at top).
`scroll ScrollList --direction down` / `--to`	ScrollPattern paging works (offset → 441).
`send-keys --via send-input` typed text	Unicode `é€`, symbols `a1!`, case-insensitive `CTRL+A`/`DELETE`, raw `vk=0x0d`, named keys/arrows — all correct, real per-char KeyDown+Char observed.
JSON output (`--json`)	clean, parseable, no log lines mixed in.
Arg validation	bad `from`/`to` points, missing args, invalid `--via` → exit 1 with actionable messages.

🐞 Confirmed bugs & usability issues

1. `send-keys "Hello world"` types `Helloworld` — space dropped (confirms static-review H1) · High

Ran send-keys "Hello world" --target InputBox --via send-input. Event log:

KeyDown H, e, l, l, o, w, o, r, l, d   (no space keystroke)  →  InputBox = "Helloworld"

Exit code 0, prints ✅ Sent keys "Hello world". The space is the token separator (keys.Split(null, RemoveEmptyEntries) in KeyStringParser.cs:87) and is never re-emitted. The documented example winapp ui send-keys "Hello world" (docs/ui-automation.md:289) does not work. The only way to type a space today is the space named token: send-keys "Hello space world" → correctly produced Hello world. Recommend a literal path (e.g. --text "<string>") and fixing the doc example.

2. Default `post-message` transport silently no-ops typed text on WinUI 3 · High / Medium

Ran send-keys "x y z" --target InputBox --via post-message (post-message is the default). Result: InputBox stayed empty, EventLog stayed empty — zero events — yet the CLI printed ✅ Sent keys. WM_CHAR posted to the WinUI host HWND is not turned into text by the XAML input pipeline. Since post-message is the default and WinUI 3 is winapp's primary framework, a user running the documented happy path (winapp ui send-keys "hello" -a MyApp) gets a false success with nothing typed. (Named keys do deliver KeyDown via post-message — see below — only WM_CHAR text is dropped.) Recommend: detect/report when post-message text delivery is unsupported, or default typed-text to send-input, and at minimum make the success message honest.

3. Accelerators / combos don't fire via `post-message` · Medium

send-keys "ctrl+t" (a KeyboardAccelerator) via post-message did not fire (click counter unchanged, nothing logged). Via send-input it fired (Accelerator Ctrl+T, counter incremented). Worth calling out explicitly in --via help that combos/accelerators require send-input. (Named non-modifier keys like down down up do log KeyDown via post-message, so behavior is inconsistent across token kinds.)

4. Cannot type text containing `+` (confirms static-review M4) · Medium

send-keys "C++" → ❌ Invalid key combo 'C++'. send-keys "a+b" → ❌ Unknown modifier 'a' in 'a+b'. Any token with + is forced through chord parsing, so realistic literals fail.

5. `scroll` accepts conflicting modes (confirms static-review M1) · Medium

scroll ScrollList --direction down --wheel 120 passes validation (no error) and silently prefers --wheel. Recommend requiring exactly one of --direction / --to / --wheel.

6. Minor — inconsistent error formatting

The "no scroll mode specified" error prints [ERROR] - Specify --direction... (raw logger format) while every other ui error uses the ❌ symbol. Cosmetic but visible.

🔐 Security assessment (the headline ask: Ctrl+Alt+Del & system inputs)

Ctrl+Alt+Del — safe, cannot be synthesized ✅

Fired send-keys "ctrl+alt+del" --via send-input at the app. The Secure Attention Sequence did not trigger — no lock screen, no secure desktop, session stayed fully interactive. The three keys arrived at the app as ordinary KeyDowns (Control, Menu, Delete). This is a Windows kernel guarantee (SAS is unsynthesizable via SendInput), not something this tool implements — but the net result is that ctrl+alt+del is harmless here.

No blocklist for any other system combo ⚠️

The parser + injector accept and would inject every dangerous combo via --via send-input, with no warning, confirmation, or allowlist (verified each is accepted at parse stage):
win+l (lock workstation) · win+r (Run) · win+d (show desktop) · win+e · ctrl+shift+esc (Task Manager) · alt+tab · alt+f4 · ctrl+esc (Start) · printscreen · raw vk=0x5B (Win).
Via send-input these execute system-wide — e.g. win+l would lock the machine. I also confirmed alt+f4 via the (non-elevated, window-scoped) post-message transport closed the target window with no confirmation.

Why this is moderate rather than critical

The default transport is window-scoped and cannot trigger system hotkeys. win+l via post-message just posts harmless messages to the target window — the dangerous path requires the explicit, documented --via send-input.
UIPI already protects elevated targets: SendInput from a non-elevated CLI into a higher-integrity window fails, and the code surfaces a clear error. You can't drive an elevated app without elevating the CLI.
For an automation/test tool, synthesizing win+l etc. is partly by design (you may want to test lock/again behavior).

Recommended hardening

Verify foreground before send-input (ties to static-review M2). send-keys/drag/scroll --wheel call SetForegroundWindow then sleep 100 ms and inject without checking it succeeded. SetForegroundWindow silently fails from a background process under the foreground-lock timeout, so an injected sequence — including a system combo — can land on whatever window is actually foreground, not the intended target. Check GetForegroundWindow() belongs to the resolved HWND/process and fail otherwise.
Guard system-reserved combos. Warn (or require an explicit --allow-system-keys) when synthesizing win+*, ctrl+shift+esc, ctrl+alt+del via send-input.
Fix the short-write handling (static-review H2/H3). Both keyboard and mouse paths treat a partial SendInput write as success with no release-on-failure. A partial send that strands Win/Ctrl/a mouse button down is materially more dangerous when those are system modifiers — combine the fix with best-effort key-up/button-up in a finally.

Suggested test additions for the PR

A parser test asserting "Hello world" round-trips a space (would catch Bump form-data from 4.0.3 to 4.0.4 in /sample #1).
A WinUI e2e step typing text via both transports and asserting the resulting TextBox value (would catch Bump form-data from 4.0.3 to 4.0.4 in /sample #1 and Bump tmp from 0.2.3 to 0.2.4 in /sample #2).
A scroll test asserting conflicting --direction+--wheel is rejected (Added C# SDK. #5).

_{Hands-on testing performed against a packaged WinUI 3 harness on Windows; the harness was removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-23. 🤖}

Fixes for the pr-review findings on the UI input-injection verbs: - H1: send-keys preserves whitespace in literal text - adjacent literal tokens coalesce into one TextInput so "Hello world" types verbatim. - H2: SendInput (keyboard) requires full delivery; on a short/zero write it releases any held keys to avoid a stuck modifier, with distinct zero-vs-partial error messages. - H3: mouse Drag releases the button in a finally block if a move/up throws; SendInputs requires full delivery (was sent==0). - M1: scroll rejects conflicting --direction/--to/--wheel instead of silently preferring one. - M2: --via send-input verifies the target owns the foreground window before injecting (new foreground_not_target error code). - M3: e2e script now exercises send-keys (literal + named key) and scroll --wheel against the WinUI sample app. - M4: a '+' token is only a chord when leading segments are real modifiers, so 'C++', 'a+b' type literally; 'ctrl+bogus' still errors. - M5: expanded KeyStringParser unit tests (literals, coalescing, vk edge cases, modifier-led unknown main key). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… formatting Follow-up to Nikola's empirical test report on the input verbs: - #2/#3: send-keys via post-message warns when literal text is sent, since WinUI 3 / XAML drops posted WM_CHAR (typed text silently no-ops). Docs corrected to recommend --via send-input for typed text on WinUI/WPF. - Hardening #1: extend the foreground-ownership guard (was send-keys only) to drag and scroll --wheel via a shared ForegroundGuard helper, so an OS-wide gesture can't land on the wrong window when SetForegroundWindow silently fails. Returns foreground_not_target. - #6: the "no scroll mode specified" error now uses the same ❌ symbol as every other ui error (was raw [ERROR] format). System-reserved-combo blocklist (report hardening #2) intentionally left for discussion — it changes the tool contract and partly conflicts with legitimate automation (testing win+l etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azchohfi · 2026-06-23T22:41:32Z

Thanks @nmetulev — really thorough, both the static review and the hands-on test pass. I pushed two commits (7642645, b511ed7) addressing the findings. Summary:

Fixed

Finding	Fix
H1 / hands-on #1 — `"Hello world"` → `Helloworld`	Parser now coalesces adjacent literal tokens into one `TextInput` with the space preserved, so quoted phrases type verbatim. `down enter world` stays distinct. Regression test added.
H2 — keyboard short/zero `SendInput` write reported as success	`SendViaSendInput` now requires `sent == length`; on a short/zero write it releases any held keys (reverse-order key-up) so a modifier can't stick, with distinct zero-vs-partial messages.
H3 — mouse short-write + no release-on-throw	`Drag` releases the button in a `finally` if a move/up throws; `SendInputs` requires full delivery (was `sent == 0`).
M1 / hands-on #5 — `scroll` accepts conflicting modes	Rejects more than one of `--direction`/`--to`/`--wheel` with `invalid_arguments`. Test added.
M2 + hands-on hardening #1 — no foreground verify before `send-input`	Added a shared `ForegroundGuard`; `send-keys --via send-input`, `drag`, and `scroll --wheel` now verify the target owns the foreground window and fail with `foreground_not_target` rather than acting on the wrong window.
M4 / hands-on #4 — can't type tokens containing `+`	A `+` token is only a chord when its leading segments are real modifiers, so `C++`, `a+b`, `1+1` type literally; `ctrl+bogus` still errors. Tests added.
M5 — parser edge tests	Expanded `KeyStringParserTests` (literals, coalescing, distinct runs, vk valid/invalid/zero/overflow, modifier-led unknown main key).
M3 — e2e coverage	`test-e2e-winui-ui.ps1` now exercises `send-keys` (literal text with spaces + named key) and `scroll --wheel` against the WinUI sample.
hands-on #2/#3 — post-message silently no-ops typed text on WinUI 3	`send-keys --via post-message` now warns when literal text is sent (XAML drops posted `WM_CHAR`); docs corrected to recommend `--via send-input` for typed text on WinUI/WPF.
hands-on #6 — inconsistent error formatting	The "no scroll mode" error now uses the same ❌ symbol as every other `ui` error.

Intentionally left (open question)

Hardening Bump tmp from 0.2.3 to 0.2.4 in /sample #2 — system-reserved-combo blocklist / --allow-system-keys. I left this out for now because it changes the tool's contract and partly conflicts with legitimate automation (you might want to test win+l / lock behavior). The dangerous path already requires the explicit --via send-input, the default transport is window-scoped, and UIPI protects elevated targets. Happy to add a guard (warn-only, or opt-in --allow-system-keys) if you'd prefer — your call on the policy.
M6 / L1 — low-level keyboard-release-order and drag-interpolation unit tests. The release-order (H2) and interpolation (L1/H3) logic lives in the static KeyboardInput/MouseInput helpers that call PInvoke.SendInput directly; the IKeyboardInput/IMouseInput fakes only observe KeyAction lists / drag coords, not the synthesized INPUT array, so these can't be asserted without a SendInput injection seam. I covered them at the e2e layer (the new send-keys/scroll steps) instead of adding a seam purely for the test — let me know if you'd rather I introduce the seam.

All 95 parser/command tests pass; the 5 pre-existing EndToEndTests failures are environmental (the node wrapper needs npm run build in src/winapp-npm) and unrelated to these changes.

Drag (inline review thread): support `drag <from> <to>` where each endpoint is an element selector (resolves to its center) or app x,y coordinates as reported by `ui inspect`, alongside the retained legacy 3-arg `drag <selector> <fromOffset> <toOffset>` form (disambiguated by the third positional arg). Adds From/To to the JSON result. Hardening #2 (system-reserved combos): send-keys --via send-input now warns (advisory, still sends) before synthesizing win+*, ctrl+shift+esc, ctrl+alt+del, alt+tab, alt+f4, ctrl+esc, lone win/printscreen, via a new SystemKeyGuard helper. Bug #3: --via help now notes accelerators only fire via send-input. Tests: expand KeyStringParser coverage (modifier/key aliases, extended flags, function keys, raw-vk, whitespace, coalescing edge cases); new SystemKeyGuardTests; new 2-arg drag tests (selector/coords combinations, error paths); send-keys system-combo warn-only test. Docs: ui-automation.md, SKILL.md (x2), agent docs (x2), usage.md, and the drag/send-keys blocks in cli-schema.json updated for both forms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…clearer legacy error - SystemKeyGuard: warn on Alt+PrintScreen (active-window screenshot); add test row. - Drag tests: add per-call FindSingleResults queue to the UIA fake so selector->selector resolves two distinct elements; assert both endpoint centers; add coords->selector and to-selector-not-found cases. - UiDragCommand: legacy offset parse errors now explain that 3 args selects the legacy '<selector> <fromOffset> <toOffset>' form and 2 args is selector/coords -> selector/coords. - docs fragment (SKILL source): sync dual-form drag + system-combo send-input note. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ardTests CI builds with -warnaserror, so the inline 'new[] { ... }' constant arrays in CollectionAssert.AreEqual were promoted from CA1861 warnings to errors. Switch to element-wise Assert.AreEqual (count + indexer) which is equivalent and CA1861-clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes the three forward-compat gaps the microsoft-ui-reactor E2E harness hit when mapping its temporary SendInput fallback onto the new input verbs: - send-keys: add a `text=` token escape so a literal value that collides with a key/modifier name (enter, del, up, ctrl+a) is typed verbatim. Mirrors `vk=`, is checked first, and still coalesces with adjacent literal words. - drag: add `--dwell-ms` (hold at the destination before releasing) so drop targets / merge overlays that arm from a sustained hover can latch before the button-up — required for tear-off/merge drags that a plain A->B drag misses. - drag: add `--hold-ms` (hold at the press point before moving). With <from> == <to> this is a press-and-hold / long-press gesture. Threads holdMs/dwellMs through IMouseInput.Drag -> MouseInput.Drag and the UiDragResult JSON. 14 new unit tests (text= escape, hold/dwell flow, negative validation, defaults). Regenerated cli-schema.json, the ui-automation SKILL, and the npm command wrappers (the wrappers had drifted behind the schema and are now back in sync); updated docs/ui-automation.md by hand. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

nmetulev · 2026-06-24T05:08:31Z

🤖 AI-generated hands-on test report — rounds 2–3 (consolidated) (GitHub Copilot CLI, Claude Opus 4.8). After the PR was updated (80ea1d3) I rebuilt the CLI, ran the expanded unit suite, and stress-tested the verbs against two purpose-built harnesses — a dynamic WinUI 3 app (chaos timer that moves/resizes a target every ~200 ms, plus zero-size / add-remove-element / window-resize controls) and a throwaway Electron/Chromium app. Covers code review, unit tests, and live SendInput delivery. (The test box was briefly locked mid-run, which itself produced a useful security confirmation — noted below.)

Re-test of `ui send-keys` / `ui drag` / `scroll --wheel`

✅ Round-1 findings — all addressed, and confirmed live end-to-end

Prev finding	Fix	Verified
H1 `"Hello world"` → `Helloworld`	`CoalesceText` re-joins adjacent literal tokens	live: InputBox = `Hello world` (space preserved); unit `Parse_QuotedPhrase_PreservesSpacesAsSingleLiteral`
M4 `C++` / `a+b` rejected	`TryParseChord` only treats modifier-led tokens as chords	live: `"C++"` types `C++`; `a+b`, `1+1=2`, `+` parse as text; `ctrl+bogus` still rejected
M1 `scroll --direction --wheel` both accepted	`modeCount > 1` → error	live: `scroll --direction down --to bottom` → `❌ Specify only one of …`
M2 send-input lands on wrong window	new `ForegroundGuard` on all 3 gesture verbs	live both ways: refuses (exit 1 + JSON `foreground_not_target`) when not foreground; works once foregrounded
H2 keyboard short-write	`sent != length` → release held keys + throw	code `KeyboardInput.SendViaSendInput` + `ReleaseHeldKeys`
H3 drag button left down	`try/finally` best-effort button-up	code `MouseInput.Drag`
round-1 "false success" on post-message text	warns before injecting	code `UiSendKeysCommand` (but see N6 — the warning is now over-eager)
round-1 "no system-combo guard"	new `SystemKeyGuard` advisory warning	live: `ctrl+alt+del` / `printscreen` via send-input → ⚠ warning, still exit 0

Other live confirmations: drag thumb-grab moved a slider 50→89; scroll --wheel changed offset (sign correct); Unicode é€ lands verbatim. Unit tests: KeyStringParser 79 / SystemKeyGuard 28 / UiCommand 81 — all green. Zero-size → zero_size_element JSON error; missing element → "The UI may have changed … prefer targeting by AutomationId."

🐞 Findings

N1 · `send-keys` literal text mis-fires on any word that is a key name — High (usability) — confirmed live

Words that collide with key/modifier names fire keys instead of typing, with no escape syntax:

`send-keys …`	InputBox got	What happened
`"the end"`	`the`	`end` pressed End
`"go home"`	`go`	`home` pressed Home
`"press down"`	`press`	`down` pressed Down arrow
`"Hi enter Bye"`	`HiBye`	`enter` fired Enter between the words

Colliding words: space enter tab esc escape up down left right home end del delete ins insert pageup pagedown backspace return apps printscreen capslock f1..f16 + modifiers ctrl shift alt win cmd super meta menu control. So send-keys "<arbitrary prose>" is not reliable for typing text. Recommend a dedicated literal path, e.g. --text "<string>" (verbatim, no tokenization), and/or an escape ({end} = key vs end = word). This also fixes N2.

N2 · Literal text can't contain consecutive/edge whitespace — Low–Med — confirmed live

CoalesceText re-joins with a single space: "a b" → a b, " pad " → pad, tab → single space. Can't type double spaces, tabs, or indentation. (Naturally solved by --text.)

N5 · gesture verbs silently miss a moving/animating target (report success anyway) — Medium, NEW — the "changing size between checks" answer

With the chaos timer moving & resizing the target every ~200 ms:

Condition	Clicks that actually landed
Chaos OFF (stable)	6 / 6
Chaos ON (moving)	0 / 12

Every one of the 12 misses still reported ✅ click … at (x,y) and exit 0. The verbs resolve element bounds, then inject at those screen coordinates after a settle/foreground delay (~50–150 ms click, ~500 ms full drag); if the element moves/resizes in that window the gesture hits the stale rectangle (empty space) and the element never receives it — yet the command reports success. Related: a drag starting off the grabbable part (slider track vs thumb) also no-ops while reporting ✅. So click / drag / scroll --wheel give no signal that the gesture actually hit the target under shifting UI. Suggested: (a) re-read bounds immediately before the press / verify cursor-still-over-element; (b) optionally verify a post-condition (focus/value change) and return non-zero on miss; (c) prefer UIA-pattern verbs (invoke, set-value, scroll --direction/--to) for moving targets — those acted correctly throughout.

N6 · post-message-text behavior + warning are framework-dependent — Low–Med, NEW (from the Electron pass)

send-keys text via post-message is dropped by WinUI 3/XAML but consumed by Electron/Chromium (and typically Win32/WPF). Yet the CLI emits its warning unconditionally ("…may not be delivered to WinUI 3 / XAML apps (WM_CHAR is dropped)…") even when targeting Electron, where it works — so it false-alarms on non-XAML apps. Recommend making the warning framework-aware (only for XAML/WinUI targets) or softening it.

N3 · misleading "elevated" error when there is no input desktop — Low

On a locked/secure desktop, SendInput returns 0 and the older ui click/ui hover report "the target window may be elevated … run as administrator" — wrong cause. The new verbs avoid this (their ForegroundGuard fires first). Consider a "no interactive desktop (session locked?)" message and applying the foreground guard to click/hover too.

N4 · send-input / drag / scroll --wheel require an unlocked, foreground, interactive session — Informational (by design)

A consequence of the (correct) ForegroundGuard: these refuse unless the target is genuinely foreground, so they can't be driven from a background/headless/CI or locked context. The UIA-path verbs (set-value, invoke, scroll --direction/--to) remain headless-friendly. Worth a docs callout.

🔐 Security — re-assessment (substantially improved)

Ctrl+Alt+Del: unsynthesizable (kernel-protected SAS).
Locked / secure-desktop boundary — confirmed empirically: with the box locked (LogonUI on the secure desktop), GetForegroundWindow()=0 and SendInput=0 — a user-session process cannot inject into the lock screen (can't type a password into LogonUI). Strong OS-enforced boundary.
ForegroundGuard prevents the M2 "wrong window" multiplier (verified live, both directions).
SystemKeyGuard is advisory — warns, doesn't block — for win+<key>, lone win, ctrl+alt+del, ctrl+shift+esc, ctrl+esc, alt+tab, alt+esc, alt+f4, printscreen, alt+printscreen via send-input (verified live). Reasonable for an automation tool; gate behind --allow-system-keys if hard blocking is ever wanted.
Short-write handling: keyboard releases held keys before throwing; drag releases the button in finally — a partial SendInput can't strand a (possibly system) modifier/button.

Net: the round-1 dangerous-combo concern is now mitigated by layered guards (foreground verification + advisory warning + SAS/secure-desktop protection + short-write recovery) rather than a hard blocklist — a sound design.

🧩 Cross-framework — WinUI 3 vs Electron / Chromium

Behavior	WinUI 3 / XAML	Electron / Chromium
`send-keys` text via send-input	✓ types	✓ types
`send-keys` text via post-message	✗ dropped (WM_CHAR)	✓ works
H1 `"Hello world"`	✓ `Hello world`	✓ `Hello world`
M4 `"C++"`	✓ `C++`	✓ `C++`
N1 `"the end"`	"the" + End key	identical
N2 `"a b"`	`a b`	identical
`drag` slider	50→89	50→81
`scroll --wheel`	✓	✓ (0→200)
`scroll --direction` (UIA)	✓	✓ (200→654)

Parser-level findings (H1/M4/N1/N2) are framework-independent. Electron targeting caveat: DOM elements are visible to ui inspect/search/--target/drag-scroll resolution only when renderer accessibility is active (I launched with --force-renderer-accessibility; Chromium also enables it on-demand when a UIA client attaches) — worth a docs note for Electron users.

🤖 Agent-ergonomics feedback (opportunities to improve)

Separate from correctness — candid notes from driving ui heavily as an automated agent (no eyes on the screen). The verbs are powerful and well-guarded, but the ergonomics currently assume a human who can see the result; for an agent that costs extra round-trips (tokens) and risks false conclusions.

Self-verifying actions (highest impact). click/drag/scroll --wheel report ✅/exit 0 from injecting at coordinates, not from the element receiving anything (see N5). With no eyes, my only recourse is a separate get-value/inspect call, so every logical action becomes resolve → act → verify (3 invocations). Returning post-action state in the JSON envelope and/or flipping exit code on no-effect would collapse that to one call and kill the silent-miss class.
Foreground requirement is hard to satisfy headlessly. The ForegroundGuard is the right security call, but from a background/agent context the verbs just return foreground_not_target; I had to hand-roll AttachThreadInput + SPI_SETFOREGROUNDLOCKTIMEOUT=0 to test at all. A first-class --activate (reliably foreground + verify) and a loud doc note would save every agent from reinventing that.
--text for literal typing (see N1/N2) — removes the single most likely way an LLM gets send-keys wrong (prose words firing keys).
State readback hygiene. get-value silently falls back to the accessible Name when there's no value (an empty TextBox read back as "Input box" made me think it had content) — tag the source ({"value":"","source":"name"}). And the Auto-selected HWND … pass -w 393384 … banner prints to stdout interleaved with the value, which both wastes tokens and corrupts parsing — route advisory chatter to stderr and never mix it into a value payload. Auto scroll-into-view (or an explicit offscreen:true in resolve output) would also drop a mandatory extra call.
Output / token hygiene. Every inspect/search appends a "Found N elements. Use the first token as selector, e.g. …" footer — useful once for a human, pure overhead per agent call; suppress under --json/a terse mode. Make --json purely machine-readable everywhere (structured errors and warnings — extend the great foreground_not_target/zero_size_element codes to all outcomes; keep ❌⚠✅ prose for human mode only). A batch/script mode (run several verbs against one resolved session) would amortize process-startup + re-resolution — the biggest token win. And drop the unconditional post-message warning on non-XAML targets (N6) so warnings stay trustworthy.

Worth keeping as-is: the error copy is genuinely excellent for agents ("The UI may have changed … prefer targeting by AutomationId — these survive layout changes"), the structured error codes are exactly right, and the guard layering is coherent. The ask is mainly to make each call self-verifying and machine-clean so an agent doesn't reconstruct "did that work?" from extra inspections.

Verdict

The round-1 fixes hold up under live injection on both WinUI 3 and Electron. Open items, by impact: N1 (literal-text/key ambiguity → add --text), N5 (gesture silent-miss on moving UI, success-coded → outcome verification), N6 (post-message warning not framework-aware), N2 (whitespace doc note), and the minor N3/N4 polish.

_{Live testing on WinUI 3 (chaos/zero-size/resize harness) and an Electron/Chromium harness; both removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-23. 🤖}

nmetulev · 2026-06-24T02:47:11Z

 - `ui screenshot -a <app> [--output file.png] [--json] [--focus] [--capture-screen]` — capture window as PNG. Default uses Windows.Graphics.Capture (composited surface — preserves rounded corners and works while occluded), with PrintWindow as fallback. Use `--focus` to bring the window to the foreground first; use `--capture-screen` for popup overlays not owned by the target window.
 - `ui invoke <selector> -a <app>` — activate element by slug or text search. Auto-walks to invokable ancestor for non-invokable elements.
 - `ui hover <selector> -a <app> [--dwell-time <ms>]` — move mouse to element center to trigger tooltips, flyouts, and hover states. Use with `ui screenshot --capture-screen` to capture the result.
+- `ui drag <from> <to> -a <app> [--right]` — press the mouse button at one point, move to another, and release (reorder, resize, sliders, drag-and-drop). Each of `<from>`/`<to>` is an element selector (drags from/to its center) or app coordinates `x,y` as reported by `ui inspect`. Legacy form `ui drag <selector> <fromX,fromY> <toX,toY>` is still accepted, where the two `x,y` are pixel offsets from the element's top-left corner.


There is no legacy form, it never shipped - remove that to keep it simple

Removed — there's no legacy form anymore. drag is now only drag <from> <to>, where each endpoint is an element selector (drags from/to its center) or app-relative x,y coords from ui inspect. I dropped the to-offset argument, the legacy code branch, and the ElementId field from the JSON result (only the legacy path ever set it), and cleaned up every reference — both agent files, both SKILL.md (+ the shared template), docs/ui-automation.md, the generated cli-schema.json/SKILL tables, and the npm command wrapper. The legacy drag tests were removed and the remaining drag tests converted to the two-arg form.

nmetulev · 2026-06-24T02:50:23Z

+# Raw app coordinates → app coordinates
+winapp ui drag 120,200 480,200 -a myapp
+
+# Legacy: offsets from the element's top-left corner


same here, no such thing as legacy

Removed — there's no legacy form anymore. drag is now only drag <from> <to>, where each endpoint is an element selector (drags from/to its center) or app-relative x,y coords from ui inspect. I dropped the to-offset argument, the legacy code branch, and the ElementId field from the JSON result (only the legacy path ever set it), and cleaned up every reference — both agent files, both SKILL.md (+ the shared template), docs/ui-automation.md, the generated cli-schema.json/SKILL tables, and the npm command wrapper. The legacy drag tests were removed and the remaining drag tests converted to the two-arg form.

nmetulev · 2026-06-24T05:05:24Z

+Press the mouse button at one point, move to another, then release. Two forms:
+
+- **Preferred — `drag <from> <to>`:** each endpoint is either an **element selector** (drags from/to the element's center) or **app coordinates `x,y`** exactly as reported by `winapp ui inspect`. Mix and match freely (selector→selector, selector→coords, coords→coords).
+- **Legacy — `drag <selector> <fromX,fromY> <toX,toY>`:** the two `x,y` values are pixel **offsets from the element's top-left corner**. Disambiguated by the third positional argument; existing scripts keep working unchanged.


here too and many other places re legacy

Removed — there's no legacy form anymore. drag is now only drag <from> <to>, where each endpoint is an element selector (drags from/to its center) or app-relative x,y coords from ui inspect. I dropped the to-offset argument, the legacy code branch, and the ElementId field from the JSON result (only the legacy path ever set it), and cleaned up every reference — both agent files, both SKILL.md (+ the shared template), docs/ui-automation.md, the generated cli-schema.json/SKILL tables, and the npm command wrapper. The legacy drag tests were removed and the remaining drag tests converted to the two-arg form.

nmetulev · 2026-06-24T05:07:06Z

+**Choosing a transport / known limits:**
+- `post-message` is the default because it bypasses UIPI and doesn't depend on the window being foreground. Limits: it cannot trigger global hotkeys registered through `WH_KEYBOARD_LL` low-level hooks (those tap input upstream of any window queue), and apps that read raw key state via `GetAsyncKeyState` may not observe held modifiers. For classic Win32/WinForms apps whose controls are separate child windows, target the control's native window handle with `-w` (or `--target`) so keys reach the right control. WinUI 3 / WPF apps have a single window and route keys to the internally focused element, so targeting the top-level window works.
+- `send-input` produces fully real input (modifiers visible to `GetAsyncKeyState`, fires low-level hooks) but goes to whatever window is foreground and is **blocked by UIPI when injecting from an elevated process into a lower-integrity (AppContainer/AppX) target**. If `send-input` reports a failure, the target is likely elevated or an AppX app — use `post-message`, or run the CLI at a matching integrity level. As a safety guard, `send-input` verifies the target window is actually in the foreground immediately before injecting and **fails (`foreground_not_target`) rather than typing into the wrong window** if focus could not be brought to it — focus or click the window first.
+- **System-reserved combos** (`win+l`, `win+r`, `ctrl+shift+esc`, `ctrl+alt+del`, `alt+tab`, `alt+f4`, `ctrl+esc`, lone `win`/`printscreen`, …) act on the OS/shell rather than just the target when sent OS-wide. `send-input` **emits a warning** before synthesizing them (e.g. `win+l` would lock the session) but still sends them — synthesizing them can be the legitimate intent of a test. `post-message` is window-scoped and is not affected (a posted `win+l` is harmless, though a posted `alt+f4` still closes the target window).


We shouldn't allow these IMO

Agreed — changed from warn-and-send to reject. send-keys --via send-input now errors with invalid_arguments and sends nothing when it sees a system-reserved combo (win+l, alt+f4, ctrl+alt+del, alt+tab, …), since injecting those OS-wide has effects well beyond the target window. --via post-message stays unaffected because it's window-scoped (a posted win+l is harmless), so it remains the escape hatch if you genuinely need to deliver one to a specific window. Updated the SystemKeyGuard doc + prose and the tests (SendKeys_SystemCombo_ViaSendInput_IsRejected asserts exit 1 + 0 sends; added SendKeys_SystemCombo_ViaPostMessage_StillSends).

nmetulev · 2026-06-24T05:08:36Z

+**Options:**
+- `--direction <up|down|left|right>` — Scroll incrementally via `ScrollPattern`.
+- `--to <top|bottom>` — Jump to the start/end via `ScrollPattern`.
+- `--wheel <delta>` — Synthesize mouse-wheel input over the element's center via `SendInput`. One notch is `120`; positive scrolls up/away, negative down/toward. Bypasses `ScrollPattern`.


Why 120 and not just 1?

Good call — --wheel now takes notches, so 1 = one notch up and -1 = one notch down. The reason it was 120 is that 120 is the Windows WHEEL_DELTA constant: SendInput's MOUSEEVENTF_WHEEL measures rotation in units where one physical detent = 120, so the old value was the raw Win32 unit leaking through the CLI surface. The command now multiplies the notch count by WHEEL_DELTA (120) internally — e.g. --wheel 3 injects a delta of 360 — so callers think in notches while SendInput still gets the right magnitude. Updated the option description, the example (--wheel -1), and the tests (--wheel -1 → fake delta -120; added Scroll_Wheel_MultipleNotches_ScaleByWheelDelta for --wheel 3 → 360).

… wheel in notches - ui drag: remove the never-shipped legacy 3-arg form entirely; only `drag <from> <to>` (selector or app coords per endpoint). Drop the to-offset arg, legacy branch, and the ElementId result field (legacy-only). - ui send-keys: reject system-reserved combos on --via send-input (error + no send) since OS-wide injection reaches beyond the target; --via post-message (window-scoped) unaffected. Document the text= escape for typing literal phrases that collide with key names. - ui scroll --wheel: take notches instead of raw WHEEL_DELTA units (1 = one notch); scale by WHEEL_DELTA (120) internally before SendInput. - Update tests, docs (ui-automation.md, SKILL template + generated tables, agent files, cli-schema.json, npm wrapper). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… click/hover, docs Addresses the consolidated pr-review report on the input-injection verbs. - M2/M3 (security): move the foreground guard to AFTER the stabilize awaits in drag/scroll, immediately before injection, so a focus steal during the awaited re-resolve can't route the OS-wide gesture to the wrong window. - M4: extract the N5 target-stability re-resolve into GestureTargeting and reuse it across click/hover/drag/scroll; add moving-target + settle-after-move tests. - M5: extract pure ForegroundGuard.Classify so the no_interactive_desktop vs foreground_not_target selection is unit-tested without a live desktop. - M6: gate the post-message WM_CHAR warning behind FrameworkHint.IsLikelyXaml and extract ShouldWarnPostMessageTextDropped, so the gate is unit-tested and only warns on XAML targets instead of false-alarming Win32/WPF/Electron apps. - L1: click/hover re-resolve their target just before injecting (refuse on a moving/vanished target) instead of acting on a stale rect. - L2: drag target_moved errors report the caller's selector token, not "from"/"to". - L3: decode backslash whitespace escapes in text= values (\s \t \n \r \\) so multi/leading whitespace survives the tokenizer; add coverage. - MouseInput/KeyboardInput: distinguish a locked/secure desktop (no interactive desktop) from elevation/UIPI in the SendInput failure message. - Docs: document the interactive-desktop requirement, target_moved re-resolve, the text= escape table, and the XAML-only post-message caveat across ui-automation.md, the fragment, and both SKILL.md mirrors (version unchanged). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azchohfi · 2026-06-24T19:27:06Z

Thanks again @nmetulev — the rounds 2–3 report was incredibly useful. Everything actionable is now in, across ae59996, 0d200e5, and 007af17. Per finding:

Finding	Status	What changed
N1 literal text fires keys on collision words (`"the end"` → `the` + End)	✅ Fixed	Per-token `text=` escape — `send-keys "text=the text=end"` types `the end`, while bare `end`/`down`/`enter` still press the key. The keys path is untouched for real key presses. (`ae59996`)
N2 can't type consecutive/edge whitespace	✅ Fixed	Backslash escapes inside `text=`: `\s \t \n \r \\`. `text=a\s\sb` → `a b`, leading/trailing padding survives; unknown escapes pass through verbatim. (`007af17`)
N3 misleading "elevated" error when there's no input desktop	✅ Fixed	The SendInput-failure message now distinguishes a locked/secure desktop (new `no_interactive_desktop` code) from elevation/UIPI, and `click`/`hover` re-resolve before injecting too. (`007af17`)
N4 injection needs an unlocked, foreground, interactive session	✅ Documented	Interactive-desktop requirement is now called out in `ui-automation.md` + both SKILL mirrors, with the UIA-path verbs (`set-value`/`invoke`/`scroll --direction\|--to`) flagged as the headless-friendly alternative. (`007af17`)
N5 gesture silent-miss on a moving target, success-coded	✅ Fixed	New `GestureTargeting.ResolveStableAsync` re-reads bounds immediately before injection on click / hover / drag / scroll --wheel; a target that's moving (> 2 px across reads), vanished, or zero-size now returns `target_moved` (exit 1) instead of a false `✅`. Stable targets behave exactly as before. (`007af17`)
N6 post-message warning not framework-aware	✅ Fixed	The WM_CHAR-dropped warning is now gated behind `FrameworkHint.IsLikelyXaml(hwnd)` — only XAML/WinUI targets get it; Electron/Win32/WPF (where post-message text actually lands) no longer false-alarm. (`007af17`)

Security. Per the thread, system-reserved combos via --via send-input (win+l, alt+f4, ctrl+shift+esc, …) are now hard-rejected — invalid_arguments, exit 1, nothing synthesized — rather than advisory-warned, and we deliberately did not add a --allow-system-keys escape hatch. The secure-desktop boundary you confirmed empirically (GetForegroundWindow=0 / SendInput=0 on the lock screen) is unchanged.

Agent-ergonomics. The two that live inside these verbs are in: literal typing (N1, via text=) and self-verifying on a moving/gone target (N5, exit 1 on miss). The broader platform asks — full post-action state in the JSON envelope for every action, a first-class --activate, get-value source-tagging, stdout/stderr split, --json purity everywhere, and a batch/session mode — are cross-cutting changes that reach well beyond these three input verbs, so I'd rather land them as focused follow-ups than expand this PR. Happy to file those and fold any specific one in here if you'd prefer it ride along.

New coverage: FrameworkHintTests, InjectionGuardTests (foreground-classification + post-message warn-gate truth tables), moving-target / settle-after-move cases in UiCommandTests, and the text= escape rows in KeyStringParserTests.

Copilot

Pull request overview

Copilot reviewed 38 out of 38 changed files in this pull request and generated 1 comment.

--wheel is expressed in notches now (UiScrollCommand scales by WHEEL_DELTA=120 internally), so --wheel -120 meant 120 notches down (-14400 raw delta) instead of a single detent. The scroll E2E assertion now exercises one notch down, matching the documented notch semantics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

nmetulev · 2026-06-25T23:03:31Z

🤖 AI-generated test report — round 4 (GitHub Copilot CLI, Claude Opus 4.8). Follow-up to the rounds 2–3 consolidated report, now that the PR was updated (65c4c93) in response to those reviews. I rebuilt, re-ran the unit suite (236 PR-relevant tests green), code-reviewed the deltas, and re-ran the live break-it matrix on fresh WinUI 3 + Electron harnesses (chaos timer moving/resizing a target every ~200 ms).

Round 4 — updated PR (`65c4c93`): fixes verified + new findings

✅ Every round-2/3 finding addressed (verified live unless noted)

Finding	Fix in this PR	Verified
N1 prose words fire keys	new `text=` escape token (`text=enter` types "enter")	live WinUI: `text=enter`→"enter", `text=ctrl+a`→"ctrl+a", bare `enter`→key still fires
N2 whitespace lost	backslash escapes in `text=` (`\s \t \n \r \\`)	live: `text=a\s\sb` → "a b" (double space)
N3 misleading "elevated" error when locked	new `no_interactive_desktop` code + message	verified live while the box was locked on send-keys/drag/scroll --wheel
N6 post-message warning false-alarmed on Electron	`FrameworkHint.IsLikelyXaml(hwnd)` scopes it to XAML	live: WinUI warns, Electron no warning + text types fine
round-3 "no system-combo block"	`SystemKeyGuard` now blocks (was advisory)	live: `ctrl+alt+del`/`printscreen`/`alt+f4`/`alt+tab` via send-input → exit 1, refused, not sent
ergonomics: JSON/value interleaving	JSON errors now → stderr; success envelopes → stdout	confirmed

Also new: drag --hold-ms (long-press) / --dwell-ms (drop settle); drag endpoints accept selector or x,y; a final ForegroundGuard re-check after the stabilize re-resolve closes the focus-steal race. Unit tests green: KeyStringParser 97, SystemKeyGuard 28, FrameworkHint 13, InjectionGuard 9, UiCommand 89.

🎯 N5 (silent-miss on a moving target) — the "changing size between checks" ask: much better, not fully closed

New GestureTargeting.ResolveStableAsync re-reads the element's bounds immediately before injecting, refuses zero-size / never-settling targets with target_moved / zero_size_element (non-zero exit), and clicks at the element's current position rather than a stale rectangle. Measured against the round-3 baseline (chaos ON = 0/12 landed, 12/12 silent ✅):

Harness (chaos ON, ~200 ms moves)	reported ✅	actually landed	false success	refused (`zero_size`/`target_moved`)
WinUI (3×10 clicks)	~2–4 /10	matches when stable	~2 /10	6–8 zero_size, 0–1 target_moved
Electron (10 clicks)	10/10	4/10	6/10	0
(baseline chaos OFF)	—	6/6, 5/5	0	0

So false-success dropped from ~100% to ~20% (WinUI) but is still ~60% on Electron. A reported success can still miss a continuously-moving target: the target moves during the ~50 ms cursor-settle inside MouseInput.Click() after ResolveStableAsync already returned "settled" — and on a smoothly-animating UI (Electron) neither target_moved nor zero_size fires, so the miss is silent. Suggest re-verifying the cursor-is-over-element (or re-reading position) immediately before the button-down, dropping/shortening the internal settle, or verifying a post-condition and returning non-zero on no-effect. (Continuously-moving targets are an edge case; for normal "foreground settles then static" UI the re-resolve is exactly right.)

🐞 New findings

F1 (Medium): click and hover were given the N5 re-resolve but not the ForegroundGuard.TryEnsureForeground gate that drag/scroll --wheel/send-keys have. Two effects (verified live while locked): (a) inconsistent error — click/hover return internal_error ("SendInput failed…") where the other three return the clean no_interactive_desktop; (b) click/hover still lack the wrong-window (M2) protection — if SetForegroundWindow silently fails and another window is foreground, they'd inject into it. Recommend routing click/hover through TryEnsureForeground too.
F2 (Low): the text= escape (the N1/N2 fix) is documented in docs/ui-automation.md but absent from send-keys --help / the keys argument description / cli-schema.json (0 matches) — so it's not discoverable via --help or the npm-wrapper / LLM-skill surface. Add text=<literal> to the argument description.
F4 (Low/compat): the drag argument shape changed to drag <from> <to> (each a selector or absolute x,y) from the prior drag <selector> <from-offset> <to-offset>. Breaking for existing scripts; pre-release, so just worth a changelog note.

Verdict (round 4)

Strong iteration — every prior finding is fixed or materially mitigated, with good new tests and structured/stderr JSON. The two things I'd still act on: F1 (give click/hover the same foreground guard) and F3/N5 (close the residual re-resolve→button-down race so a ✅ always means the gesture landed). Both are about the same north star from the ergonomics note: a success result should guarantee the action actually happened.

_{Live testing on WinUI 3 (chaos/zero-size/resize harness) and an Electron/Chromium harness; both removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-25. 🤖}

…race close + text= discoverability Round-4 test report (PR #587) flagged three actionable items; this addresses them. F1 — click/hover now gated by the foreground guard (parity with drag / scroll --wheel / send-keys). Introduce IForegroundGuard (RealForegroundGuard wraps the existing static ForegroundGuard.TryEnsureForeground so the classification logic stays the single source of truth) and inject it into all five gesture verbs. click/hover now fail with the clean no_interactive_desktop error on a locked session instead of a misleading internal_error, and gain wrong-window (foreground_not_target) protection. The DI seam also makes the deny path unit-testable without a live desktop. F3/N5 — close the residual re-resolve to button-down race on click. After the stable re-resolve and foreground check, position the cursor (new IMouseInput.MoveCursor), run the cursor-settle here, then do a final GestureTargeting.ConfirmStillAsync read immediately before the button-down, which now uses settleMs:0 (no second internal settle). A target that drifts during the settle window is caught and refused with target_moved instead of reporting a false success after landing on empty space. F2 — document the text=<literal> escape (incl. \s \t \n \r \\ backslash escapes) in the send-keys keys-argument description so it is discoverable via --help / cli-schema.json / the npm SDK, not only docs/ui-automation.md. Regenerated cli-schema.json, plugin/Claude skills, and the npm command SDK. F4 — the drag arg-shape change is pre-release and already reflected in help/docs; the repo has no CHANGELOG, so no changelog entry is added. Tests: 1509 pass, 5 pre-existing E2E_Node* env failures (need the npm wrapper build, unrelated), 1 skip. New tests cover click/hover guard deny+allow paths, the move-during-settle race, settleMs:0 on the press, and text= discoverability. Live click/drag verification against a real app remains blocked by the lock screen — these findings are unit-test-verified only. Refs #587 #562 #498 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azchohfi · 2026-06-26T03:43:17Z

Thanks for the round-4 report — addressed the two verdict items plus the F2 quick win in 4517a1e.

F1 (click/hover foreground guard) — click and hover now route through the same foreground gate as drag/scroll --wheel/send-keys. Rather than duplicate the static call, I introduced an IForegroundGuard DI seam (RealForegroundGuard wraps the existing ForegroundGuard.TryEnsureForeground, so the Classify logic stays the single source of truth) and injected it into all five gesture verbs. So click/hover now return the clean no_interactive_desktop on a locked session (instead of the misleading internal_error/SendInput failure) and gain the wrong-window foreground_not_target protection. The seam also makes the deny path unit-testable — added click+hover deny/allow tests.

F3/N5 (residual re-resolve→button-down race) — closed for click. After the stable re-resolve and foreground check, the command now positions the cursor (new IMouseInput.MoveCursor), runs the cursor-settle itself, then does a final GestureTargeting.ConfirmStillAsync read immediately before the button-down — and the press now uses settleMs: 0 so there's no second internal settle window. A target that drifts during the settle is caught and refused with target_moved instead of a false success. New test simulates a target that settles in ResolveStableAsync then moves on the confirm read → exit 1, no click injected.

F2 (text= discoverability) — added text=<literal> (incl. the \s \t \n \r \\ escapes) to the send-keys keys-argument description, so it now shows in --help, cli-schema.json, and the npm SDK — not just docs/ui-automation.md. Regenerated schema + plugin/Claude skills + npm commands.

F4 (changelog) — the repo has no CHANGELOG, and the breaking drag <from> <to> shape is already reflected in the verb help and docs/ui-automation.md, so there is nothing to add there. Happy to add one if you would like a CHANGELOG introduced.

Verification — F1 and F3 are deterministic logic fixes with full unit coverage; they do not depend on a live desktop. The F3 race is replicable in-process: the regression test drives a target that reads as "settled" in ResolveStableAsync and then drifts on the confirm read, and asserts the command refuses with target_moved and injects no click. I confirmed the test genuinely exercises the race by deleting the confirm read and watching it fail (a click lands on the moved target), then restoring it to green. The only check that still needs an unlocked session is the physical end-to-end input smoke — real-app pixel landing for click/drag and the WinUI 3 per-keystroke KeyDown spot-check — which exercises the OS input layer, not this logic. The PR stays draft for that manual gate.

Unit suite: 1509 pass, 1 skip, 5 pre-existing E2E_Node* env failures (need the npm wrapper build, unrelated to this change).

send-keys' per-token text= escape (e.g. text=enter) is easy to forget and awkward when the entire payload is literal text. Add a command-level --verbatim flag that types the whole keys argument as a single literal: no named-key/combo/vk=/text= interpretation, and exact whitespace preserved (the normal path collapses internal whitespace runs to a single space). - KeyStringParser.ParseVerbatim: whole-string TextInput, no tokenizing or escape decoding; throws on empty/whitespace like Parse. - UiSendKeysCommand: --verbatim option; branch the parse path on it; help and keys-arg description point at it. Verbatim text is a TextInput, so the send-input system-combo guard correctly does not refuse e.g. "win+l". - Tests: parser (literal / exact whitespace / no escape decode / empty) and command (types literal not keys, preserves whitespace, system-combo-as-text, missing keys still errors, option documented). - Docs + skill fragment + agent subcommand list updated; regenerated cli-schema.json and npm SDK now carry --verbatim. Refs #562, #587 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azchohfi · 2026-06-29T21:44:00Z

Added a --verbatim flag to ui send-keys (commit 4092cae).

Why: the per-token text= escape (e.g. text=enter) is easy to forget and awkward when the whole payload is literal text. --verbatim types the entire keys argument as one literal — no named-key/combo/vk=/text= parsing — and preserves exact internal whitespace (the normal path collapses runs of spaces to one, which text= could only fix with \s).

winapp ui send-keys "down down enter" -a myapp --verbatim   # types the words, doesn't press Down/Down/Enter
winapp ui send-keys "a  b" -a myapp --verbatim              # keeps the double space

KeyStringParser.ParseVerbatim — whole-string TextInput, no tokenizing/escape-decoding; throws on empty/whitespace like Parse.
Because verbatim text is a TextInput (not a chord), the --via send-input system-combo guard correctly does not refuse e.g. win+l typed as literal text.
New unit tests (parser + command), and the help text, cli-schema.json, npm SDK, docs, skill fragment, and agent subcommand list all updated.

Full suite green (1523 pass; the 5 E2E_Node* failures are the pre-existing env ones needing npm run build, unrelated). Still draft.

Address the pr-review findings on the send-keys (#562) and drag / scroll --wheel (#498) input-injection verbs. High: - H1 ForegroundGuard: compare the foreground window's root ancestor (GetAncestor GA_ROOT) to the target instead of PID-matching, so a child/owned window of the target counts as "foreground" and a same-PID but different top-level window does not. Drops the unsafe block. - H2 drag from-point settle race: move the cursor, wait, re-confirm the from element is still there (ConfirmStillAsync), re-check the foreground, then press with settleMs:0. Mirrors the click fix so "success" means the button-down landed on the resolved target, not empty space. Medium: - M1 KeyStringParser: a modifier-led chord with an empty segment around '+' ("ctrl++a", "ctrl+", "shift++enter") now throws instead of silently dropping the empty and pressing a different chord. Non-modifier-led tokens ("a+b", "C++", "ctrl+a+b") stay literal text. - M2 scroll --wheel: same cursor-settle / confirm / foreground sequence as drag before synthesizing the wheel. - M3 click: re-check the foreground immediately before the button-down (second gate), closing the same settle-window race on the success path. - M7 drag: a comma-shaped from/to token that isn't a valid x,y pair ("100,", "100,200,300") returns invalid_arguments instead of being treated as a selector and misreported as element-not-found. - M8 --verbatim: whitespace-only is legitimate verbatim content (types the spaces) rather than a "nothing to send" error; only genuinely empty is rejected. - M9 send-keys --via send-input: refuse OS-wide injection when no target window resolves (keystrokes have no location to verify the foreground against) instead of typing blindly into whatever has focus. - M10 send-keys: the human log reports the action count, not the raw keystrokes (which may carry secrets in persisted CI logs); --json still includes `keys` for opt-in callers. Tests: - M4 split the 1.5k-line UiCommandTests.cs into partial-class files (UiCommandTests.SendKeys.cs, UiCommandTests.Mouse.cs) so no file exceeds the ~1000-line guideline; behavior-preserving. - M5 foreground-deny tests for drag, scroll --wheel, and send-keys --via send-input. - M6 drag from-point moves-during-settle test (mirrors the click case). - Added malformed-chord, verbatim-whitespace, malformed-coordinate, and no-resolvable-target tests. 1540 pass / 5 pre-existing E2E_Node env fails / 1 skip. cli-schema content unchanged (no Description edits). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azchohfi · 2026-06-29T22:58:53Z

Self-review hardening round (commit `24d5575`)

Ran a multi-dimension review over this branch and addressed every finding it surfaced (2 high, 10 medium). All changes are unit-covered and the full suite is green: 1540 pass / 1 skip, with the only failures being the 5 pre-existing E2E_Node* tests that need src/winapp-npm/dist/cli.js (environment, unrelated to this branch).

High

H1 — ForegroundGuard now compares the foreground window's root ancestor (GetAncestor(GA_ROOT)) to the target rather than matching PID, so a child/owned window of the target counts as "in front" and a same-PID-but-different-top-level window does not. Removed the unsafe block.
H2 — drag from-point settle race (the same class as the click N5/F3 fix): move the cursor, wait, re-confirm the from element is still there, re-check the foreground, then press with settleMs:0. A ✅ now means the button-down landed on the resolved target, not on space it drifted away from.

Medium

M2 / M3 — applied the same cursor-settle → confirm → second foreground gate to scroll --wheel and to click's success path.
M1 — a modifier-led chord with an empty segment (ctrl++a, ctrl+, shift++enter) now errors instead of silently dropping the empty and pressing a different chord; non-modifier-led tokens (a+b, C++, ctrl+a+b) stay literal.
M7 — a comma-shaped drag endpoint that isn't a valid x,y pair (100,, 100,200,300) returns invalid_arguments instead of being treated as a selector and misreported as element-not-found.
M8 — --verbatim treats a whitespace-only argument as content to type (it's the whole point of verbatim), not a "nothing to send" error.
M9 — send-keys --via send-input refuses OS-wide injection when no target window resolves, rather than typing blindly into whatever has focus.
M10 — the human log reports the action count, not the raw keystrokes (CI logs are persisted and a sequence may carry a typed secret); --json still includes keys for opt-in callers.

Tests

M4 — split the 1.5k-line UiCommandTests.cs into partial-class files (UiCommandTests.SendKeys.cs, UiCommandTests.Mouse.cs) so no file exceeds the ~1000-line guideline. Behavior-preserving; same [TestClass], same fixtures.
M5 / M6 — added foreground-deny tests (drag, scroll --wheel, send-keys send-input) and a drag from-point moves-during-settle test, plus the malformed-chord / verbatim-whitespace / malformed-coordinate / no-resolvable-target cases above.

cli-schema.json content is unchanged (no Description edits this round — verified the live --cli-schema is byte-identical to the committed file modulo the dev version stamp). Still a draft.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

nmetulev · 2026-06-30T21:23:54Z

🤖 AI-generated test report — round 5 (GitHub Copilot CLI, Claude Opus 4.8). The PR was updated again (24d5575, v0.4.1) directly in response to the round-4 review. I rebuilt, re-ran the unit suite (270 PR-relevant tests green), code-reviewed the deltas, and re-ran the live break-it matrix on fresh WinUI 3 + Electron harnesses (chaos timer moving/resizing a target every ~150 ms).

Round 5 — updated PR (`24d5575`): round-4 findings fixed

✅ Every round-4 finding addressed (verified live on both frameworks unless noted)

Round-4 finding	Fix in this PR	Verified
F1 click & hover lacked the foreground guard	`ForegroundGuard` is now an injected `IForegroundGuard` and `TryEnsureForeground` is wired into all 5 gesture verbs (click, hover, drag, scroll --wheel, send-keys)	code: hover `:123`, click `:130`+`:157`; + unit tests; live: click/hover correctly foreground-then-act
F3 moving-target "success" could still miss	`MoveCursor` → settle → `ConfirmStillAsync` (fresh UIA re-read) → inject with `settleMs: 0`, so the button-down happens immediately after a confirm	the headline result — see below
F2 `text=` undocumented in `--help`	`text=` and the new `--verbatim` are now in the `keys` arg description + option help	confirmed in `send-keys --help`
my ergonomics ask: a safe "type a string" path	new `--verbatim` types the whole argument literally, exact whitespace preserved	live: `--verbatim "a b c"` → `get-value` returns `"a b c"` (len 8) on both WinUI + Electron

Unit tests: KeyStringParser 112, SystemKeyGuard 28, FrameworkHint 13, ForegroundGuard 6, UiCommand 111.

🎯 F3 (the round-4 silent-miss) — essentially fixed

Re-ran the moving-target break-it test (now at a faster 150 ms mutation). The new confirm-immediately-before-button-down closes almost the entire race:

Harness, chaos ON	round 4 false-success	round 5 false-success	round-5 refusals
Electron (15 clicks)	~6/10 (~60%)	1/15 (~7%)	14 `target_moved`
WinUI (15 clicks)	~2/10 (~20%)	0/15 (0%)	2 `target_moved` + 13 `zero_size`
(baseline chaos OFF)	—	6/6 land (both)	0

drag and scroll --wheel against a moving target also now refuse cleanly with target_moved (exit 1, JSON). The residual ~7% on Electron is the irreducible microsecond gap between ConfirmStillAsync's UIA read and the OS SendInput button-down — about as tight as coordinate injection can get. A reported ✅ now almost always means the gesture landed (vs round-4 where, under animation, it frequently didn't).

✅ Regressions all clean on the refactored build

N6 framework-aware warning: post-message text warns on WinUI/XAML, silent on Electron (and types fine) — both correct.
SystemKeyGuard still hard-blocks ctrl+alt+del / alt+f4 / win+l via send-input (exit 1) on both.
text= per-token escape intact (text=the text=end → "the end"); bare enter still fires the key.
drag --hold-ms/--dwell-ms parse fine.

Verdict (round 5)

This closes the loop on the round-4 review — F1, F2, F3 are all addressed, --verbatim lands the ergonomics ask, and I found no new defects this round. Nicely done. The only thing left from my notes is philosophical (the text=/--verbatim "type a string" ergonomics are now solved two ways; the default bare-text path is still key-interpreted, which is the right call given --verbatim exists and is documented).

One thing I could not stage live: the no_interactive_desktop half of F1 (click/hover returning the clean locked-desktop error instead of round-4's internal_error) needs a locked session to demonstrate — verified in code (the guard now runs before the SendInput call) and by the consistent wiring across all five verbs.

_{Live testing on WinUI 3 + Electron/Chromium chaos harnesses (target moving/resizing every ~150 ms); both removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-30. 🤖}

azchohfi and others added 4 commits June 22, 2026 13:11

azchohfi requested a review from Copilot June 22, 2026 22:30

Copilot started reviewing on behalf of azchohfi June 22, 2026 22:32 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

Comment thread src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs

Comment thread src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs Outdated

azchohfi requested a review from Copilot June 23, 2026 00:00

Copilot started reviewing on behalf of azchohfi June 23, 2026 00:00 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Comment thread src/winapp-CLI/WinApp.Cli/Commands/UiCommand.cs

azchohfi marked this pull request as ready for review June 23, 2026 00:09

azchohfi and others added 2 commits June 22, 2026 17:18

Merge branch 'main' into azchohfi-ui-input-verbs

9724f38

nmetulev added the DO-NOT-MERGE label Jun 23, 2026

nmetulev requested changes Jun 23, 2026

View reviewed changes

Comment thread .claude/agents/winapp.md Outdated

azchohfi and others added 2 commits June 23, 2026 15:35

azchohfi and others added 3 commits June 23, 2026 16:37

azchohfi removed the DO-NOT-MERGE label Jun 24, 2026

azchohfi requested a review from nmetulev June 24, 2026 00:18

nmetulev requested changes Jun 24, 2026

View reviewed changes

azchohfi and others added 2 commits June 24, 2026 11:00

azchohfi requested review from Copilot and nmetulev June 24, 2026 20:53

Copilot started reviewing on behalf of azchohfi June 24, 2026 20:54 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Comment thread scripts/test-e2e-winui-ui.ps1 Outdated

azchohfi mentioned this pull request Jun 24, 2026

Remove InputInjector workaround once winappCli #587 (send-keys/drag/scroll verbs) ships microsoft/microsoft-ui-reactor#652

Open

7 tasks

Merge branch 'main' into azchohfi-ui-input-verbs

d993de2

nmetulev and others added 3 commits June 26, 2026 10:46

Merge branch 'main' into azchohfi-ui-input-verbs

19af75a

Merge branch 'main' into azchohfi-ui-input-verbs

5a6ed0e

azchohfi requested a review from Copilot June 29, 2026 23:05

Copilot started reviewing on behalf of azchohfi June 29, 2026 23:05 View session

Copilot AI reviewed Jun 29, 2026

nmetulev approved these changes Jun 30, 2026

View reviewed changes

nmetulev merged commit 89def9f into main Jun 30, 2026
23 checks passed

nmetulev deleted the azchohfi-ui-input-verbs branch June 30, 2026 21:32

Uh oh!

Conversation

azchohfi commented Jun 22, 2026

Summary

winapp ui send-keys (#562)

winapp ui drag + ui scroll --wheel (#498)

Implementation notes

Testing

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Build Metrics Report

Binary Sizes

Test Results

Test Coverage

CLI Startup Time

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

nmetulev commented Jun 23, 2026

PR Review — azchohfi-ui-input-verbs vs origin/main (7 commits, 27 files, +2,008 / −26)

Summary

Coverage

Findings

Details

H1 — src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:86-113 · correctness + cli-ux · Multi-model: confirmed

H2 — src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:142-151 · correctness + security · Multi-model: confirmed

H3 — src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77,143-179 · correctness + security · Multi-model: new finding

M1 — src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs:85-130 · cli-ux

M2 — src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134 · security

M3 — src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134 · test-coverage

M4 — src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:91-124 · correctness

M5 — src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:119-145 · test-coverage

M6 — src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:58-69 · test-coverage

L1 — src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77 · test-coverage

L2 — src/winapp-npm/src/winapp-commands.ts:1003-1010 · packaging

Disputed / dropped by cross-model

Uh oh!

nmetulev commented Jun 23, 2026

Hands-on test results — ui send-keys / ui drag / scroll --wheel

TL;DR

✅ What works (verified)

🐞 Confirmed bugs & usability issues

1. send-keys "Hello world" types Helloworld — space dropped (confirms static-review H1) · High

2. Default post-message transport silently no-ops typed text on WinUI 3 · High / Medium

3. Accelerators / combos don't fire via post-message · Medium

4. Cannot type text containing + (confirms static-review M4) · Medium

5. scroll accepts conflicting modes (confirms static-review M1) · Medium

6. Minor — inconsistent error formatting

🔐 Security assessment (the headline ask: Ctrl+Alt+Del & system inputs)

Ctrl+Alt+Del — safe, cannot be synthesized ✅

No blocklist for any other system combo ⚠️

Why this is moderate rather than critical

Recommended hardening

Suggested test additions for the PR

Uh oh!

Uh oh!

azchohfi commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixed

Intentionally left (open question)

Uh oh!

nmetulev commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-test of ui send-keys / ui drag / scroll --wheel

✅ Round-1 findings — all addressed, and confirmed live end-to-end

🐞 Findings

N1 · send-keys literal text mis-fires on any word that is a key name — High (usability) — confirmed live

N2 · Literal text can't contain consecutive/edge whitespace — Low–Med — confirmed live

N5 · gesture verbs silently miss a moving/animating target (report success anyway) — Medium, NEW — the "changing size between checks" answer

`winapp ui send-keys` (#562)

`winapp ui drag` + `ui scroll --wheel` (#498)

github-actions Bot commented Jun 22, 2026 •

edited

Loading

PR Review — `azchohfi-ui-input-verbs` vs `origin/main` (7 commits, 27 files, +2,008 / −26)

H1 — `src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:86-113` · correctness + cli-ux · Multi-model: confirmed

H2 — `src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:142-151` · correctness + security · Multi-model: confirmed

H3 — `src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77,143-179` · correctness + security · Multi-model: new finding

M1 — `src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs:85-130` · cli-ux

M2 — `src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134` · security

M3 — `src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134` · test-coverage

M4 — `src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:91-124` · correctness

M5 — `src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:119-145` · test-coverage

M6 — `src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:58-69` · test-coverage

L1 — `src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77` · test-coverage

L2 — `src/winapp-npm/src/winapp-commands.ts:1003-1010` · packaging

Hands-on test results — `ui send-keys` / `ui drag` / `scroll --wheel`

1. `send-keys "Hello world"` types `Helloworld` — space dropped (confirms static-review H1) · High

2. Default `post-message` transport silently no-ops typed text on WinUI 3 · High / Medium

3. Accelerators / combos don't fire via `post-message` · Medium

4. Cannot type text containing `+` (confirms static-review M4) · Medium

5. `scroll` accepts conflicting modes (confirms static-review M1) · Medium

azchohfi commented Jun 23, 2026 •

edited

Loading

nmetulev commented Jun 24, 2026 •

edited

Loading

Re-test of `ui send-keys` / `ui drag` / `scroll --wheel`

N1 · `send-keys` literal text mis-fires on any word that is a key name — High (usability) — confirmed live

Round 4 — updated PR (`65c4c93`): fixes verified + new findings

azchohfi commented Jun 26, 2026 •

edited

Loading

Self-review hardening round (commit `24d5575`)