Skip to content

Add ui send-keys (#562) and ui drag + scroll --wheel (#498) input-injection verbs#587

Merged
nmetulev merged 22 commits into
mainfrom
azchohfi-ui-input-verbs
Jun 30, 2026
Merged

Add ui send-keys (#562) and ui drag + scroll --wheel (#498) input-injection verbs#587
nmetulev merged 22 commits into
mainfrom
azchohfi-ui-input-verbs

Conversation

@azchohfi

Copy link
Copy Markdown
Collaborator

Summary

Implements the two missing input-injection features in winapp ui, the only blockers for migrating the microsoft-ui-reactor WinUI 3 E2E suite off WinAppDriver. Before this, no ui verb could synthesize keyboard input or perform mouse drag/wheel gestures.

Two focused commits, one per verb:

winapp ui send-keys (#562)

Synthetic keyboard input — the keyboard counterpart to click.

  • Named keys (down, enter, tab, f5), whitespace-separated sequences (down down enter), modifier combos (ctrl+shift+t), literal text (hello), and raw virtual keys (vk=0xNN).
  • Two transports via --via: post-message (default — HWND-targeted, bypasses UIPI) and send-input (OS-wide).
  • --target <selector> focuses an element (UIA) before sending. Standard -w/-a/--json.
  • Documented hard limits: PostMessage can't fire WH_KEYBOARD_LL global hotkeys; SendInput is UIPI-blocked elevated → AppX.

winapp ui drag + ui scroll --wheel (#498)

  • winapp ui drag <selector> <fromX,fromY> <toX,toY> — press, move in steps (realistic WM_MOUSEMOVE stream), release. Coordinates are x,y offsets from the element's top-left corner. --right for right-button drag.
  • ui scroll --wheel <delta> — synthesizes real mouse-wheel input (120 = one notch) for wheel handlers (zoom/custom scroll) that ScrollPattern can't exercise.
  • Mouse drag/scroll-wheel were prioritized per the issue; hover/scroll already existed.

Implementation notes

  • Mirrors the existing click/hover verb conventions: command + nested DI handler, source-generated JSON result models, UiErrors/UiJsonError envelopes, SetForegroundWindow + SendInput (virtual-desktop normalized) for mouse, PostMessage/SendInput for keys.
  • New testable abstractions: IKeyboardInput (PostMessage + SendInput) and IMouseInput.Drag/ScrollWheel, with FakeKeyboardInput/FakeMouseInput recording calls.
  • Docs updated (docs/ui-automation.md, skill fragment) and cli-schema.json + plugin/Claude skills + npm TS SDK wrappers regenerated.

Testing

  • dotnet build clean; full unit suite 1340 passed, 1 skipped (the only excluded tests are the pre-existing E2E_Node* env tests, unrelated).
  • 26 new send-keys tests (parser + command) and 8 new drag/scroll-wheel tests; coordinate math, JSON shape, transports, and error paths all covered.
  • PostMessage keyboard transport verified live end-to-end against a WinForms TextBox (literal text, Enter→newline, vk=0xNN translation). Live SendInput-based drag verification was blocked only by the machine being on the Windows lock screen (the lock screen intercepts injected input); the drag path is identical to the shipped click/hover SendInput code and the cursor was confirmed to move to the correct physical coordinates.

Closes nothing — issues #562 and #498 are intentionally left open.

azchohfi and others added 4 commits June 22, 2026 13:11
Introduce winapp ui send-keys to synthesize keyboard input, the
keyboard counterpart to click. Supports named keys (down, enter,
tab, f5), whitespace-separated sequences, modifier combos
(ctrl+shift+t), literal text, and raw virtual keys (vk=0xNN).

Two transports via --via: post-message (default, HWND-targeted,
bypasses UIPI) and send-input (OS-wide). --target focuses an element
via UIA before sending. Standard -w/-a/--json options.

- KeyStringParser: token grammar to KeyAction list
- IKeyboardInput/RealKeyboardInput/KeyboardInput: PostMessage +
  SendInput transports
- UiSendKeysCommand wired into DI, UiCommand, HostBuilderExtensions
- UiSendKeysResult JSON model
- Unit tests (parser + command) and FakeKeyboardInput
- Docs, skill fragment, regenerated cli-schema + npm SDK wrapper

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add `winapp ui drag <selector> <fromX,fromY> <toX,toY>` to synthesize a
mouse drag: press at a point inside an element, move to another point in
small steps (so the app sees a realistic WM_MOUSEMOVE stream), then
release. Coordinates are x,y offsets from the element's top-left corner.
Use --right for a right-button drag. Useful for reorder/resize handles,
sliders, canvas drawing, and drag-and-drop.

Also add `--wheel <delta>` to `ui scroll` to synthesize real mouse-wheel
input over an element (120 = one notch), for testing wheel handlers
(zoom, custom scroll) that ScrollPattern can't exercise.

- Extend IMouseInput/RealMouseInput/MouseInput with Drag + ScrollWheel
  (SendInput, virtual-desktop normalized like the existing click/hover)
- UiDragCommand wired into DI, UiCommand, HostBuilderExtensions
- UiDragResult JSON model; UiScrollResult gains a wheel field
- Unit tests for drag + scroll-wheel; FakeMouseInput records both
- Docs, skill fragment, regenerated cli-schema + npm SDK wrappers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The SendInput transport previously typed literal text via KEYEVENTF_UNICODE
packets, which deliver the character (WM_CHAR -> TextChanged) but raise KeyDown
with VK_PACKET rather than the real key. Downstream WinUI 3/WPF handlers that
key off KeyDown with the correct virtual key did not see a faithful keystroke.

Map each character to its virtual key (plus Shift) via VkKeyScan and emit real
key-down/up events so the target sees a genuine KeyDown with the correct VK and
the OS composes the matching WM_CHAR (TextChanged) - one full keystroke per
character. Fall back to a Unicode packet for characters not reachable on the
current layout or requiring Ctrl/AltGr, so the exact character still lands.

PostMessage typed text keeps WM_CHAR (correct text + TextChanged across
integrity levels, no per-char KeyDown); docs clarify when to use which
transport for per-keystroke fidelity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…help (#562)

The --via option and command help now state that typed text raises a real
per-character KeyDown only on send-input (post-message raises TextChanged but
not a per-character KeyDown), and that named keys/combos raise KeyDown on both.
Recommends --via send-input for per-keystroke KeyDown on a WinUI 3/WPF TextBox.
Regenerated cli-schema.json and the ui-automation skill from the CLI help.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Build Metrics Report

Binary Sizes

Artifact Baseline Current Delta
CLI (ARM64) 31.57 MB 31.71 MB 📈 +142.5 KB (+0.44%)
CLI (x64) 31.90 MB 32.04 MB 📈 +141.0 KB (+0.43%)
MSIX (ARM64) 13.25 MB 13.30 MB 📈 +50.0 KB (+0.37%)
MSIX (x64) 14.09 MB 14.15 MB 📈 +64.0 KB (+0.44%)
NPM Package 27.65 MB 27.77 MB 📈 +120.6 KB (+0.43%)
NuGet Package 27.68 MB 27.79 MB 📈 +118.5 KB (+0.42%)
VS Code Extension 20.47 MB 20.60 MB 📈 +127.1 KB (+0.61%)

Test Results

1545 passed, 1 skipped out of 1546 tests in 428.7s (+221 tests, +73.6s vs. baseline)

Test Coverage

18.1% line coverage, 37.3% branch coverage · ✅ +0.7% vs. baseline

CLI Startup Time

41ms median (x64, winapp --version) · ✅ no change vs. baseline


Updated 2026-06-29 23:15:26 UTC · commit 24d5575 · workflow run

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds three input-injection capabilities to winapp ui, the keyboard/mouse gestures needed to migrate a WinUI 3 E2E suite off WinAppDriver: a new ui send-keys verb (synthetic keyboard input via PostMessage or SendInput), a new ui drag verb (button-press → stepped move → release), and a --wheel option on ui scroll (real mouse-wheel input). The implementation cleanly follows the existing click/hover conventions: command + nested DI handler, source-generated JSON result models, UiErrors/UiJsonError envelopes, and new testable abstractions (IKeyboardInput, IMouseInput.Drag/ScrollWheel) with Real* production and Fake* test implementations. Docs, the CLI schema, the Copilot/Claude skills, and the npm TS SDK are regenerated accordingly.

Changes:

  • New UiSendKeysCommand + KeyStringParser/KeyboardInput (named keys, sequences, modifier combos, vk=0xNN, literal text; --via post-message|send-input, --target focus).
  • New UiDragCommand + MouseInput.Drag; new --wheel on UiScrollCommand + MouseInput.ScrollWheel; DI/registration, JSON result models, and NativeMethods.txt P/Invoke additions.
  • 34 new unit tests (parser + commands), plus regenerated docs/schema/skills/npm wrappers.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs New send-keys command + handler, transport parsing, target focus.
src/winapp-CLI/WinApp.Cli/Commands/UiDragCommand.cs New drag command + handler with element-relative point parsing.
src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs Adds --wheel option and SendInput wheel path; reports element hwnd.
src/winapp-CLI/WinApp.Cli/Commands/UiCommand.cs Registers drag/send-keys subcommands.
src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs New key-string grammar parser (chords, named keys, vk=, text).
src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs PostMessage/SendInput keyboard synthesis with documented limits.
src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs Adds Drag and ScrollWheel; repeats the SendInput-failure guard (nit).
src/winapp-CLI/WinApp.Cli/Helpers/I{Keyboard,Mouse}Input.cs, Real*, UiJsonContext.cs Interfaces, prod impls, new JSON result models.
src/winapp-CLI/WinApp.Cli/Helpers/HostBuilderExtensions.cs DI registration of IKeyboardInput and the two new handlers.
src/winapp-CLI/WinApp.Cli/NativeMethods.txt Adds PostMessage/MapVirtualKey/VkKeyScan + WM_*/VIRTUAL_KEY.
src/winapp-CLI/WinApp.Cli.Tests/* KeyStringParserTests, send-keys/drag/wheel command tests, FakeKeyboardInput.
src/winapp-npm/src/winapp-commands.ts Regenerated uiDrag/uiSendKeys wrappers + wheel on uiScroll.
docs/ui-automation.md, docs/fragments/skills/.../ui-automation.md, .github/plugin/.../SKILL.md, .claude/.../SKILL.md, docs/cli-schema.json, scripts/generate-llm-docs.ps1 Regenerated/updated docs, schema, skills, and skill→command map.

Note on doc sync (not commentable inline — these files are not in the diff): two hand-written command lists were not updated for the new verbs — docs/usage.md (### ui "Commands", lines 1076-1091) and .github/plugin/agents/winapp.agent.md ("Key subcommands", lines 207-221). Both enumerate click/hover/etc. by hand and should gain drag, send-keys, and the scroll --wheel option to stay in sync.

Inline findings: one test_coverage nit (the new --wheel zero-size guard is untested while the analogous hover/click guards are covered) and one maintainability nit (the SendInput sent == 0 guard is now duplicated four times in MouseInput.cs).


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs
Comment thread src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs Outdated
…size path (#587)

- Extract MouseInput.SendInputs(Span<INPUT>) helper so the SendInput
  zero-return / 'window may be elevated' guard lives in one place instead
  of being duplicated across SendMove/SendClick/SendButton/ScrollWheel.
- Add Scroll_Wheel_ZeroSizeElement_ReturnsError covering the new --wheel
  CodeZeroSize guard (exit 1, no ScrollWheel call), matching the
  Hover_ZeroSizeElement_ReturnsError coverage convention.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Comment thread src/winapp-CLI/WinApp.Cli/Commands/UiCommand.cs
@azchohfi azchohfi marked this pull request as ready for review June 23, 2026 00:09
azchohfi and others added 2 commits June 22, 2026 17:18
The auto-generated schema/skill docs already cover the new ui drag and
send-keys verbs, but two hand-written command lists were missed:
- docs/usage.md (ui Commands list)
- .github/plugin/agents/winapp.agent.md (Key subcommands), mirrored to
  .claude/agents/winapp.md via sync-claude-plugin.ps1

Add drag and send-keys entries matching the existing sibling-verb style in
each list. scroll --wheel is already covered by the existing scroll entry,
so no separate entry was added.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nmetulev

Copy link
Copy Markdown
Member

🤖 AI-generated review — produced by the pr-review skill (8 parallel review dimensions + a cross-model verification pass on GPT-5.4, orchestrated by Claude Opus 4.8). This is automated analysis, not a human review: findings may contain false positives, and nothing here has been built or tested. Treat it as a checklist of things to verify, not a merge gate.

PR Review — azchohfi-ui-input-verbs vs origin/main (7 commits, 27 files, +2,008 / −26)

Summary

Critical: 0 High: 3 Medium: 6 Low: 2

Nice work adding a testability seam (IKeyboardInput/IMouseInput + fakes) — alternative-solution came back clean and the design is sound. The Highs cluster around two real themes in the synthetic-input layer: (1) the key-string parser splits literal text on whitespace, so send-keys "Hello world" types Helloworld; and (2) both the keyboard and mouse SendInput paths treat a partial write (sent < length) as success, which can strand a held modifier or mouse button. The cross-model pass confirmed H1/H2, dropped a docs finding (the key grammar is in fact documented), and surfaced the mouse-path variant of the short-write bug as a new High.

Coverage

Dimension Result
security ⚠ 2 findings (1 folded into H2)
correctness ⚠ 4 findings
cli-ux ⚠ 2 findings (1 folded into H1)
alternative-solution ✓ clean
test-coverage ⚠ 4 findings
docs-and-samples ✓ clean (1 finding disputed & dropped by cross-model)
packaging ⚠ 1 finding
multi-model 2/3 highs confirmed · 1 disputed/dropped · +1 new high

Findings

ID File:lines Domain Finding
H1 src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:86-113 correctness, cli-ux Literal text split on whitespace → send-keys "Hello world" types Helloworld; documented example is broken
H2 src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:142-151 correctness, security Keyboard SendInput short write (sent < length) treated as success → stuck modifier
H3 src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77,143-179 correctness, security Mouse SendInputs short write treated as success + no finally cleanup → stuck mouse button (new, found by cross-model; absorbs the drag-cleanup finding)
M1 src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs:85-130 cli-ux scroll accepts conflicting --direction/--to/--wheel and silently prefers --wheel
M2 src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134 security --via send-input doesn't verify focus landed on the target before sending → may inject into the wrong window
M3 src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134 test-coverage send-keys/drag/scroll --wheel not exercised by the WinUI e2e script
M4 src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:91-124 correctness Any token containing + is parsed as a chord → realistic literals like C++ / a+b throw
M5 src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:119-145 test-coverage Parser negative/edge coverage shallow (malformed combos, {} literals, Unicode)
M6 src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:58-69 test-coverage Modifier down/up pairing and release-on-failure (stuck-key) untested
L1 src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77 test-coverage Drag interpolation / button-event sequencing not covered
L2 src/winapp-npm/src/winapp-commands.ts:1003-1010 packaging Generated npm wrapper descriptions stale vs cli-schema.json

Details

H1 — src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:86-113 · correctness + cli-ux · Multi-model: confirmed

  • Severity: high · Confidence: high
  • Finding: Literal text containing spaces is split into separate TextInput actions, so the space is lost — send-keys "Hello world" types Helloworld.
  • Evidence: var tokens = keys.Split((char[]?)null, StringSplitOptions.RemoveEmptyEntries); (87) → each token falls through to actions.Add(new TextInput(token)); (113); the space is never reintroduced when sending chars (KeyboardInput.cs:73-76, 128-132). The documented example winapp ui send-keys "Hello world" (docs/ui-automation.md:289) is therefore broken. (Unknown tokens being typed as literal text is intentional/documented; the whitespace loss is the bug.)
  • Recommendation: Preserve whitespace within literal runs, or add an explicit literal path (e.g. --text <string>) that types the string verbatim. Fix or remove the broken "Hello world" doc example to match the chosen behavior.

H2 — src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:142-151 · correctness + security · Multi-model: confirmed

  • Severity: high · Confidence: high
  • Finding: A partial SendInput write is treated as success, which can leave a modifier-down event without its matching key-up (a "stuck modifier" that affects the user's whole session).
  • Evidence: var sent = PInvoke.SendInput((uint)array.Length, pInputs, sizeof(INPUT)); (145) then if (sent == 0) (146) — only zero is treated as failure. The batch can contain modifier-down, key-down/up, and modifier-up events (KeyboardInput.cs:113-125), so sent < array.Length is silently accepted with no retry or best-effort release.
  • Recommendation: Treat sent != array.Length as failure, and in a finally/cleanup path issue best-effort key-up for any modifiers pressed in the batch.

H3 — src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77,143-179 · correctness + security · Multi-model: new finding

  • Severity: high · Confidence: high
  • Finding: The mouse path has the same defect: SendInputs accepts short writes as success, and Drag presses the button via separate calls with no finally cleanup — so a mid-gesture failure can leave a mouse button logically held down.
  • Evidence: Drag does SendButton(downFlag) (62) … SendButton(upFlag) (76) as separate calls with no try/finally; SendClick builds a two-event down/up batch (145-156); but SendInputs only checks if (sent == 0) (172-173). A short write strands the button-down. (This was independently flagged by the correctness pass for the drag-cleanup angle, MouseInput.cs:62-76.)
  • Recommendation: Require sent == inputs.Length in SendInputs, and wrap click/drag/right-drag gestures in try/finally that always attempts the button-up. Consider extracting the shared "send N inputs, verify count, release-on-failure" logic so keyboard (H2) and mouse use one correct implementation.

M1 — src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs:85-130 · cli-ux

  • Severity: medium · Confidence: high
  • Finding: ui scroll accepts mutually conflicting modes and silently prefers --wheel, so --direction/--to are ignored without warning.
  • Evidence: Validation only rejects when all three are null (85); the handler branches on wheel is int delta (107) and only uses direction/to in the else (130).
  • Recommendation: Require exactly one of --direction / --to / --wheel; on conflict, return a non-zero invalid-arguments error naming the conflicting flags.

M2 — src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134 · security

  • Severity: medium · Confidence: high
  • Finding: With --via send-input, focus success isn't verified immediately before sending, so keystrokes can land in whatever window is actually foreground (a different app, a UAC prompt, etc.).
  • Evidence: Lines 126-134 call SetForegroundWindow, wait, then send input without checking GetForegroundWindow against the resolved target.
  • Recommendation: For send-input, verify the foreground window belongs to the resolved target HWND/process right before sending and fail (non-zero) otherwise.

M3 — src/winapp-CLI/WinApp.Cli/Commands/UiSendKeysCommand.cs:126-134 · test-coverage

  • Severity: medium · Confidence: high
  • Finding: The new real-input verbs aren't in the WinUI e2e script that claims to cover all ui commands.
  • Evidence: scripts/test-e2e-winui-ui.ps1 Phase 3 exercises click/hover/scroll-pattern only — no send-keys, drag, or scroll --wheel. (The scroll --wheel zero-size unit path is covered.)
  • Recommendation: Add e2e steps for send-keys, drag, and scroll --wheel against the WinUI sample.

M4 — src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:91-124 · correctness

  • Severity: medium · Confidence: high
  • Finding: Any multi-character token containing + is treated as a chord, so realistic literals like C++ or a+b throw instead of typing.
  • Evidence: if (token.Contains('+') && token.Length > 1) (92) then throw new FormatException($"Invalid key combo '{token}'..."); (124).
  • Recommendation: Only treat a token as a chord when the leading segments are valid modifiers; provide an escape/literal path for +.

M5 — src/winapp-CLI/WinApp.Cli/Helpers/KeyStringParser.cs:119-145 · test-coverage

  • Severity: medium · Confidence: high
  • Finding: Parser negative/edge coverage is shallow.
  • Evidence: Tests cover happy paths plus empty input, invalid vk=, and unknown modifier, but not malformed combos (ctrl+, +a, ctrl++a, ctrl+bogus), brace-like/empty {} literals, or Unicode tokens (KeyStringParserTests.cs).
  • Recommendation: Add [DataRow]-driven tests for malformed combos and the intended literal behavior of brace/+/Unicode inputs — these would directly pin H1 and M4.

M6 — src/winapp-CLI/WinApp.Cli/Helpers/KeyboardInput.cs:58-69 · test-coverage

  • Severity: medium · Confidence: high
  • Finding: Modifier down/up pairing and stuck-key-on-failure behavior are untested.
  • Evidence: Command tests use FakeKeyboardInput and assert transport/action counts, but nothing observes the KeyDown(main) → KeyUp(main) → KeyUp(modifier) order or simulates a failure after a modifier-down.
  • Recommendation: Add a keyboard-event sink (or fault-injection wrapper) so tests assert modifier release order and release-on-exception — the regression test for H2.

L1 — src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs:54-77 · test-coverage

  • Severity: low · Confidence: high
  • Finding: Drag interpolation and button-event sequencing aren't covered.
  • Evidence: UiCommandTests assert FakeMouseInput.Drag receives endpoints/right-button, but not the multi-step movement stream or down/up ordering implemented in MouseInput.
  • Recommendation: Add a low-level mouse-event sink test (or e2e drag scenario) validating the interpolated path and button ordering — the regression test for H3.

L2 — src/winapp-npm/src/winapp-commands.ts:1003-1010 · packaging

  • Severity: low · Confidence: high
  • Finding: The autogenerated npm command surface is stale vs docs/cli-schema.json for send-keys descriptions.
  • Evidence: winapp-commands.ts lacks the newer per-keystroke/TextChanged wording present at docs/cli-schema.json:3308-3309; npm run generate-commands:check reports out of date.
  • Recommendation: Re-run npm run generate-commands from the updated schema and commit the regenerated winapp-commands.ts.

Disputed / dropped by cross-model

  • A docs finding claimed the send-keys KEY-STRING grammar was undocumented. Dropped: docs/ui-automation.md:295-314 already has an explicit Key grammar block (named keys, sequences, modifier combos, literal text, raw VKs) plus transport/KeyDown behavior, mirrored in the skill fragment. The cross-model pass also noted that {KEYNAME}/{{/}} escaping isn't implemented in KeyStringParser.cs, so documenting it would be wrong. The only genuine doc issue is the broken "Hello world" example, folded into H1.

Generated by the winappcli pr-review skill · 8 specialist passes + GPT-5.4 cross-model verification · 2026-06-23. No code was built or executed; verify findings before acting. 🤖

@nmetulev

Copy link
Copy Markdown
Member

🤖 AI-generated hands-on test report — produced by the GitHub Copilot CLI agent (Claude Opus 4.8). This complements the earlier automated pr-review comment with empirical testing: I built the CLI from this branch, built a purpose-made WinUI 3 target app (with AutomationIds + a ground-truth event log capturing KeyDown / TextChanged / Char / slider value / scroll offset / clicks), and fired every new verb at it. Findings below were observed at runtime, not just read from the diff. Verify before acting.

Hands-on test results — ui send-keys / ui drag / scroll --wheel

Setup: Debug build of winapp.exe; packaged WinUI 3 InputTestApp as the injection target; results read back via winapp ui get-value <EventLog/InputBox/SliderValue/ScrollState/ButtonState>.

TL;DR

Mouse verbs (drag, scroll --wheel) and send-input text injection work well. The problems are concentrated in send-keys: the documented "Hello world" example is broken, and the default transport silently does nothing for typed text on WinUI 3 while reporting success. On security: Ctrl+Alt+Del cannot be synthesized (good), but there is no guard at all on other system combos via --via send-input.


✅ What works (verified)

Verb Result
drag DragSlider 225,24 400,24 slider value 50 → 91; reverse drag → 9. Offsets-from-top-left + screen mapping correct. --right and zero-size (start==end) work.
scroll ScrollList --wheel -360 / 360 offset 0 → 890 (delta sign correct, clamps at top).
scroll ScrollList --direction down / --to ScrollPattern paging works (offset → 441).
send-keys --via send-input typed text Unicode é€, symbols a1!, case-insensitive CTRL+A/DELETE, raw vk=0x0d, named keys/arrows — all correct, real per-char KeyDown+Char observed.
JSON output (--json) clean, parseable, no log lines mixed in.
Arg validation bad from/to points, missing args, invalid --via → exit 1 with actionable messages.

🐞 Confirmed bugs & usability issues

1. send-keys "Hello world" types Helloworld — space dropped (confirms static-review H1) · High

Ran send-keys "Hello world" --target InputBox --via send-input. Event log:

KeyDown H, e, l, l, o, w, o, r, l, d   (no space keystroke)  →  InputBox = "Helloworld"

Exit code 0, prints ✅ Sent keys "Hello world". The space is the token separator (keys.Split(null, RemoveEmptyEntries) in KeyStringParser.cs:87) and is never re-emitted. The documented example winapp ui send-keys "Hello world" (docs/ui-automation.md:289) does not work. The only way to type a space today is the space named token: send-keys "Hello space world" → correctly produced Hello world. Recommend a literal path (e.g. --text "<string>") and fixing the doc example.

2. Default post-message transport silently no-ops typed text on WinUI 3 · High / Medium

Ran send-keys "x y z" --target InputBox --via post-message (post-message is the default). Result: InputBox stayed empty, EventLog stayed empty — zero events — yet the CLI printed ✅ Sent keys. WM_CHAR posted to the WinUI host HWND is not turned into text by the XAML input pipeline. Since post-message is the default and WinUI 3 is winapp's primary framework, a user running the documented happy path (winapp ui send-keys "hello" -a MyApp) gets a false success with nothing typed. (Named keys do deliver KeyDown via post-message — see below — only WM_CHAR text is dropped.) Recommend: detect/report when post-message text delivery is unsupported, or default typed-text to send-input, and at minimum make the success message honest.

3. Accelerators / combos don't fire via post-message · Medium

send-keys "ctrl+t" (a KeyboardAccelerator) via post-message did not fire (click counter unchanged, nothing logged). Via send-input it fired (Accelerator Ctrl+T, counter incremented). Worth calling out explicitly in --via help that combos/accelerators require send-input. (Named non-modifier keys like down down up do log KeyDown via post-message, so behavior is inconsistent across token kinds.)

4. Cannot type text containing + (confirms static-review M4) · Medium

send-keys "C++"❌ Invalid key combo 'C++'. send-keys "a+b"❌ Unknown modifier 'a' in 'a+b'. Any token with + is forced through chord parsing, so realistic literals fail.

5. scroll accepts conflicting modes (confirms static-review M1) · Medium

scroll ScrollList --direction down --wheel 120 passes validation (no error) and silently prefers --wheel. Recommend requiring exactly one of --direction / --to / --wheel.

6. Minor — inconsistent error formatting

The "no scroll mode specified" error prints [ERROR] - Specify --direction... (raw logger format) while every other ui error uses the symbol. Cosmetic but visible.


🔐 Security assessment (the headline ask: Ctrl+Alt+Del & system inputs)

Ctrl+Alt+Del — safe, cannot be synthesized

Fired send-keys "ctrl+alt+del" --via send-input at the app. The Secure Attention Sequence did not trigger — no lock screen, no secure desktop, session stayed fully interactive. The three keys arrived at the app as ordinary KeyDowns (Control, Menu, Delete). This is a Windows kernel guarantee (SAS is unsynthesizable via SendInput), not something this tool implements — but the net result is that ctrl+alt+del is harmless here.

No blocklist for any other system combo ⚠️

The parser + injector accept and would inject every dangerous combo via --via send-input, with no warning, confirmation, or allowlist (verified each is accepted at parse stage):
win+l (lock workstation) · win+r (Run) · win+d (show desktop) · win+e · ctrl+shift+esc (Task Manager) · alt+tab · alt+f4 · ctrl+esc (Start) · printscreen · raw vk=0x5B (Win).
Via send-input these execute system-wide — e.g. win+l would lock the machine. I also confirmed alt+f4 via the (non-elevated, window-scoped) post-message transport closed the target window with no confirmation.

Why this is moderate rather than critical

  • The default transport is window-scoped and cannot trigger system hotkeys. win+l via post-message just posts harmless messages to the target window — the dangerous path requires the explicit, documented --via send-input.
  • UIPI already protects elevated targets: SendInput from a non-elevated CLI into a higher-integrity window fails, and the code surfaces a clear error. You can't drive an elevated app without elevating the CLI.
  • For an automation/test tool, synthesizing win+l etc. is partly by design (you may want to test lock/again behavior).

Recommended hardening

  1. Verify foreground before send-input (ties to static-review M2). send-keys/drag/scroll --wheel call SetForegroundWindow then sleep 100 ms and inject without checking it succeeded. SetForegroundWindow silently fails from a background process under the foreground-lock timeout, so an injected sequence — including a system combo — can land on whatever window is actually foreground, not the intended target. Check GetForegroundWindow() belongs to the resolved HWND/process and fail otherwise.
  2. Guard system-reserved combos. Warn (or require an explicit --allow-system-keys) when synthesizing win+*, ctrl+shift+esc, ctrl+alt+del via send-input.
  3. Fix the short-write handling (static-review H2/H3). Both keyboard and mouse paths treat a partial SendInput write as success with no release-on-failure. A partial send that strands Win/Ctrl/a mouse button down is materially more dangerous when those are system modifiers — combine the fix with best-effort key-up/button-up in a finally.

Suggested test additions for the PR

Hands-on testing performed against a packaged WinUI 3 harness on Windows; the harness was removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-23. 🤖

Comment thread .claude/agents/winapp.md Outdated
azchohfi and others added 2 commits June 23, 2026 15:35
Fixes for the pr-review findings on the UI input-injection verbs:

- H1: send-keys preserves whitespace in literal text - adjacent literal
  tokens coalesce into one TextInput so "Hello world" types verbatim.
- H2: SendInput (keyboard) requires full delivery; on a short/zero write
  it releases any held keys to avoid a stuck modifier, with distinct
  zero-vs-partial error messages.
- H3: mouse Drag releases the button in a finally block if a move/up
  throws; SendInputs requires full delivery (was sent==0).
- M1: scroll rejects conflicting --direction/--to/--wheel instead of
  silently preferring one.
- M2: --via send-input verifies the target owns the foreground window
  before injecting (new foreground_not_target error code).
- M3: e2e script now exercises send-keys (literal + named key) and
  scroll --wheel against the WinUI sample app.
- M4: a '+' token is only a chord when leading segments are real
  modifiers, so 'C++', 'a+b' type literally; 'ctrl+bogus' still errors.
- M5: expanded KeyStringParser unit tests (literals, coalescing, vk
  edge cases, modifier-led unknown main key).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… formatting

Follow-up to Nikola's empirical test report on the input verbs:

- #2/#3: send-keys via post-message warns when literal text is sent, since
  WinUI 3 / XAML drops posted WM_CHAR (typed text silently no-ops). Docs
  corrected to recommend --via send-input for typed text on WinUI/WPF.
- Hardening #1: extend the foreground-ownership guard (was send-keys only)
  to drag and scroll --wheel via a shared ForegroundGuard helper, so an
  OS-wide gesture can't land on the wrong window when SetForegroundWindow
  silently fails. Returns foreground_not_target.
- #6: the "no scroll mode specified" error now uses the same ❌ symbol as
  every other ui error (was raw [ERROR] format).

System-reserved-combo blocklist (report hardening #2) intentionally left
for discussion — it changes the tool contract and partly conflicts with
legitimate automation (testing win+l etc.).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi

azchohfi commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks @nmetulev — really thorough, both the static review and the hands-on test pass. I pushed two commits (7642645, b511ed7) addressing the findings. Summary:

Fixed

Finding Fix
H1 / hands-on #1"Hello world"Helloworld Parser now coalesces adjacent literal tokens into one TextInput with the space preserved, so quoted phrases type verbatim. down enter world stays distinct. Regression test added.
H2 — keyboard short/zero SendInput write reported as success SendViaSendInput now requires sent == length; on a short/zero write it releases any held keys (reverse-order key-up) so a modifier can't stick, with distinct zero-vs-partial messages.
H3 — mouse short-write + no release-on-throw Drag releases the button in a finally if a move/up throws; SendInputs requires full delivery (was sent == 0).
M1 / hands-on #5scroll accepts conflicting modes Rejects more than one of --direction/--to/--wheel with invalid_arguments. Test added.
M2 + hands-on hardening #1 — no foreground verify before send-input Added a shared ForegroundGuard; send-keys --via send-input, drag, and scroll --wheel now verify the target owns the foreground window and fail with foreground_not_target rather than acting on the wrong window.
M4 / hands-on #4 — can't type tokens containing + A + token is only a chord when its leading segments are real modifiers, so C++, a+b, 1+1 type literally; ctrl+bogus still errors. Tests added.
M5 — parser edge tests Expanded KeyStringParserTests (literals, coalescing, distinct runs, vk valid/invalid/zero/overflow, modifier-led unknown main key).
M3 — e2e coverage test-e2e-winui-ui.ps1 now exercises send-keys (literal text with spaces + named key) and scroll --wheel against the WinUI sample.
hands-on #2/#3 — post-message silently no-ops typed text on WinUI 3 send-keys --via post-message now warns when literal text is sent (XAML drops posted WM_CHAR); docs corrected to recommend --via send-input for typed text on WinUI/WPF.
hands-on #6 — inconsistent error formatting The "no scroll mode" error now uses the same ❌ symbol as every other ui error.

Intentionally left (open question)

  • Hardening Bump tmp from 0.2.3 to 0.2.4 in /sample #2 — system-reserved-combo blocklist / --allow-system-keys. I left this out for now because it changes the tool's contract and partly conflicts with legitimate automation (you might want to test win+l / lock behavior). The dangerous path already requires the explicit --via send-input, the default transport is window-scoped, and UIPI protects elevated targets. Happy to add a guard (warn-only, or opt-in --allow-system-keys) if you'd prefer — your call on the policy.
  • M6 / L1 — low-level keyboard-release-order and drag-interpolation unit tests. The release-order (H2) and interpolation (L1/H3) logic lives in the static KeyboardInput/MouseInput helpers that call PInvoke.SendInput directly; the IKeyboardInput/IMouseInput fakes only observe KeyAction lists / drag coords, not the synthesized INPUT array, so these can't be asserted without a SendInput injection seam. I covered them at the e2e layer (the new send-keys/scroll steps) instead of adding a seam purely for the test — let me know if you'd rather I introduce the seam.

All 95 parser/command tests pass; the 5 pre-existing EndToEndTests failures are environmental (the node wrapper needs npm run build in src/winapp-npm) and unrelated to these changes.

azchohfi and others added 3 commits June 23, 2026 16:37
Drag (inline review thread): support `drag <from> <to>` where each
endpoint is an element selector (resolves to its center) or app x,y
coordinates as reported by `ui inspect`, alongside the retained legacy
3-arg `drag <selector> <fromOffset> <toOffset>` form (disambiguated by
the third positional arg). Adds From/To to the JSON result.

Hardening #2 (system-reserved combos): send-keys --via send-input now
warns (advisory, still sends) before synthesizing win+*, ctrl+shift+esc,
ctrl+alt+del, alt+tab, alt+f4, ctrl+esc, lone win/printscreen, via a new
SystemKeyGuard helper. Bug #3: --via help now notes accelerators only
fire via send-input.

Tests: expand KeyStringParser coverage (modifier/key aliases, extended
flags, function keys, raw-vk, whitespace, coalescing edge cases); new
SystemKeyGuardTests; new 2-arg drag tests (selector/coords combinations,
error paths); send-keys system-combo warn-only test.

Docs: ui-automation.md, SKILL.md (x2), agent docs (x2), usage.md, and the
drag/send-keys blocks in cli-schema.json updated for both forms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…clearer legacy error

- SystemKeyGuard: warn on Alt+PrintScreen (active-window screenshot); add test row.
- Drag tests: add per-call FindSingleResults queue to the UIA fake so selector->selector
  resolves two distinct elements; assert both endpoint centers; add coords->selector and
  to-selector-not-found cases.
- UiDragCommand: legacy offset parse errors now explain that 3 args selects the legacy
  '<selector> <fromOffset> <toOffset>' form and 2 args is selector/coords -> selector/coords.
- docs fragment (SKILL source): sync dual-form drag + system-combo send-input note.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ardTests

CI builds with -warnaserror, so the inline 'new[] { ... }' constant arrays in
CollectionAssert.AreEqual were promoted from CA1861 warnings to errors. Switch to
element-wise Assert.AreEqual (count + indexer) which is equivalent and CA1861-clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi azchohfi requested a review from nmetulev June 24, 2026 00:18
Closes the three forward-compat gaps the microsoft-ui-reactor E2E harness hit
when mapping its temporary SendInput fallback onto the new input verbs:

- send-keys: add a `text=` token escape so a literal value that collides with a
  key/modifier name (enter, del, up, ctrl+a) is typed verbatim. Mirrors `vk=`,
  is checked first, and still coalesces with adjacent literal words.
- drag: add `--dwell-ms` (hold at the destination before releasing) so drop
  targets / merge overlays that arm from a sustained hover can latch before the
  button-up — required for tear-off/merge drags that a plain A->B drag misses.
- drag: add `--hold-ms` (hold at the press point before moving). With
  <from> == <to> this is a press-and-hold / long-press gesture.

Threads holdMs/dwellMs through IMouseInput.Drag -> MouseInput.Drag and the
UiDragResult JSON. 14 new unit tests (text= escape, hold/dwell flow, negative
validation, defaults). Regenerated cli-schema.json, the ui-automation SKILL,
and the npm command wrappers (the wrappers had drifted behind the schema and
are now back in sync); updated docs/ui-automation.md by hand.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nmetulev

nmetulev commented Jun 24, 2026

Copy link
Copy Markdown
Member

🤖 AI-generated hands-on test report — rounds 2–3 (consolidated) (GitHub Copilot CLI, Claude Opus 4.8). After the PR was updated (80ea1d3) I rebuilt the CLI, ran the expanded unit suite, and stress-tested the verbs against two purpose-built harnesses — a dynamic WinUI 3 app (chaos timer that moves/resizes a target every ~200 ms, plus zero-size / add-remove-element / window-resize controls) and a throwaway Electron/Chromium app. Covers code review, unit tests, and live SendInput delivery. (The test box was briefly locked mid-run, which itself produced a useful security confirmation — noted below.)

Re-test of ui send-keys / ui drag / scroll --wheel

✅ Round-1 findings — all addressed, and confirmed live end-to-end

Prev finding Fix Verified
H1 "Hello world"Helloworld CoalesceText re-joins adjacent literal tokens live: InputBox = Hello world (space preserved); unit Parse_QuotedPhrase_PreservesSpacesAsSingleLiteral
M4 C++ / a+b rejected TryParseChord only treats modifier-led tokens as chords live: "C++" types C++; a+b, 1+1=2, + parse as text; ctrl+bogus still rejected
M1 scroll --direction --wheel both accepted modeCount > 1 → error live: scroll --direction down --to bottom❌ Specify only one of …
M2 send-input lands on wrong window new ForegroundGuard on all 3 gesture verbs live both ways: refuses (exit 1 + JSON foreground_not_target) when not foreground; works once foregrounded
H2 keyboard short-write sent != length → release held keys + throw code KeyboardInput.SendViaSendInput + ReleaseHeldKeys
H3 drag button left down try/finally best-effort button-up code MouseInput.Drag
round-1 "false success" on post-message text warns before injecting code UiSendKeysCommand (but see N6 — the warning is now over-eager)
round-1 "no system-combo guard" new SystemKeyGuard advisory warning live: ctrl+alt+del / printscreen via send-input → ⚠ warning, still exit 0

Other live confirmations: drag thumb-grab moved a slider 50→89; scroll --wheel changed offset (sign correct); Unicode é€ lands verbatim. Unit tests: KeyStringParser 79 / SystemKeyGuard 28 / UiCommand 81 — all green. Zero-size → zero_size_element JSON error; missing element → "The UI may have changed … prefer targeting by AutomationId."


🐞 Findings

N1 · send-keys literal text mis-fires on any word that is a key name — High (usability) — confirmed live

Words that collide with key/modifier names fire keys instead of typing, with no escape syntax:

send-keys … InputBox got What happened
"the end" the end pressed End
"go home" go home pressed Home
"press down" press down pressed Down arrow
"Hi enter Bye" HiBye enter fired Enter between the words

Colliding words: space enter tab esc escape up down left right home end del delete ins insert pageup pagedown backspace return apps printscreen capslock f1..f16 + modifiers ctrl shift alt win cmd super meta menu control. So send-keys "<arbitrary prose>" is not reliable for typing text. Recommend a dedicated literal path, e.g. --text "<string>" (verbatim, no tokenization), and/or an escape ({end} = key vs end = word). This also fixes N2.

N2 · Literal text can't contain consecutive/edge whitespace — Low–Med — confirmed live

CoalesceText re-joins with a single space: "a b"a b, " pad "pad, tab → single space. Can't type double spaces, tabs, or indentation. (Naturally solved by --text.)

N5 · gesture verbs silently miss a moving/animating target (report success anyway) — Medium, NEW — the "changing size between checks" answer

With the chaos timer moving & resizing the target every ~200 ms:

Condition Clicks that actually landed
Chaos OFF (stable) 6 / 6
Chaos ON (moving) 0 / 12

Every one of the 12 misses still reported ✅ click … at (x,y) and exit 0. The verbs resolve element bounds, then inject at those screen coordinates after a settle/foreground delay (~50–150 ms click, ~500 ms full drag); if the element moves/resizes in that window the gesture hits the stale rectangle (empty space) and the element never receives it — yet the command reports success. Related: a drag starting off the grabbable part (slider track vs thumb) also no-ops while reporting . So click / drag / scroll --wheel give no signal that the gesture actually hit the target under shifting UI. Suggested: (a) re-read bounds immediately before the press / verify cursor-still-over-element; (b) optionally verify a post-condition (focus/value change) and return non-zero on miss; (c) prefer UIA-pattern verbs (invoke, set-value, scroll --direction/--to) for moving targets — those acted correctly throughout.

N6 · post-message-text behavior + warning are framework-dependent — Low–Med, NEW (from the Electron pass)

send-keys text via post-message is dropped by WinUI 3/XAML but consumed by Electron/Chromium (and typically Win32/WPF). Yet the CLI emits its warning unconditionally ("…may not be delivered to WinUI 3 / XAML apps (WM_CHAR is dropped)…") even when targeting Electron, where it works — so it false-alarms on non-XAML apps. Recommend making the warning framework-aware (only for XAML/WinUI targets) or softening it.

N3 · misleading "elevated" error when there is no input desktop — Low

On a locked/secure desktop, SendInput returns 0 and the older ui click/ui hover report "the target window may be elevated … run as administrator" — wrong cause. The new verbs avoid this (their ForegroundGuard fires first). Consider a "no interactive desktop (session locked?)" message and applying the foreground guard to click/hover too.

N4 · send-input / drag / scroll --wheel require an unlocked, foreground, interactive session — Informational (by design)

A consequence of the (correct) ForegroundGuard: these refuse unless the target is genuinely foreground, so they can't be driven from a background/headless/CI or locked context. The UIA-path verbs (set-value, invoke, scroll --direction/--to) remain headless-friendly. Worth a docs callout.


🔐 Security — re-assessment (substantially improved)

  • Ctrl+Alt+Del: unsynthesizable (kernel-protected SAS).
  • Locked / secure-desktop boundary — confirmed empirically: with the box locked (LogonUI on the secure desktop), GetForegroundWindow()=0 and SendInput=0 — a user-session process cannot inject into the lock screen (can't type a password into LogonUI). Strong OS-enforced boundary.
  • ForegroundGuard prevents the M2 "wrong window" multiplier (verified live, both directions).
  • SystemKeyGuard is advisory — warns, doesn't block — for win+<key>, lone win, ctrl+alt+del, ctrl+shift+esc, ctrl+esc, alt+tab, alt+esc, alt+f4, printscreen, alt+printscreen via send-input (verified live). Reasonable for an automation tool; gate behind --allow-system-keys if hard blocking is ever wanted.
  • Short-write handling: keyboard releases held keys before throwing; drag releases the button in finally — a partial SendInput can't strand a (possibly system) modifier/button.

Net: the round-1 dangerous-combo concern is now mitigated by layered guards (foreground verification + advisory warning + SAS/secure-desktop protection + short-write recovery) rather than a hard blocklist — a sound design.


🧩 Cross-framework — WinUI 3 vs Electron / Chromium

Behavior WinUI 3 / XAML Electron / Chromium
send-keys text via send-input ✓ types ✓ types
send-keys text via post-message dropped (WM_CHAR) works
H1 "Hello world" Hello world Hello world
M4 "C++" C++ C++
N1 "the end" "the" + End key identical
N2 "a b" a b identical
drag slider 50→89 50→81
scroll --wheel ✓ (0→200)
scroll --direction (UIA) ✓ (200→654)

Parser-level findings (H1/M4/N1/N2) are framework-independent. Electron targeting caveat: DOM elements are visible to ui inspect/search/--target/drag-scroll resolution only when renderer accessibility is active (I launched with --force-renderer-accessibility; Chromium also enables it on-demand when a UIA client attaches) — worth a docs note for Electron users.


🤖 Agent-ergonomics feedback (opportunities to improve)

Separate from correctness — candid notes from driving ui heavily as an automated agent (no eyes on the screen). The verbs are powerful and well-guarded, but the ergonomics currently assume a human who can see the result; for an agent that costs extra round-trips (tokens) and risks false conclusions.

  1. Self-verifying actions (highest impact). click/drag/scroll --wheel report /exit 0 from injecting at coordinates, not from the element receiving anything (see N5). With no eyes, my only recourse is a separate get-value/inspect call, so every logical action becomes resolve → act → verify (3 invocations). Returning post-action state in the JSON envelope and/or flipping exit code on no-effect would collapse that to one call and kill the silent-miss class.
  2. Foreground requirement is hard to satisfy headlessly. The ForegroundGuard is the right security call, but from a background/agent context the verbs just return foreground_not_target; I had to hand-roll AttachThreadInput + SPI_SETFOREGROUNDLOCKTIMEOUT=0 to test at all. A first-class --activate (reliably foreground + verify) and a loud doc note would save every agent from reinventing that.
  3. --text for literal typing (see N1/N2) — removes the single most likely way an LLM gets send-keys wrong (prose words firing keys).
  4. State readback hygiene. get-value silently falls back to the accessible Name when there's no value (an empty TextBox read back as "Input box" made me think it had content) — tag the source ({"value":"","source":"name"}). And the Auto-selected HWND … pass -w 393384 … banner prints to stdout interleaved with the value, which both wastes tokens and corrupts parsing — route advisory chatter to stderr and never mix it into a value payload. Auto scroll-into-view (or an explicit offscreen:true in resolve output) would also drop a mandatory extra call.
  5. Output / token hygiene. Every inspect/search appends a "Found N elements. Use the first token as selector, e.g. …" footer — useful once for a human, pure overhead per agent call; suppress under --json/a terse mode. Make --json purely machine-readable everywhere (structured errors and warnings — extend the great foreground_not_target/zero_size_element codes to all outcomes; keep ❌⚠✅ prose for human mode only). A batch/script mode (run several verbs against one resolved session) would amortize process-startup + re-resolution — the biggest token win. And drop the unconditional post-message warning on non-XAML targets (N6) so warnings stay trustworthy.

Worth keeping as-is: the error copy is genuinely excellent for agents ("The UI may have changed … prefer targeting by AutomationId — these survive layout changes"), the structured error codes are exactly right, and the guard layering is coherent. The ask is mainly to make each call self-verifying and machine-clean so an agent doesn't reconstruct "did that work?" from extra inspections.

Verdict

The round-1 fixes hold up under live injection on both WinUI 3 and Electron. Open items, by impact: N1 (literal-text/key ambiguity → add --text), N5 (gesture silent-miss on moving UI, success-coded → outcome verification), N6 (post-message warning not framework-aware), N2 (whitespace doc note), and the minor N3/N4 polish.

Live testing on WinUI 3 (chaos/zero-size/resize harness) and an Electron/Chromium harness; both removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-23. 🤖

Comment thread .claude/agents/winapp.md Outdated
- `ui screenshot -a <app> [--output file.png] [--json] [--focus] [--capture-screen]` — capture window as PNG. Default uses Windows.Graphics.Capture (composited surface — preserves rounded corners and works while occluded), with PrintWindow as fallback. Use `--focus` to bring the window to the foreground first; use `--capture-screen` for popup overlays not owned by the target window.
- `ui invoke <selector> -a <app>` — activate element by slug or text search. Auto-walks to invokable ancestor for non-invokable elements.
- `ui hover <selector> -a <app> [--dwell-time <ms>]` — move mouse to element center to trigger tooltips, flyouts, and hover states. Use with `ui screenshot --capture-screen` to capture the result.
- `ui drag <from> <to> -a <app> [--right]` — press the mouse button at one point, move to another, and release (reorder, resize, sliders, drag-and-drop). Each of `<from>`/`<to>` is an element selector (drags from/to its center) or app coordinates `x,y` as reported by `ui inspect`. Legacy form `ui drag <selector> <fromX,fromY> <toX,toY>` is still accepted, where the two `x,y` are pixel offsets from the element's top-left corner.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no legacy form, it never shipped - remove that to keep it simple

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — there's no legacy form anymore. drag is now only drag <from> <to>, where each endpoint is an element selector (drags from/to its center) or app-relative x,y coords from ui inspect. I dropped the to-offset argument, the legacy code branch, and the ElementId field from the JSON result (only the legacy path ever set it), and cleaned up every reference — both agent files, both SKILL.md (+ the shared template), docs/ui-automation.md, the generated cli-schema.json/SKILL tables, and the npm command wrapper. The legacy drag tests were removed and the remaining drag tests converted to the two-arg form.

Comment thread .claude/skills/winapp-ui-automation/SKILL.md
# Raw app coordinates → app coordinates
winapp ui drag 120,200 480,200 -a myapp

# Legacy: offsets from the element's top-left corner

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, no such thing as legacy

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — there's no legacy form anymore. drag is now only drag <from> <to>, where each endpoint is an element selector (drags from/to its center) or app-relative x,y coords from ui inspect. I dropped the to-offset argument, the legacy code branch, and the ElementId field from the JSON result (only the legacy path ever set it), and cleaned up every reference — both agent files, both SKILL.md (+ the shared template), docs/ui-automation.md, the generated cli-schema.json/SKILL tables, and the npm command wrapper. The legacy drag tests were removed and the remaining drag tests converted to the two-arg form.

Comment thread docs/ui-automation.md Outdated
Press the mouse button at one point, move to another, then release. Two forms:

- **Preferred — `drag <from> <to>`:** each endpoint is either an **element selector** (drags from/to the element's center) or **app coordinates `x,y`** exactly as reported by `winapp ui inspect`. Mix and match freely (selector→selector, selector→coords, coords→coords).
- **Legacy — `drag <selector> <fromX,fromY> <toX,toY>`:** the two `x,y` values are pixel **offsets from the element's top-left corner**. Disambiguated by the third positional argument; existing scripts keep working unchanged.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too and many other places re legacy

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — there's no legacy form anymore. drag is now only drag <from> <to>, where each endpoint is an element selector (drags from/to its center) or app-relative x,y coords from ui inspect. I dropped the to-offset argument, the legacy code branch, and the ElementId field from the JSON result (only the legacy path ever set it), and cleaned up every reference — both agent files, both SKILL.md (+ the shared template), docs/ui-automation.md, the generated cli-schema.json/SKILL tables, and the npm command wrapper. The legacy drag tests were removed and the remaining drag tests converted to the two-arg form.

Comment thread docs/ui-automation.md Outdated
**Choosing a transport / known limits:**
- `post-message` is the default because it bypasses UIPI and doesn't depend on the window being foreground. Limits: it cannot trigger global hotkeys registered through `WH_KEYBOARD_LL` low-level hooks (those tap input upstream of any window queue), and apps that read raw key state via `GetAsyncKeyState` may not observe held modifiers. For classic Win32/WinForms apps whose controls are separate child windows, target the control's native window handle with `-w` (or `--target`) so keys reach the right control. WinUI 3 / WPF apps have a single window and route keys to the internally focused element, so targeting the top-level window works.
- `send-input` produces fully real input (modifiers visible to `GetAsyncKeyState`, fires low-level hooks) but goes to whatever window is foreground and is **blocked by UIPI when injecting from an elevated process into a lower-integrity (AppContainer/AppX) target**. If `send-input` reports a failure, the target is likely elevated or an AppX app — use `post-message`, or run the CLI at a matching integrity level. As a safety guard, `send-input` verifies the target window is actually in the foreground immediately before injecting and **fails (`foreground_not_target`) rather than typing into the wrong window** if focus could not be brought to it — focus or click the window first.
- **System-reserved combos** (`win+l`, `win+r`, `ctrl+shift+esc`, `ctrl+alt+del`, `alt+tab`, `alt+f4`, `ctrl+esc`, lone `win`/`printscreen`, …) act on the OS/shell rather than just the target when sent OS-wide. `send-input` **emits a warning** before synthesizing them (e.g. `win+l` would lock the session) but still sends them — synthesizing them can be the legitimate intent of a test. `post-message` is window-scoped and is not affected (a posted `win+l` is harmless, though a posted `alt+f4` still closes the target window).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't allow these IMO

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — changed from warn-and-send to reject. send-keys --via send-input now errors with invalid_arguments and sends nothing when it sees a system-reserved combo (win+l, alt+f4, ctrl+alt+del, alt+tab, …), since injecting those OS-wide has effects well beyond the target window. --via post-message stays unaffected because it's window-scoped (a posted win+l is harmless), so it remains the escape hatch if you genuinely need to deliver one to a specific window. Updated the SystemKeyGuard doc + prose and the tests (SendKeys_SystemCombo_ViaSendInput_IsRejected asserts exit 1 + 0 sends; added SendKeys_SystemCombo_ViaPostMessage_StillSends).

Comment thread docs/ui-automation.md Outdated
**Options:**
- `--direction <up|down|left|right>` — Scroll incrementally via `ScrollPattern`.
- `--to <top|bottom>` — Jump to the start/end via `ScrollPattern`.
- `--wheel <delta>` — Synthesize mouse-wheel input over the element's center via `SendInput`. One notch is `120`; positive scrolls up/away, negative down/toward. Bypasses `ScrollPattern`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 120 and not just 1?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — --wheel now takes notches, so 1 = one notch up and -1 = one notch down. The reason it was 120 is that 120 is the Windows WHEEL_DELTA constant: SendInput's MOUSEEVENTF_WHEEL measures rotation in units where one physical detent = 120, so the old value was the raw Win32 unit leaking through the CLI surface. The command now multiplies the notch count by WHEEL_DELTA (120) internally — e.g. --wheel 3 injects a delta of 360 — so callers think in notches while SendInput still gets the right magnitude. Updated the option description, the example (--wheel -1), and the tests (--wheel -1 → fake delta -120; added Scroll_Wheel_MultipleNotches_ScaleByWheelDelta for --wheel 3 → 360).

azchohfi and others added 2 commits June 24, 2026 11:00
… wheel in notches

- ui drag: remove the never-shipped legacy 3-arg form entirely; only `drag <from> <to>`
  (selector or app coords per endpoint). Drop the to-offset arg, legacy branch, and the
  ElementId result field (legacy-only).
- ui send-keys: reject system-reserved combos on --via send-input (error + no send) since
  OS-wide injection reaches beyond the target; --via post-message (window-scoped) unaffected.
  Document the text= escape for typing literal phrases that collide with key names.
- ui scroll --wheel: take notches instead of raw WHEEL_DELTA units (1 = one notch); scale
  by WHEEL_DELTA (120) internally before SendInput.
- Update tests, docs (ui-automation.md, SKILL template + generated tables, agent files,
  cli-schema.json, npm wrapper).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… click/hover, docs

Addresses the consolidated pr-review report on the input-injection verbs.

- M2/M3 (security): move the foreground guard to AFTER the stabilize awaits in
  drag/scroll, immediately before injection, so a focus steal during the awaited
  re-resolve can't route the OS-wide gesture to the wrong window.
- M4: extract the N5 target-stability re-resolve into GestureTargeting and reuse it
  across click/hover/drag/scroll; add moving-target + settle-after-move tests.
- M5: extract pure ForegroundGuard.Classify so the no_interactive_desktop vs
  foreground_not_target selection is unit-tested without a live desktop.
- M6: gate the post-message WM_CHAR warning behind FrameworkHint.IsLikelyXaml and
  extract ShouldWarnPostMessageTextDropped, so the gate is unit-tested and only
  warns on XAML targets instead of false-alarming Win32/WPF/Electron apps.
- L1: click/hover re-resolve their target just before injecting (refuse on a
  moving/vanished target) instead of acting on a stale rect.
- L2: drag target_moved errors report the caller's selector token, not "from"/"to".
- L3: decode backslash whitespace escapes in text= values (\s \t \n \r \\) so
  multi/leading whitespace survives the tokenizer; add coverage.
- MouseInput/KeyboardInput: distinguish a locked/secure desktop (no interactive
  desktop) from elevation/UIPI in the SendInput failure message.
- Docs: document the interactive-desktop requirement, target_moved re-resolve, the
  text= escape table, and the XAML-only post-message caveat across ui-automation.md,
  the fragment, and both SKILL.md mirrors (version unchanged).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi

Copy link
Copy Markdown
Collaborator Author

Thanks again @nmetulev — the rounds 2–3 report was incredibly useful. Everything actionable is now in, across ae59996, 0d200e5, and 007af17. Per finding:

Finding Status What changed
N1 literal text fires keys on collision words ("the end"the + End) ✅ Fixed Per-token text= escape — send-keys "text=the text=end" types the end, while bare end/down/enter still press the key. The keys path is untouched for real key presses. (ae59996)
N2 can't type consecutive/edge whitespace ✅ Fixed Backslash escapes inside text=: \s \t \n \r \\. text=a\s\sba b, leading/trailing padding survives; unknown escapes pass through verbatim. (007af17)
N3 misleading "elevated" error when there's no input desktop ✅ Fixed The SendInput-failure message now distinguishes a locked/secure desktop (new no_interactive_desktop code) from elevation/UIPI, and click/hover re-resolve before injecting too. (007af17)
N4 injection needs an unlocked, foreground, interactive session ✅ Documented Interactive-desktop requirement is now called out in ui-automation.md + both SKILL mirrors, with the UIA-path verbs (set-value/invoke/scroll --direction|--to) flagged as the headless-friendly alternative. (007af17)
N5 gesture silent-miss on a moving target, success-coded ✅ Fixed New GestureTargeting.ResolveStableAsync re-reads bounds immediately before injection on click / hover / drag / scroll --wheel; a target that's moving (> 2 px across reads), vanished, or zero-size now returns target_moved (exit 1) instead of a false . Stable targets behave exactly as before. (007af17)
N6 post-message warning not framework-aware ✅ Fixed The WM_CHAR-dropped warning is now gated behind FrameworkHint.IsLikelyXaml(hwnd) — only XAML/WinUI targets get it; Electron/Win32/WPF (where post-message text actually lands) no longer false-alarm. (007af17)

Security. Per the thread, system-reserved combos via --via send-input (win+l, alt+f4, ctrl+shift+esc, …) are now hard-rejectedinvalid_arguments, exit 1, nothing synthesized — rather than advisory-warned, and we deliberately did not add a --allow-system-keys escape hatch. The secure-desktop boundary you confirmed empirically (GetForegroundWindow=0 / SendInput=0 on the lock screen) is unchanged.

Agent-ergonomics. The two that live inside these verbs are in: literal typing (N1, via text=) and self-verifying on a moving/gone target (N5, exit 1 on miss). The broader platform asks — full post-action state in the JSON envelope for every action, a first-class --activate, get-value source-tagging, stdout/stderr split, --json purity everywhere, and a batch/session mode — are cross-cutting changes that reach well beyond these three input verbs, so I'd rather land them as focused follow-ups than expand this PR. Happy to file those and fold any specific one in here if you'd prefer it ride along.

New coverage: FrameworkHintTests, InjectionGuardTests (foreground-classification + post-message warn-gate truth tables), moving-target / settle-after-move cases in UiCommandTests, and the text= escape rows in KeyStringParserTests.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 38 changed files in this pull request and generated 1 comment.

Comment thread scripts/test-e2e-winui-ui.ps1 Outdated
--wheel is expressed in notches now (UiScrollCommand scales by WHEEL_DELTA=120
internally), so --wheel -120 meant 120 notches down (-14400 raw delta) instead
of a single detent. The scroll E2E assertion now exercises one notch down,
matching the documented notch semantics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nmetulev

Copy link
Copy Markdown
Member

🤖 AI-generated test report — round 4 (GitHub Copilot CLI, Claude Opus 4.8). Follow-up to the rounds 2–3 consolidated report, now that the PR was updated (65c4c93) in response to those reviews. I rebuilt, re-ran the unit suite (236 PR-relevant tests green), code-reviewed the deltas, and re-ran the live break-it matrix on fresh WinUI 3 + Electron harnesses (chaos timer moving/resizing a target every ~200 ms).

Round 4 — updated PR (65c4c93): fixes verified + new findings

✅ Every round-2/3 finding addressed (verified live unless noted)

Finding Fix in this PR Verified
N1 prose words fire keys new text= escape token (text=enter types "enter") live WinUI: text=enter→"enter", text=ctrl+a→"ctrl+a", bare enter→key still fires
N2 whitespace lost backslash escapes in text= (\s \t \n \r \\) live: text=a\s\sb"a b" (double space)
N3 misleading "elevated" error when locked new no_interactive_desktop code + message verified live while the box was locked on send-keys/drag/scroll --wheel
N6 post-message warning false-alarmed on Electron FrameworkHint.IsLikelyXaml(hwnd) scopes it to XAML live: WinUI warns, Electron no warning + text types fine
round-3 "no system-combo block" SystemKeyGuard now blocks (was advisory) live: ctrl+alt+del/printscreen/alt+f4/alt+tab via send-input → exit 1, refused, not sent
ergonomics: JSON/value interleaving JSON errors now → stderr; success envelopes → stdout confirmed

Also new: drag --hold-ms (long-press) / --dwell-ms (drop settle); drag endpoints accept selector or x,y; a final ForegroundGuard re-check after the stabilize re-resolve closes the focus-steal race. Unit tests green: KeyStringParser 97, SystemKeyGuard 28, FrameworkHint 13, InjectionGuard 9, UiCommand 89.

🎯 N5 (silent-miss on a moving target) — the "changing size between checks" ask: much better, not fully closed

New GestureTargeting.ResolveStableAsync re-reads the element's bounds immediately before injecting, refuses zero-size / never-settling targets with target_moved / zero_size_element (non-zero exit), and clicks at the element's current position rather than a stale rectangle. Measured against the round-3 baseline (chaos ON = 0/12 landed, 12/12 silent ✅):

Harness (chaos ON, ~200 ms moves) reported ✅ actually landed false success refused (zero_size/target_moved)
WinUI (3×10 clicks) ~2–4 /10 matches when stable ~2 /10 6–8 zero_size, 0–1 target_moved
Electron (10 clicks) 10/10 4/10 6/10 0
(baseline chaos OFF) 6/6, 5/5 0 0

So false-success dropped from ~100% to ~20% (WinUI) but is still ~60% on Electron. A reported success can still miss a continuously-moving target: the target moves during the ~50 ms cursor-settle inside MouseInput.Click() after ResolveStableAsync already returned "settled" — and on a smoothly-animating UI (Electron) neither target_moved nor zero_size fires, so the miss is silent. Suggest re-verifying the cursor-is-over-element (or re-reading position) immediately before the button-down, dropping/shortening the internal settle, or verifying a post-condition and returning non-zero on no-effect. (Continuously-moving targets are an edge case; for normal "foreground settles then static" UI the re-resolve is exactly right.)

🐞 New findings

  • F1 (Medium): click and hover were given the N5 re-resolve but not the ForegroundGuard.TryEnsureForeground gate that drag/scroll --wheel/send-keys have. Two effects (verified live while locked): (a) inconsistent error — click/hover return internal_error ("SendInput failed…") where the other three return the clean no_interactive_desktop; (b) click/hover still lack the wrong-window (M2) protection — if SetForegroundWindow silently fails and another window is foreground, they'd inject into it. Recommend routing click/hover through TryEnsureForeground too.
  • F2 (Low): the text= escape (the N1/N2 fix) is documented in docs/ui-automation.md but absent from send-keys --help / the keys argument description / cli-schema.json (0 matches) — so it's not discoverable via --help or the npm-wrapper / LLM-skill surface. Add text=<literal> to the argument description.
  • F4 (Low/compat): the drag argument shape changed to drag <from> <to> (each a selector or absolute x,y) from the prior drag <selector> <from-offset> <to-offset>. Breaking for existing scripts; pre-release, so just worth a changelog note.

Verdict (round 4)

Strong iteration — every prior finding is fixed or materially mitigated, with good new tests and structured/stderr JSON. The two things I'd still act on: F1 (give click/hover the same foreground guard) and F3/N5 (close the residual re-resolve→button-down race so a always means the gesture landed). Both are about the same north star from the ergonomics note: a success result should guarantee the action actually happened.

Live testing on WinUI 3 (chaos/zero-size/resize harness) and an Electron/Chromium harness; both removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-25. 🤖

…race close + text= discoverability

Round-4 test report (PR #587) flagged three actionable items; this addresses them.

F1 — click/hover now gated by the foreground guard (parity with drag / scroll
--wheel / send-keys). Introduce IForegroundGuard (RealForegroundGuard wraps the
existing static ForegroundGuard.TryEnsureForeground so the classification logic
stays the single source of truth) and inject it into all five gesture verbs.
click/hover now fail with the clean no_interactive_desktop error on a locked
session instead of a misleading internal_error, and gain wrong-window
(foreground_not_target) protection. The DI seam also makes the deny path
unit-testable without a live desktop.

F3/N5 — close the residual re-resolve to button-down race on click. After the
stable re-resolve and foreground check, position the cursor (new
IMouseInput.MoveCursor), run the cursor-settle here, then do a final
GestureTargeting.ConfirmStillAsync read immediately before the button-down,
which now uses settleMs:0 (no second internal settle). A target that drifts
during the settle window is caught and refused with target_moved instead of
reporting a false success after landing on empty space.

F2 — document the text=<literal> escape (incl. \s \t \n \r \\ backslash
escapes) in the send-keys keys-argument description so it is discoverable via
--help / cli-schema.json / the npm SDK, not only docs/ui-automation.md.
Regenerated cli-schema.json, plugin/Claude skills, and the npm command SDK.

F4 — the drag arg-shape change is pre-release and already reflected in
help/docs; the repo has no CHANGELOG, so no changelog entry is added.

Tests: 1509 pass, 5 pre-existing E2E_Node* env failures (need the npm wrapper
build, unrelated), 1 skip. New tests cover click/hover guard deny+allow paths,
the move-during-settle race, settleMs:0 on the press, and text= discoverability.
Live click/drag verification against a real app remains blocked by the lock
screen — these findings are unit-test-verified only.

Refs #587 #562 #498

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi

azchohfi commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for the round-4 report — addressed the two verdict items plus the F2 quick win in 4517a1e.

F1 (click/hover foreground guard)click and hover now route through the same foreground gate as drag/scroll --wheel/send-keys. Rather than duplicate the static call, I introduced an IForegroundGuard DI seam (RealForegroundGuard wraps the existing ForegroundGuard.TryEnsureForeground, so the Classify logic stays the single source of truth) and injected it into all five gesture verbs. So click/hover now return the clean no_interactive_desktop on a locked session (instead of the misleading internal_error/SendInput failure) and gain the wrong-window foreground_not_target protection. The seam also makes the deny path unit-testable — added click+hover deny/allow tests.

F3/N5 (residual re-resolve→button-down race) — closed for click. After the stable re-resolve and foreground check, the command now positions the cursor (new IMouseInput.MoveCursor), runs the cursor-settle itself, then does a final GestureTargeting.ConfirmStillAsync read immediately before the button-down — and the press now uses settleMs: 0 so there's no second internal settle window. A target that drifts during the settle is caught and refused with target_moved instead of a false success. New test simulates a target that settles in ResolveStableAsync then moves on the confirm read → exit 1, no click injected.

F2 (text= discoverability) — added text=<literal> (incl. the \s \t \n \r \\ escapes) to the send-keys keys-argument description, so it now shows in --help, cli-schema.json, and the npm SDK — not just docs/ui-automation.md. Regenerated schema + plugin/Claude skills + npm commands.

F4 (changelog) — the repo has no CHANGELOG, and the breaking drag <from> <to> shape is already reflected in the verb help and docs/ui-automation.md, so there is nothing to add there. Happy to add one if you would like a CHANGELOG introduced.

Verification — F1 and F3 are deterministic logic fixes with full unit coverage; they do not depend on a live desktop. The F3 race is replicable in-process: the regression test drives a target that reads as "settled" in ResolveStableAsync and then drifts on the confirm read, and asserts the command refuses with target_moved and injects no click. I confirmed the test genuinely exercises the race by deleting the confirm read and watching it fail (a click lands on the moved target), then restoring it to green. The only check that still needs an unlocked session is the physical end-to-end input smoke — real-app pixel landing for click/drag and the WinUI 3 per-keystroke KeyDown spot-check — which exercises the OS input layer, not this logic. The PR stays draft for that manual gate.

Unit suite: 1509 pass, 1 skip, 5 pre-existing E2E_Node* env failures (need the npm wrapper build, unrelated to this change).

nmetulev and others added 3 commits June 26, 2026 10:46
send-keys' per-token text= escape (e.g. text=enter) is easy to forget and
awkward when the entire payload is literal text. Add a command-level
--verbatim flag that types the whole keys argument as a single literal:
no named-key/combo/vk=/text= interpretation, and exact whitespace preserved
(the normal path collapses internal whitespace runs to a single space).

- KeyStringParser.ParseVerbatim: whole-string TextInput, no tokenizing or
  escape decoding; throws on empty/whitespace like Parse.
- UiSendKeysCommand: --verbatim option; branch the parse path on it; help
  and keys-arg description point at it. Verbatim text is a TextInput, so the
  send-input system-combo guard correctly does not refuse e.g. "win+l".
- Tests: parser (literal / exact whitespace / no escape decode / empty) and
  command (types literal not keys, preserves whitespace, system-combo-as-text,
  missing keys still errors, option documented).
- Docs + skill fragment + agent subcommand list updated; regenerated
  cli-schema.json and npm SDK now carry --verbatim.

Refs #562, #587

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi

Copy link
Copy Markdown
Collaborator Author

Added a --verbatim flag to ui send-keys (commit 4092cae).

Why: the per-token text= escape (e.g. text=enter) is easy to forget and awkward when the whole payload is literal text. --verbatim types the entire keys argument as one literal — no named-key/combo/vk=/text= parsing — and preserves exact internal whitespace (the normal path collapses runs of spaces to one, which text= could only fix with \s).

winapp ui send-keys "down down enter" -a myapp --verbatim   # types the words, doesn't press Down/Down/Enter
winapp ui send-keys "a  b" -a myapp --verbatim              # keeps the double space
  • KeyStringParser.ParseVerbatim — whole-string TextInput, no tokenizing/escape-decoding; throws on empty/whitespace like Parse.
  • Because verbatim text is a TextInput (not a chord), the --via send-input system-combo guard correctly does not refuse e.g. win+l typed as literal text.
  • New unit tests (parser + command), and the help text, cli-schema.json, npm SDK, docs, skill fragment, and agent subcommand list all updated.

Full suite green (1523 pass; the 5 E2E_Node* failures are the pre-existing env ones needing npm run build, unrelated). Still draft.

Address the pr-review findings on the send-keys (#562) and drag / scroll
--wheel (#498) input-injection verbs.

High:
- H1 ForegroundGuard: compare the foreground window's root ancestor
  (GetAncestor GA_ROOT) to the target instead of PID-matching, so a
  child/owned window of the target counts as "foreground" and a same-PID
  but different top-level window does not. Drops the unsafe block.
- H2 drag from-point settle race: move the cursor, wait, re-confirm the
  from element is still there (ConfirmStillAsync), re-check the foreground,
  then press with settleMs:0. Mirrors the click fix so "success" means the
  button-down landed on the resolved target, not empty space.

Medium:
- M1 KeyStringParser: a modifier-led chord with an empty segment around
  '+' ("ctrl++a", "ctrl+", "shift++enter") now throws instead of silently
  dropping the empty and pressing a different chord. Non-modifier-led
  tokens ("a+b", "C++", "ctrl+a+b") stay literal text.
- M2 scroll --wheel: same cursor-settle / confirm / foreground sequence as
  drag before synthesizing the wheel.
- M3 click: re-check the foreground immediately before the button-down
  (second gate), closing the same settle-window race on the success path.
- M7 drag: a comma-shaped from/to token that isn't a valid x,y pair
  ("100,", "100,200,300") returns invalid_arguments instead of being
  treated as a selector and misreported as element-not-found.
- M8 --verbatim: whitespace-only is legitimate verbatim content (types the
  spaces) rather than a "nothing to send" error; only genuinely empty is
  rejected.
- M9 send-keys --via send-input: refuse OS-wide injection when no target
  window resolves (keystrokes have no location to verify the foreground
  against) instead of typing blindly into whatever has focus.
- M10 send-keys: the human log reports the action count, not the raw
  keystrokes (which may carry secrets in persisted CI logs); --json still
  includes `keys` for opt-in callers.

Tests:
- M4 split the 1.5k-line UiCommandTests.cs into partial-class files
  (UiCommandTests.SendKeys.cs, UiCommandTests.Mouse.cs) so no file exceeds
  the ~1000-line guideline; behavior-preserving.
- M5 foreground-deny tests for drag, scroll --wheel, and send-keys
  --via send-input.
- M6 drag from-point moves-during-settle test (mirrors the click case).
- Added malformed-chord, verbatim-whitespace, malformed-coordinate, and
  no-resolvable-target tests.

1540 pass / 5 pre-existing E2E_Node env fails / 1 skip. cli-schema content
unchanged (no Description edits).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi

Copy link
Copy Markdown
Collaborator Author

Self-review hardening round (commit 24d5575)

Ran a multi-dimension review over this branch and addressed every finding it surfaced (2 high, 10 medium). All changes are unit-covered and the full suite is green: 1540 pass / 1 skip, with the only failures being the 5 pre-existing E2E_Node* tests that need src/winapp-npm/dist/cli.js (environment, unrelated to this branch).

High

  • H1 — ForegroundGuard now compares the foreground window's root ancestor (GetAncestor(GA_ROOT)) to the target rather than matching PID, so a child/owned window of the target counts as "in front" and a same-PID-but-different-top-level window does not. Removed the unsafe block.
  • H2 — drag from-point settle race (the same class as the click N5/F3 fix): move the cursor, wait, re-confirm the from element is still there, re-check the foreground, then press with settleMs:0. A now means the button-down landed on the resolved target, not on space it drifted away from.

Medium

  • M2 / M3 — applied the same cursor-settle → confirm → second foreground gate to scroll --wheel and to click's success path.
  • M1 — a modifier-led chord with an empty segment (ctrl++a, ctrl+, shift++enter) now errors instead of silently dropping the empty and pressing a different chord; non-modifier-led tokens (a+b, C++, ctrl+a+b) stay literal.
  • M7 — a comma-shaped drag endpoint that isn't a valid x,y pair (100,, 100,200,300) returns invalid_arguments instead of being treated as a selector and misreported as element-not-found.
  • M8--verbatim treats a whitespace-only argument as content to type (it's the whole point of verbatim), not a "nothing to send" error.
  • M9send-keys --via send-input refuses OS-wide injection when no target window resolves, rather than typing blindly into whatever has focus.
  • M10 — the human log reports the action count, not the raw keystrokes (CI logs are persisted and a sequence may carry a typed secret); --json still includes keys for opt-in callers.

Tests

  • M4 — split the 1.5k-line UiCommandTests.cs into partial-class files (UiCommandTests.SendKeys.cs, UiCommandTests.Mouse.cs) so no file exceeds the ~1000-line guideline. Behavior-preserving; same [TestClass], same fixtures.
  • M5 / M6 — added foreground-deny tests (drag, scroll --wheel, send-keys send-input) and a drag from-point moves-during-settle test, plus the malformed-chord / verbatim-whitespace / malformed-coordinate / no-resolvable-target cases above.

cli-schema.json content is unchanged (no Description edits this round — verified the live --cli-schema is byte-identical to the committed file modulo the dev version stamp). Still a draft.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@nmetulev

Copy link
Copy Markdown
Member

🤖 AI-generated test report — round 5 (GitHub Copilot CLI, Claude Opus 4.8). The PR was updated again (24d5575, v0.4.1) directly in response to the round-4 review. I rebuilt, re-ran the unit suite (270 PR-relevant tests green), code-reviewed the deltas, and re-ran the live break-it matrix on fresh WinUI 3 + Electron harnesses (chaos timer moving/resizing a target every ~150 ms).

Round 5 — updated PR (24d5575): round-4 findings fixed

✅ Every round-4 finding addressed (verified live on both frameworks unless noted)

Round-4 finding Fix in this PR Verified
F1 click & hover lacked the foreground guard ForegroundGuard is now an injected IForegroundGuard and TryEnsureForeground is wired into all 5 gesture verbs (click, hover, drag, scroll --wheel, send-keys) code: hover :123, click :130+:157; + unit tests; live: click/hover correctly foreground-then-act
F3 moving-target "success" could still miss MoveCursor → settle → ConfirmStillAsync (fresh UIA re-read) → inject with settleMs: 0, so the button-down happens immediately after a confirm the headline result — see below
F2 text= undocumented in --help text= and the new --verbatim are now in the keys arg description + option help confirmed in send-keys --help
my ergonomics ask: a safe "type a string" path new --verbatim types the whole argument literally, exact whitespace preserved live: --verbatim "a b c"get-value returns "a b c" (len 8) on both WinUI + Electron

Unit tests: KeyStringParser 112, SystemKeyGuard 28, FrameworkHint 13, ForegroundGuard 6, UiCommand 111.

🎯 F3 (the round-4 silent-miss) — essentially fixed

Re-ran the moving-target break-it test (now at a faster 150 ms mutation). The new confirm-immediately-before-button-down closes almost the entire race:

Harness, chaos ON round 4 false-success round 5 false-success round-5 refusals
Electron (15 clicks) ~6/10 (~60%) 1/15 (~7%) 14 target_moved
WinUI (15 clicks) ~2/10 (~20%) 0/15 (0%) 2 target_moved + 13 zero_size
(baseline chaos OFF) 6/6 land (both) 0

drag and scroll --wheel against a moving target also now refuse cleanly with target_moved (exit 1, JSON). The residual ~7% on Electron is the irreducible microsecond gap between ConfirmStillAsync's UIA read and the OS SendInput button-down — about as tight as coordinate injection can get. A reported now almost always means the gesture landed (vs round-4 where, under animation, it frequently didn't).

✅ Regressions all clean on the refactored build

  • N6 framework-aware warning: post-message text warns on WinUI/XAML, silent on Electron (and types fine) — both correct.
  • SystemKeyGuard still hard-blocks ctrl+alt+del / alt+f4 / win+l via send-input (exit 1) on both.
  • text= per-token escape intact (text=the text=end → "the end"); bare enter still fires the key.
  • drag --hold-ms/--dwell-ms parse fine.

Verdict (round 5)

This closes the loop on the round-4 review — F1, F2, F3 are all addressed, --verbatim lands the ergonomics ask, and I found no new defects this round. Nicely done. The only thing left from my notes is philosophical (the text=/--verbatim "type a string" ergonomics are now solved two ways; the default bare-text path is still key-interpreted, which is the right call given --verbatim exists and is documented).

One thing I could not stage live: the no_interactive_desktop half of F1 (click/hover returning the clean locked-desktop error instead of round-4's internal_error) needs a locked session to demonstrate — verified in code (the guard now runs before the SendInput call) and by the consistent wiring across all five verbs.

Live testing on WinUI 3 + Electron/Chromium chaos harnesses (target moving/resizing every ~150 ms); both removed after testing. Generated by the GitHub Copilot CLI agent · 2026-06-30. 🤖

@nmetulev nmetulev merged commit 89def9f into main Jun 30, 2026
23 checks passed
@nmetulev nmetulev deleted the azchohfi-ui-input-verbs branch June 30, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants