feat(tool-server): implement await-ui-element tool by hubgan · Pull Request #396 · software-mansion/argent

hubgan · 2026-06-23T10:04:29Z

Summary

Adds a await-ui-element tool that blocks until a UI condition is satisfied or a timeout elapses.

This is one of the most impactful missing tools. Without it, agents have to poll manually with screenshot → describe → check loops — slow, token-heavy, and unreliable (fixed sleep delays either over-wait or fire before the screen settles). await-ui-element moves the poll loop server-side: one call in, a definitive verdict out.

Works on iOS, Android, and Chromium (CDP), polling the same accessibility / DOM tree as describe.

Conditions

visible <selector> — wait until an element appears on screen
hidden <selector> — wait until an element disappears
exists <selector> — wait until an element is in the tree
text <selector> <expectedText> — wait until the matched element contains the text

selector is { text?, identifier?, role? } — every field provided must match (case-insensitive substring). Polls internally; default timeoutMs 5000, default pollIntervalMs 400. Returns { success: boolean, elapsed: number }, with a note explaining what was seen on timeout.

{ "udid": "<UDID>", "condition": "visible", "selector": { "text": "Continue" } }

Also usable as a step inside run-sequence, so a tap → await-ui-element → tap chain runs in a single call.

Notable details

Multi-match aware — a substring selector can hit several nodes, so conditions evaluate the whole match set: visible holds if any match is on-screen, hidden only if none is (a zero-area container can't flip the verdict).
hidden guard — a selector that matches nothing counts as already-hidden; the note flags when the selector never matched, so a typo doesn't read as a silent success.
Deadline-clamped poll — a large pollIntervalMs can't overshoot timeoutMs.
Abort-aware — both the time sleep and the poll loop stop promptly on cancellation.

Changes

tools/await-ui-element/index.ts — new tool (factory, mirrors describe's service resolution)
utils/setup-registry.ts — registers the tool
tools/run-sequence/index.ts — allows await-ui-element as a step + arg-doc + example
test/await-ui-element.test.ts — new test suite
skills/rules/argent.md + argent-device-interact/SKILL.md — document await-ui-element, replace the "don't poll screenshot" guidance with it

Test plan

time sleeps and succeeds without touching a device
visible succeeds once the element appears; times out with a diagnostic note when it doesn't
exists / hidden / text happy paths; text timeout note reports last-seen text
hidden instant success and the "selector never matched" flag
poll sleep clamped to the deadline (large pollIntervalMs can't overshoot timeoutMs)
prompt cancellation aborts both the poll loop and the time sleep
multi-match evaluation (findAll / evaluateMatches)
schema validation (required fields per condition, durationMs cap)

latekvo

This sounds like a nitpick but i think it might be quite imporant for the agent - let's name this tool one of these names whichever you believe to be the most fitting: wait-for, wait-for-ui or await-ui-element

The reason is that by default the only thing the agent sees is tool name.
The agent then has to manually call ToolSearch tool to view descriptions of the tools it wants to call.
Given that context, wait alone may be too broad or misleading to the agent.

Note that we already use await in our await_user_selection tool, so following that tool naming convention would make some sense.

latekvo

Reviewed end-to-end, including running the built tool-server against a live iOS simulator over HTTP: visible / text succeed once the element appears, hidden correctly refuses to fire while the element is still on screen, an unmet wait inside run-sequence stops the sequence so the trailing tap never runs, and the schema rejects text without expectedText. Cross-platform tree handling, abort/cancellation propagation, and the per-fetch deadline all check out. All eight earlier review threads are addressed in this revision (replied + resolved individually).

The inline notes below are minor and don't block.

One item that can't be anchored to a line: the PR title and description still document the previous wait tool and its time / durationMs sleep condition, which were removed in the rename to await-ui-element. The test-plan checkboxes and the Changes file paths (tools/wait/index.ts, test/wait.test.ts) are stale relative to the merged code and would carry into the squash-commit message — worth refreshing before merge.

latekvo · 2026-06-24T14:58:32Z

+  exists   — the selector matches an element anywhere in the tree.
+  visible  — the selector matches an element with a non-zero on-screen frame.
+  hidden   — the selector matches nothing, or only a zero-area element (e.g. a spinner that disappeared).
+  text     — the element matched by the selector contains expectedText (case-insensitive substring).


This describes text as checking "the element matched by the selector," but the selector is a substring match that can hit several nodes and the condition only inspects the first match in reading order. When a broad selector matches more than one element and a lower one — not the topmost — is the one containing expectedText, the wait reports failure even though an on-screen matching element does contain the text. The single-element phrasing doesn't signal that the verdict is keyed to the topmost match only, so an agent using a loose selector can be surprised by a false negative.

latekvo · 2026-06-24T14:58:33Z

+
+// Every node matching the selector in the subtree, EXCLUDING `root` itself.
+// `root` is the synthetic full-screen container every describe adapter injects
+// (iOS `AXGroup`, Android `hierarchy`/`Screen`, Chromium `html`; frame


This rationale states every adapter injects a synthetic full-screen root with frame 0,0,1,1, but on Chromium the root is the real <html> element (describeChromium walks document.documentElement) and its frame comes from getBoundingClientRect, not a synthetic 0,0,1,1. Excluding it is still correct, but the stated reasoning misdescribes the Chromium case. A side effect worth noting: because <html>'s own id / aria-label / author role sit on the excluded root, a selector targeting those attributes matches nothing on Chromium.

latekvo · 2026-06-24T14:58:33Z

+// describe prunes off-screen / zero-size nodes on Chromium and the compressed
+// Android dump, and iOS AX only returns on-screen leaves — so a non-zero frame
+// area is a cheap, reliable proxy for "visible".
+function isVisible(node: DescribeNode): boolean {


findAll matches against every node in the tree, while describe's rendered body only emits nodes that pass its content/role filter. As a result a role- or identifier-only selector can match a structural container (e.g. an unlabeled AXGroup) that describe never lists, so visible / exists can hold for an element the agent never saw in describe's output. The comment frames the match set as mirroring format-tree, but only the root exclusion is shared — the per-node visibility gate is not.

latekvo · 2026-06-24T14:58:33Z

+// such a tree is not evidence the element is gone, so we must not let `hidden`
+// resolve positively off it — otherwise "AX is down" reads as "element hidden".
+function isBlindRead(data: DescribeTreeData): boolean {
+  return data.tree.children.length === 0 && Boolean(data.hint || data.should_restart);


isBlindRead only treats an empty tree as unreliable when hint or should_restart is set, and only the iOS path ever sets those. On Android and Chromium an empty tree is always treated as a confirmed read, so a hidden (or text-absence) wait resolves immediately on any empty tree, including a momentary blank frame mid-navigation. When the element matched on an earlier poll (everMatched true) and a later poll lands on a transient empty tree, hidden returns success with no caveat — which can release a gated tap against a screen that only briefly went blank.

hubgan marked this pull request as ready for review June 23, 2026 10:18

latekvo reviewed Jun 24, 2026

View reviewed changes

Comment thread packages/skills/skills/argent-device-interact/SKILL.md Outdated

latekvo reviewed Jun 24, 2026

View reviewed changes

hubgan added 3 commits June 24, 2026 15:57

feat(tool-server): implement wait tool

1b24139

chore: run format

9cc7bf7

fix: gate run-sequence and flows

03023d9

hubgan force-pushed the feat/wait-tool branch from 36fbe79 to 03023d9 Compare June 24, 2026 14:27

latekvo approved these changes Jun 24, 2026

View reviewed changes

hubgan changed the title ~~feat(tool-server): implement wait tool~~ feat(tool-server): implement await-ui-element tool Jun 24, 2026

fix: guard hidden against transient empty trees + correct match docs

f0a714d

hubgan merged commit 5558774 into main Jun 24, 2026
7 checks passed

hubgan deleted the feat/wait-tool branch June 24, 2026 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tool-server): implement await-ui-element tool#396

feat(tool-server): implement await-ui-element tool#396
hubgan merged 4 commits into
mainfrom
feat/wait-tool

hubgan commented Jun 23, 2026 •

edited

Loading

Uh oh!

latekvo left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

latekvo left a comment

Uh oh!

latekvo Jun 24, 2026

Uh oh!

latekvo Jun 24, 2026

Uh oh!

latekvo Jun 24, 2026

Uh oh!

latekvo Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hubgan commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Conditions

Notable details

Changes

Test plan

Uh oh!

latekvo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

latekvo left a comment

Choose a reason for hiding this comment

Uh oh!

latekvo Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

latekvo Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

latekvo Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

latekvo Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hubgan commented Jun 23, 2026 •

edited

Loading