feat: Ref-Backed Locator for browser actions#1016
Merged
Conversation
Introduces a unified target resolution system with fingerprint verification and structured error diagnostics. Snapshot phase: - Each interactive element now gets a fingerprint (tag, role, text, ariaLabel, id, testId) stored in window.__opencli_ref_identity - Zero overhead: metadata is already available during DOM walk Resolution phase (new target-resolver.ts): - Numeric input → ref path with fingerprint verification - CSS-like input → querySelectorAll with uniqueness check - No more silent first-match: ambiguous selectors are rejected Error model (new target-errors.ts): - stale_ref: element identity changed since snapshot - ambiguous: CSS selector matched multiple elements (with candidates) - not_found: element not in DOM or invalid input - All errors include actionable hints for AI agents base-page.ts: - click() and typeText() now use two-phase resolve-then-act - Existing CDP fallback for click preserved
scrollTo now uses the same two-phase resolve-then-act pattern as click and typeText, getting fingerprint verification and structured error diagnostics (stale_ref/ambiguous/not_found) for free.
…getError in CLI 1. Fingerprint verification now uses the full identity vector (tag, id, testId, ariaLabel, role, text) instead of just tag/role/text. Strong identifiers (id, testId) are decisive; remaining signals use majority voting. Fixes false negatives where same-tag elements swapped. 2. browserAction() now renders TargetError with code, hint, and candidates list instead of just the message string.
- browser get text/value/attributes now resolve via resolveTargetJs instead of raw querySelector, getting fingerprint verification and structured errors for free - browser select uses selectResolvedJs on __resolved element - type command's autocomplete detection uses isAutocompleteResolvedJs on the already-resolved element - Fix empty-string text prefix match: fp.text="Login" + text="" no longer falsely passes fingerprint check
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
resolveTargetJs) that replaces the ad-hoc 4-strategy fallback indom-helpers.tsstale_ref(element identity changed),ambiguous(CSS matched multiple),not_found(element gone) — all with actionable hintsbase-page.ts: Phase 1 resolves + validates target, Phase 2 executes click/type on verified elementChanges
src/browser/target-resolver.ts—resolveTargetJs(),clickResolvedJs(),typeResolvedJs()src/browser/target-errors.ts—TargetErrorclass withcode,hint,candidatessrc/browser/dom-snapshot.ts— generates__opencli_ref_identityfingerprint map during snapshotsrc/browser/base-page.ts—click()andtypeText()use two-phase resolve-then-act patternDesign context
Discussed in thread with @codex-coder. Key insight: OpenCLI doesn't need a full Locator DSL (agent loop gets fresh snapshot each turn). The real gap is verification (detecting stale/ambiguous refs) and diagnostics (structured errors with actionable hints).
Test plan
opencli browser state→ click by ref → verify fingerprint check worksstale_referror with hintambiguouserror with candidates list