Skip to content

feat(browser): compound expansion + cascading stale-ref + bbox 0.99 dedup#1116

Merged
jackwener merged 2 commits intomainfrom
feat/browser-agent-upgrades
Apr 21, 2026
Merged

feat(browser): compound expansion + cascading stale-ref + bbox 0.99 dedup#1116
jackwener merged 2 commits intomainfrom
feat/browser-agent-upgrades

Conversation

@jackwener
Copy link
Copy Markdown
Owner

Summary

Three agent-native upgrades inspired by browser-use, bundled as one PR because they share the same target / snapshot / find surface.

Priority ordering matches the agent-success lens WAWQAQ set: #2 compound (highest lift) → #1 cascading stale-ref → #4 bbox containment.

1. Compound-component expansion — src/browser/compound.ts (new)

browser find --css entries now carry a compound field for the three input categories agents burn turns on:

  • date / time / datetime-local / month / week{control, format: "YYYY-MM-DD", current, min?, max?} so agents stop typing "April 21 2026" into date inputs and silently failing.
  • select{control: "select", multiple, current, options: [{label, value, selected, disabled?}], options_total}. Snapshots cap options at 6; this caps at 50 with full options_total. Agents can match by label without opening the dropdown.
  • file{control: "file", multiple, accept?, current: [names]} so agents don't re-upload what's already there.

Helper is shipped as an inlined JS source string so it can be reused by snapshot / eval later without a second code path.

2. Cascading stale-ref — src/browser/target-resolver.ts

Strict fingerprint equality was stalling SPA re-renders / i18n label swaps. Ref path now walks three tiers before giving up:

tier condition
exact tag + every non-empty stored field agrees
stable tag + strong id (id / testId) agrees; soft signals may have drifted
reidentified original ref missing or mismatched, but fp uniquely identifies a live node — re-tag + refresh identity map

Every success envelope now carries match_level. Only when all three tiers miss do we emit stale_ref.

3. BBox 0.99 containment filter — src/browser/dom-snapshot.ts

Adds a second dedup tier on top of the existing 0.95 non-interactive one. When a parent is a propagator (tag a / button OR role button / link / menuitem / tab / option) and a child is interactive but undistinctive (no aria-label / id / data-testid / name / own form-control), fold it into the parent.

Removes the [1]<button> [2]<svg> [3]<span> ref-explosion on icon buttons. Nested <input> / <a href> stay distinct by design.

Test plan

  • src/browser/compound.test.ts — 18 cases exercising date / select / file variants via a sandbox-executed compoundInfoOf(el)
  • src/browser/find.test.ts — asserts compoundInfoOf is inlined and entries opt-in only when compound data exists
  • src/browser/target-resolver.test.ts — asserts all three match_level values appear, classifier + reidentifier are wired, fallthrough ordering is correct
  • src/browser/dom-snapshot.test.ts — asserts 0.99 tier coexists with 0.95 tier, PROPAGATING_ROLES / isBboxPropagator / isDistinctivelyInteractive are emitted
  • Full run: 287/287 in src/browser/ + src/cli.test.ts pass
  • tsc --noEmit clean

cc @codex-mini1 @First-principles-1 for review.

…edup

Three agent-native upgrades inspired by browser-use, landed as one PR
because they share the same target / snapshot / find surface.

  1. Compound expansion (compound.ts)
     Date/time/datetime-local/month/week, select, and file inputs now
     emit a `compound` JSON field on `browser find --css` entries with
     format, current value, min/max (date family), full options list
     + selected (select), accept / multiple / files[] (file). Kills
     the three biggest form-page failure modes (wrong date format,
     guessed options, re-uploaded files) without extra round-trips.

  2. Cascading stale-ref (target-resolver.ts)
     Numeric ref resolution now walks three tiers before giving up:
     exact → stable (tag + strong id match, soft signals drifted) →
     reidentified (original ref lost, fingerprint uniquely found a
     live element; re-tag + refresh identity). Every success envelope
     carries `match_level` so callers can tell which tier matched.
     SPA re-renders / i18n label swaps no longer stall agents.

  3. BBox 0.99 containment for interactive descendants (dom-snapshot.ts)
     Adds a second dedup tier on top of the existing 0.95 non-interactive
     one. When a parent is a propagator (tag a/button OR role button/
     link/menuitem/tab/option) and a child is interactive but
     undistinctive (no aria-label/id/testid/name/form-control), fold
     it into the parent — removes `[1]<button> [2]<svg> [3]<span>`
     noise on icon buttons.

Tests: 287/287 pass (src/browser + src/cli.test.ts). Typecheck clean.
- compound select: walk ALL options to collect selected labels, not just
  the first 50 we serialize. Fixes dropdowns where the selected entry
  sits past COMPOUND_SELECT_OPTIONS_CAP (e.g. country lists, timezones)
  reporting current: "" even though the user picked a valid option.
- match_level: propagate the cascading match tier
  (exact / stable / reidentified) through IPage.click/typeText/scrollTo,
  BasePage, and the cli command envelopes (click / type / select /
  get text|value|html|attributes). Agents now see in JSON that the
  resolver had to fall back, instead of the tier being swallowed.
- compound contract is now also emitted by `browser state`
  (per-ref compounds: sidecar) and by `browser get html --as json`
  (compound field on each node), not only by `browser find --css`.
  Closes the gap where agents using the default snapshot still
  round-tripped `find` for every date / select / file control.

Adds targeted regression tests for each blocker + updates cli.test.ts
mocks to the new envelope shape.
@jackwener jackwener merged commit 04a5a17 into main Apr 21, 2026
13 checks passed
jackwener added a commit that referenced this pull request Apr 21, 2026
Restore `skills/opencli-browser/SKILL.md`, deleted in #1094, rewritten for
the post-#1116 browser CLI surface: selector-first target contract,
`match_level { exact | stable | reidentified }`, compound fields for
date/time/select/file, structured error codes with `available` vs
`candidates`, new `find` / `extract` / `network --filter` commands,
html tree budgets, tabs/frames, cost guide, recipes, pitfalls.

Review tightened two contract-drift bugs before merge:
- `browser tab list` envelope field is `page`, not `targetId`
- `network --ttl` default is `24h`, not `~5min`

2 reviewers green (codex-mini1, First-principles-1); CI all-green.
luxiaolei pushed a commit to luxiaolei/OpenCLI that referenced this pull request Apr 22, 2026
…edup (jackwener#1116)

* feat(browser): compound expansion + cascading stale-ref + bbox 0.99 dedup

Three agent-native upgrades inspired by browser-use, landed as one PR
because they share the same target / snapshot / find surface.

  1. Compound expansion (compound.ts)
     Date/time/datetime-local/month/week, select, and file inputs now
     emit a `compound` JSON field on `browser find --css` entries with
     format, current value, min/max (date family), full options list
     + selected (select), accept / multiple / files[] (file). Kills
     the three biggest form-page failure modes (wrong date format,
     guessed options, re-uploaded files) without extra round-trips.

  2. Cascading stale-ref (target-resolver.ts)
     Numeric ref resolution now walks three tiers before giving up:
     exact → stable (tag + strong id match, soft signals drifted) →
     reidentified (original ref lost, fingerprint uniquely found a
     live element; re-tag + refresh identity). Every success envelope
     carries `match_level` so callers can tell which tier matched.
     SPA re-renders / i18n label swaps no longer stall agents.

  3. BBox 0.99 containment for interactive descendants (dom-snapshot.ts)
     Adds a second dedup tier on top of the existing 0.95 non-interactive
     one. When a parent is a propagator (tag a/button OR role button/
     link/menuitem/tab/option) and a child is interactive but
     undistinctive (no aria-label/id/testid/name/form-control), fold
     it into the parent — removes `[1]<button> [2]<svg> [3]<span>`
     noise on icon buttons.

Tests: 287/287 pass (src/browser + src/cli.test.ts). Typecheck clean.

* fix(browser): address reviewer blockers on PR jackwener#1116

- compound select: walk ALL options to collect selected labels, not just
  the first 50 we serialize. Fixes dropdowns where the selected entry
  sits past COMPOUND_SELECT_OPTIONS_CAP (e.g. country lists, timezones)
  reporting current: "" even though the user picked a valid option.
- match_level: propagate the cascading match tier
  (exact / stable / reidentified) through IPage.click/typeText/scrollTo,
  BasePage, and the cli command envelopes (click / type / select /
  get text|value|html|attributes). Agents now see in JSON that the
  resolver had to fall back, instead of the tier being swallowed.
- compound contract is now also emitted by `browser state`
  (per-ref compounds: sidecar) and by `browser get html --as json`
  (compound field on each node), not only by `browser find --css`.
  Closes the gap where agents using the default snapshot still
  round-tripped `find` for every date / select / file control.

Adds targeted regression tests for each blocker + updates cli.test.ts
mocks to the new envelope shape.

(cherry picked from commit 04a5a17)
luxiaolei pushed a commit to luxiaolei/OpenCLI that referenced this pull request Apr 22, 2026
Restore `skills/opencli-browser/SKILL.md`, deleted in jackwener#1094, rewritten for
the post-jackwener#1116 browser CLI surface: selector-first target contract,
`match_level { exact | stable | reidentified }`, compound fields for
date/time/select/file, structured error codes with `available` vs
`candidates`, new `find` / `extract` / `network --filter` commands,
html tree budgets, tabs/frames, cost guide, recipes, pitfalls.

Review tightened two contract-drift bugs before merge:
- `browser tab list` envelope field is `page`, not `targetId`
- `network --ttl` default is `24h`, not `~5min`

2 reviewers green (codex-mini1, First-principles-1); CI all-green.

(cherry picked from commit 2f66d48)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant