Skip to content

browser: add coordinate clicks and act verification#54452

Closed
dluttz wants to merge 1 commit intoopenclaw:mainfrom
dluttz:codex/browser-heygen-editor
Closed

browser: add coordinate clicks and act verification#54452
dluttz wants to merge 1 commit intoopenclaw:mainfrom
dluttz:codex/browser-heygen-editor

Conversation

@dluttz
Copy link
Copy Markdown

@dluttz dluttz commented Mar 25, 2026

No description provided.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 25, 2026

Greptile Summary

This PR adds two primary capabilities to the browser automation layer: coordinate-based clicks (click-coords action) dispatched via CDP Input.dispatchMouseEvent (Playwright path) or document.elementFromPoint + synthetic JS events (Chrome MCP / existing-session path), and act verification — an optional post-action screenshot captured and persisted when verifyWith: "screenshot" is passed. It also adds a pre-action scroll-into-view step before element-based clicks and type actions to handle off-screen targets, a scrollintoview (lowercase) alias, and a best-effort CDP bootstrap hook (bootstrapChromeMcpDevtools) that fires on session creation.

Key observations:

  • Behavioral inconsistency in click-coords: The Chrome MCP path returns 404 when document.elementFromPoint finds no element at the requested coordinates, while the Playwright/CDP path always returns 200 (CDP events fire unconditionally). A caller using the same endpoint across session types would see different status codes for the same empty-space click.
  • bootstrapChromeMcpDevtools exits on first command failure: If Target.setAutoAttach cannot be called (all 4 argument variants fail), the loop breaks and Runtime.enable / Log.enable are never attempted, even though those commands are independent. This is labeled best-effort, but continuing rather than breaking would be more thorough.
  • The verifyWith field added to individual batch sub-action types in BrowserActRequest is never consumed by executeSingleAction; only the top-level verifyWith on a batch action is used. This is not a runtime bug but may mislead future contributors.

Confidence Score: 3/5

  • Safe to merge after resolving the click-coords 404/200 inconsistency between Chrome MCP and Playwright backends.
  • The PR is well-tested and the core features (coordinate clicks, scroll-into-view, act verification) are correctly implemented. However, the inconsistent HTTP status for coordinate clicks on empty space (404 from Chrome MCP, 200 from Playwright) is a concrete behavioral divergence on the primary new feature that will surface in multi-backend automation pipelines. This warrants a 3 rather than a 4.
  • src/browser/routes/agent.act.ts — the click-coords case diverges in error semantics between the isExistingSession and Playwright paths.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/browser/routes/agent.act.ts
Line: 741-742

Comment:
**Behavioral inconsistency between backends for coordinate clicks**

The existing-session (Chrome MCP) path returns a `404` when `document.elementFromPoint` finds no element at the given coordinates, but the Playwright/CDP path (`clickCoordsViaPlaywright`) always returns `200` — CDP `Input.dispatchMouseEvent` fires unconditionally, regardless of whether an element is present at `(x, y)`.

A caller hitting the same `/act` endpoint with `kind: "click-coords"` and coordinates on empty space will receive `404` on existing-session profiles and `200` on Playwright-attached profiles. This silent divergence can cause hard-to-debug test or automation failures when the same script is used across session types.

Consider either:
- Removing the `!result.success` guard on the Chrome MCP path so both backends return `200` (and let the caller decide if the click had no effect), or
- Adding an element-existence check on the Playwright path (e.g., inject a `document.elementFromPoint` evaluation before dispatching CDP events) so both paths agree on 404 semantics.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/browser/chrome-mcp.ts
Line: 331-348

Comment:
**`bootstrapChromeMcpDevtools` stops trying all remaining commands after first failure**

When all 4 argument variants fail for a command, the outer loop `break`s immediately. This means if `Target.setAutoAttach` fails (all 4 variants), neither `Runtime.enable` nor `Log.enable` will be attempted — even though they are independent CDP commands that could succeed on their own.

In practice this is fine if the reason for failure is a format mismatch (meaning the raw CDP tool exists but none of the 4 argument shapes match, so the other commands would also fail). However, if `Target.setAutoAttach` fails for a domain-specific reason (e.g., not available in the current browser context), `Runtime.enable` and `Log.enable` might still succeed. Consider continuing the loop with `continue` instead of `break` to maximise the bootstrap coverage:

```ts
if (!succeeded) {
  continue; // try remaining independent commands even if this one failed
}
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "browser: add coordinate clicks and act v..." | Re-trigger Greptile

Comment on lines +741 to +742
if (!result.success) {
return jsonError(res, 404, "No element found at the requested coordinates.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Behavioral inconsistency between backends for coordinate clicks

The existing-session (Chrome MCP) path returns a 404 when document.elementFromPoint finds no element at the given coordinates, but the Playwright/CDP path (clickCoordsViaPlaywright) always returns 200 — CDP Input.dispatchMouseEvent fires unconditionally, regardless of whether an element is present at (x, y).

A caller hitting the same /act endpoint with kind: "click-coords" and coordinates on empty space will receive 404 on existing-session profiles and 200 on Playwright-attached profiles. This silent divergence can cause hard-to-debug test or automation failures when the same script is used across session types.

Consider either:

  • Removing the !result.success guard on the Chrome MCP path so both backends return 200 (and let the caller decide if the click had no effect), or
  • Adding an element-existence check on the Playwright path (e.g., inject a document.elementFromPoint evaluation before dispatching CDP events) so both paths agree on 404 semantics.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/browser/routes/agent.act.ts
Line: 741-742

Comment:
**Behavioral inconsistency between backends for coordinate clicks**

The existing-session (Chrome MCP) path returns a `404` when `document.elementFromPoint` finds no element at the given coordinates, but the Playwright/CDP path (`clickCoordsViaPlaywright`) always returns `200` — CDP `Input.dispatchMouseEvent` fires unconditionally, regardless of whether an element is present at `(x, y)`.

A caller hitting the same `/act` endpoint with `kind: "click-coords"` and coordinates on empty space will receive `404` on existing-session profiles and `200` on Playwright-attached profiles. This silent divergence can cause hard-to-debug test or automation failures when the same script is used across session types.

Consider either:
- Removing the `!result.success` guard on the Chrome MCP path so both backends return `200` (and let the caller decide if the click had no effect), or
- Adding an element-existence check on the Playwright path (e.g., inject a `document.elementFromPoint` evaluation before dispatching CDP events) so both paths agree on 404 semantics.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread src/browser/chrome-mcp.ts
Comment on lines +331 to +348
for (const command of commands) {
let succeeded = false;
for (const buildArgs of callVariants) {
try {
await client.callTool({
name: rawToolName,
arguments: buildArgs(command.method, command.params),
});
succeeded = true;
break;
} catch {
// Best-effort bootstrap only. Current chrome-devtools-mcp builds usually
// do not expose a raw CDP tool, so failures here should not break attach.
}
}
if (!succeeded) {
break;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 bootstrapChromeMcpDevtools stops trying all remaining commands after first failure

When all 4 argument variants fail for a command, the outer loop breaks immediately. This means if Target.setAutoAttach fails (all 4 variants), neither Runtime.enable nor Log.enable will be attempted — even though they are independent CDP commands that could succeed on their own.

In practice this is fine if the reason for failure is a format mismatch (meaning the raw CDP tool exists but none of the 4 argument shapes match, so the other commands would also fail). However, if Target.setAutoAttach fails for a domain-specific reason (e.g., not available in the current browser context), Runtime.enable and Log.enable might still succeed. Consider continuing the loop with continue instead of break to maximise the bootstrap coverage:

if (!succeeded) {
  continue; // try remaining independent commands even if this one failed
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/browser/chrome-mcp.ts
Line: 331-348

Comment:
**`bootstrapChromeMcpDevtools` stops trying all remaining commands after first failure**

When all 4 argument variants fail for a command, the outer loop `break`s immediately. This means if `Target.setAutoAttach` fails (all 4 variants), neither `Runtime.enable` nor `Log.enable` will be attempted — even though they are independent CDP commands that could succeed on their own.

In practice this is fine if the reason for failure is a format mismatch (meaning the raw CDP tool exists but none of the 4 argument shapes match, so the other commands would also fail). However, if `Target.setAutoAttach` fails for a domain-specific reason (e.g., not available in the current browser context), `Runtime.enable` and `Log.enable` might still succeed. Consider continuing the loop with `continue` instead of `break` to maximise the bootstrap coverage:

```ts
if (!succeeded) {
  continue; // try remaining independent commands even if this one failed
}
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e84d20509

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/browser/chrome-mcp.ts
Comment on lines +670 to +673
dispatch("mousedown", 1);
dispatch("mouseup", 1);
dispatch("click", 1);
if (${params.doubleClick ? "true" : "false"}) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect requested mouse button for coordinate clicks

In clickChromeMcpCoords, the existing-session path always dispatches click (and optionally dblclick) regardless of the requested button, so button:"right"/"middle" does not produce native right/middle-click behavior (e.g. contextmenu/auxclick) and can trigger the wrong handlers while still returning success. This affects any /act click-coords call on Chrome MCP profiles that sets a non-left button.

Useful? React with 👍 / 👎.

Comment on lines +592 to +595
const verifyWith = parseVerifyWith(body.verifyWith);
if (Object.hasOwn(body, "verifyWith") && !verifyWith) {
return jsonError(res, 400, "verifyWith must be screenshot");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject or fulfill verifyWith on close actions

verifyWith is accepted at request validation for all action kinds, but the close branch still returns directly without using sendActSuccess, so /act with { kind: "close", verifyWith: "screenshot" } silently drops verification and returns a normal success payload. Clients that rely on verification evidence will misinterpret this as a verified action; this should either be explicitly rejected for close or implemented before closing.

Useful? React with 👍 / 👎.

@steipete
Copy link
Copy Markdown
Contributor

Codex deep review: useful feature idea, but this branch is stale and too broad for current main.

What it is trying to solve:

  • add coordinate clicks for cases where ref/selector targeting is not enough
  • add optional post-act screenshot verification/artifact capture
  • improve existing-session interactions by scrolling refs into view before click/type

Current main evidence:

  • Browser code has moved under extensions/browser/src/browser/**; this PR still edits old src/browser/** paths.
  • extensions/browser/src/browser/routes/agent.act.normalize.ts still only accepts click by ref or selector; no click-coords action exists on main.
  • Existing-session interaction already has separate navigation/stale-element guard coverage now, so this should be rebased around the current extensions/browser/src/browser/routes/agent.act.ts path and existing-session limit files.

Best fix shape:

  • Split this into smaller PRs: coordinate click first; verification artifact capture second; Chrome MCP bootstrap only if current dependency still exposes a raw CDP tool worth using.
  • For coordinate click, make backend semantics identical. Either both Playwright/CDP and Chrome MCP return success for empty-space clicks, or both check document.elementFromPoint and return the same error shape.
  • Avoid per-subaction verifyWith schema unless executeSingleAction actually consumes it; otherwise keep verification top-level only.
  • Tests should target current extension paths and cover both Playwright/local and existing-session behavior.

I would not merge this branch as-is. I also would not close it as fixed: the coordinate-click capability is still absent on current main, but the implementation needs a current-path rewrite and narrower review surface.

@steipete
Copy link
Copy Markdown
Contributor

Codex review follow-up: I landed the useful coordinate-click slice from this PR on current main as 209d50b52c (feat(browser): add coordinate click action).

What landed:

  • browser({ action: "act", kind: "clickCoords", x, y }) for managed and existing-session profiles.
  • openclaw browser click-coords <x> <y> CLI support.
  • Docs/changelog coverage plus focused regression tests.

I kept this narrower than the stale branch: the broader act verification/bootstrap pieces need a separate current-main design pass. Closing this PR because its source paths are obsolete and the remaining diff is no longer directly mergeable. Thanks @dluttz; the landing commit keeps co-author credit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants