browser: add coordinate clicks and act verification by dluttz · Pull Request #54452 · openclaw/openclaw

dluttz · 2026-03-25T12:14:45Z

No description provided.

greptile-apps · 2026-03-25T12:20:00Z

Greptile Summary

This PR adds two primary capabilities to the browser automation layer: coordinate-based clicks (click-coords action) dispatched via CDP Input.dispatchMouseEvent (Playwright path) or document.elementFromPoint + synthetic JS events (Chrome MCP / existing-session path), and act verification — an optional post-action screenshot captured and persisted when verifyWith: "screenshot" is passed. It also adds a pre-action scroll-into-view step before element-based clicks and type actions to handle off-screen targets, a scrollintoview (lowercase) alias, and a best-effort CDP bootstrap hook (bootstrapChromeMcpDevtools) that fires on session creation.

Key observations:

Behavioral inconsistency in click-coords: The Chrome MCP path returns 404 when document.elementFromPoint finds no element at the requested coordinates, while the Playwright/CDP path always returns 200 (CDP events fire unconditionally). A caller using the same endpoint across session types would see different status codes for the same empty-space click.
bootstrapChromeMcpDevtools exits on first command failure: If Target.setAutoAttach cannot be called (all 4 argument variants fail), the loop breaks and Runtime.enable / Log.enable are never attempted, even though those commands are independent. This is labeled best-effort, but continuing rather than breaking would be more thorough.
The verifyWith field added to individual batch sub-action types in BrowserActRequest is never consumed by executeSingleAction; only the top-level verifyWith on a batch action is used. This is not a runtime bug but may mislead future contributors.

Confidence Score: 3/5

Safe to merge after resolving the click-coords 404/200 inconsistency between Chrome MCP and Playwright backends.
The PR is well-tested and the core features (coordinate clicks, scroll-into-view, act verification) are correctly implemented. However, the inconsistent HTTP status for coordinate clicks on empty space (404 from Chrome MCP, 200 from Playwright) is a concrete behavioral divergence on the primary new feature that will surface in multi-backend automation pipelines. This warrants a 3 rather than a 4.
src/browser/routes/agent.act.ts — the click-coords case diverges in error semantics between the isExistingSession and Playwright paths.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/browser/routes/agent.act.ts
Line: 741-742

Comment:
**Behavioral inconsistency between backends for coordinate clicks**

The existing-session (Chrome MCP) path returns a `404` when `document.elementFromPoint` finds no element at the given coordinates, but the Playwright/CDP path (`clickCoordsViaPlaywright`) always returns `200` — CDP `Input.dispatchMouseEvent` fires unconditionally, regardless of whether an element is present at `(x, y)`.

A caller hitting the same `/act` endpoint with `kind: "click-coords"` and coordinates on empty space will receive `404` on existing-session profiles and `200` on Playwright-attached profiles. This silent divergence can cause hard-to-debug test or automation failures when the same script is used across session types.

Consider either:
- Removing the `!result.success` guard on the Chrome MCP path so both backends return `200` (and let the caller decide if the click had no effect), or
- Adding an element-existence check on the Playwright path (e.g., inject a `document.elementFromPoint` evaluation before dispatching CDP events) so both paths agree on 404 semantics.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/browser/chrome-mcp.ts
Line: 331-348

Comment:
**`bootstrapChromeMcpDevtools` stops trying all remaining commands after first failure**

When all 4 argument variants fail for a command, the outer loop `break`s immediately. This means if `Target.setAutoAttach` fails (all 4 variants), neither `Runtime.enable` nor `Log.enable` will be attempted — even though they are independent CDP commands that could succeed on their own.

In practice this is fine if the reason for failure is a format mismatch (meaning the raw CDP tool exists but none of the 4 argument shapes match, so the other commands would also fail). However, if `Target.setAutoAttach` fails for a domain-specific reason (e.g., not available in the current browser context), `Runtime.enable` and `Log.enable` might still succeed. Consider continuing the loop with `continue` instead of `break` to maximise the bootstrap coverage:

```ts
if (!succeeded) {
  continue; // try remaining independent commands even if this one failed
}
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "browser: add coordinate clicks and act v..." | Re-trigger Greptile}

greptile-apps · 2026-03-25T12:20:04Z

+              if (!result.success) {
+                return jsonError(res, 404, "No element found at the requested coordinates.");


Behavioral inconsistency between backends for coordinate clicks

The existing-session (Chrome MCP) path returns a 404 when document.elementFromPoint finds no element at the given coordinates, but the Playwright/CDP path (clickCoordsViaPlaywright) always returns 200 — CDP Input.dispatchMouseEvent fires unconditionally, regardless of whether an element is present at (x, y).

A caller hitting the same /act endpoint with kind: "click-coords" and coordinates on empty space will receive 404 on existing-session profiles and 200 on Playwright-attached profiles. This silent divergence can cause hard-to-debug test or automation failures when the same script is used across session types.

Consider either:

Removing the !result.success guard on the Chrome MCP path so both backends return 200 (and let the caller decide if the click had no effect), or

Adding an element-existence check on the Playwright path (e.g., inject a document.elementFromPoint evaluation before dispatching CDP events) so both paths agree on 404 semantics.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/browser/routes/agent.act.ts Line: 741-742 Comment: **Behavioral inconsistency between backends for coordinate clicks** The existing-session (Chrome MCP) path returns a `404` when `document.elementFromPoint` finds no element at the given coordinates, but the Playwright/CDP path (`clickCoordsViaPlaywright`) always returns `200` — CDP `Input.dispatchMouseEvent` fires unconditionally, regardless of whether an element is present at `(x, y)`. A caller hitting the same `/act` endpoint with `kind: "click-coords"` and coordinates on empty space will receive `404` on existing-session profiles and `200` on Playwright-attached profiles. This silent divergence can cause hard-to-debug test or automation failures when the same script is used across session types. Consider either: - Removing the `!result.success` guard on the Chrome MCP path so both backends return `200` (and let the caller decide if the click had no effect), or - Adding an element-existence check on the Playwright path (e.g., inject a `document.elementFromPoint` evaluation before dispatching CDP events) so both paths agree on 404 semantics. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-25T12:20:05Z

+  for (const command of commands) {
+    let succeeded = false;
+    for (const buildArgs of callVariants) {
+      try {
+        await client.callTool({
+          name: rawToolName,
+          arguments: buildArgs(command.method, command.params),
+        });
+        succeeded = true;
+        break;
+      } catch {
+        // Best-effort bootstrap only. Current chrome-devtools-mcp builds usually
+        // do not expose a raw CDP tool, so failures here should not break attach.
+      }
+    }
+    if (!succeeded) {
+      break;
+    }


bootstrapChromeMcpDevtools stops trying all remaining commands after first failure

When all 4 argument variants fail for a command, the outer loop breaks immediately. This means if Target.setAutoAttach fails (all 4 variants), neither Runtime.enable nor Log.enable will be attempted — even though they are independent CDP commands that could succeed on their own.

In practice this is fine if the reason for failure is a format mismatch (meaning the raw CDP tool exists but none of the 4 argument shapes match, so the other commands would also fail). However, if Target.setAutoAttach fails for a domain-specific reason (e.g., not available in the current browser context), Runtime.enable and Log.enable might still succeed. Consider continuing the loop with continue instead of break to maximise the bootstrap coverage:

if (!succeeded) { continue; // try remaining independent commands even if this one failed }

Prompt To Fix With AI

This is a comment left during a code review. Path: src/browser/chrome-mcp.ts Line: 331-348 Comment: **`bootstrapChromeMcpDevtools` stops trying all remaining commands after first failure** When all 4 argument variants fail for a command, the outer loop `break`s immediately. This means if `Target.setAutoAttach` fails (all 4 variants), neither `Runtime.enable` nor `Log.enable` will be attempted — even though they are independent CDP commands that could succeed on their own. In practice this is fine if the reason for failure is a format mismatch (meaning the raw CDP tool exists but none of the 4 argument shapes match, so the other commands would also fail). However, if `Target.setAutoAttach` fails for a domain-specific reason (e.g., not available in the current browser context), `Runtime.enable` and `Log.enable` might still succeed. Consider continuing the loop with `continue` instead of `break` to maximise the bootstrap coverage: ```ts if (!succeeded) { continue; // try remaining independent commands even if this one failed } ``` How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e84d20509

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-25T12:20:11Z

+      dispatch("mousedown", 1);
+      dispatch("mouseup", 1);
+      dispatch("click", 1);
+      if (${params.doubleClick ? "true" : "false"}) {


Respect requested mouse button for coordinate clicks

In clickChromeMcpCoords, the existing-session path always dispatches click (and optionally dblclick) regardless of the requested button, so button:"right"/"middle" does not produce native right/middle-click behavior (e.g. contextmenu/auxclick) and can trigger the wrong handlers while still returning success. This affects any /act click-coords call on Chrome MCP profiles that sets a non-left button.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-25T12:20:11Z

+    const verifyWith = parseVerifyWith(body.verifyWith);
+    if (Object.hasOwn(body, "verifyWith") && !verifyWith) {
+      return jsonError(res, 400, "verifyWith must be screenshot");
+    }


Reject or fulfill verifyWith on close actions

verifyWith is accepted at request validation for all action kinds, but the close branch still returns directly without using sendActSuccess, so /act with { kind: "close", verifyWith: "screenshot" } silently drops verification and returns a normal success payload. Clients that rely on verification evidence will misinterpret this as a verified action; this should either be explicitly rejected for close or implemented before closing.

Useful? React with 👍 / 👎.

steipete · 2026-04-25T04:28:37Z

Codex deep review: useful feature idea, but this branch is stale and too broad for current main.

What it is trying to solve:

add coordinate clicks for cases where ref/selector targeting is not enough
add optional post-act screenshot verification/artifact capture
improve existing-session interactions by scrolling refs into view before click/type

Current main evidence:

Browser code has moved under extensions/browser/src/browser/**; this PR still edits old src/browser/** paths.
extensions/browser/src/browser/routes/agent.act.normalize.ts still only accepts click by ref or selector; no click-coords action exists on main.
Existing-session interaction already has separate navigation/stale-element guard coverage now, so this should be rebased around the current extensions/browser/src/browser/routes/agent.act.ts path and existing-session limit files.

Best fix shape:

Split this into smaller PRs: coordinate click first; verification artifact capture second; Chrome MCP bootstrap only if current dependency still exposes a raw CDP tool worth using.
For coordinate click, make backend semantics identical. Either both Playwright/CDP and Chrome MCP return success for empty-space clicks, or both check document.elementFromPoint and return the same error shape.
Avoid per-subaction verifyWith schema unless executeSingleAction actually consumes it; otherwise keep verification top-level only.
Tests should target current extension paths and cover both Playwright/local and existing-session behavior.

I would not merge this branch as-is. I also would not close it as fixed: the coordinate-click capability is still absent on current main, but the implementation needs a current-path rewrite and narrower review surface.

steipete · 2026-04-25T06:31:50Z

Codex review follow-up: I landed the useful coordinate-click slice from this PR on current main as 209d50b52c (feat(browser): add coordinate click action).

What landed:

browser({ action: "act", kind: "clickCoords", x, y }) for managed and existing-session profiles.
openclaw browser click-coords <x> <y> CLI support.
Docs/changelog coverage plus focused regression tests.

I kept this narrower than the stale branch: the broader act verification/bootstrap pieces need a separate current-main design pass. Closing this PR because its source paths are obsolete and the remaining diff is no longer directly mergeable. Thanks @dluttz; the landing commit keeps co-author credit.

browser: add coordinate clicks and act verification

0e84d20

openclaw-barnacle Bot added the size: XL label Mar 25, 2026

greptile-apps Bot reviewed Mar 25, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 25, 2026

View reviewed changes

steipete closed this Apr 25, 2026

steipete mentioned this pull request Apr 26, 2026

Add browser action recording and coordinate-based clicking for canvas automation #8429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

browser: add coordinate clicks and act verification#54452

browser: add coordinate clicks and act verification#54452
dluttz wants to merge 1 commit intoopenclaw:mainfrom
dluttz:codex/browser-heygen-editor

dluttz commented Mar 25, 2026

Uh oh!

greptile-apps Bot commented Mar 25, 2026

Uh oh!

greptile-apps Bot Mar 25, 2026

Uh oh!

greptile-apps Bot Mar 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 25, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 25, 2026

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (!result.success) {
		return jsonError(res, 404, "No element found at the requested coordinates.");

Uh oh!

Conversation

dluttz commented Mar 25, 2026

Uh oh!

greptile-apps Bot commented Mar 25, 2026

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants