browser: add coordinate clicks and act verification#54452
browser: add coordinate clicks and act verification#54452dluttz wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds two primary capabilities to the browser automation layer: coordinate-based clicks ( Key observations:
Confidence Score: 3/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/browser/routes/agent.act.ts
Line: 741-742
Comment:
**Behavioral inconsistency between backends for coordinate clicks**
The existing-session (Chrome MCP) path returns a `404` when `document.elementFromPoint` finds no element at the given coordinates, but the Playwright/CDP path (`clickCoordsViaPlaywright`) always returns `200` — CDP `Input.dispatchMouseEvent` fires unconditionally, regardless of whether an element is present at `(x, y)`.
A caller hitting the same `/act` endpoint with `kind: "click-coords"` and coordinates on empty space will receive `404` on existing-session profiles and `200` on Playwright-attached profiles. This silent divergence can cause hard-to-debug test or automation failures when the same script is used across session types.
Consider either:
- Removing the `!result.success` guard on the Chrome MCP path so both backends return `200` (and let the caller decide if the click had no effect), or
- Adding an element-existence check on the Playwright path (e.g., inject a `document.elementFromPoint` evaluation before dispatching CDP events) so both paths agree on 404 semantics.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/browser/chrome-mcp.ts
Line: 331-348
Comment:
**`bootstrapChromeMcpDevtools` stops trying all remaining commands after first failure**
When all 4 argument variants fail for a command, the outer loop `break`s immediately. This means if `Target.setAutoAttach` fails (all 4 variants), neither `Runtime.enable` nor `Log.enable` will be attempted — even though they are independent CDP commands that could succeed on their own.
In practice this is fine if the reason for failure is a format mismatch (meaning the raw CDP tool exists but none of the 4 argument shapes match, so the other commands would also fail). However, if `Target.setAutoAttach` fails for a domain-specific reason (e.g., not available in the current browser context), `Runtime.enable` and `Log.enable` might still succeed. Consider continuing the loop with `continue` instead of `break` to maximise the bootstrap coverage:
```ts
if (!succeeded) {
continue; // try remaining independent commands even if this one failed
}
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "browser: add coordinate clicks and act v..." | Re-trigger Greptile |
| if (!result.success) { | ||
| return jsonError(res, 404, "No element found at the requested coordinates."); |
There was a problem hiding this comment.
Behavioral inconsistency between backends for coordinate clicks
The existing-session (Chrome MCP) path returns a 404 when document.elementFromPoint finds no element at the given coordinates, but the Playwright/CDP path (clickCoordsViaPlaywright) always returns 200 — CDP Input.dispatchMouseEvent fires unconditionally, regardless of whether an element is present at (x, y).
A caller hitting the same /act endpoint with kind: "click-coords" and coordinates on empty space will receive 404 on existing-session profiles and 200 on Playwright-attached profiles. This silent divergence can cause hard-to-debug test or automation failures when the same script is used across session types.
Consider either:
- Removing the
!result.successguard on the Chrome MCP path so both backends return200(and let the caller decide if the click had no effect), or - Adding an element-existence check on the Playwright path (e.g., inject a
document.elementFromPointevaluation before dispatching CDP events) so both paths agree on 404 semantics.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/browser/routes/agent.act.ts
Line: 741-742
Comment:
**Behavioral inconsistency between backends for coordinate clicks**
The existing-session (Chrome MCP) path returns a `404` when `document.elementFromPoint` finds no element at the given coordinates, but the Playwright/CDP path (`clickCoordsViaPlaywright`) always returns `200` — CDP `Input.dispatchMouseEvent` fires unconditionally, regardless of whether an element is present at `(x, y)`.
A caller hitting the same `/act` endpoint with `kind: "click-coords"` and coordinates on empty space will receive `404` on existing-session profiles and `200` on Playwright-attached profiles. This silent divergence can cause hard-to-debug test or automation failures when the same script is used across session types.
Consider either:
- Removing the `!result.success` guard on the Chrome MCP path so both backends return `200` (and let the caller decide if the click had no effect), or
- Adding an element-existence check on the Playwright path (e.g., inject a `document.elementFromPoint` evaluation before dispatching CDP events) so both paths agree on 404 semantics.
How can I resolve this? If you propose a fix, please make it concise.| for (const command of commands) { | ||
| let succeeded = false; | ||
| for (const buildArgs of callVariants) { | ||
| try { | ||
| await client.callTool({ | ||
| name: rawToolName, | ||
| arguments: buildArgs(command.method, command.params), | ||
| }); | ||
| succeeded = true; | ||
| break; | ||
| } catch { | ||
| // Best-effort bootstrap only. Current chrome-devtools-mcp builds usually | ||
| // do not expose a raw CDP tool, so failures here should not break attach. | ||
| } | ||
| } | ||
| if (!succeeded) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
bootstrapChromeMcpDevtools stops trying all remaining commands after first failure
When all 4 argument variants fail for a command, the outer loop breaks immediately. This means if Target.setAutoAttach fails (all 4 variants), neither Runtime.enable nor Log.enable will be attempted — even though they are independent CDP commands that could succeed on their own.
In practice this is fine if the reason for failure is a format mismatch (meaning the raw CDP tool exists but none of the 4 argument shapes match, so the other commands would also fail). However, if Target.setAutoAttach fails for a domain-specific reason (e.g., not available in the current browser context), Runtime.enable and Log.enable might still succeed. Consider continuing the loop with continue instead of break to maximise the bootstrap coverage:
if (!succeeded) {
continue; // try remaining independent commands even if this one failed
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/browser/chrome-mcp.ts
Line: 331-348
Comment:
**`bootstrapChromeMcpDevtools` stops trying all remaining commands after first failure**
When all 4 argument variants fail for a command, the outer loop `break`s immediately. This means if `Target.setAutoAttach` fails (all 4 variants), neither `Runtime.enable` nor `Log.enable` will be attempted — even though they are independent CDP commands that could succeed on their own.
In practice this is fine if the reason for failure is a format mismatch (meaning the raw CDP tool exists but none of the 4 argument shapes match, so the other commands would also fail). However, if `Target.setAutoAttach` fails for a domain-specific reason (e.g., not available in the current browser context), `Runtime.enable` and `Log.enable` might still succeed. Consider continuing the loop with `continue` instead of `break` to maximise the bootstrap coverage:
```ts
if (!succeeded) {
continue; // try remaining independent commands even if this one failed
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0e84d20509
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| dispatch("mousedown", 1); | ||
| dispatch("mouseup", 1); | ||
| dispatch("click", 1); | ||
| if (${params.doubleClick ? "true" : "false"}) { |
There was a problem hiding this comment.
Respect requested mouse button for coordinate clicks
In clickChromeMcpCoords, the existing-session path always dispatches click (and optionally dblclick) regardless of the requested button, so button:"right"/"middle" does not produce native right/middle-click behavior (e.g. contextmenu/auxclick) and can trigger the wrong handlers while still returning success. This affects any /act click-coords call on Chrome MCP profiles that sets a non-left button.
Useful? React with 👍 / 👎.
| const verifyWith = parseVerifyWith(body.verifyWith); | ||
| if (Object.hasOwn(body, "verifyWith") && !verifyWith) { | ||
| return jsonError(res, 400, "verifyWith must be screenshot"); | ||
| } |
There was a problem hiding this comment.
Reject or fulfill verifyWith on close actions
verifyWith is accepted at request validation for all action kinds, but the close branch still returns directly without using sendActSuccess, so /act with { kind: "close", verifyWith: "screenshot" } silently drops verification and returns a normal success payload. Clients that rely on verification evidence will misinterpret this as a verified action; this should either be explicitly rejected for close or implemented before closing.
Useful? React with 👍 / 👎.
|
Codex deep review: useful feature idea, but this branch is stale and too broad for current What it is trying to solve:
Current
Best fix shape:
I would not merge this branch as-is. I also would not close it as fixed: the coordinate-click capability is still absent on current |
|
Codex review follow-up: I landed the useful coordinate-click slice from this PR on current What landed:
I kept this narrower than the stale branch: the broader act verification/bootstrap pieces need a separate current-main design pass. Closing this PR because its source paths are obsolete and the remaining diff is no longer directly mergeable. Thanks @dluttz; the landing commit keeps co-author credit. |
No description provided.