Codex/multiprocess browser improvements by softpudding · Pull Request #41 · softpudding/OpenBrowser

softpudding · 2026-03-22T14:21:24Z

Summary

This branch started as a per-conversation multi-process isolation change, but it also includes a broader set of browser automation, prompt, and evaluation improvements that landed along the way.

At a high level, this PR:

adds worker-process execution for agent messages and browser command handling
improves screenshot and highlight stability for background-tab automation
extends element interaction support with swipe/carousel handling
adds a new BlueBook evaluation site and datasets
refreshes prompt guidance, model/tool selection behavior, and dependency pins
fixes evaluation cost extraction and updates the checked-in evaluation report

Main Changes

1. Per-conversation multi-process execution

Added optional multi-process mode in the agent manager, with one worker process per conversation.
Introduced ProcessManager and BrowserExecutorBundle to encapsulate process lifecycle, worker queues, agent manager initialization, and command execution in workers.
Updated agent message processing so SSE can stream from worker queues instead of only in-process threads.
Added worker-side handling for:
- agent message execution
- browser command execution
- pause control
- clean shutdown
Updated session/status handling around worker execution and disconnect paths.
Added unit coverage for multi-process API flow, process lifecycle, bundle behavior, and related command/model cases.

Key files:
server/agent/manager.py
server/agent/api.py
server/api/routes/agent.py
server/core/process_manager.py
server/core/browser_executor_bundle.py

2. Highlight readiness and screenshot stability

Reworked highlight readiness to use a snapshot-first approach instead of relying on page-side polling loops that can be throttled in background tabs.
Added consistency/stability handling around highlight capture and screenshot timing.
Improved screenshot capture by waiting for page settle before capture, including fonts, viewport mutations, media readiness, and quiet windows.
Adjusted collision/highlight behavior so element_type="any" better surfaces scrollable regions and remains more stable visually.
Documented the new highlight-readiness design in project docs.

Key files:
extension/src/commands/highlight-detection.injected.js
extension/src/commands/highlight-detection.ts
extension/src/commands/screenshot.ts
extension/src/utils/layout-stability.ts
extension/src/utils/collision-detection.ts
AGENTS.md

3. Element interaction improvements, including swipe support

Added swipe/carousel interaction support for swipable regions.
Updated element-interaction prompts and tool guidance so the agent can distinguish scrollable containers from swipable carousel/slider regions.
Fixed async JavaScript result handling for click flows where dialogs can open during JS-driven interactions, avoiding false failures.
Expanded command/type support so swipe is treated as a first-class interaction.

Key files:
extension/src/commands/element-actions.ts
extension/src/background/index.ts
server/models/commands.py
server/agent/prompts/element_interaction_tool.j2
server/agent/prompts/highlight_tool.j2
server/agent/prompts/tab_tool.j2

4. Prompt and model/tooling refinements

Refined highlight prompt guidance, especially around when to use keywords vs pagination.
Tightened JavaScript prompt guidance so JS is positioned as a narrow system-specific fallback instead of a primary interaction path.
Updated prompt context/tool profile behavior to better match model tier and browser tool availability.
Bumped openhands-sdk / openhands-tools dependency revisions and refreshed the lockfile.

Key files:
server/agent/prompts/highlight_tool.j2
server/agent/prompts/javascript_tool.j2
server/agent/tools/base.py
server/agent/tools/browser_executor.py
server/agent/tools/prompt_context.py
pyproject.toml
uv.lock

5. Evaluation expansion and maintenance

Added the new BlueBook evaluation site, assets, and two datasets:
- bluebook_simple
- bluebook_complex
Updated the eval server/docs to expose the new scenario.
Refreshed the checked-in evaluation report.
Fixed evaluation cost extraction to use the final usage_metrics SSE snapshot rather than the first one, which was underreporting costs.
Recomputed eval/evaluation_report.json from the saved SSE output.

Key files:
eval/bluebook/index.html
eval/bluebook/js/bluebook.js
eval/bluebook/css/bluebook.css
eval/dataset/bluebook_simple.yaml
eval/dataset/bluebook_complex.yaml
eval/server.py
eval/README.md
eval/evaluate_browser_agent.py
eval/evaluation_report.json

Testing

Added or updated tests for:

multi-process agent API behavior
process manager / worker lifecycle
browser executor bundle behavior
command model coverage
prompt/profile behavior
highlight detection and layout stability
screenshot capture behavior
element-action regression coverage
eval client cost extraction behavior

Representative test files:
server/tests/unit/test_agent_api_multiprocess.py
server/tests/unit/test_agent_manager_process.py
server/tests/unit/test_browser_executor_bundle.py
server/tests/unit/test_eval_client.py
extension/src/__tests__/highlight-detection.test.ts
extension/src/__tests__/highlight-layout-stability.test.ts
extension/src/__tests__/element-actions-regression.test.ts

Notes

This branch also contains a few non-core artifacts alongside product changes, including:

checked-in eval output/report updates
lock files under eval/.locks/
bug_report_highlight_any_scrollable.md

softpudding added 15 commits March 21, 2026 21:33

Run conversation messages in worker processes

5e92b28

Avoid blocking pause on client disconnect

122adee

Stabilize tab view screenshots before capture

2f4522f

Add Bluebook eval assets and improve highlight detection

58242b1

Remove accidental evaluation and page artifacts

945878d

Prioritize scrollable containers in any highlight

f453bd2

Use snapshot readiness for highlight stability

8653028

Document highlight readiness design

7d01690

Add swipe carousel interaction support

860d822

refine browser prompt guidance and bump agent-sdk

f6bf0d3

Refresh BlueBook eval scenario

bbef65e

Fix click async JavaScript result handling

8caadbc

update eval results

d9ad8a4

Fix eval usage metric cost extraction

71b6ee1

Fix test expectations and apply formatting

6c4ab2f

softpudding merged commit 5fbd1cd into main Mar 22, 2026
4 checks passed

softpudding deleted the codex/multiprocess-browser-improvements branch March 22, 2026 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex/multiprocess browser improvements#41

Codex/multiprocess browser improvements#41
softpudding merged 15 commits intomainfrom
codex/multiprocess-browser-improvements

softpudding commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

softpudding commented Mar 22, 2026

Summary

Main Changes

1. Per-conversation multi-process execution

2. Highlight readiness and screenshot stability

3. Element interaction improvements, including swipe support

4. Prompt and model/tooling refinements

5. Evaluation expansion and maintenance

Testing

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant