Skip to content

Codex/multiprocess browser improvements#41

Merged
softpudding merged 15 commits intomainfrom
codex/multiprocess-browser-improvements
Mar 22, 2026
Merged

Codex/multiprocess browser improvements#41
softpudding merged 15 commits intomainfrom
codex/multiprocess-browser-improvements

Conversation

@softpudding
Copy link
Copy Markdown
Owner

Summary

This branch started as a per-conversation multi-process isolation change, but it also includes a broader set of browser automation, prompt, and evaluation improvements that landed along the way.

At a high level, this PR:

  • adds worker-process execution for agent messages and browser command handling
  • improves screenshot and highlight stability for background-tab automation
  • extends element interaction support with swipe/carousel handling
  • adds a new BlueBook evaluation site and datasets
  • refreshes prompt guidance, model/tool selection behavior, and dependency pins
  • fixes evaluation cost extraction and updates the checked-in evaluation report

Main Changes

1. Per-conversation multi-process execution

  • Added optional multi-process mode in the agent manager, with one worker process per conversation.
  • Introduced ProcessManager and BrowserExecutorBundle to encapsulate process lifecycle, worker queues, agent manager initialization, and command execution in workers.
  • Updated agent message processing so SSE can stream from worker queues instead of only in-process threads.
  • Added worker-side handling for:
    • agent message execution
    • browser command execution
    • pause control
    • clean shutdown
  • Updated session/status handling around worker execution and disconnect paths.
  • Added unit coverage for multi-process API flow, process lifecycle, bundle behavior, and related command/model cases.

Key files:
server/agent/manager.py
server/agent/api.py
server/api/routes/agent.py
server/core/process_manager.py
server/core/browser_executor_bundle.py

2. Highlight readiness and screenshot stability

  • Reworked highlight readiness to use a snapshot-first approach instead of relying on page-side polling loops that can be throttled in background tabs.
  • Added consistency/stability handling around highlight capture and screenshot timing.
  • Improved screenshot capture by waiting for page settle before capture, including fonts, viewport mutations, media readiness, and quiet windows.
  • Adjusted collision/highlight behavior so element_type="any" better surfaces scrollable regions and remains more stable visually.
  • Documented the new highlight-readiness design in project docs.

Key files:
extension/src/commands/highlight-detection.injected.js
extension/src/commands/highlight-detection.ts
extension/src/commands/screenshot.ts
extension/src/utils/layout-stability.ts
extension/src/utils/collision-detection.ts
AGENTS.md

3. Element interaction improvements, including swipe support

  • Added swipe/carousel interaction support for swipable regions.
  • Updated element-interaction prompts and tool guidance so the agent can distinguish scrollable containers from swipable carousel/slider regions.
  • Fixed async JavaScript result handling for click flows where dialogs can open during JS-driven interactions, avoiding false failures.
  • Expanded command/type support so swipe is treated as a first-class interaction.

Key files:
extension/src/commands/element-actions.ts
extension/src/background/index.ts
server/models/commands.py
server/agent/prompts/element_interaction_tool.j2
server/agent/prompts/highlight_tool.j2
server/agent/prompts/tab_tool.j2

4. Prompt and model/tooling refinements

  • Refined highlight prompt guidance, especially around when to use keywords vs pagination.
  • Tightened JavaScript prompt guidance so JS is positioned as a narrow system-specific fallback instead of a primary interaction path.
  • Updated prompt context/tool profile behavior to better match model tier and browser tool availability.
  • Bumped openhands-sdk / openhands-tools dependency revisions and refreshed the lockfile.

Key files:
server/agent/prompts/highlight_tool.j2
server/agent/prompts/javascript_tool.j2
server/agent/tools/base.py
server/agent/tools/browser_executor.py
server/agent/tools/prompt_context.py
pyproject.toml
uv.lock

5. Evaluation expansion and maintenance

  • Added the new BlueBook evaluation site, assets, and two datasets:
    • bluebook_simple
    • bluebook_complex
  • Updated the eval server/docs to expose the new scenario.
  • Refreshed the checked-in evaluation report.
  • Fixed evaluation cost extraction to use the final usage_metrics SSE snapshot rather than the first one, which was underreporting costs.
  • Recomputed eval/evaluation_report.json from the saved SSE output.

Key files:
eval/bluebook/index.html
eval/bluebook/js/bluebook.js
eval/bluebook/css/bluebook.css
eval/dataset/bluebook_simple.yaml
eval/dataset/bluebook_complex.yaml
eval/server.py
eval/README.md
eval/evaluate_browser_agent.py
eval/evaluation_report.json

Testing

Added or updated tests for:

  • multi-process agent API behavior
  • process manager / worker lifecycle
  • browser executor bundle behavior
  • command model coverage
  • prompt/profile behavior
  • highlight detection and layout stability
  • screenshot capture behavior
  • element-action regression coverage
  • eval client cost extraction behavior

Representative test files:
server/tests/unit/test_agent_api_multiprocess.py
server/tests/unit/test_agent_manager_process.py
server/tests/unit/test_browser_executor_bundle.py
server/tests/unit/test_eval_client.py
extension/src/__tests__/highlight-detection.test.ts
extension/src/__tests__/highlight-layout-stability.test.ts
extension/src/__tests__/element-actions-regression.test.ts

Notes

This branch also contains a few non-core artifacts alongside product changes, including:

  • checked-in eval output/report updates
  • lock files under eval/.locks/
  • bug_report_highlight_any_scrollable.md

@softpudding softpudding merged commit 5fbd1cd into main Mar 22, 2026
4 checks passed
@softpudding softpudding deleted the codex/multiprocess-browser-improvements branch March 22, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant