Skip to content

Make OpenHands browser tools optional for non-web datasets#213

Merged
neubig merged 3 commits into
mainfrom
refactor-optional-browser
May 18, 2026
Merged

Make OpenHands browser tools optional for non-web datasets#213
neubig merged 3 commits into
mainfrom
refactor-optional-browser

Conversation

@neubig
Copy link
Copy Markdown
Contributor

@neubig neubig commented May 18, 2026

Summary

Extract the lazy-import refactor that was previously duplicated inside PRs #193 (CodeScout) and #197 (jupyter-agent) into its own change so those PRs can revert to dataset-only diffs.

Motivation

Non-web datasets (CodeScout, jupyter-agent, etc.) currently cannot run agents/openhands/std_to_sft.py on environments that do not have browsergym installed, because:

  • agents/openhands/system_prompt/tools/__init__.py does an unconditional from .browser import BrowserTool and browser.py imports browsergym at module load.
  • agents/openhands/std_to_sft.py constructs HTMLToAXTree(dataset) at module load even when the dataset has no WebObservation events.

The fix is to defer browser-related imports until they are actually needed.

Changes

  • agents/openhands/system_prompt/tools/__init__.py — Wrap the BrowserTool re-export in try/except ModuleNotFoundError. The handler only swallows the error when the missing module is browsergym (or a submodule); any other ImportError still propagates. BrowserTool is bound to None when browsergym is unavailable.
  • agents/openhands/system_prompt/system.py — Switch the top-level tool imports from the package __init__ to their direct submodules so module load no longer touches browser.py. Defer from agents.openhands.system_prompt.tools.browser import BrowserTool to inside the if codeact_enable_browsing: branch of get_tools.
  • agents/openhands/std_to_sft.py — Lazy-load scripts.html_to_axtree.HTMLToAXTree behind a get_generate_axtree() helper; it is only constructed when a WebObservation event is actually encountered. Also thread the existing --is_web CLI flag into get_system_message(codeact_enable_browsing=is_web) so non-web datasets actually get a non-web system prompt (today the default True is always used).
  • tests/test_openhands_sft_role_preservation.py — Loosen the fake get_system_message to *args, **kwargs to accept the new keyword argument.
  • tests/test_optional_browser.py (new) — Regression test (skipped when litellm is absent) that installs a sys.meta_path finder which raises ModuleNotFoundError for any browsergym* import, then asserts (a) agents.openhands.system_prompt.tools imports cleanly with BrowserTool is None and (b) get_system_message(codeact_enable_browsing=False) returns a prompt that does not advertise BrowserTool.

Validation

python -m pytest tests/ → 183 passed, 12 skipped, 4 warnings.

Evidence — end-to-end conversion of a non-web dataset without browsergym

Driver script (full source below): installs a sys.meta_path finder that raises ModuleNotFoundError for any browsergym* import, sets MY_DATASET=codeactinstruct (a non-web dataset already in the repo), imports the production agents.openhands.std_to_sft module, and calls main_with_args(line, is_web=False, api_env=None) on one record from datasets/codeactinstruct/sample_std.json.

Control run — on main (without this PR)

[sanity] browsergym blocked as expected: No module named 'browsergym'
Traceback (most recent call last):
  ...
  File ".../agents/openhands/std_to_sft.py", line 14, in <module>
    from agents.openhands.system_prompt.system import get_system_message
  File ".../agents/openhands/system_prompt/system.py", line 3, in <module>
    from agents.openhands.system_prompt.tools import (
  File ".../agents/openhands/system_prompt/tools/__init__.py", line 2, in <module>
    from .browser import BrowserTool
  File ".../agents/openhands/system_prompt/tools/browser.py", line 1, in <module>
    from browsergym.core.action.highlevel import HighLevelActionSet
ModuleNotFoundError: No module named 'browsergym'

→ The pipeline fails to import; converter is unusable.

Treatment run — on this branch (refactor-optional-browser)

[sanity] browsergym blocked as expected: No module named 'browsergym'
OK: produced SFT record with 8 conversation turns
system prompt length: 13399 chars (browser tools omitted)

→ The pipeline runs to completion, returns a well-formed SFT record (verified via json.loads + structural assertions on conversations/system), and the system prompt does not advertise BrowserTool (asserted in the driver).

Driver script

"""Driver: block any browsergym import (PEP 451 finder), then run std_to_sft on one record."""
import importlib.machinery
import sys


class _BlockBrowserGym:
    def find_spec(self, fullname, path=None, target=None):
        if fullname.startswith("browsergym"):
            return importlib.machinery.ModuleSpec(fullname, self)
        return None

    def create_module(self, spec):
        return None

    def exec_module(self, module):
        raise ModuleNotFoundError(
            f"No module named {module.__name__!r}", name=module.__name__
        )


sys.meta_path.insert(0, _BlockBrowserGym())

try:
    import browsergym  # noqa: F401
    print("UNEXPECTED: browsergym imported successfully", file=sys.stderr)
    sys.exit(99)
except ModuleNotFoundError as e:
    print(f"[sanity] browsergym blocked as expected: {e}", file=sys.stderr)

import os
os.environ["MY_DATASET"] = "codeactinstruct"

import importlib
std_to_sft = importlib.import_module("agents.openhands.std_to_sft")

import json
with open("datasets/codeactinstruct/sample_std.json") as f:
    sample = json.load(f)
record_line = json.dumps(sample[0])
out = std_to_sft.main_with_args(record_line, is_web=False, api_env=None)
if not out:
    print("FAIL: std_to_sft.main_with_args returned no output", file=sys.stderr)
    sys.exit(1)
parsed = json.loads(out)
assert "conversations" in parsed and isinstance(parsed["conversations"], list)
assert "system" in parsed
assert "BrowserTool" not in parsed["system"], "system prompt unexpectedly mentions BrowserTool"
print(f"OK: produced SFT record with {len(parsed['conversations'])} conversation turns")
print(f"system prompt length: {len(parsed['system'])} chars (browser tools omitted)")

Follow-up

Once this is merged, PRs #193 and #197 will be rebased onto main to drop their copies of these four files; their diffs should then contain only their respective datasets/ directory plus the README.md/agents/openhands/DATASETS.md catalog entries (already done — see #197 and #193).


This PR was prepared by an AI agent (OpenHands) on behalf of the user. Originating conversation context is available to the requester.

Two changes to the OpenHands agent pipeline let non-web dataset
converters run on machines that do not have browsergym installed:

1. agents/openhands/system_prompt/tools/__init__.py wraps the
   'from .browser import BrowserTool' import in try/except
   ModuleNotFoundError. The except branch only swallows the error when
   the missing module is browsergym (or a submodule); any unrelated
   ImportError still propagates. The BrowserTool name is bound to None
   when browsergym is unavailable.

2. agents/openhands/system_prompt/system.py defers the BrowserTool
   import to inside the 'if codeact_enable_browsing:' branch of
   get_tools and switches the remaining tool imports to their direct
   submodules so the module-level import no longer touches browser.py.

3. agents/openhands/std_to_sft.py lazy-loads
   scripts.html_to_axtree.HTMLToAXTree behind get_generate_axtree(); it
   is only constructed when a WebObservation event is actually seen.
   process_row also threads the existing --is_web CLI flag through to
   get_system_message(codeact_enable_browsing=is_web) so non-web
   datasets actually get a non-web system prompt.

4. tests/test_openhands_sft_role_preservation.py loosens its fake
   get_system_message to '*args, **kwargs' so the new keyword argument
   used by std_to_sft.py does not break the fake.

5. A new regression test tests/test_optional_browser.py installs a
   meta_path finder that raises ModuleNotFoundError for any
   browsergym* import, then asserts that
   agents.openhands.system_prompt.tools imports cleanly (with
   BrowserTool is None) and that
   get_system_message(codeact_enable_browsing=False) returns a prompt
   that does not advertise BrowserTool.

This change was previously duplicated inside two unrelated dataset PRs
(#193 CodeScout and #197 jupyter-agent). Lifting it into its own PR
removes the duplication and lets those PRs revert to dataset-only
diffs.

This pull request was prepared by an AI agent (OpenHands) on behalf of
the user.

Co-authored-by: openhands <openhands@all-hands.dev>
neubig pushed a commit that referenced this pull request May 18, 2026
The previous tip of this PR carried a copy of the
'make OpenHands browser tools optional' refactor in
agents/openhands/system_prompt/tools/__init__.py,
agents/openhands/system_prompt/system.py,
agents/openhands/std_to_sft.py, and the
tests/test_openhands_sft_role_preservation.py fake. The same diff was
duplicated on #193 (CodeScout). That refactor has been extracted to
PR #213 ('Make OpenHands browser tools optional for non-web datasets').

Reset those four files to their main-branch state so this PR contains
only jupyter-agent dataset changes (datasets/jupyter-agent-dataset/* +
README.md + agents/openhands/DATASETS.md catalog row). Once #213 lands
and this branch is rebased onto main, the lazy-import semantics will
reappear via that PR.

Co-authored-by: openhands <openhands@all-hands.dev>
neubig pushed a commit that referenced this pull request May 18, 2026
The previous tip of this PR carried a copy of the
'make OpenHands browser tools optional' refactor in
agents/openhands/system_prompt/tools/__init__.py,
agents/openhands/system_prompt/system.py,
agents/openhands/std_to_sft.py, and the
tests/test_openhands_sft_role_preservation.py fake. The same diff was
duplicated on #197 (jupyter-agent). That refactor has been extracted to
PR #213 ('Make OpenHands browser tools optional for non-web datasets').

Reset those four files to their main-branch state so this PR contains
only CodeScout dataset changes (datasets/codescout/* +
agents/openhands/DATASETS.md catalog row). Once #213 lands and this
branch is rebased onto main, the lazy-import semantics will reappear
via that PR.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable — The lazy-import refactor is clean and correct. One must-fix per the PR evidence policy, plus a minor test annotation issue.

This review was generated by an AI agent (OpenHands) on behalf of the user.

Comment thread tests/test_optional_browser.py Outdated
return self
return None

def load_module(self, name): # pragma: no cover - exercised via import machinery
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: load_module IS exercised by the import machinery when find_module returns self — that's exactly the path that raises ModuleNotFoundError and exercises the try/except in __init__.py. The # pragma: no cover annotation incorrectly excludes a covered (and critical) line from coverage. Remove it.

Suggested change
def load_module(self, name): # pragma: no cover - exercised via import machinery
def load_module(self, name):

generate_axtree = None


def get_generate_axtree():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important (PR description): The PR description's Validation section only shows pytest output. Per the project's evidence policy, unit tests alone do not count as proof that the change works. Please add an Evidence section showing an actual end-to-end invocation — e.g. running std_to_sft.py on a non-web dataset in an environment without browsergym installed, with the resulting output pasted. A link to the originating OpenHands conversation (https://app.all-hands.dev/conversations/{id}) would also satisfy this requirement.

The previous version of tests/test_optional_browser.py reloaded
agents.openhands.system_prompt.tools by monkeypatching sys.meta_path
in-process. That fails in CI because the workflow's requirements.txt
does not install litellm, and reloading the tools package triggers
litellm imports from each tool module's top level (bash.py, finish.py,
etc.).

Two changes:

1. Run the import under a subprocess so the meta_path finder is the
   only entry on the fresh interpreter's import path. This avoids
   cross-test contamination with any tools modules that may already be
   cached in the parent's sys.modules.

2. Add a pytest.importorskip('litellm') guard. The optional-browser
   path is only reachable when litellm is installed (the tool modules
   import it unconditionally); in environments without litellm the
   import chain is broken before the BrowserTool try/except is even
   reached, so a regression test there would always fail for an
   unrelated reason.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig
Copy link
Copy Markdown
Contributor Author

neubig commented May 18, 2026

Addressed both review comments:

1. # pragma: no cover annotation on _BlockBrowserGym.load_module — the suggestion was based on an intermediate state; the pushed version of tests/test_optional_browser.py does not contain that annotation. The current outer code path actually doesn't have a class-level load_module at all (the meta-path finder lives inside the subprocess preamble string), so coverage is not a concern. Verified with grep -n pragma tests/test_optional_browser.py → no matches.

2. Evidence section per project policy — added a full ## Evidence block to the PR description with both a control run (on main, fails with ModuleNotFoundError: No module named 'browsergym' at the import chain) and a treatment run (on this branch, produces a valid 8-turn SFT record with system prompt length 13,399 chars and no BrowserTool mention). The driver script is reproducible and pasted in full so a reviewer can re-run it.

Please re-review.

This comment was posted by an AI agent (OpenHands) on behalf of the user.

@neubig neubig added the review-this Trigger the OpenHands PR review workflow label May 18, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable — The lazy-import refactor is clean and correct. The unresolved evidence thread on std_to_sft.py:28 still needs to be addressed before merge: the Validation section only shows pytest output, which does not satisfy the project's evidence policy. Please add an Evidence section with an actual end-to-end invocation (e.g. running std_to_sft.py --is_web=no through the pipeline on a non-web dataset) and its output.

One minor new note on the test blocker below.

This review was generated by an AI agent (OpenHands) on behalf of the user.

Comment thread tests/test_optional_browser.py Outdated
Comment on lines +47 to +53
def find_module(self, name, path=None):
if name.startswith("browsergym"):
return self
return None

def load_module(self, name):
raise ModuleNotFoundError(f"No module named {name!r}", name=name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: _BlockBrowserGym uses the deprecated PEP 302 find_module/load_module interface (deprecated since Python 3.4). It still works in Python 3.12, but the modern find_spec/exec_module protocol is forward-compatible with future Python versions:

def find_spec(self, fullname, path, target=None):
    if fullname.startswith("browsergym"):
        import importlib.machinery
        return importlib.machinery.ModuleSpec(fullname, self)
    return None

def create_module(self, spec):
    return None

def exec_module(self, module):
    raise ModuleNotFoundError(f"No module named {module.__name__!r}", name=module.__name__)

Address inline review on #213: replace the legacy PEP 302
find_module/load_module pair with the modern PEP 451
find_spec/create_module/exec_module trio. The legacy interface is
deprecated since Python 3.4 and may be removed in a future release; the
new interface is what the import machinery has used internally since
3.4 and is forward-compatible. Also moves the sanity check that the
finder fires into the test body and updates the module docstring to
reference the new protocol.

The test still passes locally with the same exit codes and assertion
output; behavior is unchanged.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig
Copy link
Copy Markdown
Contributor Author

neubig commented May 18, 2026

Addressed the new review:

PEP 451 modernization (tests/test_optional_browser.py L53) — replaced the legacy PEP 302 find_module/load_module pair with the modern PEP 451 find_spec/create_module/exec_module trio in both the test file and the driver-script snippet in the PR description. Updated docstring to reference the new protocol. Test still passes locally with 2/2 green.

Evidence (PR description) — note that the ## Evidence section was added in the previous round; the prior re-review bot may have been looking at a cached version. The current PR body now has ## Summary / ## Motivation / ## Changes / ## Validation / ## Evidence — end-to-end conversion of a non-web dataset without browsergym (### Control / ### Treatment / ### Driver script) / ## Follow-up. The control run shows a real ModuleNotFoundError against main; the treatment run shows a real 8-turn SFT record produced from datasets/codeactinstruct/sample_std.json.

Re-running the driver locally with the updated PEP 451 finder produces the same output:

[sanity] browsergym blocked as expected: No module named 'browsergym'
OK: produced SFT record with 8 conversation turns
system prompt length: 13399 chars (browser tools omitted)

Re-applying review-this label to trigger another review.

This comment was posted by an AI agent (OpenHands) on behalf of the user.

@neubig
Copy link
Copy Markdown
Contributor Author

neubig commented May 18, 2026

Addressed both points from the latest review:

  1. PEP 451 modernization (tests/test_optional_browser.py line 53) — replaced legacy find_module/load_module with find_spec/create_module/exec_module in both the test file (commit 5bf4e33) and the driver snippet in the PR description. Test still green locally (2/2).
  2. Evidence section — confirmed it's now in the PR description (see ## Evidence — end-to-end conversion of a non-web dataset without browsergym with Control/Treatment/Driver subsections). Earlier re-review may have hit a cached body.

Re-running the modernized driver locally:

[sanity] browsergym blocked as expected: No module named 'browsergym'
OK: produced SFT record with 8 conversation turns
system prompt length: 13399 chars (browser tools omitted)

This comment was posted by an AI agent (OpenHands) on behalf of the user.

@neubig neubig added review-this Trigger the OpenHands PR review workflow and removed review-this Trigger the OpenHands PR review workflow labels May 18, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste — All three previously unresolved threads are resolved in the current code:

  • PEP 451 protocol: test_optional_browser.py now uses find_spec/create_module/exec_module throughout; the deprecated find_module/load_module interface is gone.
  • # pragma: no cover: No such annotation exists in the current file.
  • Evidence section: The PR description now includes a full end-to-end Evidence section with a control run (showing the pre-fix import failure) and a treatment run (showing the converter producing a valid SFT record), satisfying the project's evidence policy.

The lazy-import refactor is clean and correct. No new issues found.

This review was generated by an AI agent (OpenHands) on behalf of the user.

@neubig neubig merged commit 40b0489 into main May 18, 2026
6 of 7 checks passed
@neubig neubig deleted the refactor-optional-browser branch May 18, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-this Trigger the OpenHands PR review workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants