Make OpenHands browser tools optional for non-web datasets#213
Conversation
Two changes to the OpenHands agent pipeline let non-web dataset converters run on machines that do not have browsergym installed: 1. agents/openhands/system_prompt/tools/__init__.py wraps the 'from .browser import BrowserTool' import in try/except ModuleNotFoundError. The except branch only swallows the error when the missing module is browsergym (or a submodule); any unrelated ImportError still propagates. The BrowserTool name is bound to None when browsergym is unavailable. 2. agents/openhands/system_prompt/system.py defers the BrowserTool import to inside the 'if codeact_enable_browsing:' branch of get_tools and switches the remaining tool imports to their direct submodules so the module-level import no longer touches browser.py. 3. agents/openhands/std_to_sft.py lazy-loads scripts.html_to_axtree.HTMLToAXTree behind get_generate_axtree(); it is only constructed when a WebObservation event is actually seen. process_row also threads the existing --is_web CLI flag through to get_system_message(codeact_enable_browsing=is_web) so non-web datasets actually get a non-web system prompt. 4. tests/test_openhands_sft_role_preservation.py loosens its fake get_system_message to '*args, **kwargs' so the new keyword argument used by std_to_sft.py does not break the fake. 5. A new regression test tests/test_optional_browser.py installs a meta_path finder that raises ModuleNotFoundError for any browsergym* import, then asserts that agents.openhands.system_prompt.tools imports cleanly (with BrowserTool is None) and that get_system_message(codeact_enable_browsing=False) returns a prompt that does not advertise BrowserTool. This change was previously duplicated inside two unrelated dataset PRs (#193 CodeScout and #197 jupyter-agent). Lifting it into its own PR removes the duplication and lets those PRs revert to dataset-only diffs. This pull request was prepared by an AI agent (OpenHands) on behalf of the user. Co-authored-by: openhands <openhands@all-hands.dev>
The previous tip of this PR carried a copy of the 'make OpenHands browser tools optional' refactor in agents/openhands/system_prompt/tools/__init__.py, agents/openhands/system_prompt/system.py, agents/openhands/std_to_sft.py, and the tests/test_openhands_sft_role_preservation.py fake. The same diff was duplicated on #193 (CodeScout). That refactor has been extracted to PR #213 ('Make OpenHands browser tools optional for non-web datasets'). Reset those four files to their main-branch state so this PR contains only jupyter-agent dataset changes (datasets/jupyter-agent-dataset/* + README.md + agents/openhands/DATASETS.md catalog row). Once #213 lands and this branch is rebased onto main, the lazy-import semantics will reappear via that PR. Co-authored-by: openhands <openhands@all-hands.dev>
The previous tip of this PR carried a copy of the 'make OpenHands browser tools optional' refactor in agents/openhands/system_prompt/tools/__init__.py, agents/openhands/system_prompt/system.py, agents/openhands/std_to_sft.py, and the tests/test_openhands_sft_role_preservation.py fake. The same diff was duplicated on #197 (jupyter-agent). That refactor has been extracted to PR #213 ('Make OpenHands browser tools optional for non-web datasets'). Reset those four files to their main-branch state so this PR contains only CodeScout dataset changes (datasets/codescout/* + agents/openhands/DATASETS.md catalog row). Once #213 lands and this branch is rebased onto main, the lazy-import semantics will reappear via that PR. Co-authored-by: openhands <openhands@all-hands.dev>
| return self | ||
| return None | ||
|
|
||
| def load_module(self, name): # pragma: no cover - exercised via import machinery |
There was a problem hiding this comment.
🟡 Suggestion: load_module IS exercised by the import machinery when find_module returns self — that's exactly the path that raises ModuleNotFoundError and exercises the try/except in __init__.py. The # pragma: no cover annotation incorrectly excludes a covered (and critical) line from coverage. Remove it.
| def load_module(self, name): # pragma: no cover - exercised via import machinery | |
| def load_module(self, name): |
| generate_axtree = None | ||
|
|
||
|
|
||
| def get_generate_axtree(): |
There was a problem hiding this comment.
🟠 Important (PR description): The PR description's Validation section only shows pytest output. Per the project's evidence policy, unit tests alone do not count as proof that the change works. Please add an Evidence section showing an actual end-to-end invocation — e.g. running std_to_sft.py on a non-web dataset in an environment without browsergym installed, with the resulting output pasted. A link to the originating OpenHands conversation (https://app.all-hands.dev/conversations/{id}) would also satisfy this requirement.
The previous version of tests/test_optional_browser.py reloaded
agents.openhands.system_prompt.tools by monkeypatching sys.meta_path
in-process. That fails in CI because the workflow's requirements.txt
does not install litellm, and reloading the tools package triggers
litellm imports from each tool module's top level (bash.py, finish.py,
etc.).
Two changes:
1. Run the import under a subprocess so the meta_path finder is the
only entry on the fresh interpreter's import path. This avoids
cross-test contamination with any tools modules that may already be
cached in the parent's sys.modules.
2. Add a pytest.importorskip('litellm') guard. The optional-browser
path is only reachable when litellm is installed (the tool modules
import it unconditionally); in environments without litellm the
import chain is broken before the BrowserTool try/except is even
reached, so a regression test there would always fail for an
unrelated reason.
Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed both review comments: 1. 2. Evidence section per project policy — added a full Please re-review. This comment was posted by an AI agent (OpenHands) on behalf of the user. |
There was a problem hiding this comment.
🟡 Acceptable — The lazy-import refactor is clean and correct. The unresolved evidence thread on std_to_sft.py:28 still needs to be addressed before merge: the Validation section only shows pytest output, which does not satisfy the project's evidence policy. Please add an Evidence section with an actual end-to-end invocation (e.g. running std_to_sft.py --is_web=no through the pipeline on a non-web dataset) and its output.
One minor new note on the test blocker below.
This review was generated by an AI agent (OpenHands) on behalf of the user.
| def find_module(self, name, path=None): | ||
| if name.startswith("browsergym"): | ||
| return self | ||
| return None | ||
|
|
||
| def load_module(self, name): | ||
| raise ModuleNotFoundError(f"No module named {name!r}", name=name) |
There was a problem hiding this comment.
🟡 Suggestion: _BlockBrowserGym uses the deprecated PEP 302 find_module/load_module interface (deprecated since Python 3.4). It still works in Python 3.12, but the modern find_spec/exec_module protocol is forward-compatible with future Python versions:
def find_spec(self, fullname, path, target=None):
if fullname.startswith("browsergym"):
import importlib.machinery
return importlib.machinery.ModuleSpec(fullname, self)
return None
def create_module(self, spec):
return None
def exec_module(self, module):
raise ModuleNotFoundError(f"No module named {module.__name__!r}", name=module.__name__)Address inline review on #213: replace the legacy PEP 302 find_module/load_module pair with the modern PEP 451 find_spec/create_module/exec_module trio. The legacy interface is deprecated since Python 3.4 and may be removed in a future release; the new interface is what the import machinery has used internally since 3.4 and is forward-compatible. Also moves the sanity check that the finder fires into the test body and updates the module docstring to reference the new protocol. The test still passes locally with the same exit codes and assertion output; behavior is unchanged. Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed the new review: PEP 451 modernization ( Evidence (PR description) — note that the Re-running the driver locally with the updated PEP 451 finder produces the same output: Re-applying This comment was posted by an AI agent (OpenHands) on behalf of the user. |
|
Addressed both points from the latest review:
Re-running the modernized driver locally: This comment was posted by an AI agent (OpenHands) on behalf of the user. |
There was a problem hiding this comment.
🟢 Good taste — All three previously unresolved threads are resolved in the current code:
- PEP 451 protocol:
test_optional_browser.pynow usesfind_spec/create_module/exec_modulethroughout; the deprecatedfind_module/load_moduleinterface is gone. # pragma: no cover: No such annotation exists in the current file.- Evidence section: The PR description now includes a full end-to-end
Evidencesection with a control run (showing the pre-fix import failure) and a treatment run (showing the converter producing a valid SFT record), satisfying the project's evidence policy.
The lazy-import refactor is clean and correct. No new issues found.
This review was generated by an AI agent (OpenHands) on behalf of the user.
Summary
Extract the lazy-import refactor that was previously duplicated inside PRs #193 (CodeScout) and #197 (jupyter-agent) into its own change so those PRs can revert to dataset-only diffs.
Motivation
Non-web datasets (CodeScout, jupyter-agent, etc.) currently cannot run
agents/openhands/std_to_sft.pyon environments that do not havebrowsergyminstalled, because:agents/openhands/system_prompt/tools/__init__.pydoes an unconditionalfrom .browser import BrowserToolandbrowser.pyimportsbrowsergymat module load.agents/openhands/std_to_sft.pyconstructsHTMLToAXTree(dataset)at module load even when the dataset has noWebObservationevents.The fix is to defer browser-related imports until they are actually needed.
Changes
agents/openhands/system_prompt/tools/__init__.py— Wrap theBrowserToolre-export intry/except ModuleNotFoundError. The handler only swallows the error when the missing module isbrowsergym(or a submodule); any other ImportError still propagates.BrowserToolis bound toNonewhenbrowsergymis unavailable.agents/openhands/system_prompt/system.py— Switch the top-level tool imports from the package__init__to their direct submodules so module load no longer touchesbrowser.py. Deferfrom agents.openhands.system_prompt.tools.browser import BrowserToolto inside theif codeact_enable_browsing:branch ofget_tools.agents/openhands/std_to_sft.py— Lazy-loadscripts.html_to_axtree.HTMLToAXTreebehind aget_generate_axtree()helper; it is only constructed when aWebObservationevent is actually encountered. Also thread the existing--is_webCLI flag intoget_system_message(codeact_enable_browsing=is_web)so non-web datasets actually get a non-web system prompt (today the defaultTrueis always used).tests/test_openhands_sft_role_preservation.py— Loosen the fakeget_system_messageto*args, **kwargsto accept the new keyword argument.tests/test_optional_browser.py(new) — Regression test (skipped whenlitellmis absent) that installs asys.meta_pathfinder which raisesModuleNotFoundErrorfor anybrowsergym*import, then asserts (a)agents.openhands.system_prompt.toolsimports cleanly withBrowserTool is Noneand (b)get_system_message(codeact_enable_browsing=False)returns a prompt that does not advertiseBrowserTool.Validation
python -m pytest tests/→ 183 passed, 12 skipped, 4 warnings.Evidence — end-to-end conversion of a non-web dataset without browsergym
Driver script (full source below): installs a
sys.meta_pathfinder that raisesModuleNotFoundErrorfor anybrowsergym*import, setsMY_DATASET=codeactinstruct(a non-web dataset already in the repo), imports the productionagents.openhands.std_to_sftmodule, and callsmain_with_args(line, is_web=False, api_env=None)on one record fromdatasets/codeactinstruct/sample_std.json.Control run — on
main(without this PR)→ The pipeline fails to import; converter is unusable.
Treatment run — on this branch (
refactor-optional-browser)→ The pipeline runs to completion, returns a well-formed SFT record (verified via
json.loads+ structural assertions onconversations/system), and the system prompt does not advertiseBrowserTool(asserted in the driver).Driver script
Follow-up
Once this is merged, PRs #193 and #197 will be rebased onto
mainto drop their copies of these four files; their diffs should then contain only their respectivedatasets/directory plus theREADME.md/agents/openhands/DATASETS.mdcatalog entries (already done — see #197 and #193).This PR was prepared by an AI agent (OpenHands) on behalf of the user. Originating conversation context is available to the requester.