Grounding by jdchawla29 · Pull Request #113 · hud-evals/hud-python

jdchawla29 · 2025-09-10T19:01:42Z

This pull request introduces a new "grounded agent" architecture that separates visual grounding (element detection) from high-level reasoning, enabling more robust and modular browser automation. The main additions are a new agent class (GroundedOpenAIChatAgent), supporting configuration and tooling for grounding, and an example script demonstrating usage. It also updates the browser agent Docker image version and adds support for OpenRouter API keys.

Grounded Agent Architecture:

Added GroundedOpenAIChatAgent in hud/agents/grounded_openai.py, which uses a dedicated vision model for grounding UI elements and a planning model (e.g., GPT-4o) for reasoning. The agent exposes a synthetic "computer" tool, intercepts tool calls for grounding, and separates screenshot management.
Introduced the hud/tools/grounding module with Grounder, GroundedComputerTool, and GrounderConfig for managing grounding model configuration and execution. [1] [2]

Example and Configuration:

Added examples/grounded_agent.py to demonstrate the grounded agent workflow, including environment setup, model configuration, and a sample form-filling task using separated vision and reasoning.
Updated hud/settings.py to support the OPENROUTER_API_KEY for OpenRouter-based vision models.

Maintenance:

Updated the Docker image version for the browser agent in examples/03_browser_agent_loop.py from 0.1.2 to 0.1.3 to ensure compatibility with the grounded agent.

- Introduced GroundedOpenAIChatAgent for separating visual grounding from reasoning. - Added GroundedComputerTool for resolving element descriptions to coordinates. - Implemented Grounder class for API calls to grounding models and coordinate parsing. - Created configuration and initialization for grounding models. - Updated Docker image version in browser agent example. - Added new grounding examples and configurations in the tools module.

promptless · 2025-09-10T19:12:03Z

📝 Documentation updates detected!

New suggestion: Add comprehensive grounded agent architecture documentation for PR #113

hud/tools/grounding/grounded_tool.py

Co-authored-by: jaideep <jaideep@hud.so>

jdchawla29 added 3 commits September 10, 2025 09:48

Refactor grounded agent and tools

11fd858

notebook

50850f2

This comment was marked as outdated.

Sign in to view

jdchawla29 added 4 commits September 10, 2025 12:24

ruff

068e498

based trace

121d6a8

ruff

3ac726e

pyright and ruff

f4e73a6

This comment was marked as outdated.

Sign in to view

jdchawla29 added 3 commits September 10, 2025 15:20

run datasets with openai chat agents

8a8e61d

error handling in coordinate parsing

bc63bd7

ruff

18d9566

jdchawla29 requested a review from Parth220 September 10, 2025 22:38

fix: ruff and pyright

df2f263

This comment was marked as outdated.

Sign in to view

Parth220 approved these changes Sep 11, 2025

View reviewed changes

hud/tools/grounding/grounded_tool.py Show resolved Hide resolved

Refactor: Use tuples for coordinates in GroundedComputerTool

27191f4

Co-authored-by: jaideep <jaideep@hud.so>

This comment was marked as outdated.

Sign in to view

cursoragent and others added 2 commits September 11, 2025 00:45

Add tests for GroundedOpenAIChatAgent and GroundedComputerTool

e3b2b91

Co-authored-by: jaideep <jaideep@hud.so>

format tests

abe8d90

jdchawla29 merged commit c16e48e into main Sep 11, 2025
8 of 11 checks passed

jdchawla29 deleted the grounding branch September 11, 2025 00:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grounding#113

Grounding#113
jdchawla29 merged 14 commits intomainfrom
grounding

jdchawla29 commented Sep 10, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

promptless bot commented Sep 10, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jdchawla29 commented Sep 10, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

promptless bot commented Sep 10, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants