Skip to content

Grounding#113

Merged
jdchawla29 merged 14 commits intomainfrom
grounding
Sep 11, 2025
Merged

Grounding#113
jdchawla29 merged 14 commits intomainfrom
grounding

Conversation

@jdchawla29
Copy link
Collaborator

This pull request introduces a new "grounded agent" architecture that separates visual grounding (element detection) from high-level reasoning, enabling more robust and modular browser automation. The main additions are a new agent class (GroundedOpenAIChatAgent), supporting configuration and tooling for grounding, and an example script demonstrating usage. It also updates the browser agent Docker image version and adds support for OpenRouter API keys.

Grounded Agent Architecture:

  • Added GroundedOpenAIChatAgent in hud/agents/grounded_openai.py, which uses a dedicated vision model for grounding UI elements and a planning model (e.g., GPT-4o) for reasoning. The agent exposes a synthetic "computer" tool, intercepts tool calls for grounding, and separates screenshot management.
  • Introduced the hud/tools/grounding module with Grounder, GroundedComputerTool, and GrounderConfig for managing grounding model configuration and execution. [1] [2]

Example and Configuration:

  • Added examples/grounded_agent.py to demonstrate the grounded agent workflow, including environment setup, model configuration, and a sample form-filling task using separated vision and reasoning.
  • Updated hud/settings.py to support the OPENROUTER_API_KEY for OpenRouter-based vision models.

Maintenance:

  • Updated the Docker image version for the browser agent in examples/03_browser_agent_loop.py from 0.1.2 to 0.1.3 to ensure compatibility with the grounded agent.

- Introduced GroundedOpenAIChatAgent for separating visual grounding from reasoning.
- Added GroundedComputerTool for resolving element descriptions to coordinates.
- Implemented Grounder class for API calls to grounding models and coordinate parsing.
- Created configuration and initialization for grounding models.
- Updated Docker image version in browser agent example.
- Added new grounding examples and configurations in the tools module.
cursor[bot]

This comment was marked as outdated.

@promptless
Copy link
Contributor

promptless bot commented Sep 10, 2025

📝 Documentation updates detected!

New suggestion: Add comprehensive grounded agent architecture documentation for PR #113

cursor[bot]

This comment was marked as outdated.

@jdchawla29 jdchawla29 requested a review from Parth220 September 10, 2025 22:38
cursor[bot]

This comment was marked as outdated.

Co-authored-by: jaideep <jaideep@hud.so>
cursor[bot]

This comment was marked as outdated.

@jdchawla29 jdchawla29 merged commit c16e48e into main Sep 11, 2025
8 of 11 checks passed
@jdchawla29 jdchawla29 deleted the grounding branch September 11, 2025 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants