Skip to content

fix: route image_qa and self_reflection through the configured model (closes #3)#5

Merged
adamlu123 merged 1 commit into
microsoft:mainfrom
mvanhorn:fix/3-webwright-inner-tool-llm-routing
May 27, 2026
Merged

fix: route image_qa and self_reflection through the configured model (closes #3)#5
adamlu123 merged 1 commit into
microsoft:mainfrom
mvanhorn:fix/3-webwright-inner-tool-llm-routing

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

webwright -c base.yaml -c model_claude.yaml now runs end-to-end with only ANTHROPIC_API_KEY set. The image_qa and self_reflection inner tools route through the configured model registry instead of hardcoding the OpenAI Responses API.

Why this matters

Issue #3 (reported 2026-05-26 by @leachuk) shows the symptom: an Anthropic-only run fails at step 8 with RuntimeError: Missing OPENAI_API_KEY raised from src/webwright/tools/image_qa.py:66. The README and base.yaml:25 say "Export credentials for the chosen backend (e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY)" but the inner tools never honored that contract — the comment at base.yaml:18 explicitly admitted OPENAI_API_KEY (always — used by self_reflection and image_qa tools).

The root cause was two _openai_config(args) helpers in tools/image_qa.py and tools/self_reflection.py reading os.environ.get("OPENAI_API_KEY") and posting directly to the OpenAI Responses API. Browser-use and Stagehand both let every layer of the agent pick any supported model; this change brings webwright to parity.

Changes

  • tools/image_qa.py and tools/self_reflection.py accept --model-config <path> and route the tool's vision call through webwright.models.get_model(...). The existing --api-key / OPENAI_API_KEY path is preserved so direct CLI use of these tools (without the agent loop) still works.
  • agents/default.py writes the resolved model: (or tools.<name>.model:) block to a per-workspace JSON file and passes --model-config <path> to the subprocess invocations.
  • models/base.py adds _complete_text_async and __call__, so model wrappers can be reused for inner-tool calls without going through the agent's full query pipeline. OpenAIModel and OpenRouterModel get a matching _build_text_payload. AnthropicModel already produces text via the same primitives the agent uses.
  • model_claude.yaml declares tools.image_qa.model and tools.self_reflection.model via a YAML anchor, so an Anthropic run inherits Anthropic for inner tools automatically. base.yaml and the README credentials section are updated to reflect the new reality.

Testing

tests/unit/test_tool_model_routing.py (2 tests, both passing) verifies that base.yaml + model_claude.yaml resolves the inner-tool model class to anthropic, and that _extract_model_config falls back to the top-level model: block when no per-tool override is present.

Fixes #3.

…loses microsoft#3)

Issue microsoft#3 reported that 'Fails to run with only ANTHROPIC_API_KEY'. The
image_qa and self_reflection tools hardcoded the OpenAI Responses API
through an _openai_config helper that read OPENAI_API_KEY even when the
agent loop ran on Claude.

This change adds a model_config flag to both tools, threads the resolved
model: block from base.yaml (and model_claude.yaml's anchored tools.*.model
blocks) into the subprocess invocation, and routes through the same
models registry the outer loop uses. The existing OpenAI fallback path
is preserved for direct CLI use.

Closes microsoft#3.
@adamlu123 adamlu123 merged commit c03b7ff into microsoft:main May 27, 2026
1 check passed
adamlu123 added a commit that referenced this pull request May 27, 2026
Follow-up to #5. Consolidates duplicated model-config helpers, drops legacy OpenAI HTTP code paths, and simplifies model_claude.yaml. Net -477 lines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fails to run with only ANTHROPIC_API_KEY

2 participants