fix: route image_qa and self_reflection through the configured model (closes #3) by mvanhorn · Pull Request #5 · microsoft/Webwright

mvanhorn · 2026-05-26T13:59:36Z

Summary

webwright -c base.yaml -c model_claude.yaml now runs end-to-end with only ANTHROPIC_API_KEY set. The image_qa and self_reflection inner tools route through the configured model registry instead of hardcoding the OpenAI Responses API.

Why this matters

Issue #3 (reported 2026-05-26 by @leachuk) shows the symptom: an Anthropic-only run fails at step 8 with RuntimeError: Missing OPENAI_API_KEY raised from src/webwright/tools/image_qa.py:66. The README and base.yaml:25 say "Export credentials for the chosen backend (e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY)" but the inner tools never honored that contract — the comment at base.yaml:18 explicitly admitted OPENAI_API_KEY (always — used by self_reflection and image_qa tools).

The root cause was two _openai_config(args) helpers in tools/image_qa.py and tools/self_reflection.py reading os.environ.get("OPENAI_API_KEY") and posting directly to the OpenAI Responses API. Browser-use and Stagehand both let every layer of the agent pick any supported model; this change brings webwright to parity.

Changes

tools/image_qa.py and tools/self_reflection.py accept --model-config <path> and route the tool's vision call through webwright.models.get_model(...). The existing --api-key / OPENAI_API_KEY path is preserved so direct CLI use of these tools (without the agent loop) still works.
agents/default.py writes the resolved model: (or tools.<name>.model:) block to a per-workspace JSON file and passes --model-config <path> to the subprocess invocations.
models/base.py adds _complete_text_async and __call__, so model wrappers can be reused for inner-tool calls without going through the agent's full query pipeline. OpenAIModel and OpenRouterModel get a matching _build_text_payload. AnthropicModel already produces text via the same primitives the agent uses.
model_claude.yaml declares tools.image_qa.model and tools.self_reflection.model via a YAML anchor, so an Anthropic run inherits Anthropic for inner tools automatically. base.yaml and the README credentials section are updated to reflect the new reality.

Testing

tests/unit/test_tool_model_routing.py (2 tests, both passing) verifies that base.yaml + model_claude.yaml resolves the inner-tool model class to anthropic, and that _extract_model_config falls back to the top-level model: block when no per-tool override is present.

Fixes #3.

…loses microsoft#3) Issue microsoft#3 reported that 'Fails to run with only ANTHROPIC_API_KEY'. The image_qa and self_reflection tools hardcoded the OpenAI Responses API through an _openai_config helper that read OPENAI_API_KEY even when the agent loop ran on Claude. This change adds a model_config flag to both tools, threads the resolved model: block from base.yaml (and model_claude.yaml's anchored tools.*.model blocks) into the subprocess invocation, and routes through the same models registry the outer loop uses. The existing OpenAI fallback path is preserved for direct CLI use. Closes microsoft#3.

Follow-up to #5. Consolidates duplicated model-config helpers, drops legacy OpenAI HTTP code paths, and simplifies model_claude.yaml. Net -477 lines.

mvanhorn mentioned this pull request May 26, 2026

Fails to run with only ANTHROPIC_API_KEY #3

Closed

adamlu123 merged commit c03b7ff into microsoft:main May 27, 2026
1 check passed

adamlu123 mentioned this pull request May 27, 2026

refactor: dedupe inner-tool model routing into a single helper #10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: route image_qa and self_reflection through the configured model (closes #3)#5

fix: route image_qa and self_reflection through the configured model (closes #3)#5
adamlu123 merged 1 commit into
microsoft:mainfrom
mvanhorn:fix/3-webwright-inner-tool-llm-routing

mvanhorn commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mvanhorn commented May 26, 2026

Summary

Why this matters

Changes

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants