A tiny, terminal-based web agent harness β readable end-to-end, SOTA on web agent benchmarks.
- π Blog: Webwright: A terminal is all you need for web agents
- π Website & demo videos: microsoft.github.io/Webwright
Webwright gives agents a terminal where it can launch multiple browswer sessions to inspect the page and complete a web task. It captures and inspects page screenshots/states only when needed. It drives a Playwright browser through a minimal agent loop with pluggable LLM backends. No multi-agent system, no graph engine, no plugin layer, no hidden orchestration β just a terminal, a browser, and a model.
Most web agents today treat the browser session itself as the workspace: at each step the model receives the current page state and predicts a single next operation β a click, a type, a DOM selector, or a short tool call. Whatever the format, the agent is locked into predicting one web action at a time inside a predefined interaction loop. That harness was useful when LLMs were weaker. As models get stronger at writing and debugging code, the same harness becomes a bottleneck.
Webwright takes a different stance: separate the agent from the browser, and treat the browser as something the agent can launch, inspect, and discard while developing a program. The persistent artifact is not the browser session β it's the code and logs in the local workspace.
- π§± Robust, reusable interaction with web environments β instead of fragile pixel-level actions, a coding agent with a terminal queries elements, waits for conditions, and handles dynamic behaviors like lazy loading or re-rendering. The resulting scripts can be rerun, adapted, and shared across tasks rather than rediscovered from scratch.
- β‘ Efficient composition of complex workflows β multi-step interactions like selecting a date or filling a form become a compact program. Loops, functions, and abstractions let the agent generalize across similar tasks (e.g. different dates) without re-predicting the same low-level sequences. Fewer interaction rounds, faster execution, less error accumulation on long horizons.
- π§ͺ Workspace-as-state, not browser-as-state β the agent can write exploratory scripts, spawn fresh browser sessions, and decide for itself when to capture screenshots and inspect failures, much like a human engineer iterating on an RPA script.
- πͺ Surprisingly effective despite being minimal β this stripped-down setup turns out to handle complex and especially long-horizon web tasks well (see Performance).
Most web agent frameworks bury the actual agent loop under layers of abstractions. Webwright takes the opposite stance:
- πͺΆ Lightweight by design β core agent loop in a single ~450-line file, Playwright environment in ~570 lines, CLI in ~150 lines.
- π§© Pluggable model backends β OpenAI, Anthropic, and OpenRouter, each ~150β200 lines.
- π Zero hidden frameworks β just
httpx,pydantic,playwright, andtyper. - π Flat prompt β observe β act loop β readable end-to-end, easy to debug, easy to fork.
- π§ͺ Run-artifact first β every run writes trajectories and screenshots to disk for inspection.
If you want a minimal, easy-to-debug starting point for browser-using agents instead of another heavyweight platform, this is it.
State-of-the-art on two real-website benchmarks with a 100-step budget β see the blog post for full details.
- π Online-Mind2Web (300 tasks): 86.7% with GPT-5.4 β highest among open-sourced harnesses in the AutoEval category. Claude Opus 4.7 reaches 84.7%, and is stronger on the hard split (80.5% vs. 76.6% for GPT-5.4 at N=100).
- π Odysseys (200 long-horizon tasks): 60.1% with GPT-5.4 (avg. 76.1 steps) β +15.6 points over the prior SOTA (Opus 4.6 at 44.5%, using xy-coordinate prediction and persistent browser) and +26.6 points over base GPT-5.4 (33.5% using xy-coordinate prediction and persistent browser).
- π§ Code-as-action beats coordinate prediction: Webwright substantially outperforms a reproduced GPT-5.4 screenshot+xy-coordinate baseline across all difficulty splits.
- π§° Small models + reusable tools: generated scripts can be packaged as parameterized CLI tools β even Qwen-3.5-9B completes tasks well on Online-Mind2Web sites with 5+ tools available.
webwright/
βββ pyproject.toml # package: webwright
βββ src/webwright/
β βββ run/cli.py # CLI entrypoint (`webwright`)
β βββ agents/default.py # core agent loop
β βββ environments/ # Playwright browser workspace
β βββ tools/ # image_qa, self_reflection
β βββ models/ # openai_model, anthropic_model, base
β βββ config/ # base.yaml, model_openai.yaml, model_claude.yaml
β βββ utils/
βββ tests/
βββ outputs/ # run artifacts (trajectories, screenshots)
- Python 3.10+
- Chromium installed through Playwright
- An API key for your chosen backend (OpenAI, Anthropic, or OpenRouter)
pip install -e .
playwright install chromiumExport credentials for the chosen backend (e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY), then:
python -m webwright.run.cli \
-c base.yaml -c model_openai.yaml \
-t "Find the cheapest economy flight from SEA to JFK on 2026-05-15" \
--start-url https://www.google.com/flights \
--task-id demo_openai \
-o outputs/default| Flag | Description |
|---|---|
-c |
Config file(s) from src/webwright/config/ (stackable). |
-t |
Task instruction. |
--start-url |
Initial page. |
--task-id |
Output subfolder name. |
-o |
Output directory. |
Web-agent research is now benefiting from infrastructure originally designed for accessibility. Accessibility trees, ARIA metadata, and semantic page representations help assistive technologies expose web content to people with disabilities; today, the same signals also give LLM agents a machine-readable view of pages beyond pixels.
As builders, we have a responsibility to bring these advances back to the accessibility community. Webwright could support everyday assistive workflows such as:
- π forms and appointments
- π transportation lookups
- π service and price comparison
β¦while also acting as a repair layer for the web itself: inspecting pages, detecting missing labels, confusing controls, broken navigation, or inaccessible forms, and generating reusable scripts or overlays that make sites easier to understand and operate.
We encourage developers to propose ideas for using Webwright to move us closer to a more accessible and useful web for everyone.
- SWE-agent/mini-swe-agent β design inspiration for the minimal agent loop.
- Playwright β browser automation.
If you use Webwright in your research or build on it, please cite this repository:
@misc{webwright2026,
title = {Webwright: A terminal is all you need for web agents},
author = {Lu, Yadong and Xu, Lingrui and Huang, Chao and Awadallah, Ahmed},
year = {2026},
howpublished = {\url{https://github.com/microsoft/Webwright}},
note = {GitHub repository}
}