Skip to content

microsoft/Webwright

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Webwright: A Terminal Is All You Need for Web Agents

Webwright logo

A tiny, terminal-based web agent harness β€” readable end-to-end, SOTA on web agent benchmarks.

Python Playwright Backends Footprint

Webwright gives agents a terminal where it can launch multiple browswer sessions to inspect the page and complete a web task. It captures and inspects page screenshots/states only when needed. It drives a Playwright browser through a minimal agent loop with pluggable LLM backends. No multi-agent system, no graph engine, no plugin layer, no hidden orchestration β€” just a terminal, a browser, and a model.


πŸ’‘ Motivation: Beyond Step-by-Step Web Interaction in a Stateful Browser

Most web agents today treat the browser session itself as the workspace: at each step the model receives the current page state and predicts a single next operation β€” a click, a type, a DOM selector, or a short tool call. Whatever the format, the agent is locked into predicting one web action at a time inside a predefined interaction loop. That harness was useful when LLMs were weaker. As models get stronger at writing and debugging code, the same harness becomes a bottleneck.

Webwright takes a different stance: separate the agent from the browser, and treat the browser as something the agent can launch, inspect, and discard while developing a program. The persistent artifact is not the browser session β€” it's the code and logs in the local workspace.

  • 🧱 Robust, reusable interaction with web environments β€” instead of fragile pixel-level actions, a coding agent with a terminal queries elements, waits for conditions, and handles dynamic behaviors like lazy loading or re-rendering. The resulting scripts can be rerun, adapted, and shared across tasks rather than rediscovered from scratch.
  • ⚑ Efficient composition of complex workflows β€” multi-step interactions like selecting a date or filling a form become a compact program. Loops, functions, and abstractions let the agent generalize across similar tasks (e.g. different dates) without re-predicting the same low-level sequences. Fewer interaction rounds, faster execution, less error accumulation on long horizons.
  • πŸ§ͺ Workspace-as-state, not browser-as-state β€” the agent can write exploratory scripts, spawn fresh browser sessions, and decide for itself when to capture screenshots and inspect failures, much like a human engineer iterating on an RPA script.
  • πŸͺ„ Surprisingly effective despite being minimal β€” this stripped-down setup turns out to handle complex and especially long-horizon web tasks well (see Performance).

🌟 Why Webwright

Most web agent frameworks bury the actual agent loop under layers of abstractions. Webwright takes the opposite stance:

  • πŸͺΆ Lightweight by design β€” core agent loop in a single ~450-line file, Playwright environment in ~570 lines, CLI in ~150 lines.
  • 🧩 Pluggable model backends β€” OpenAI, Anthropic, and OpenRouter, each ~150–200 lines.
  • πŸ” Zero hidden frameworks β€” just httpx, pydantic, playwright, and typer.
  • πŸ” Flat prompt β†’ observe β†’ act loop β€” readable end-to-end, easy to debug, easy to fork.
  • πŸ§ͺ Run-artifact first β€” every run writes trajectories and screenshots to disk for inspection.

If you want a minimal, easy-to-debug starting point for browser-using agents instead of another heavyweight platform, this is it.


πŸ“Š Performance

State-of-the-art on two real-website benchmarks with a 100-step budget β€” see the blog post for full details.

  • πŸ† Online-Mind2Web (300 tasks): 86.7% with GPT-5.4 β€” highest among open-sourced harnesses in the AutoEval category. Claude Opus 4.7 reaches 84.7%, and is stronger on the hard split (80.5% vs. 76.6% for GPT-5.4 at N=100).
  • πŸš€ Odysseys (200 long-horizon tasks): 60.1% with GPT-5.4 (avg. 76.1 steps) β€” +15.6 points over the prior SOTA (Opus 4.6 at 44.5%, using xy-coordinate prediction and persistent browser) and +26.6 points over base GPT-5.4 (33.5% using xy-coordinate prediction and persistent browser).
  • 🧠 Code-as-action beats coordinate prediction: Webwright substantially outperforms a reproduced GPT-5.4 screenshot+xy-coordinate baseline across all difficulty splits.
  • 🧰 Small models + reusable tools: generated scripts can be packaged as parameterized CLI tools β€” even Qwen-3.5-9B completes tasks well on Online-Mind2Web sites with 5+ tools available.

οΏ½πŸ—ΊοΈ Project Map

webwright/
β”œβ”€β”€ pyproject.toml           # package: webwright
β”œβ”€β”€ src/webwright/
β”‚   β”œβ”€β”€ run/cli.py           # CLI entrypoint (`webwright`)
β”‚   β”œβ”€β”€ agents/default.py    # core agent loop
β”‚   β”œβ”€β”€ environments/        # Playwright browser workspace
β”‚   β”œβ”€β”€ tools/               # image_qa, self_reflection
β”‚   β”œβ”€β”€ models/              # openai_model, anthropic_model, base
β”‚   β”œβ”€β”€ config/              # base.yaml, model_openai.yaml, model_claude.yaml
β”‚   └── utils/
β”œβ”€β”€ tests/
└── outputs/                 # run artifacts (trajectories, screenshots)

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Chromium installed through Playwright
  • An API key for your chosen backend (OpenAI, Anthropic, or OpenRouter)

Install

pip install -e .
playwright install chromium

Run

Export credentials for the chosen backend (e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY), then:

python -m webwright.run.cli \
    -c base.yaml -c model_openai.yaml \
    -t "Find the cheapest economy flight from SEA to JFK on 2026-05-15" \
    --start-url https://www.google.com/flights \
    --task-id demo_openai \
    -o outputs/default

🚩 Flags

Flag Description
-c Config file(s) from src/webwright/config/ (stackable).
-t Task instruction.
--start-url Initial page.
--task-id Output subfolder name.
-o Output directory.

β™Ώ Give Back to the Accessibility Community

Web-agent research is now benefiting from infrastructure originally designed for accessibility. Accessibility trees, ARIA metadata, and semantic page representations help assistive technologies expose web content to people with disabilities; today, the same signals also give LLM agents a machine-readable view of pages beyond pixels.

As builders, we have a responsibility to bring these advances back to the accessibility community. Webwright could support everyday assistive workflows such as:

  • πŸ“ forms and appointments
  • 🚌 transportation lookups
  • πŸ›’ service and price comparison

…while also acting as a repair layer for the web itself: inspecting pages, detecting missing labels, confusing controls, broken navigation, or inaccessible forms, and generating reusable scripts or overlays that make sites easier to understand and operate.

We encourage developers to propose ideas for using Webwright to move us closer to a more accessible and useful web for everyone.


Credits

Citation

If you use Webwright in your research or build on it, please cite this repository:

@misc{webwright2026,
  title        = {Webwright: A terminal is all you need for web agents},
  author       = {Lu, Yadong and Xu, Lingrui and Huang, Chao and Awadallah, Ahmed},
  year         = {2026},
  howpublished = {\url{https://github.com/microsoft/Webwright}},
  note         = {GitHub repository}
}

About

A simple web agent harness that achieves SOTA results on long horizon web tasks.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors