Skip to content

microsoft/Webwright

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Webwright

Webwright logo

Turn Your Coding Models to Be State-of-the-art Browser Agents

Python Playwright Backends Footprint

Webwright gives LLM a terminal where it can launch multiple browswer sessions to inspect the page and complete a web task. It captures and inspects page screenshots/states only when needed. It enforces each web tasks to be completed end2end within a re-runable python script, i.e. your web agent browsing history is a single code file. No multi-agent system, no graph engine, no plugin layer, no hidden orchestration β€” just a terminal, a browser, and a model.

Already got your favorite agents, and wonder how to make Claude Code, Codex, Hermes, OpenClaw more capable in browser tasks? Consider add Webwright plugin/skills!


πŸ’‘ Motivation: Beyond Step-by-Step Web Interaction in a Stateful Browser

Most web agents today treat the browser session itself as the workspace: at each step the model receives the current page state and predicts a single next operation β€” a click, a type, a DOM selector, or a short tool call. Whatever the format, the agent is locked into predicting one web action at a time inside a predefined interaction loop. That harness was useful when LLMs were weaker. As models get stronger at writing and debugging code, the same harness becomes a bottleneck.

Webwright takes a different stance: separate the agent from the browser, and treat the browser as something the agent can launch, inspect, and discard while developing a program. The persistent artifact is not the browser session β€” it's the code and logs in the local workspace.

  • 🧱 Robust, reusable interaction with web environments β€” instead of fragile pixel-level actions, a coding agent with a terminal queries elements, waits for conditions, and handles dynamic behaviors like lazy loading or re-rendering. The resulting scripts can be rerun, adapted, and shared across tasks rather than rediscovered from scratch.
  • ⚑ Efficient composition of complex workflows β€” multi-step interactions like selecting a date or filling a form become a compact program. Loops, functions, and abstractions let the agent generalize across similar tasks (e.g. different dates) without re-predicting the same low-level sequences. Fewer interaction rounds, faster execution, less error accumulation on long horizons.
  • πŸ§ͺ Workspace-as-state, not browser-as-state β€” the agent can write exploratory scripts, spawn fresh browser sessions, and decide for itself when to capture screenshots and inspect failures, much like a human engineer iterating on an RPA script.
  • πŸͺ„ Surprisingly effective despite being minimal β€” this stripped-down setup turns out to handle complex and especially long-horizon web tasks well (see Performance).

🌟 Why Webwright

Most web agent frameworks bury the actual agent loop under layers of abstractions. Webwright takes the opposite stance:

  • πŸͺΆ Lightweight by design β€” core agent loop in a single ~450-line file, Playwright environment in ~570 lines, CLI in ~150 lines.
  • 🧩 Pluggable model backends β€” OpenAI, Anthropic, and OpenRouter, each ~150–200 lines.
  • πŸ” Zero hidden frameworks β€” just httpx, pydantic, playwright, and typer.
  • πŸ” Flat prompt β†’ observe β†’ execute script loop β€” readable end-to-end, easy to debug, easy to fork.
  • πŸ§ͺ Run-artifact first β€” every run writes trajectories and screenshots to disk for inspection.

If you want a minimal, easy-to-debug starting point for browser-using agents instead of another heavyweight platform, this is it.


πŸ†š How Webwright Differs From Other Browser-Agent Repos

How they differ at the architectural level:

Stagehand (Browserbase) agent-browser (Vercel) browser-use Webwright
Paradigm Hybrid: code + NL primitives (act / extract / agent) CLI tool that another agent (Claude Code, Codex, etc.) calls Autonomous LLM agent loop over DOM/AX snapshots Coding agent with a terminal; browser is just an environment it spawns
Action space Playwright code, or NL β†’ LLM-translated Playwright Discrete subcommands (open, click @e2, snapshot, eval) Indexed click/type actions selected by the LLM Free-form Python (writes Playwright scripts itself)
What is "state"? The browser session The browser session (held by daemon across CLI calls) The browser session The local workspace β€” code, screenshots, logs. Browser is disposable.
Loop shape Imperative; agent() does multi-step when needed One CLI invocation per micro-step observe β†’ predict next action β†’ execute β†’ repeat write code β†’ execute β†’ inspect screenshots β†’ repair (code-as-action)

πŸŽ₯ Demo

webwright_demo.mp4

πŸ“Š Performance

State-of-the-art on two real-website benchmarks with a 100-step budget β€” see the blog post for full details.

  • πŸ† Online-Mind2Web (300 tasks): 86.7% with GPT-5.4 β€” highest among open-sourced harnesses in the AutoEval category. Claude Opus 4.7 reaches 84.7%, and is stronger on the hard split (80.5% vs. 76.6% for GPT-5.4 at N=100).
  • πŸš€ Odysseys (200 long-horizon tasks): 60.1% with GPT-5.4 (avg. 76.1 steps) β€” +15.6 points over the prior SOTA (Opus 4.6 at 44.5%, using vision based approach and persistent browser) and +26.6 points over base GPT-5.4 (33.5% using xy-coordinate prediction and persistent browser).
  • 🧠 Code-as-action beats coordinate prediction: Webwright substantially outperforms a reproduced GPT-5.4 screenshot+xy-coordinate baseline across all difficulty splits.
  • 🧰 Small models + reusable tools: generated scripts can be packaged as parameterized CLI tools β€” even Qwen-3.5-9B completes tasks well on Online-Mind2Web sites with 5+ tools available.

Odysseys long-horizon eval @ 100 steps Online-Mind2Web AutoEval @ 100 steps


πŸ—ΊοΈ Project Map

webwright/
β”œβ”€β”€ pyproject.toml           # package: webwright
β”œβ”€β”€ src/webwright/
β”‚   β”œβ”€β”€ run/cli.py           # CLI entrypoint (`webwright`)
β”‚   β”œβ”€β”€ agents/default.py    # core agent loop
β”‚   β”œβ”€β”€ environments/        # Playwright browser workspace
β”‚   β”œβ”€β”€ tools/               # image_qa, self_reflection
β”‚   β”œβ”€β”€ models/              # openai_model, anthropic_model, base
β”‚   β”œβ”€β”€ config/              # base.yaml, model_openai.yaml, model_claude.yaml
β”‚   └── utils/
β”œβ”€β”€ tests/
└── outputs/                 # run artifacts (trajectories, screenshots)

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Chromium installed through Playwright
  • An API key for your chosen backend (OpenAI, Anthropic, or OpenRouter)

Install

pip install -e .
playwright install chromium

Run

Export credentials for the chosen backend (e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY), then:

python -m webwright.run.cli \
    -c base.yaml -c model_openai.yaml \
    -t "Search for flights from SEA to JFK on 2026-08-15 to 2026-08-20" \
    --start-url https://www.google.com/flights \
    --task-id demo_openai \
    -o outputs/default

🚩 Flags

Flag Description
-c Config file(s) from src/webwright/config/ (stackable).
-t Task instruction.
--start-url Initial page.
--task-id Output subfolder name.
-o Output directory.

πŸ”Œ Use as a Plugin

Webwright ships plugin manifests for both Claude Code (.claude-plugin/plugin.json) and OpenAI Codex (.codex-plugin/plugin.json), with the shared skill at skills/webwright/ and slash commands at skills/webwright/commands/. The host agent drives the Webwright loop natively β€” no extra LLM API key or cost beyond your host subscription. Hosts that read PNG screenshots natively skip the OpenAI-backed image_qa / self_reflection tools.

Common runtime deps (install once after either path):

pip install -e .
playwright install chromium
Claude Code

Install

Install through the bundled marketplace inside Claude Code:

# 1. Add this repo as a Claude Code plugin marketplace
/plugin marketplace add microsoft/Webwright

# 2. Install the plugin from that marketplace
/plugin install webwright@webwright

Prefer a local checkout? Point the marketplace command at the cloned repo instead:

/plugin marketplace add /absolute/path/to/Webwright
/plugin install webwright@webwright

Use

Start a new Claude Code session after installing β€” plugins are loaded at session start and won't appear until you restart.

You can either ask Claude Code in plain English (the skill auto-activates from its description), or use one of the slash commands:

/webwright:run search Google Flights for flights from SEA to JFK on 2026-08-15 to 2026-08-20
/webwright:craft search a ticket on Google Flights from LAX to SFO depart June 7 return June 14
  • /webwright:run (or any plain prompt) produces a one-shot final_script.py for the literal task values.
  • /webwright:craft produces a reusable CLI tool: final_script.py becomes one parameterized function with a Google-style Args: docstring and an argparse wrapper whose flags default to the concrete task values, so you can rerun it later with different arguments β€” e.g. python final_script.py --origin JFK --destination LAX --depart-date 2026-07-01.

In both modes Claude Code scaffolds a workspace with plan.md, runs instrumented Playwright scripts under final_runs/run_<id>/, and visually self-verifies each critical point against the saved screenshots.

OpenAI Codex

Install

Codex reads Claude-style marketplaces, so the same repo works as a Codex plugin marketplace. From the Codex CLI:

# 1. Add this repo as a Codex plugin marketplace
codex plugin marketplace add microsoft/Webwright

# 2. Open the plugin browser and install Webwright
codex
/plugins

Prefer a local checkout?

codex plugin marketplace add /absolute/path/to/Webwright

Then restart Codex so the new marketplace and plugin are picked up.

Use

In a new Codex thread, either ask in plain English (the skill auto-activates from its description) or invoke the bundled skill explicitly with @webwright:

@webwright search Google Flights for flights from SEA to JFK on 2026-08-15 to 2026-08-20

Codex scaffolds a workspace with plan.md, runs instrumented Playwright scripts under final_runs/run_<id>/, and visually self-verifies each critical point against the saved screenshots.

To turn the plugin off without uninstalling, set its entry in ~/.codex/config.toml to enabled = false and restart Codex.

🦞 OpenClaw

Install

Install directly from a local checkout (path, archive, npm spec, git repo, or clawhub: spec all work):

openclaw plugins install /absolute/path/to/Webwright
openclaw gateway restart   # reload so the plugin and skill are picked up

Verify:

openclaw plugins list | grep webwright
openclaw skills  list | grep webwright   # should show "βœ“ ready"

Use

The webwright skill is now available to any OpenClaw agent surface (CLI, Telegram, etc.) β€” invoke it by asking the agent in natural language, or via the slash commands shipped under skills/webwright/commands/, e.g. /webwright run <task>.

To uninstall: openclaw plugins uninstall webwright.

Hermes Agent

Install

Hermes Agent is a skills-compatible client, so the same skills/webwright/ folder loads as a Hermes skill. Symlink it into your Hermes user-skills directory:

mkdir -p ~/.hermes/skills
ln -sfn /absolute/path/to/Webwright/skills/webwright ~/.hermes/skills/webwright

No Hermes-specific manifest is needed; only SKILL.md is loaded.

Use

Start Hermes (hermes) and ask it to drive a web task in natural language β€” the skill auto-activates from its description. You can also invoke it explicitly with /webwright.

Note: the named subcommands shipped under skills/webwright/commands/ (/webwright:run, /webwright:craft) are a Claude Code / Codex convention and are inert in Hermes; the skill itself still works end-to-end.


Credits

Citation

If you use Webwright in your research or build on it, please cite this repository:

@misc{webwright2026,
  title        = {Webwright: A terminal is all you need for web agents},
  author       = {Lu, Yadong and Xu, Lingrui and Huang, Chao and Awadallah, Ahmed},
  year         = {2026},
  howpublished = {\url{https://github.com/microsoft/Webwright}},
  note         = {GitHub repository}
}

About

A simple SWE style browser agent framework that achieves SOTA results on long horizon web tasks.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors