Webpilot is a browser tool.
It launches a real Chromium-based browser with a local extension runtime, exposes a WebSocket protocol, and lets a user, script, or LLM drive that browser through the same command surface.
The primary interface is the live DOM — not screenshots. discover, html, and q give you real page structure, real selectors, and real handles. Screenshots exist as a fallback for when layout or visual rendering is the actual question. For everything else, read the DOM.
What Webpilot does:
- starts and controls a real browser — no CDP, no detectable debugging port
- exposes the live DOM directly: navigation, element discovery, querying, interaction, cookies
- provides configurable cursor, click, typing, and scroll behavior
- works from the CLI, raw WebSocket, Node, or an MCP adapter
What Webpilot does not do:
- decide what to do next
- ship a tuned human profile
- ship site strategy, retries, or route doctrine
The user or LLM decides the workflow. Webpilot provides the browser runtime and commands.
npm install -g h17-webpilotCreate ~/h17-webpilot/config.js or human-browser.config.js in your project:
module.exports = {
browser: "/Applications/Chromium.app/Contents/MacOS/Chromium",
human: {
calibrated: false,
profileName: "public-default",
cursor: {
overshootRatio: 0,
},
},
};The example file is human-browser.config.example.js.
npx webpilot start
npx webpilot start -dThis launches the browser and starts the local WebSocket bridge on ws://localhost:7331.
Use npx webpilot start -d if you want an append-only session log.
- default log path:
~/h17-webpilot/webpilot.log - override path in config with
framework.debug.sessionLogPath
npx webpilot -c 'go example.com'
npx webpilot -c 'discover'
npx webpilot -c 'click h1'
npx webpilot -c 'wait h1'
npx webpilot -c 'html'
npx webpilot -c 'cookies load ./cookies.json'Use the same loop every time:
- inspect
- act
- verify
npx webpilot
npx webpilot -c 'go example.com'
npx webpilot start
npx webpilot start -d
npx webpilot stopCore commands:
go <url>: navigatediscover: list interactive elements with handlesq <selector>/query <selector>: query elementswait <selector>: wait for a selectorclick <selector|handleId>: safe clicktype [selector] <text>: type with the configured public profileclear <selector>: clear an inputkey <name>/press <name>: send a keysd [px] [selector]/su [px] [selector]: scrollhtml: read page HTMLss: save a screenshot — use when layout or visual rendering is the question, not DOM structurecookies: dump cookiescookies load <file>: load cookies from a JSON array fileframes: list framesnpx webpilot start -d: start detached and append WS commands/events to~/h17-webpilot/webpilot.logunless config overrides the path
Raw mode stays available:
npx webpilot -c 'human.click {"selector": "button[type=submit]"}'
npx webpilot -c '{"action": "dom.getHTML", "params": {}}'Connect to ws://localhost:7331 and send JSON:
{ "id": "1", "action": "tabs.navigate", "params": { "url": "https://example.com" } }Capability groups:
tabsdomhumancookieseventsframework
Full reference: protocol/PROTOCOL.md
The Node API is a wrapper over the same WebSocket protocol.
const { startWithPage } = require('h17-webpilot');
const { page } = await startWithPage();
await page.navigate('https://example.com');
await page.query('h1');
await page.click('h1');
await page.waitFor('body');Useful methods:
navigate(url)/ legacygoto(url)query(selector)/ legacy$(selector)queryAll(selector)/ legacy$$(selector)waitFor(selector)/ legacywaitForSelector(selector)read()/ legacycontent()click(...)/ legacyhumanClick(...)type(...)/ legacyhumanType(...)scroll(...)/ legacyhumanScroll(...)clearInput(...)/ legacyhumanClearInput(...)pressKey(key)configure(config)/ legacysetConfig(config)
Public config is split into:
framework: runtime behavior, debug toggles, handle retentionhuman: cursor, click, typing, scroll, and avoid rules
The public package exposes a lot of knobs on purpose. The user decides how much to tune. The package does not ship a strong profile.
Example:
module.exports = {
framework: {
debug: {
cursor: true,
sessionLogPath: '~/h17-webpilot/webpilot.log',
},
},
human: {
calibrated: false,
profileName: 'public-default',
cursor: {
spreadRatio: 0.16,
jitterRatio: 0,
stutterChance: 0,
driftThresholdPx: 0,
overshootRatio: 0,
},
click: {
thinkDelayMin: 35,
thinkDelayMax: 90,
maxShiftPx: 50,
},
type: {
baseDelayMin: 8,
baseDelayMax: 20,
variance: 4,
pauseChance: 0,
pauseMin: 0,
pauseMax: 0,
},
},
};Auth/session bootstrap example:
module.exports = {
browser: "/Applications/Chromium.app/Contents/MacOS/Chromium",
boot: {
cookiesPath: "./cookies.json",
commands: [
"go https://hugopalma.work",
"cookies load ./cookies.json",
{ action: "framework.getConfig", params: {} }
],
},
};boot.cookiesPath loads a cookie jar before commands run.
boot.commands accepts:
- command strings like the CLI shorthands
cookies load <file>entries- raw objects:
{ action, params, tabId? }
These defaults do not represent a human profile:
- typing is very fast
- overshoot is off
- jitter is off
- drift is off
They are there to show what is configurable. The package does not ship your final values.
If no config file exists, npx webpilot start will:
- detect installed browsers
- ask the user to choose one when needed
- generate
~/h17-webpilot/config.js
The generated config uses the same public defaults shown above.
If you start with npx webpilot start -d, session logging is enabled even if the config does not set it.
The path comes from framework.debug.sessionLogPath when present, otherwise it falls back to ~/h17-webpilot/webpilot.log.
Tested browsers:
- Chromium
- Helium
- Google Chrome
- Defaults are for demonstration and development, not for behavior parity.
- The browser tool does not decide workflows.
- The user or LLM still has to choose selectors, waits, retries, and verification steps.
dom.evaluatemay hit CSP restrictions on some sites. DOM reading and interaction still work through the isolated content-script path.
SKILL.md explains how an LLM should use Webpilot as a browser tool.
Apache 2.0