Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .changeset/webcontainer-runner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
'@gemstack/ai-autopilot': minor
---

Add WebContainerRunner, the in-browser sandboxed runner

`WebContainerRunner` is the third real `Runner` adapter (after `LocalRunner` and
`DockerRunner`), wrapping StackBlitz's `@webcontainer/api`. It runs untrusted,
agent-authored code entirely inside a browser tab: an in-browser Node runtime, an
isolated filesystem, and an instant `preview()` URL for a dev server, with nothing
touching the host. This is the "sit on harnesses, don't compete" bet for the
browser: the same `Runner` interface, now backed by WebContainer.

It is browser-only by construction (WebContainer needs `SharedArrayBuffer`, so the
hosting page must be cross-origin isolated), so `@webcontainer/api` is an optional
peer dependency and is imported lazily: loading `@gemstack/ai-autopilot` in Node
never pulls it in. Guard with the new `webContainerAvailable()` and reach for
`DockerRunner` on the server.

Because a WebContainer cannot boot in Node, boot-and-serve is proven by a headless
Chromium harness under `packages/ai-autopilot/harness/webcontainer/` that drives
the compiled adapter through boot, fs, exec (exit codes, cwd/env, timeout kill),
start, a real preview URL, an in-container serve check, dispose, and reboot. The
Node-only guards are covered by the default test suite.

Part of #109 (the Flue adapter stays gated on a live Flue environment).
39 changes: 39 additions & 0 deletions packages/ai-autopilot/harness/webcontainer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# WebContainer boot-and-serve harness

`WebContainerRunner` wraps [`@webcontainer/api`](https://webcontainers.io), which
runs **only inside a cross-origin-isolated browser** — it cannot boot in plain
Node. So, unlike `DockerRunner` (verified by a normal `node --test` suite against
a local daemon), the WebContainer adapter is proven by driving a **real headless
Chromium** against the compiled adapter. This directory is that proof.

## What it does

1. `server.mjs` serves a tiny page over `127.0.0.1` with the cross-origin
isolation headers WebContainer needs (`COOP: same-origin`, `COEP:
require-corp`), plus the compiled adapter (`/dist`) and `@webcontainer/api`
(`/api`) same-origin.
2. `index.html` imports the **real** `dist/runner/webcontainer.js` and drives it
exactly as an app would: boot, fs round-trip, `exec` (exit codes, cwd/env,
timeout kill), `start` a server, resolve its `preview()` URL, prove it serves
by fetching it from inside the container, then `dispose` and reboot.
3. `drive.mjs` launches headless Chromium (via `playwright-core`), loads the
page, and asserts every check passed.

## Run it

```bash
pnpm build # compile the adapter into dist/
node harness/webcontainer/drive.mjs # boot-and-serve proof; exits non-zero on any failed check
```

Requirements:

- **A Chromium browser.** Uses your system Google Chrome by default; falls back
to a Playwright Chromium (`npx playwright install chromium`).
- **Network.** WebContainer downloads its runtime from StackBlitz on first boot,
so this is not offline-hermetic (another reason it is opt-in, not part of
`pnpm test`).

The Node-only guards (`webContainerAvailable()` is false outside a browser,
`boot()` throws a clear error in Node) are covered by `src/runner/webcontainer.test.ts`
and do run in the default suite.
51 changes: 51 additions & 0 deletions packages/ai-autopilot/harness/webcontainer/drive.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
// Boot-and-serve proof for WebContainerRunner. WebContainer runs only in a
// cross-origin-isolated browser, so this drives a real headless Chromium against
// the compiled adapter and asserts it boots, execs, serves, and tears down.
//
// Prereqs: build the package (`pnpm build`) and have Google Chrome installed, or
// a Playwright Chromium (`npx playwright install chromium`). Then:
// node harness/webcontainer/drive.mjs
//
// Needs network: WebContainer downloads its runtime from StackBlitz on first boot.
import { chromium } from 'playwright-core'
import { startServer } from './server.mjs'

async function launch() {
// Prefer a system Chrome (no browser download); fall back to a Playwright Chromium.
try {
return await chromium.launch({ headless: true, channel: 'chrome' })
} catch {
try {
return await chromium.launch({ headless: true })
} catch (err) {
console.error(
'No usable Chromium. Install Google Chrome, or run `npx playwright install chromium`.\n' + err,
)
process.exit(2)
}
}
}

const { server, port } = await startServer()
const browser = await launch()
const page = await browser.newPage()
page.on('console', (m) => console.log('[page]', m.text()))
page.on('pageerror', (e) => console.log('[pageerror]', e.message))

let result
try {
await page.goto(`http://127.0.0.1:${port}/`, { waitUntil: 'load' })
await page.waitForFunction(() => window.__RESULT__ && window.__RESULT__.done, { timeout: 90000 })
result = await page.evaluate(() => window.__RESULT__)
} catch (e) {
result = { done: false, ok: false, error: 'harness timeout: ' + e.message }
}

await browser.close()
server.close()

const passed = (result.checks ?? []).filter((c) => c.ok).length
const total = (result.checks ?? []).length
console.log(`\n${result.ok ? 'OK' : 'FAILED'} — ${passed}/${total} checks passed`)
if (result.error) console.log('error:', result.error)
process.exit(result.ok ? 0 : 1)
79 changes: 79 additions & 0 deletions packages/ai-autopilot/harness/webcontainer/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<title>WebContainerRunner boot-and-serve harness</title>
<!-- The adapter dynamically imports the bare specifier `@webcontainer/api`; map it same-origin. -->
<script type="importmap">
{ "imports": { "@webcontainer/api": "/api/index.js" } }
</script>
</head>
<body>
<pre id="log"></pre>
<script type="module">
// Drive the REAL compiled adapter (dist), exactly as an app would import it.
import { WebContainerRunner, webContainerAvailable } from '/dist/runner/webcontainer.js'

const logEl = document.getElementById('log')
const log = (m) => { logEl.textContent += m + '\n'; console.log(m) }
const checks = []
const check = (name, ok, detail) => { checks.push({ name, ok: !!ok, detail }); log((ok ? 'PASS ' : 'FAIL ') + name + (detail ? ' ' + detail : '')) }
window.__RESULT__ = { done: false }

const httpServer =
"const http=require('http');http.createServer((_,res)=>{res.writeHead(200);res.end('hello from webcontainer')}).listen(3000,()=>console.log('listening'))"

try {
check('webContainerAvailable() is true (cross-origin isolated)', webContainerAvailable(), 'isolated=' + self.crossOriginIsolated)

const runner = new WebContainerRunner()
const s = await runner.boot({ files: { 'server.js': httpServer, 'note.txt': 'seed' } })
log('booted session ' + s.id)

// --- filesystem round-trip ---
check('fs.read seeds mounted files', (await s.fs.read('note.txt')) === 'seed')
await s.fs.write('src/deep/a.txt', 'A')
check('fs.exists on a nested write', await s.fs.exists('src/deep/a.txt'))
const list = await s.fs.list()
check('fs.list is recursive + sorted', JSON.stringify(list) === JSON.stringify(['note.txt', 'server.js', 'src/deep/a.txt']), JSON.stringify(list))
await s.fs.remove('src')
check('fs.remove deletes recursively', !(await s.fs.exists('src/deep/a.txt')))
let threw = false
try { await s.fs.read('../escape') } catch { threw = true }
check('fs rejects paths escaping the workspace', threw)

// --- exec ---
const ver = await s.exec('node -v')
check('exec runs in the browser Node runtime', ver.exitCode === 0 && /v\d+\./.test(ver.stdout), JSON.stringify(ver.stdout.trim()))
check('exec propagates non-zero exit codes', (await s.exec('exit 7')).exitCode === 7)
check('exec honors cwd + env', (await s.exec('echo $FOO', { env: { FOO: 'bar' } })).stdout.includes('bar'))
const timed = await s.exec('node -e "setTimeout(()=>{}, 10000)"', { timeoutMs: 1000 })
check('exec kills a command past its timeout (124)', timed.exitCode === 124, 'code=' + timed.exitCode)

// --- start + preview + serve ---
const dev = await s.start('node server.js')
check('start returns a handle without blocking', dev.command === 'node server.js')
const preview = await s.preview({ port: 3000, waitMs: 15000 })
check('preview resolves a WebContainer URL for the running server', /webcontainer-api\.io/.test(preview.url), preview.url)

// Serve proof: hit the loopback server from INSIDE the container and read the body.
const served = await s.exec("node -e \"fetch('http://localhost:3000').then(r=>r.text()).then(t=>process.stdout.write(t))\"")
check('the started server actually serves', served.stdout.includes('hello from webcontainer'), JSON.stringify(served.stdout.trim()))

await dev.stop()
await s.dispose()
check('dispose blocks further exec', await (async () => { try { await s.exec('true'); return false } catch { return true } })())

// Single-instance guard: a fresh boot after dispose must succeed.
const s2 = await runner.boot()
check('boot succeeds again after dispose (single-instance slot freed)', !!s2.id)
await s2.dispose()

window.__RESULT__ = { done: true, ok: checks.every(c => c.ok), checks }
} catch (err) {
log('ERROR ' + (err && err.stack || err))
window.__RESULT__ = { done: true, ok: false, checks, error: String(err && err.message || err) }
}
</script>
</body>
</html>
52 changes: 52 additions & 0 deletions packages/ai-autopilot/harness/webcontainer/server.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import { createServer } from 'node:http'
import { readFile } from 'node:fs/promises'
import { fileURLToPath } from 'node:url'
import { dirname, join, normalize } from 'node:path'

const __dirname = dirname(fileURLToPath(import.meta.url))
const PKG_ROOT = join(__dirname, '..', '..')
const DIST_DIR = join(PKG_ROOT, 'dist')
// Resolve the installed @webcontainer/api so the page can serve it same-origin.
// It is import-only (no CJS main), so resolve its ESM entry and take the dist dir.
const API_DIR = dirname(fileURLToPath(import.meta.resolve('@webcontainer/api')))

const TYPES = {
'.html': 'text/html',
'.js': 'text/javascript',
'.mjs': 'text/javascript',
'.map': 'application/json',
'.wasm': 'application/wasm',
}

const safeJoin = (base, rel) => join(base, normalize(rel).replace(/^(\.\.(\/|\\|$))+/, ''))

/** Start the COOP/COEP static server. Resolves with its chosen port. */
export function startServer() {
const server = createServer(async (req, res) => {
// Cross-origin isolation is what makes SharedArrayBuffer (and WebContainer) available.
res.setHeader('Cross-Origin-Opener-Policy', 'same-origin')
res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp')
res.setHeader('Cross-Origin-Resource-Policy', 'cross-origin')

const path = new URL(req.url, 'http://localhost').pathname
const send = async (file) => {
const body = await readFile(file)
res.setHeader('Content-Type', TYPES['.' + file.split('.').pop()] ?? 'application/octet-stream')
res.end(body)
}
try {
if (path === '/favicon.ico') { res.statusCode = 204; return res.end() }
if (path === '/' || path === '/index.html') return await send(join(__dirname, 'index.html'))
if (path.startsWith('/dist/')) return await send(safeJoin(DIST_DIR, path.slice('/dist/'.length)))
if (path.startsWith('/api/')) return await send(safeJoin(API_DIR, path.slice('/api/'.length)))
res.statusCode = 404
res.end('not found')
} catch (err) {
res.statusCode = 500
res.end(String(err))
}
})
return new Promise((resolve) => {
server.listen(0, '127.0.0.1', () => resolve({ server, port: server.address().port }))
})
}
10 changes: 10 additions & 0 deletions packages/ai-autopilot/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,18 @@
"@gemstack/ai-skills": "workspace:^",
"zod": "^4.0.0"
},
"peerDependencies": {
"@webcontainer/api": "^1.6.0"
},
"peerDependenciesMeta": {
"@webcontainer/api": {
"optional": true
}
},
"devDependencies": {
"@types/node": "^20.0.0",
"@webcontainer/api": "^1.6.4",
"playwright-core": "^1.49.0",
"typescript": "^5.4.0"
},
"author": "Suleiman Shahbari"
Expand Down
5 changes: 5 additions & 0 deletions packages/ai-autopilot/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
* - {@link FakeRunner} — in-memory runner for tests
* - {@link LocalRunner} — real host workspace (fs + child processes); the first real adapter
* - {@link DockerRunner} — sandboxed workspace in a container (via the `docker` CLI)
* - {@link WebContainerRunner} — sandboxed workspace in the browser (via `@webcontainer/api`)
* - {@link runnerTools} — expose a booted session to an agent as sandbox tools
*
* Surfaces run the same autopilot in the terminal, an in-page UI, or a
Expand Down Expand Up @@ -129,6 +130,9 @@ export {
DockerRunner,
DockerRunnerSession,
dockerAvailable,
WebContainerRunner,
WebContainerRunnerSession,
webContainerAvailable,
RunnerError,
runnerTools,
type Runner,
Expand All @@ -147,6 +151,7 @@ export {
type RecordedStart,
type LocalRunnerOptions,
type DockerRunnerOptions,
type WebContainerRunnerOptions,
type RunnerToolsOptions,
} from './runner/index.js'
export {
Expand Down
7 changes: 7 additions & 0 deletions packages/ai-autopilot/src/runner/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
* - {@link FakeRunner} — in-memory runner for tests (the runner analog of `AiFake`)
* - {@link LocalRunner} — real host workspace (fs + child processes); the first real adapter
* - {@link DockerRunner} — sandboxed workspace in a container (via the `docker` CLI)
* - {@link WebContainerRunner} — sandboxed workspace in the browser (via `@webcontainer/api`)
* - {@link runnerTools} — expose a session to an agent as sandbox tools
*/
export type {
Expand All @@ -35,4 +36,10 @@ export {
} from './fake.js'
export { LocalRunner, LocalRunnerSession, type LocalRunnerOptions } from './local.js'
export { DockerRunner, DockerRunnerSession, dockerAvailable, type DockerRunnerOptions } from './docker.js'
export {
WebContainerRunner,
WebContainerRunnerSession,
webContainerAvailable,
type WebContainerRunnerOptions,
} from './webcontainer.js'
export { runnerTools, type RunnerToolsOptions } from './tools.js'
33 changes: 33 additions & 0 deletions packages/ai-autopilot/src/runner/webcontainer.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import { describe, it } from 'node:test'
import assert from 'node:assert/strict'
import { WebContainerRunner, webContainerAvailable } from './webcontainer.js'
import { RunnerError } from './types.js'

// These run in plain Node (no browser), so they cover the capability guard and
// the Node-side contract. The real boot-and-serve is proven by the browser
// harness under `harness/webcontainer/` (see its README).

describe('webContainerAvailable', () => {
it('is false in a non-isolated (Node) context', () => {
assert.equal(webContainerAvailable(), false)
})
})

describe('WebContainerRunner', () => {
it('identifies its kind', () => {
assert.equal(new WebContainerRunner().kind, 'webcontainer')
})

it('constructs without importing @webcontainer/api (browser-only dep stays lazy)', () => {
// No throw here means the module loaded without eagerly requiring the browser dep.
assert.doesNotThrow(() => new WebContainerRunner({ coep: 'credentialless', preview: false }))
})

it('boot() rejects with a clear error when not cross-origin isolated', async () => {
await assert.rejects(() => new WebContainerRunner().boot(), (err: unknown) => {
assert.ok(err instanceof RunnerError)
assert.match((err as Error).message, /cross-origin isolated/)
return true
})
})
})
Loading
Loading