Browser2Video

Record smooth browser and terminal automation videos (MP4 @ 60fps) with subtitles, narration, and step metadata.

Modes: human (cursor overlay, click effects, natural pacing) or fast (no delays)
Recording: per-page CDP screencast, OS/Xvfb capture, or headless
Narration: OpenAI TTS with auto-translation, realtime playback, cached in .cache/tts/
Terminals: Real PTY terminals (mc, htop, vim) with xterm.js rendering
Multi-pane: Dockview-based grid layout with titled panels and dynamic tabs

Quick start

npm install browser2video
npx b2v doctor  # check environment

Create a scenario file my-scenario.ts:

import { createSession } from "browser2video";

const session = await createSession({ mode: "human", record: true });
const { step } = session;
const { actor } = await session.openPage({ url: "https://example.com" });

await step("Open page", async () => {
  await actor.waitFor("h1");
});

await step("Click the link", async () => {
  await actor.click("a");
});

const result = await session.finish();
console.log("Video:", result.video);

Run it:

npx b2v run my-scenario.ts --mode human --headed

Requirements

Node.js >= 22
ffmpeg in PATH (for video composition / audio mixing)

Starter examples

The package includes ready-to-run examples:

npx b2v run node_modules/browser2video/examples/simple-browser.ts
npx b2v run node_modules/browser2video/examples/terminal-echo.ts

Video examples

All videos are auto-generated on every push to main. Watch them at the video gallery.

Single-actor UI demo

A single browser page with form inputs, scrolling, drag-and-drop, canvas drawing, and React Flow nodes. Shows the basics of createSession + Actor interactions.

Scenario source

Narrated Kanban board

AI-narrated walkthrough of a Kanban board lifecycle. The narrator explains each column while the cursor highlights it. Uses session.step(caption, narration, fn) for concurrent speech and actions.

Scenario source

Multi-window collaboration

Records two browser windows side-by-side sharing a real-time synced todo list via Automerge. Demonstrates multi-pane video composition with session.openPage() called twice.

Scenario source

Terminal UI (mc, htop, vim)

Interactive TUI apps (Midnight Commander, htop, vim) running in real PTY terminals via session.createTerminal(). Each terminal is its own pane with a scoped TerminalActor.

Scenario source

Console logging

In-page console panel showing live log output during CRUD operations on a notes app.

Scenario source

Documentation | Auto-generated scenario videos

CLI

npx b2v run my-scenario.ts --mode human --headed
npx b2v run my-scenario.ts --mode fast

Narration options

npx b2v run my-scenario.ts \
  --narrate \
  --voice onyx \
  --narrate-speed 1.0 \
  --realtime-audio

Narration language can be set via environment variable:

B2V_NARRATION_LANGUAGE=ru npx b2v run my-scenario.ts

MCP server (for AI agents)

The MCP server lets AI agents (Cursor, Claude, etc.) interactively control a browser and terminals with human-like behavior, recording, narration, and scenario export. It works alongside Playwright MCP, which connects to the same browser via CDP for page inspection.

For full tool parameters, schemas, and agent workflow details, see SKILL.md.

Setup

Add to your .cursor/mcp.json (or equivalent MCP config):

{
  "mcpServers": {
    "b2v": {
      "command": "npx",
      "args": ["-y", "-p", "browser2video", "b2v-mcp"],
      "env": { "B2V_CDP_PORT": "9222" }
    },
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp", "--cdp-endpoint", "http://localhost:9222"]
    }
  }
}

b2v handles human-like interactions, recording, terminals, and narration. Playwright MCP connects to the same browser via CDP for page inspection (snapshots, screenshots, evaluate). Both servers share the same browser instance through the CDP port.

Two modes of use

Interactive -- the agent controls the browser in real-time, step by step. Good for exploratory workflows, live demos, and building new scenarios on the fly.

Batch -- run a pre-written .ts scenario file as a subprocess. Good for repeatable recordings and CI.

Interactive workflow

1. b2v_start          -- launch browser with recording
2. b2v_open_page      -- open a URL (returns pageId)
3. browser_snapshot    -- (Playwright MCP) inspect the page to find selectors
4. b2v_click / b2v_type / b2v_drag  -- human-like interactions
5. b2v_step            -- mark recording step with subtitle and optional narration
6. b2v_save_scenario   -- export steps as a replayable .ts file
7. b2v_finish          -- compose final video, returns artifact paths

Interactive tools


Session	`b2v_start`, `b2v_finish`, `b2v_status`
Pages / terminals	`b2v_open_page`, `b2v_open_terminal`, `b2v_terminal_send`, `b2v_terminal_read`
Actor interactions	`b2v_click`, `b2v_click_at`, `b2v_type`, `b2v_press_key`, `b2v_hover`, `b2v_drag`, `b2v_scroll`, `b2v_select_text`
Recording / narration	`b2v_step`, `b2v_narrate`
Scenario builder	`b2v_add_step`, `b2v_save_scenario`

Batch tools

Tool	Description
`b2v_run`	Run a pre-written scenario file with recording and narration
`b2v_list_scenarios`	List available scenario files
`b2v_doctor`	Print environment diagnostics

Troubleshooting

ffmpeg not found -- install ffmpeg and make sure it is in your PATH. Run b2v_doctor to verify.
CDP port conflict -- if port 9222 is busy, set a different port via B2V_CDP_PORT env var in both b2v and playwright server configs.
Session already running -- call b2v_finish (or restart the MCP server) before starting a new session.

CI

GitHub Actions runs two jobs on every PR and push to main:

test-fast-docker — all scenarios in fast mode inside Docker (Linux)
test-human — all scenarios in human mode with screencast recording

After merge to main, a deploy workflow records all scenarios and publishes videos to the GH Pages video gallery.

Repo layout

packages/browser2video/  Core library, CLI, MCP server (published as "browser2video")
  examples/              Starter scenario scripts
  bin/                   CLI and MCP shims
  ops/                   MCP operation definitions
  schemas/               Zod schemas
apps/demo/               Vite + React demo app (target under test)
tests/scenarios/         Scenario test files
website/                 Docusaurus documentation site

Architecture

See docs/ARCHITECTURE.md.

API Reference

`createSession(opts?): Promise<Session>`

Create a new recording session. This is the main entry point.

import { createSession } from "browser2video";

const session = await createSession({
  mode: "human",           // "human" | "fast"
  record: true,            // enable video recording
  narration: {
    enabled: true,
    voice: "onyx",         // OpenAI TTS voice
    language: "ru",        // auto-translate narration
    realtime: true,        // play audio through speakers
  },
});

Options (SessionOptions):

Option	Type	Default	Description
`mode`	`"human"` \| `"fast"`	auto	Execution mode
`record`	`boolean`	auto	Enable video recording
`outputDir`	`string`	auto	Artifacts output directory
`headed`	`boolean`	auto	Show browser window
`layout`	`LayoutConfig`	auto	Multi-pane grid layout via dockview
`delays`	`Partial<ActorDelays>`	-	Override actor timing
`ffmpegPath`	`string`	`"ffmpeg"`	Path to ffmpeg binary
`narration`	`NarrationOptions`	-	TTS narration config

`session.openPage(opts?): Promise<{ page, actor }>`

Open a new browser page with optional URL and viewport.

const { page, actor } = await session.openPage({
  url: "http://localhost:5173/kanban",
  viewport: { width: 1280, height: 720 },
});

`session.createTerminal(cmd?, opts?): Promise<TerminalActor>`

Create a terminal pane running a command (or an interactive shell). Auto-starts a PTY server on first call, cleans up on finish().

const mc = await session.createTerminal("mc");       // run mc
const shell = await session.createTerminal();         // interactive shell

await mc.click(0.25, 0.25);                          // click in mc
await shell.typeAndEnter("ls -la");                   // run a command
await shell.waitForPrompt();                          // wait until idle
const output = await shell.readNew();                 // read new output

`session.createGrid(panes, opts?): Promise<GridHandle>`

Create a multi-pane dockview layout with terminals and browser iframes.

const grid = await session.createGrid(
  [
    { url: "http://localhost:3000", label: "Preview" },
    { label: "Editor" },
    { label: "Dev Server" },
  ],
  {
    viewport: { width: 1280, height: 720 },
    grid: [[0, 1], [0, 2]],  // Preview takes left column (2 rows)
  },
);

const [browser, editor, server] = grid.actors;

`session.step(caption, fn)` / `session.step(caption, narration, fn)`

Track a named step with optional concurrent narration.

const { step } = session;

// Simple step
await step("Click login", async () => {
  await actor.click('[data-testid="login-btn"]');
});

// Step with narration (speech and action run concurrently)
await step("Explain dashboard",
  "This is the main dashboard where you can see all your projects.",
  async () => {
    await actor.circleAround('[data-testid="dashboard"]');
  },
);

`session.addCleanup(fn)`

Register a cleanup function that runs automatically when finish() is called. No more try/finally wrappers.

const server = await startServer({ type: "vite", root: "apps/demo" });
session.addCleanup(() => server.stop());

`session.finish(): Promise<SessionResult>`

Stop recording, compose video, mix narration audio, generate subtitles, and run cleanup functions.

const result = await session.finish();
// result.video     — path to MP4
// result.thumbnail — path to PNG (last-frame screenshot)
// result.subtitles — path to WebVTT
// result.metadata  — path to JSON
// result.durationMs

Actor methods

The Actor provides human-like browser interactions:

Method	Description
`actor.click(selector)`	Click an element with cursor movement and click effect
`actor.clickLocator(locator)`	Click a Playwright Locator (moves cursor first)
`actor.type(selector, text)`	Type text — auto-detects xterm.js terminals vs DOM inputs
`actor.typeAndEnter(selector, text)`	Type text and press Enter
`actor.pressKey(key)`	Press a keyboard key (e.g. `"Tab"`, `"ArrowDown"`, `"F3"`)
`actor.clickAt(x, y)`	Click at specific page coordinates (for canvas/terminal)
`actor.scroll(selector, deltaY)`	Scroll within an element or the page
`actor.drag(from, to)`	Drag from one element to another
`actor.draw(canvas, points)`	Draw on a canvas (normalized 0-1 coordinates)
`actor.circleAround(selector)`	Trace a spiral path around an element (for highlighting)
`actor.hover(selector)`	Move the cursor smoothly over an element
`actor.selectText(from, to?)`	Select text by dragging between elements
`actor.moveCursorTo(x, y)`	Move the cursor overlay to coordinates
`actor.goto(url)`	Navigate to a URL (auto-injects cursor)
`actor.waitFor(selector)`	Wait for an element to appear

TerminalActor methods

session.createTerminal(cmd?, opts?) returns a TerminalActor scoped to its terminal pane:

Method	Description
`term.click(relX, relY)`	Click at relative position within the terminal
`term.type(text)`	Type text into the terminal
`term.typeAndEnter(text)`	Type text and press Enter
`term.waitForText(includes)`	Wait for text to appear in terminal output
`term.waitForPrompt()`	Wait for a shell prompt (`$` or `#`)
`term.isBusy()`	Check if terminal is running a command
`term.waitUntilIdle()`	Wait until terminal is idle (prompt visible)
`term.read()`	Read all visible terminal text
`term.readNew()`	Read only new lines since last `read()`/`readNew()`

`startServer(config): Promise<ManagedServer>`

Start a dev server (Vite, Next.js, static, or custom command).

import { startServer } from "browser2video";

const server = await startServer({ type: "vite", root: "apps/demo" });
console.log(server.baseURL); // "http://localhost:5173"

`NarrationOptions`

Option	Type	Default	Description
`enabled`	`boolean`	`false`	Enable narration
`voice`	`string`	`"ash"`	OpenAI TTS voice
`speed`	`number`	`1.0`	Speech speed (0.25-4.0)
`model`	`string`	`"tts-1"`	OpenAI TTS model
`apiKey`	`string`	env	OpenAI API key
`cacheDir`	`string`	`.cache/tts`	Cache directory
`realtime`	`boolean`	`false`	Play through speakers
`language`	`string`	-	Auto-translate to language code

Environment variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key for narration (auto-enables in human mode when present)
`B2V_MODE`	Override execution mode (`human` / `fast`)
`B2V_RECORD`	Override recording (`true` / `false`)
`B2V_VOICE`	Override TTS voice
`B2V_NARRATION_SPEED`	Override narration speed
`B2V_NARRATION_LANGUAGE`	Override narration language (e.g. `ru`)
`B2V_REALTIME_AUDIO`	Enable realtime audio (`true`)

Playwright re-exports

For advanced usage, Playwright types and launchers are re-exported:

import { chromium, type Page, type Locator } from "browser2video";

Development (contributing)

git clone https://github.com/holiber/browser2video.git
cd browser2video
pnpm install
npx playwright install --with-deps chromium

# Run a scenario
node tests/scenarios/basic-ui.test.ts

# Typecheck
pnpm typecheck

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.github		.github
agents		agents
apps		apps
docs		docs
packages		packages
scripts		scripts
tests		tests
website		website
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
Tiltfile		Tiltfile
package.json		package.json
playwright.config.ts		playwright.config.ts
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Browser2Video

Quick start

Requirements

Starter examples

Video examples

Single-actor UI demo

Narrated Kanban board

Multi-window collaboration

Terminal UI (mc, htop, vim)

Console logging

CLI

Narration options

MCP server (for AI agents)

Setup

Two modes of use

Interactive workflow

Interactive tools

Batch tools

Troubleshooting

CI

Repo layout

Architecture

API Reference

createSession(opts?): Promise<Session>

session.openPage(opts?): Promise<{ page, actor }>

session.createTerminal(cmd?, opts?): Promise<TerminalActor>

session.createGrid(panes, opts?): Promise<GridHandle>

session.step(caption, fn) / session.step(caption, narration, fn)

session.addCleanup(fn)

session.finish(): Promise<SessionResult>

Actor methods

TerminalActor methods

startServer(config): Promise<ManagedServer>

NarrationOptions

Environment variables

Playwright re-exports

Development (contributing)

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`createSession(opts?): Promise<Session>`

`session.openPage(opts?): Promise<{ page, actor }>`

`session.createTerminal(cmd?, opts?): Promise<TerminalActor>`

`session.createGrid(panes, opts?): Promise<GridHandle>`

`session.step(caption, fn)` / `session.step(caption, narration, fn)`

`session.addCleanup(fn)`

`session.finish(): Promise<SessionResult>`

`startServer(config): Promise<ManagedServer>`

`NarrationOptions`

Packages