Skip to content

holiber/browser2video

Browser2Video

Record smooth browser and terminal automation videos (MP4 @ 60fps) with subtitles, narration, and step metadata.

  • Modes: human (cursor overlay, click effects, natural pacing) or fast (no delays)
  • Recording: per-page CDP screencast, OS/Xvfb capture, or headless
  • Narration: OpenAI TTS with auto-translation, realtime playback, cached in .cache/tts/
  • Terminals: Real PTY terminals (mc, htop, vim) with xterm.js rendering
  • Multi-pane: Dockview-based grid layout with titled panels and dynamic tabs

Quick start

npm install browser2video
npx b2v doctor  # check environment

Create a scenario file my-scenario.ts:

import { createSession } from "browser2video";

const session = await createSession({ mode: "human", record: true });
const { step } = session;
const { actor } = await session.openPage({ url: "https://example.com" });

await step("Open page", async () => {
  await actor.waitFor("h1");
});

await step("Click the link", async () => {
  await actor.click("a");
});

const result = await session.finish();
console.log("Video:", result.video);

Run it:

npx b2v run my-scenario.ts --mode human --headed

Requirements

  • Node.js >= 22
  • ffmpeg in PATH (for video composition / audio mixing)

Starter examples

The package includes ready-to-run examples:

npx b2v run node_modules/browser2video/examples/simple-browser.ts
npx b2v run node_modules/browser2video/examples/terminal-echo.ts

Video examples

All videos are auto-generated on every push to main. Watch them at the video gallery.

Single-actor UI demo

A single browser page with form inputs, scrolling, drag-and-drop, canvas drawing, and React Flow nodes. Shows the basics of createSession + Actor interactions.

Scenario source

Basic UI demo

Narrated Kanban board

AI-narrated walkthrough of a Kanban board lifecycle. The narrator explains each column while the cursor highlights it. Uses session.step(caption, narration, fn) for concurrent speech and actions.

Scenario source

Kanban scenario

Multi-window collaboration

Records two browser windows side-by-side sharing a real-time synced todo list via Automerge. Demonstrates multi-pane video composition with session.openPage() called twice.

Scenario source

Collab scenario

Terminal UI (mc, htop, vim)

Interactive TUI apps (Midnight Commander, htop, vim) running in real PTY terminals via session.createTerminal(). Each terminal is its own pane with a scoped TerminalActor.

Scenario source

TUI terminals

Console logging

In-page console panel showing live log output during CRUD operations on a notes app.

Scenario source

Console logs

Documentation | Auto-generated scenario videos

CLI

npx b2v run my-scenario.ts --mode human --headed
npx b2v run my-scenario.ts --mode fast

Narration options

npx b2v run my-scenario.ts \
  --narrate \
  --voice onyx \
  --narrate-speed 1.0 \
  --realtime-audio

Narration language can be set via environment variable:

B2V_NARRATION_LANGUAGE=ru npx b2v run my-scenario.ts

MCP server (for AI agents)

The MCP server lets AI agents (Cursor, Claude, etc.) interactively control a browser and terminals with human-like behavior, recording, narration, and scenario export. It works alongside Playwright MCP, which connects to the same browser via CDP for page inspection.

For full tool parameters, schemas, and agent workflow details, see SKILL.md.

Setup

Add to your .cursor/mcp.json (or equivalent MCP config):

{
  "mcpServers": {
    "b2v": {
      "command": "npx",
      "args": ["-y", "-p", "browser2video", "b2v-mcp"],
      "env": { "B2V_CDP_PORT": "9222" }
    },
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp", "--cdp-endpoint", "http://localhost:9222"]
    }
  }
}

b2v handles human-like interactions, recording, terminals, and narration. Playwright MCP connects to the same browser via CDP for page inspection (snapshots, screenshots, evaluate). Both servers share the same browser instance through the CDP port.

Two modes of use

Interactive -- the agent controls the browser in real-time, step by step. Good for exploratory workflows, live demos, and building new scenarios on the fly.

Batch -- run a pre-written .ts scenario file as a subprocess. Good for repeatable recordings and CI.

Interactive workflow

1. b2v_start          -- launch browser with recording
2. b2v_open_page      -- open a URL (returns pageId)
3. browser_snapshot    -- (Playwright MCP) inspect the page to find selectors
4. b2v_click / b2v_type / b2v_drag  -- human-like interactions
5. b2v_step            -- mark recording step with subtitle and optional narration
6. b2v_save_scenario   -- export steps as a replayable .ts file
7. b2v_finish          -- compose final video, returns artifact paths

Interactive tools

Session b2v_start, b2v_finish, b2v_status
Pages / terminals b2v_open_page, b2v_open_terminal, b2v_terminal_send, b2v_terminal_read
Actor interactions b2v_click, b2v_click_at, b2v_type, b2v_press_key, b2v_hover, b2v_drag, b2v_scroll, b2v_select_text
Recording / narration b2v_step, b2v_narrate
Scenario builder b2v_add_step, b2v_save_scenario

Batch tools

Tool Description
b2v_run Run a pre-written scenario file with recording and narration
b2v_list_scenarios List available scenario files
b2v_doctor Print environment diagnostics

Troubleshooting

  • ffmpeg not found -- install ffmpeg and make sure it is in your PATH. Run b2v_doctor to verify.
  • CDP port conflict -- if port 9222 is busy, set a different port via B2V_CDP_PORT env var in both b2v and playwright server configs.
  • Session already running -- call b2v_finish (or restart the MCP server) before starting a new session.

CI

GitHub Actions runs two jobs on every PR and push to main:

  • test-fast-docker — all scenarios in fast mode inside Docker (Linux)
  • test-human — all scenarios in human mode with screencast recording

After merge to main, a deploy workflow records all scenarios and publishes videos to the GH Pages video gallery.

Repo layout

packages/browser2video/  Core library, CLI, MCP server (published as "browser2video")
  examples/              Starter scenario scripts
  bin/                   CLI and MCP shims
  ops/                   MCP operation definitions
  schemas/               Zod schemas
apps/demo/               Vite + React demo app (target under test)
tests/scenarios/         Scenario test files
website/                 Docusaurus documentation site

Architecture

See docs/ARCHITECTURE.md.


API Reference

createSession(opts?): Promise<Session>

Create a new recording session. This is the main entry point.

import { createSession } from "browser2video";

const session = await createSession({
  mode: "human",           // "human" | "fast"
  record: true,            // enable video recording
  narration: {
    enabled: true,
    voice: "onyx",         // OpenAI TTS voice
    language: "ru",        // auto-translate narration
    realtime: true,        // play audio through speakers
  },
});

Options (SessionOptions):

Option Type Default Description
mode "human" | "fast" auto Execution mode
record boolean auto Enable video recording
outputDir string auto Artifacts output directory
headed boolean auto Show browser window
layout LayoutConfig auto Multi-pane grid layout via dockview
delays Partial<ActorDelays> - Override actor timing
ffmpegPath string "ffmpeg" Path to ffmpeg binary
narration NarrationOptions - TTS narration config

session.openPage(opts?): Promise<{ page, actor }>

Open a new browser page with optional URL and viewport.

const { page, actor } = await session.openPage({
  url: "http://localhost:5173/kanban",
  viewport: { width: 1280, height: 720 },
});

session.createTerminal(cmd?, opts?): Promise<TerminalActor>

Create a terminal pane running a command (or an interactive shell). Auto-starts a PTY server on first call, cleans up on finish().

const mc = await session.createTerminal("mc");       // run mc
const shell = await session.createTerminal();         // interactive shell

await mc.click(0.25, 0.25);                          // click in mc
await shell.typeAndEnter("ls -la");                   // run a command
await shell.waitForPrompt();                          // wait until idle
const output = await shell.readNew();                 // read new output

session.createGrid(panes, opts?): Promise<GridHandle>

Create a multi-pane dockview layout with terminals and browser iframes.

const grid = await session.createGrid(
  [
    { url: "http://localhost:3000", label: "Preview" },
    { label: "Editor" },
    { label: "Dev Server" },
  ],
  {
    viewport: { width: 1280, height: 720 },
    grid: [[0, 1], [0, 2]],  // Preview takes left column (2 rows)
  },
);

const [browser, editor, server] = grid.actors;

session.step(caption, fn) / session.step(caption, narration, fn)

Track a named step with optional concurrent narration.

const { step } = session;

// Simple step
await step("Click login", async () => {
  await actor.click('[data-testid="login-btn"]');
});

// Step with narration (speech and action run concurrently)
await step("Explain dashboard",
  "This is the main dashboard where you can see all your projects.",
  async () => {
    await actor.circleAround('[data-testid="dashboard"]');
  },
);

session.addCleanup(fn)

Register a cleanup function that runs automatically when finish() is called. No more try/finally wrappers.

const server = await startServer({ type: "vite", root: "apps/demo" });
session.addCleanup(() => server.stop());

session.finish(): Promise<SessionResult>

Stop recording, compose video, mix narration audio, generate subtitles, and run cleanup functions.

const result = await session.finish();
// result.video     — path to MP4
// result.thumbnail — path to PNG (last-frame screenshot)
// result.subtitles — path to WebVTT
// result.metadata  — path to JSON
// result.durationMs

Actor methods

The Actor provides human-like browser interactions:

Method Description
actor.click(selector) Click an element with cursor movement and click effect
actor.clickLocator(locator) Click a Playwright Locator (moves cursor first)
actor.type(selector, text) Type text — auto-detects xterm.js terminals vs DOM inputs
actor.typeAndEnter(selector, text) Type text and press Enter
actor.pressKey(key) Press a keyboard key (e.g. "Tab", "ArrowDown", "F3")
actor.clickAt(x, y) Click at specific page coordinates (for canvas/terminal)
actor.scroll(selector, deltaY) Scroll within an element or the page
actor.drag(from, to) Drag from one element to another
actor.draw(canvas, points) Draw on a canvas (normalized 0-1 coordinates)
actor.circleAround(selector) Trace a spiral path around an element (for highlighting)
actor.hover(selector) Move the cursor smoothly over an element
actor.selectText(from, to?) Select text by dragging between elements
actor.moveCursorTo(x, y) Move the cursor overlay to coordinates
actor.goto(url) Navigate to a URL (auto-injects cursor)
actor.waitFor(selector) Wait for an element to appear

TerminalActor methods

session.createTerminal(cmd?, opts?) returns a TerminalActor scoped to its terminal pane:

Method Description
term.click(relX, relY) Click at relative position within the terminal
term.type(text) Type text into the terminal
term.typeAndEnter(text) Type text and press Enter
term.waitForText(includes) Wait for text to appear in terminal output
term.waitForPrompt() Wait for a shell prompt ($ or #)
term.isBusy() Check if terminal is running a command
term.waitUntilIdle() Wait until terminal is idle (prompt visible)
term.read() Read all visible terminal text
term.readNew() Read only new lines since last read()/readNew()

startServer(config): Promise<ManagedServer>

Start a dev server (Vite, Next.js, static, or custom command).

import { startServer } from "browser2video";

const server = await startServer({ type: "vite", root: "apps/demo" });
console.log(server.baseURL); // "http://localhost:5173"

NarrationOptions

Option Type Default Description
enabled boolean false Enable narration
voice string "ash" OpenAI TTS voice
speed number 1.0 Speech speed (0.25-4.0)
model string "tts-1" OpenAI TTS model
apiKey string env OpenAI API key
cacheDir string .cache/tts Cache directory
realtime boolean false Play through speakers
language string - Auto-translate to language code

Environment variables

Variable Description
OPENAI_API_KEY OpenAI API key for narration (auto-enables in human mode when present)
B2V_MODE Override execution mode (human / fast)
B2V_RECORD Override recording (true / false)
B2V_VOICE Override TTS voice
B2V_NARRATION_SPEED Override narration speed
B2V_NARRATION_LANGUAGE Override narration language (e.g. ru)
B2V_REALTIME_AUDIO Enable realtime audio (true)

Playwright re-exports

For advanced usage, Playwright types and launchers are re-exported:

import { chromium, type Page, type Locator } from "browser2video";

Development (contributing)

git clone https://github.com/holiber/browser2video.git
cd browser2video
pnpm install
npx playwright install --with-deps chromium

# Run a scenario
node tests/scenarios/basic-ui.test.ts

# Typecheck
pnpm typecheck

About

create videos from browser

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors