Releases: jbrecht/tutorial-forge
v0.8.0 — screencast recorder
New: --recorder screencast captures via CDP with explicit per-frame timestamps. The assembled video starts exactly at the recording clock's zero — the calibration-flash mechanism (source of two past bugs) is skipped entirely for screencast renders, and frames arrive only when content changes. recordVideo remains the default; both live behind a new Recorder interface. Honest scope note: Chromium delivers screencast frames at CSS-viewport size regardless of deviceScaleFactor, so high-DPI capture (the issue's other goal) is not achievable this way — documented on #5. Closes #5.
v0.7.0 — GIF export
New: --gif writes an optimized animated GIF next to the MP4 — narration captions burned in (GIFs are silent), two-pass palette, configurable width/fps. --gif-steps open-modal..create-event excerpts a step range, resolved from the timing manifest (and remapped through --idle-speedup when active). The attached GIF was produced by exactly that command against the example app. Closes #4.
v0.6.0 — burned-in captions everywhere
subtitles: 'burn' now works on every ffmpeg build. Instead of libass (which Homebrew's ffmpeg 8 no longer ships), each cue is rendered as a transparent caption pill by the same browser that records the tutorial, then composited with ffmpeg's built-in overlay filter — identical rendering on every machine, styled with CSS (captionStyle: { fontSizePx, maxWidthPx, bottomMarginPx }). Captions composite after scale and zoom, so --zoom never distorts them, and cue timing shares the same map as SRT/localization/idle-speedup. Closes #7.
v0.5.0 — idle speed-up
New: --idle-speedup (or idleSpeedup: true | { maxIdleMs, speed }) fast-forwards the boring parts — spinners, slow loads, long silent waits — at 3x by default. The design guarantee: narration playback and click choreography always play at 1x; only narration-free spans longer than maxIdleMs compress, with gentle 1x margins at each edge. Audio offsets, subtitle cues, and --zoom windows all remap through the same time map, in the same single ffmpeg pass. In the e2e suite a tutorial with a 4s silent wait renders 10.5s → 7.5s with narration still in sync. Closes #3.
v0.4.0 — failure diagnostics
Real apps fail in interesting ways; now the pipeline tells you how. Every step failure throws a StepError with artifact paths: a screenshot at failure and the recent browser console/pageerror/failed-request log. Run with --debug for the full treatment: a Playwright trace (npx playwright show-trace trace.zip), the complete console log, and before/after screenshots per step — work dir always kept. doctor now also flags ffmpeg builds without libass (subtitles: 'burn' needs it; Homebrew's ffmpeg 8 dropped it), and burn mode fails fast with a clear message instead of a cryptic filter error. Closes #6.
v0.3.0 — zoom-on-callout
New: --zoom (or zoom: true | { factor } in config) smoothly zooms toward each click target and back out — the camera leads the click by a beat, holds through what the click reveals, then releases. Composited in the post phase from callout boxes already in the timing manifest, so it adds nothing to recording time and is fully deterministic. Default factor 1.35.
The attached video is the example tutorial rendered with --zoom. Closes #2.
v0.2.1
Fix: with a fast adapter setup(), the pre-roll trim point could land inside the magenta calibration flash, leaving solid-magenta opening frames (which also became the video thumbnail). The record phase now holds briefly after the flash and after setup so the final video opens on the settled app; the post phase clamps the trim point to never start before the flash has cleared. The e2e suite now asserts the output contains no flash. Demo assets regenerated.
v0.2.0 — localization
Render any tutorial in any language from one spec. Drop a <tutorial>.<lang>.json translation file (keyed by step id) next to the tutorial and run tutorial-forge render --lang es,fr. Each language is a full pipeline run — narration is re-synthesized and re-measured, so pacing adapts to each language's real speech duration. Outputs <id>.<lang>.mp4 + .srt.
config.languagesto render a language set by default;--langto overridettsByLangfor per-language voices (ElevenLabs multilingual usually needs none)- Adapter and step callbacks now receive
ctx.langfor localized apps (backward compatible) tutorial-forge listshows available languages- The attached
getting-started.es.mp4is the example app's tutorial in Spanish — generated from the same spec as the English one, plus one 8-line JSON file.
v0.1.1
Fix: the click-highlight callout ring now plays out fully before the click fires. Previously it sat above app content with a near-max z-index, so a click that opened a modal left the bright ring floating over the dimmed backdrop for ~800ms — visible flashing. The ring now announces the click target, fades, and then the click happens; the brief pulse remains as click feedback.
Demo assets regenerated with the fix.
v0.1.0 — first release
First public release. tutorial-forge turns scripted Playwright walkthroughs into narrated tutorial videos: tutorials are source code — when your UI changes, re-render instead of re-recording.
Install
pnpm add -D tutorial-forge tutorial-forge-cli playwright
npx playwright install chromium
pnpm exec tutorial-forge doctor # checks node, ffmpeg, browsers, TTS keysHighlights
- Three-phase pipeline: TTS → record → post. Narration is synthesized and measured first; the browser holds each step on screen as long as its narration plays.
- TTS providers: ElevenLabs, OpenAI, Piper (local), Silent (CI) — with a content-hash cache so unchanged lines never re-synthesize.
- Steps receive the raw Playwright
Page; an instrumentation proxy adds an animated cursor and click-highlight callouts without wrapping Playwright's API. - Deterministic and CI-friendly: headless, fixed viewport, timing manifest, sidecar SRT subtitles, single-invocation FFmpeg merge.
The attached getting-started.mp4 was generated entirely by this release from the in-repo example app.