fix: render robustness — sub-comp src paths, alpha tag case, encoder + matter improvements#627
Conversation
miguel-heygen
left a comment
There was a problem hiding this comment.
Verdict: Approve
Solid, well-motivated bugfix PR. Each change traces back to a real production symptom with a clear root cause. No blockers — a few things worth discussing for follow-ups.
Concerns
-bf 0 only covers software h264 paths (chunkEncoder.ts:142-163, streamingEncoder.ts:217-236)
The B-frame disable is inside the libx264/libx265 branch only. GPU h264 paths (nvenc, qsv, vaapi) bypass it — nvenc emits B-frames by default. The unconditional -avoid_negative_ts make_zero at the mux level mostly covers it, but if the goal is "files play everywhere," GPU-encoded h264 can still hit the same negative-DTS freeze. At minimum, a comment acknowledging the mux-level fallback would help the next person. Follow-up candidate: push -bf 0 into GPU branches too.
resolveProjectRelativeSrc strips all leading ../ unconditionally (videoFrameExtractor.ts:~476-489)
The regex src.replace(/^(\.\.[\\/])+/, "") clamps every leading ../ segment. If a monorepo legitimately has ../assets/ as a valid sibling directory, and a same-named file exists inside the project, the resolver silently prefers the clamped path. Low probability since existsSync + literal-join-first mitigate it, but worth a comment documenting the precedence choice. Also: a path like assets/../../foo (escapes after the first segment) won't be caught — path.join collapses it past baseDir.
readyState >= 2 may surface new timeouts (frameCapture.ts)
Bumping from HAVE_METADATA to HAVE_CURRENT_DATA is correct — you need actual frame data, not just dimensions. But compositions with many HEVC/VP9 sources on slow CI cold-starts may now hit the videoDeadline (reuses playerReadyTimeout) where they previously passed by getting lucky with a black frame 0. Worth monitoring after merge.
Suggestions
Case-insensitive tag lookup (ffprobe.ts:280-284)
tags.alpha_mode ?? tags.ALPHA_MODE covers the two known cases. A more future-proof approach:
const alphaMode = Object.entries(videoStream.tags ?? {})
.find(([k]) => k.toLowerCase() === "alpha_mode")?.[1] ?? "";Handles any future casing variant as ffmpeg/libav evolves. Current fix is pragmatic — not blocking on this.
Consider deduping stderr warnings (videoFrameExtractor.ts:540-547)
The new stderr warning fires per-video inside parallel iteration. A comp with 50 broken sub-comp videos = 50 identical warnings. Consider collecting unique video.src values and logging a single summary.
resolveProjectRelativeSrc debug logging — The function warns on miss but is silent on success. A debug-level log showing which candidate was resolved would help authors troubleshoot path issues without needing to add their own logging.
BT.709 tags on ProRes path (pipeline.ts)
The BT.709 colorspace tags are only added to the webm encoder path. ProRes typically carries its own color metadata, but worth confirming ProRes 4444 defaults to BT.709 in the ffmpeg version used — same color shift could occur with .mov output otherwise.
Nits
pipeline.ts:Quality/QUALITIES/QUALITY_CRF/DEFAULT_QUALITY/isQualityis a lot of surface for three strings.const QUALITY_CRF = { fast: 30, balanced: 18, best: 12 } as constwithQuality = keyof typeof QUALITY_CRFwould collapse it. Non-blocking.videoFrameExtractor.test.ts: dynamicrequire("node:fs")calls inside an ES module test work via Vitest but are inconsistent with the rest of the file's imports.- SKILL.md and patterns.md both contain near-identical text-behind-subject examples — reference one from the other to avoid drift.
renderOrchestrator.ts:2320still usesv.src.startsWith("/")instead ofisAbsolute(v.src)— the same Windows footgun the audioMixer comment warns about. Pre-existing, but since you're centralizing path resolution, worth catching in this pass.
What's good
- Root-cause analysis in every commit is textbook. Each commit tells a complete story — symptom, cause, fix, verification. The kind of commit messages that make
git blameuseful two years from now. - The Sharp
.toColourspace("b-w")fix is elegant. Single line fixing a fundamental buffer-layout misunderstanding, with a comment explaining exactly why Sharp upcasts. Maximum signal-to-code ratio. resolveProjectRelativeSrcis a clean abstraction that kills duplication across 3 call sites (videoFrameExtractor, audioMixer, renderOrchestrator). Browser-clamping semantics are well-documented in JSDoc.- Test coverage is appropriate — regression tests for alpha tag casing, path resolver (5 cases including precedence), BT.709 tags, quality presets. Not over-tested, not under-tested.
- The
-bf 0+-avoid_negative_ts make_zerobelt-and-suspenders is the right default for a tool producing files consumed by arbitrary players. Correctness over bitrate efficiency. - Documentation is practical — the three-pattern compositing table and "two non-obvious rules" read like they were written by someone who hit these problems and wants to save the next person hours.
— Review by Magi
vanceingalls
left a comment
There was a problem hiding this comment.
Verdict — Approve with nits
Important
-
chunkEncoder.ts/streamingEncoder.ts:-bf 0is added inside the SW (libx264/libx265) branch only. HW h264 paths (nvenc, qsv, vaapi) bypass it — nvenc in particular emits B-frames by default. If the goal is "files play everywhere," GPU-encoded h264 outputs can still hit the same negative-DTS freeze. The-avoid_negative_ts make_zeroyou added unconditionally below mostly covers it at the mux level, but tighter to also push-bf 0for h264 in the GPU branches (nvenc:-bf 0, qsv:-b_strategy 0 -bf 0, vaapi:-bf 0). At minimum, leave a comment that we're relying on-avoid_negative_tsto cover GPU h264. (packages/engine/src/services/chunkEncoder.ts:142-163,streamingEncoder.ts:217-236) -
resolveProjectRelativeSrconly clamps a leading../. A path likeassets/../../foo(escapes only after the first segment) won't be caught —path.joincollapses it to<parentOfBase>/foowhich silently escapes baseDir the same way. Low likelihood in practice, but worth either documenting the limitation or doing one extra check: if the joined result is outsideprojectDir, attempt a clamp. (packages/engine/src/services/videoFrameExtractor.ts:476-489) -
The new resolver is shared by
audioMixer,videoFrameExtractor, andrenderOrchestrator, but each call site still has its own absolute/HTTP guards inline.renderOrchestrator.ts:2320still usesv.src.startsWith("/")instead ofisAbsolute(v.src)— the same Windows footgun the audioMixer comment explicitly warns about. Either keep the guards consistent (useisAbsolute) or move them inside the resolver. (packages/producer/src/services/renderOrchestrator.ts:2320)
Nits
ffprobe.ts:280-284: case-insensitive read is good; consider folding into a single helperreadTagCI(tags, "alpha_mode")— pattern will recur for other sidecar tags as libavformat versions them.videoFrameExtractor.ts:540-547: the new stderr warning is great, but it fires per-video insidePromise.all-style iteration. If a comp has 50 broken sub-comp videos = 50 writes. Consider deduping byvideo.src.pipeline.ts:Quality/QUALITIES/QUALITY_CRF/DEFAULT_QUALITY/isQualityis a lot of surface for three strings. A singleconst QUALITY_CRF = { fast: 30, balanced: 18, best: 12 } as constwithQuality = keyof typeof QUALITY_CRFwould do it. Non-blocking.frameCapture.ts: bumpingreadyState >= 1to>= 2is correct, but in the begin-frame loop thevideoDeadlinereusesplayerReadyTimeout— if that's tight (default ~5s), HEVC/VP9 cold-starts on slow CI may now time out where they previously passed. Worth a CI watch.
What's good
- Each commit is a real production bug with a clean root-cause explanation in the comments — the
ALPHA_MODEcase fix and the Sharpb-wcolorspace fix in particular are the kind of "silent disagreement between two layers" bugs that're hard to spot without the smoking gun. Comments on disk are excellent. - BT.709 + limited-range tags on the cutout encoder is the right fix and is pinned by tests (
pipeline.test.ts:46-77). - Test coverage on the resolver (5 cases incl.
compiledDirprecedence + miss returning baseDir join) and on the uppercaseALPHA_MODEregression is exactly what these silent-data-bug fixes need. - Loud stderr warning when a video src can't resolve — turns the worst class of bug here (silent first-frame freeze) into something authors will actually see.
— Review by Vai
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
- engine/chunkEncoder, engine/streamingEncoder: extend `-bf 0` to GPU h264
paths (nvenc, qsv, vaapi) and `-b_strategy 0` for qsv so GPU-encoded
outputs avoid negative-DTS freezes too — not just SW libx264.
- engine/videoFrameExtractor: detect mid-path traversal (e.g.
`assets/../../foo.mp4`) by normalizing first and re-anchoring at the
project root. Adds a regression test.
- engine/videoFrameExtractor: dedupe stderr "src not resolvable" warnings
by `video.src` so a comp with N broken sources logs once, not N times.
- engine/videoFrameExtractor.test: drop dynamic `require("node:fs")`,
use ES `import { writeFileSync } from "node:fs"`.
- engine/ffprobe: extract `readTagCI` helper for case-insensitive ffprobe
tag reads (will recur for other libavformat-versioned sidecar tags).
- cli/background-removal/pipeline: collapse Quality / QUALITIES /
QUALITY_CRF / DEFAULT_QUALITY / isQuality surface using
`Quality = keyof typeof QUALITY_CRF`.
- producer/renderOrchestrator: replace `v.src.startsWith("/")` with
`isAbsolute(v.src)` in the HDR probe path so Windows absolute paths
(`C:\...`) aren't treated as relative — matches the audioMixer guard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…remove-background
- inference.ts: force `.toColourspace("b-w")` on the resized mask. Sharp upcasts
the 1-channel raw input to RGB-interleaved during resize, so `fullMask[i]`
was reading R,G,B,R,G,B... of pixels 0..691199 instead of the alpha for all
2,073,600 pixels. Visible symptom: horizontal scanline alpha artifact in
every transparent webm — the avatar appeared semi-transparent throughout.
- pipeline.ts: add BT.709 + limited-range colorspace tags so Chrome's YUV→RGB
matches the source mp4 (without these, ffmpeg's default RGB→YUV is BT.601
and skin tones drift visibly when the cutout is overlaid on its source).
- pipeline.ts: add Quality preset type ("fast"/"balanced"/"best" → CRF 30/18/12).
Default raised from CRF 30 → 18 ("balanced") so the most common pattern
(text-behind-subject) works out of the box without visible doubling.
- remove-background.ts: wire `--quality` flag with validation, +2 examples.
- Tests: BT.709 tags present, quality preset → CRF mapping, default is balanced.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lay in every player Three related render robustness fixes: 1. frameCapture.ts: bump videos-ready check from `readyState >= 1` (HAVE_METADATA — only dimensions known) to `>= 2` (HAVE_CURRENT_DATA — first frame is rasterized). Without this, when two `<video>` elements with different codecs (h264 mp4 + VP9 webm) decode at different rates, the faster one passes readiness while the slower one still hasn't painted, producing a black "first frame" for the slower clip. 2. chunkEncoder.ts (libx264 path) + streamingEncoder.ts: disable B-frames for h264 (`-bf 0`). Standard libx264 with B-frames produces negative DTS at stream start (the first B-frame's decode order is "before" the first I-frame's presentation time). VS Code preview, several browser <video> implementations, and some HW decoders freeze on the first frame and only audio plays. -bf 0 makes PTS == DTS at every frame, eliminating the issue at the source. Quality cost is ~5–10% larger files at the same CRF — worthwhile for "the file plays everywhere". 3. chunkEncoder.ts (encoder + mux paths): add `-avoid_negative_ts make_zero` as belt-and-suspenders against negative DTS sneaking back in via `-c:v copy` mux passes when audio/video PTS bases differ. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… extraction misses A <video src='../assets/foo.mp4'> inside a sub-composition silently dropped from extraction; the rendered output froze on the first decoded frame for the entire clip, with no error in stdout. Root cause: browser URL resolver clamps '..' at origin root (studio preview loads fine), but path.join(projectDir, '../assets/foo.mp4') normalizes to parent-of-project/assets/foo.mp4, which usually doesn't exist. existsSync returns false, extraction is skipped, no frame lookup is built, the per-frame injector has nothing to swap, and the <video> element's first decoded frame paints every screenshot. - Adds resolveProjectRelativeSrc in videoFrameExtractor that mirrors browser clamping (literal join first, then leading '..' stripped). - Surfaces a loud stderr warning when the resolver misses. - Mirrors fix in audioMixer.ts (same bug for <audio src='../'>) and renderOrchestrator HDR probe loop. - +6 regression tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skill (hyperframes-cli): three-pattern table (cutout-over-different-scene vs over-its-own-source vs over-different-take) + the two non-obvious rules (wrap video in non-timed div for opacity control, both videos data-start=0 for sync). Skill (hyperframes/patterns): worked text-behind-subject example. Docs: --quality flag, compositing pitfalls section, quality preset table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Newer libavformat builds write the VP9-alpha sidecar tag as 'ALPHA_MODE' (uppercase); older builds write 'alpha_mode'. ffprobe.ts only checked the lowercase form, so files produced by recent ffmpeg encoders (including the output of 'hyperframes remove-background' itself) were misclassified as having no alpha channel. Knock-on effect: the producer extracted them as JPGs (no alpha), the injected <img> overlays were fully opaque rectangles, and any element below them on the z-stack (text, captions, other layers) silently disappeared from the rendered output — even though the studio preview rendered the same composition correctly via native <video> playback. Symptom in our repro: a text-behind-subject composition showed the headline correctly in studio preview but the production render covered the headline entirely with the opaque avatar image. Fix: read videoStream.tags.alpha_mode OR videoStream.tags.ALPHA_MODE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locks in the case-insensitive behavior alongside the existing alpha_mode (lowercase) test. If either path regresses, the producer would silently extract alpha-having webms as opaque JPGs and the injected <img> overlays would cover every element below them on the z-stack — a bug that doesn't surface in the studio preview, only in production renders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng on tags Tag-based alpha detection (alpha_mode / ALPHA_MODE / pix_fmt yuva*) is fundamentally brittle. Failure modes seen in the wild: - case-sensitivity across ffmpeg versions (alpha_mode vs ALPHA_MODE) - older muxers that omit the sidecar tag entirely - mp4-as-webm rewraps that drop the tag - ffprobe reporting yuv420p for VP9-with-alpha because the alpha plane lives in a Matroska BlockAdditional sidecar, not the main pix_fmt Each of those silently strips alpha at extraction time. The bug doesn't surface until the rendered output is missing layers — frustrating to debug, silent in stdout. The previous case-insensitive fix patched one of the failure modes; this commit removes the class. The robust alternative is codec-based: any bitstream that CAN carry alpha (VP9, VP8, ProRes 4444) gets the alpha-aware decoder and PNG output by default, regardless of what the tag says. The cost is a small file-size increase on opaque VP9/VP8 sources (cached PNGs vs JPGs); the benefit is no class of silent alpha loss from tag misdetection. - Adds codecMayHaveAlpha() + decoderForCodec() helpers and exports them. - Updates extractVideoFramesRange to force libvpx-vp9 / libvpx for VP9 / VP8 unconditionally (was: only when metadata.hasAlpha). - Updates resolveFrameFormat to default to PNG for any alpha-capable codec (was: only when metadata.hasAlpha). - +4 unit tests covering the codec table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- engine/chunkEncoder, engine/streamingEncoder: extend `-bf 0` to GPU h264
paths (nvenc, qsv, vaapi) and `-b_strategy 0` for qsv so GPU-encoded
outputs avoid negative-DTS freezes too — not just SW libx264.
- engine/videoFrameExtractor: detect mid-path traversal (e.g.
`assets/../../foo.mp4`) by normalizing first and re-anchoring at the
project root. Adds a regression test.
- engine/videoFrameExtractor: dedupe stderr "src not resolvable" warnings
by `video.src` so a comp with N broken sources logs once, not N times.
- engine/videoFrameExtractor.test: drop dynamic `require("node:fs")`,
use ES `import { writeFileSync } from "node:fs"`.
- engine/ffprobe: extract `readTagCI` helper for case-insensitive ffprobe
tag reads (will recur for other libavformat-versioned sidecar tags).
- cli/background-removal/pipeline: collapse Quality / QUALITIES /
QUALITY_CRF / DEFAULT_QUALITY / isQuality surface using
`Quality = keyof typeof QUALITY_CRF`.
- producer/renderOrchestrator: replace `v.src.startsWith("/")` with
`isAbsolute(v.src)` in the HDR probe path so Windows absolute paths
(`C:\...`) aren't treated as relative — matches the audioMixer guard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
66ef751 to
f4ecf96
Compare
…person video The model removes background from any video with a person — we tested with avatars because they were convenient, but anyone can bring a talking-head clip, presenter footage, vlog, etc. Replace avatar-specific filenames (avatar.mp4 / brandon.mp4) with neutral subject.mp4 (or presenter.mp4 in the text-behind-subject example) and rephrase copy that read as if avatars were the only use case. Touches docs/guides/remove-background.mdx, hyperframes-media SKILL.md, and hyperframes/patterns.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed+snapshot Four regression tests (font-variant-numeric, many-cuts, missing-host-comp-id, variables-prod) failed on this PR with `Unable to parse PSNR output at <last checkpoint>s`. Root cause: the harness derived all 100 checkpoints from the *rendered* video's container duration, then asked ffmpeg's PSNR filter to compare the same frame index from both videos. The encoder changes earlier in this PR add `-avoid_negative_ts make_zero` to the mux step. With AAC audio that shifts the first audio sample to t=0 instead of the encoder-delay offset, extending reported container duration by ~20ms without changing video frame count. For the four failing tests, the i=99 checkpoint then landed on a frame index that exists in the rendered video but not in the snapshot baseline (e.g. round(2.98998 * 24) = 72 in a 72-frame baseline). ffmpeg's PSNR filter ran on zero matched frames and emitted no `average:` line, so the parser threw. Fix: probe both videos and use min(rendered, snapshot) duration when spreading checkpoints. This is the correct semantics for symmetric PSNR comparison anyway — both videos must have a frame at every sampled time. The change is local to the harness; no encoder behavior changes, no baselines regenerated. Other regression tests with audio (chat, sub-composition-video, vignelli-stacking) passed because their checkpoint-99 frame index landed inside the baseline's frame range with several frames of slack. The four failing tests had round-number durations where a 20ms drift was enough to push the last checkpoint past `nb_frames - 1`.
|
Regression CI is now green — all 10 regression shards + What the fix doesChanged only Why the fix is the right cause-fix, not a workaroundRoot cause was a harness bug, not an encoder bug. The encoder commit ( The harness was sampling timestamps from the rendered video's duration and asking ffmpeg's PSNR filter for that frame from both videos. When durations diverged, the i=99 checkpoint landed on a frame index that existed in one video but not the other ( Symmetric PSNR comparison must sample within the duration both videos cover. The fix encodes that. Empirical evidence (collected before pushing)I regenerated the 3 failing baselines via
Frame counts identical. Only container metadata duration drifted. No frames are being dropped — the ~20ms shift is purely the audio-delay normalization. I also ran the 3 tests locally with the harness fix against the existing baselines (no regen):
All 100 checkpoints in each test compared cleanly. CI replicates this. The existing baselines are still valid ground truth — no regen needed for this PR. |
Summary
Six focused commits stacked on this branch. Each addresses a "silent disagreement between two layers" — bugs where the studio preview rendered correctly but production renders looked wrong with no error in stdout. The smoking gun for each was a real production composition that worked in studio and broke at render.
Bugs fixed (in dependency order)
1. Sub-comp
<video src="../assets/...">silently freezes at frame 0fix(engine,producer)—path.join(projectDir, "../assets/foo")escapes the project root andexistsSyncreturns false → extraction skipped → no per-frame injection → the<video>element's first decoded frame paints for every screenshot. Browser URL parsing clamps..at origin root; Node'spath.joindoesn't. AddsresolveProjectRelativeSrcthat mirrors browser clamping as a fallback, plus a loud stderr warning on miss. Mirrored inaudioMixer.tsandrenderOrchestrator.tsHDR probe loop.2.
ALPHA_MODEuppercase tag misclassified as no-alphafix(engine)— newer libavformat builds (and our ownhyperframes remove-background) write the VP9-alpha sidecar tag asALPHA_MODE.ffprobe.tsonly checkedalpha_mode(lowercase). Result: alpha-having webms extracted as JPGs, injected<img>was a fully opaque rectangle covering every static element below it on the z-stack. Studio preview rendered fine via native<video>playback; production renders covered headlines and captions with the avatar overlay. This was the WYSIWYG bug. Test pinning case-insensitivity sits next to the existing lowercase test.3. h264 negative DTS freezes some players
fix(engine)— libx264 with B-frames produces negative DTS at stream start. VS Code preview, several browsers, and some HW decoders freeze on the first frame and audio plays alone.-bf 0for h264 +-avoid_negative_ts make_zeroat encoder + mux. ~5–10% larger files at the same CRF — worthwhile for "the file plays everywhere." Mirrored instreamingEncoder.ts.4. Black first frame when two videos with different codecs are present
fix(engine)—frameCapturepolledreadyState >= 1(HAVE_METADATA — only dimensions known). When an h264 mp4 and a VP9 webm decode at different rates the faster passes readiness while the slower hasn't painted, producing a black "first frame" for the slower clip. Bumped to>= 2(HAVE_CURRENT_DATA — first frame rasterized).5. Sharp 3-channel output silently scanlined the alpha mask
fix(cli)— Sharp upcasts 1-channel raw input to RGB-interleaved during.resize(), sofullMask[i]was reading R,G,B,R,G,B... of pixels 0..691199 instead of the alpha for all 2,073,600 pixels. Symptom: every transparent webm out ofremove-backgroundhad a horizontal scanline alpha pattern; the avatar appeared semi-transparent throughout. Forced single-channel with.toColourspace("b-w").6. Generation-loss / color-shift on the cutout overlay
fix(cli)— added BT.709 + limited-range encoder tags so Chrome's YUV→RGB matches the source mp4 (without these, ffmpeg's default RGB→YUV is BT.601 and skin tones drift visibly when the cutout is overlaid). AddedQualitypreset type (fast/balanced/best→ CRF 30/18/12), default raised from 30 → 18 so the most common compositing pattern works out of the box. New--qualityCLI flag.What you'll notice immediately
<video src="../assets/...">in sub-compositions: now actually animate in render (were silent first-frame freezes before).remove-backgroundoutput: text-behind-subject works in production renders, not just in studio preview.<video>, and HW decoders that previously froze.npx hyperframes remove-backgroundoutputs no longer have the scanline artifact.Tests
engine: 525 → 532 passing (+7 — resolver, BT.709, quality presets,ALPHA_MODEregression).cli: 240 passing (the 2 GPU-config failures are pre-existing flakes on clean main, unrelated).producer+core: green.Skill / docs updates
skills/hyperframes-cli/SKILL.md: 3-pattern compositing table + the two non-obvious rules (wrap video in non-timed div for opacity control, both videosdata-start=0for sync).skills/hyperframes/patterns.md: worked text-behind-subject example.docs/guides/remove-background.mdx: compositing pitfalls section, quality preset table,--qualityflag.Test plan
<video src="../assets/foo">in a sub-comp and confirm it animates (vs. freezing on frame 0 onmain)remove-backgroundoutput as an overlay and confirm the headline-behind-silhouette effect renders correctly (vs. covered by the avatar onmain)main)🤖 Generated with Claude Code