heygen-com · jrusso1020 · May 5, 2026 · May 5, 2026 · May 5, 2026 · May 5, 2026
diff --git a/docs/guides/remove-background.mdx b/docs/guides/remove-background.mdx
@@ -25,7 +25,7 @@ The CLI ships a built-in `remove-background` command that runs locally — no AP
   </Step>
   <Step title="Remove the background from your video">
     ```bash Terminal
-    npx hyperframes remove-background avatar.mp4 -o transparent.webm
+    npx hyperframes remove-background subject.mp4 -o transparent.webm
     ```
 
     On the first run, the CLI downloads ~168 MB of model weights to `~/.cache/hyperframes/background-removal/models/`. Subsequent runs reuse the cache.
@@ -44,7 +44,7 @@ The CLI ships a built-in `remove-background` command that runs locally — no AP
       <!-- background layer -->
       <img src="city.jpg" class="bg" />
 
-      <!-- transparent avatar floats on top -->
+      <!-- transparent subject floats on top -->
       <video src="transparent.webm" autoplay muted loop playsinline></video>
     </div>
     ```
@@ -75,8 +75,8 @@ The output is encoded with the exact ffmpeg flags Chrome's `<video>` element nee
 | `.png` | PNG with alpha | Single-image cutout (only when the input is also a single image) | varies |
 
 ```bash Terminal
-npx hyperframes remove-background avatar.mp4 -o transparent.webm        # web playback
-npx hyperframes remove-background avatar.mp4 -o transparent.mov         # editing
+npx hyperframes remove-background subject.mp4 -o transparent.webm        # web playback
+npx hyperframes remove-background subject.mp4 -o transparent.mov         # editing
 npx hyperframes remove-background portrait.jpg -o cutout.png       # still image
 ```
 
@@ -91,7 +91,7 @@ Real-world numbers from the [matting eval](https://www.heygenverse.com/a/0dd5a43
 | Linux x86 | CPU | ~1100 | ~16 min |
 | macOS Intel | CPU | ~900 | ~13 min |
 
-Matting is offline preprocessing — you run it once per asset and reuse the output. CPU-only is slow but always works; if you reuse the same avatar repeatedly, run it once on a faster machine and check the transparent output into your project.
+Matting is offline preprocessing — you run it once per asset and reuse the output. CPU-only is slow but always works; if you reuse the same subject clip repeatedly, run it once on a faster machine and check the transparent output into your project.
 
 ## Picking a device explicitly
 
@@ -100,13 +100,13 @@ Matting is offline preprocessing — you run it once per asset and reuse the out
 - **Force CPU on a GPU box** when you want to keep the GPU free for other work, or are debugging an EP-specific issue:
 
   ```bash Terminal
-  npx hyperframes remove-background avatar.mp4 -o transparent.webm --device cpu
+  npx hyperframes remove-background subject.mp4 -o transparent.webm --device cpu
   ```
 
 - **Opt into CUDA** by setting `HYPERFRAMES_CUDA=1` and providing a GPU-enabled `onnxruntime-node` build (the bundled build is CPU + CoreML only, to keep the install small for the 99% of users who don't have a GPU):
 
   ```bash Terminal
-  HYPERFRAMES_CUDA=1 npx hyperframes remove-background avatar.mp4 -o transparent.webm --device cuda
+  HYPERFRAMES_CUDA=1 npx hyperframes remove-background subject.mp4 -o transparent.webm --device cuda
   ```
 
 Run `npx hyperframes remove-background --info` to see what providers are detected on your machine and which one `auto` would pick.
@@ -115,7 +115,7 @@ Run `npx hyperframes remove-background --info` to see what providers are detecte
 
 The transparent WebM behaves like any other video element. The two patterns you'll use most:
 
-**Avatar over a background image:**
+**Subject over a background image:**
 
 ```html
 <div style="position: relative; width: 1920px; height: 1080px;">
@@ -131,22 +131,104 @@ The transparent WebM behaves like any other video element. The two patterns you'
 </div>
 ```
 
-**Avatar over a HyperFrames scene:**
+**Subject over a HyperFrames scene:**
 
 ```html
 <!-- scene contents (text, animations, etc.) -->
 <div class="title-card">Welcome</div>
 
-<!-- avatar layered on top -->
-<video src="transparent.webm" autoplay muted loop playsinline class="avatar"></video>
+<!-- subject layered on top -->
+<video src="transparent.webm" autoplay muted loop playsinline class="subject"></video>
 ```
 
-The avatar inherits the composition's frame rate and timeline — it plays through once during the scene's duration, so match the source clip length to the scene length when possible. If the scene is longer than the clip, `loop` handles it.
+The cutout inherits the composition's frame rate and timeline — it plays through once during the scene's duration, so match the source clip length to the scene length when possible. If the scene is longer than the clip, `loop` handles it.
 
 <Tip>
   When rendering a composition that contains a `<video>` element, the renderer reads the source via ffmpeg internally. Transparent WebMs are decoded with the alpha plane preserved.
 </Tip>
 
+## Compositing patterns and pitfalls
+
+The cutout webm is a **re-encoded copy** of the source mp4's RGB — the matter pipeline decodes the source to raw RGB, runs segmentation, and re-encodes to VP9 with alpha. That choice has consequences depending on what you put behind it.
+
+### The three patterns
+
+| Pattern | Behind the cutout | Result |
+|---|---|---|
+| **Cutout over a different scene** *(most common)* | Static image, gradient, animated bg, or unrelated footage | Clean. The cutout is the only source of the subject — no doubling, no edge halo. Use any `--quality`. |
+| **Cutout over its own source mp4** *(text-behind-subject, talking-head with overlays)* | The same mp4 the cutout was generated from | Two RGB sources for the same person. At default `--quality balanced` (crf 18) the doubling is barely visible; at `--quality fast` (crf 30) you'll see a slight color shift / soft edge on the silhouette. Use `--quality best` (crf 12) for hero shots. |
+| **Cutout over different footage of the same subject** | Another take of the same person | Looks like two overlapping people. Avoid — re-shoot or re-cut the source. |
+
+### Text-behind-subject: the recommended layout
+
+Putting a headline *behind* a presenter so their silhouette occludes the text:
+
+```html
+<!-- z=1 base mp4: full lobby + presenter, plays the whole scene -->
+<video
+  id="cf-base"
+  data-start="0" data-duration="6" data-media-start="0" data-track-index="0"
+  src="presenter.mp4"
+  muted playsinline
+></video>
+
+<!-- z=2 headline -->
+<h1 id="cf-headline" style="position:absolute;top:50%;left:50%;
+     transform:translate(-50%,-50%); z-index:2;
+     color:#fff; text-shadow:0 6px 32px rgba(0,0,0,.55);
+     clip-path:inset(0 0 100% 0); font-size:220px; font-weight:900;">
+  MAKE IT IN HYPERFRAMES
+</h1>
+
+<!-- z=3 cutout: same source, alpha around presenter, hidden until the cut.
+     The wrapper carries the opacity, NOT the <video> itself. -->
+<div class="cutout-wrap" style="position:absolute;inset:0;z-index:3;opacity:0">
+  <video
+    id="cf-cutout"
+    data-start="0" data-duration="6" data-media-start="0" data-track-index="1"
+    src="presenter.webm"
+    muted playsinline
+  ></video>
+</div>
+```
+
+```js
+const tl = gsap.timeline({ paused: true });
+const CUT = 3.3;
+
+// Reveal the headline early
+tl.to("#cf-headline", { clipPath: "inset(0 0 0% 0)", duration: 0.6, ease: "expo.out" }, 0.25);
+
+// At the cut, flip the cutout wrapper visible — silhouette punches through the headline
+tl.set(".cutout-wrap", { opacity: 1 }, CUT);
+
+// Sentinel: extend timeline to the composition's full duration so the renderer
+// doesn't bail past the last meaningful tween.
+tl.set({}, {}, 6);
+```
+
+### Two non-obvious rules
+
+**1. Wrap the cutout video in a non-timed `<div>` and animate the wrapper, not the video.**
+
+The framework forces `opacity: 1` on any element with `data-start`/`data-duration` while it's "active" — that's how it controls clip visibility. CSS `opacity: 0` on the video element is silently overwritten by the framework's clip lifecycle, so an opacity tween on the video element won't do anything. Wrap the video in a `<div>` that has no `data-*` attributes; the wrapper is owned entirely by your CSS/GSAP.
+
+**2. Both videos start at `data-start="0"` and decode in sync from t=0.**
+
+It's tempting to "late-mount" the cutout (`data-start="3.3"` to match the cut). Don't — Chrome does a seek + decoder warm-up at mount, which can land one frame off the base mp4 at the cut moment. With both videos mounted from t=0 and the cutout's wrapper opacity-animated, both decoders advance the same way and stay frame-accurate.
+
+### Quality preset and color match
+
+When the cutout is overlaid on its own source mp4, the encoder's CRF directly affects how visible the doubling is at edges:
+
+| `--quality` | CRF | File size (12s @ 1080p) | When to use |
+|---|---|---|---|
+| `fast` | 30 | ~2 MB | Cutout sits over an unrelated background and file size matters |
+| `balanced` *(default)* | 18 | ~6 MB | Recommended for text-behind-subject and any pattern that overlays on the source |
+| `best` | 12 | ~12 MB | Hero shots, masters, or anything you'll re-encode downstream |
+
+The encoder also writes BT.709 + limited-range color metadata so Chrome's YUV→RGB pipeline matches the source mp4's. Without those tags, the cutout would render slightly differently from the underlying mp4 even at lossless quality (visible red/skin shift).
+
 ## What u²-net_human_seg is and isn't good for
 
 The model is purpose-built for **portrait / human matting**. It excels when:
@@ -166,7 +248,7 @@ If your use case hits one of these, see the alternatives below.
 
 ## Alternatives — when the built-in command isn't the right tool
 
-The CLI ships **one model on purpose** — the one that's MIT-licensed, runs everywhere, and produces production-quality output for HeyGen-style avatar workflows. The list below leads with **free, open-source tools** that pair naturally with HyperFrames. Each entry calls out the actual catch — license, install effort, hardware needs — so you can pick the right one for your situation. Full benchmarks are in the [matting eval](https://www.heygenverse.com/a/0dd5a431-1832-4858-862d-de7fb7d02654).
+The CLI ships **one model on purpose** — the one that's MIT-licensed, runs everywhere, and produces production-quality output for person/portrait video. The list below leads with **free, open-source tools** that pair naturally with HyperFrames. Each entry calls out the actual catch — license, install effort, hardware needs — so you can pick the right one for your situation. Full benchmarks are in the [matting eval](https://www.heygenverse.com/a/0dd5a431-1832-4858-862d-de7fb7d02654).
 
 ### Free, open-source CLIs and libraries
 
@@ -208,7 +290,7 @@ ffmpeg -i frames-%04d.png -c:v libvpx-vp9 \
 
 ### How to choose
 
-- **Avatars / portraits, web playback, MIT-clean** → use the built-in `hyperframes remove-background` (this is what it's tuned for).
+- **Person / portrait video, web playback, MIT-clean** → use the built-in `hyperframes remove-background` (this is what it's tuned for).
 - **Non-human subject** (product, animal, object) → `rembg` with `isnet-general-use`.
 - **Maximum portrait quality, especially hair** → `BiRefNet` via Python.
 - **Long video where edge flicker would be visible**, GPL is OK → `RVM`.
@@ -259,7 +341,7 @@ The decoded `frame0.png` should be RGBA and have non-trivial alpha values.
 The pipeline auto-falls-back to CPU if CoreML fails to bind, with a warning. If you want to skip the CoreML attempt entirely, force CPU:
 
 ```bash Terminal
-npx hyperframes remove-background avatar.mp4 -o transparent.webm --device cpu
+npx hyperframes remove-background subject.mp4 -o transparent.webm --device cpu
 ```
 
 ### The alpha mask has rough or jagged edges

diff --git a/docs/packages/cli.mdx b/docs/packages/cli.mdx
@@ -367,6 +367,7 @@ This is suppressed in CI environments, non-TTY shells, and when `HYPERFRAMES_NO_
     |------|-------------|
     | `--output, -o` | Output path. Format inferred from extension: `.webm` (default), `.mov`, `.png` |
     | `--device` | Execution provider: `auto` (default), `cpu`, `coreml`, `cuda` |
+    | `--quality` | WebM encoder preset: `fast` (crf 30, smallest), `balanced` (crf 18, default), `best` (crf 12, near-lossless). Higher quality keeps the cutout's RGB closer to the source mp4 — important when overlaying the cutout on its own source for text-behind-subject effects. Ignored for `.mov` / `.png`. |
     | `--info` | Print detected execution providers and exit (no render) |
     | `--json` | Output result as JSON |
 

diff --git a/packages/cli/src/background-removal/inference.ts b/packages/cli/src/background-removal/inference.ts
@@ -159,10 +159,16 @@ async function postprocess(
   }
 
   // lanczos3 keeps soft edges; nearest leaves visible jaggies on hair.
+  // Sharp upcasts the single-channel raw input to a 3-channel buffer during
+  // resize, so the output is laid out as RGB-interleaved (R0,G0,B0,R1,G1,B1,...)
+  // even though all three channels carry the same grayscale value. Force the
+  // output back to single channel with toColourspace("b-w") so we can index
+  // it linearly as a mask.
   const fullMask = await sharp(maskBuf, {
     raw: { width: INPUT_SIZE, height: INPUT_SIZE, channels: 1 },
   })
     .resize(width, height, { kernel: "lanczos3", fit: "fill" })
+    .toColourspace("b-w")
     .raw()
     .toBuffer();
 

diff --git a/packages/cli/src/background-removal/pipeline.test.ts b/packages/cli/src/background-removal/pipeline.test.ts
@@ -46,6 +46,35 @@ describe("background-removal/pipeline — buildEncoderArgs", () => {
     expect(args[args.length - 1]).toBe("/tmp/out.webm");
   });
 
+  it("webm preset tags BT.709 colorspace + limited range", () => {
+    // Without these tags, ffmpeg's RGB→YUV conversion uses the BT.601 default,
+    // and Chrome's YUV→RGB pass on the resulting webm produces a different
+    // RGB triple than the source mp4 (visible color shift on overlay). Pin
+    // BT.709 limited-range so the cutout matches modern Rec.709 sources.
+    const args = buildEncoderArgs("webm", 1920, 1080, 30, "/tmp/out.webm");
+    const csIdx = args.indexOf("-colorspace");
+    expect(csIdx).toBeGreaterThan(-1);
+    expect(args[csIdx + 1]).toBe("bt709");
+    const rangeIdx = args.indexOf("-color_range");
+    expect(rangeIdx).toBeGreaterThan(-1);
+    expect(args[rangeIdx + 1]).toBe("tv");
+  });
+
+  it("webm quality presets map to crf 30/18/12", () => {
+    const fast = buildEncoderArgs("webm", 1920, 1080, 30, "/tmp/o.webm", "fast");
+    const balanced = buildEncoderArgs("webm", 1920, 1080, 30, "/tmp/o.webm", "balanced");
+    const best = buildEncoderArgs("webm", 1920, 1080, 30, "/tmp/o.webm", "best");
+    const crf = (args: string[]) => args[args.indexOf("-crf") + 1];
+    expect(crf(fast)).toBe("30");
+    expect(crf(balanced)).toBe("18");
+    expect(crf(best)).toBe("12");
+  });
+
+  it("webm default quality is balanced (crf 18)", () => {
+    const args = buildEncoderArgs("webm", 1920, 1080, 30, "/tmp/o.webm");
+    expect(args[args.indexOf("-crf") + 1]).toBe("18");
+  });
+
   it("mov preset emits ProRes 4444 + yuva444p10le", () => {
     const args = buildEncoderArgs("mov", 1920, 1080, 30, "/tmp/out.mov");
     expect(args).toContain("prores_ks");

diff --git a/packages/cli/src/background-removal/pipeline.ts b/packages/cli/src/background-removal/pipeline.ts
@@ -20,11 +20,28 @@ import { type Device, type ModelId } from "./manager.js";
 
 export type OutputFormat = "webm" | "mov" | "png";
 
+export const QUALITY_CRF = {
+  fast: 30,
+  balanced: 18,
+  best: 12,
+} as const;
+
+export type Quality = keyof typeof QUALITY_CRF;
+
+export const QUALITIES = Object.keys(QUALITY_CRF) as readonly Quality[];
+
+export const DEFAULT_QUALITY: Quality = "balanced";
+
+export const isQuality = (v: unknown): v is Quality =>
+  typeof v === "string" && (QUALITIES as readonly string[]).includes(v);
+
 export interface RenderOptions {
   inputPath: string;
   outputPath: string;
   device?: Device;
   model?: ModelId;
+  /** Encoder CRF preset for `.webm`. See `QUALITY_CRF`. Ignored for `.mov`/`.png`. */
+  quality?: Quality;
   onProgress?: (event: ProgressEvent) => void;
 }
 
@@ -100,6 +117,7 @@ export function buildEncoderArgs(
   height: number,
   fps: number,
   outputPath: string,
+  quality: Quality = DEFAULT_QUALITY,
 ): string[] {
   const base = [
     "-y",
@@ -123,7 +141,7 @@ export function buildEncoderArgs(
       "-b:v",
       "0",
       "-crf",
-      "30",
+      String(QUALITY_CRF[quality]),
       "-deadline",
       "good",
       "-row-mt",
@@ -132,6 +150,19 @@ export function buildEncoderArgs(
       "0",
       "-pix_fmt",
       "yuva420p",
+      // Tag the output as BT.709 limited range so browsers use the same
+      // YUV→RGB matrix the source video was encoded with. Without these tags
+      // ffmpeg's default RGB→YUV conversion is BT.601, which causes a visible
+      // color shift (red/skin tones in particular) when the matted overlay is
+      // composited over the original mp4.
+      "-colorspace",
+      "bt709",
+      "-color_primaries",
+      "bt709",
+      "-color_trc",
+      "bt709",
+      "-color_range",
+      "tv",
       "-metadata:s:v:0",
       "alpha_mode=1",
       "-an",
@@ -250,9 +281,20 @@ async function runPipeline(
   });
   const decoderExit = waitForExit(decoder, "ffmpeg decoder", () => decoderStderr);
 
-  const encoder = spawn("ffmpeg", buildEncoderArgs(format, width, height, fps || 30, outputPath), {
-    stdio: ["pipe", "ignore", "pipe"],
-  });
+  const encoder = spawn(
+    "ffmpeg",
+    buildEncoderArgs(
+      format,
+      width,
+      height,
+      fps || 30,
+      outputPath,
+      options.quality ?? DEFAULT_QUALITY,
+    ),
+    {
+      stdio: ["pipe", "ignore", "pipe"],
+    },
+  );
   let encoderStderr = "";
   encoder.stderr?.on("data", (d: Buffer) => {
     encoderStderr += d.toString();