Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 97 additions & 15 deletions docs/guides/remove-background.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The CLI ships a built-in `remove-background` command that runs locally — no AP
</Step>
<Step title="Remove the background from your video">
```bash Terminal
npx hyperframes remove-background avatar.mp4 -o transparent.webm
npx hyperframes remove-background subject.mp4 -o transparent.webm
```

On the first run, the CLI downloads ~168 MB of model weights to `~/.cache/hyperframes/background-removal/models/`. Subsequent runs reuse the cache.
Expand All @@ -44,7 +44,7 @@ The CLI ships a built-in `remove-background` command that runs locally — no AP
<!-- background layer -->
<img src="city.jpg" class="bg" />

<!-- transparent avatar floats on top -->
<!-- transparent subject floats on top -->
<video src="transparent.webm" autoplay muted loop playsinline></video>
</div>
```
Expand Down Expand Up @@ -75,8 +75,8 @@ The output is encoded with the exact ffmpeg flags Chrome's `<video>` element nee
| `.png` | PNG with alpha | Single-image cutout (only when the input is also a single image) | varies |

```bash Terminal
npx hyperframes remove-background avatar.mp4 -o transparent.webm # web playback
npx hyperframes remove-background avatar.mp4 -o transparent.mov # editing
npx hyperframes remove-background subject.mp4 -o transparent.webm # web playback
npx hyperframes remove-background subject.mp4 -o transparent.mov # editing
npx hyperframes remove-background portrait.jpg -o cutout.png # still image
```

Expand All @@ -91,7 +91,7 @@ Real-world numbers from the [matting eval](https://www.heygenverse.com/a/0dd5a43
| Linux x86 | CPU | ~1100 | ~16 min |
| macOS Intel | CPU | ~900 | ~13 min |

Matting is offline preprocessing — you run it once per asset and reuse the output. CPU-only is slow but always works; if you reuse the same avatar repeatedly, run it once on a faster machine and check the transparent output into your project.
Matting is offline preprocessing — you run it once per asset and reuse the output. CPU-only is slow but always works; if you reuse the same subject clip repeatedly, run it once on a faster machine and check the transparent output into your project.

## Picking a device explicitly

Expand All @@ -100,13 +100,13 @@ Matting is offline preprocessing — you run it once per asset and reuse the out
- **Force CPU on a GPU box** when you want to keep the GPU free for other work, or are debugging an EP-specific issue:

```bash Terminal
npx hyperframes remove-background avatar.mp4 -o transparent.webm --device cpu
npx hyperframes remove-background subject.mp4 -o transparent.webm --device cpu
```

- **Opt into CUDA** by setting `HYPERFRAMES_CUDA=1` and providing a GPU-enabled `onnxruntime-node` build (the bundled build is CPU + CoreML only, to keep the install small for the 99% of users who don't have a GPU):

```bash Terminal
HYPERFRAMES_CUDA=1 npx hyperframes remove-background avatar.mp4 -o transparent.webm --device cuda
HYPERFRAMES_CUDA=1 npx hyperframes remove-background subject.mp4 -o transparent.webm --device cuda
```

Run `npx hyperframes remove-background --info` to see what providers are detected on your machine and which one `auto` would pick.
Expand All @@ -115,7 +115,7 @@ Run `npx hyperframes remove-background --info` to see what providers are detecte

The transparent WebM behaves like any other video element. The two patterns you'll use most:

**Avatar over a background image:**
**Subject over a background image:**

```html
<div style="position: relative; width: 1920px; height: 1080px;">
Expand All @@ -131,22 +131,104 @@ The transparent WebM behaves like any other video element. The two patterns you'
</div>
```

**Avatar over a HyperFrames scene:**
**Subject over a HyperFrames scene:**

```html
<!-- scene contents (text, animations, etc.) -->
<div class="title-card">Welcome</div>

<!-- avatar layered on top -->
<video src="transparent.webm" autoplay muted loop playsinline class="avatar"></video>
<!-- subject layered on top -->
<video src="transparent.webm" autoplay muted loop playsinline class="subject"></video>
```

The avatar inherits the composition's frame rate and timeline — it plays through once during the scene's duration, so match the source clip length to the scene length when possible. If the scene is longer than the clip, `loop` handles it.
The cutout inherits the composition's frame rate and timeline — it plays through once during the scene's duration, so match the source clip length to the scene length when possible. If the scene is longer than the clip, `loop` handles it.

<Tip>
When rendering a composition that contains a `<video>` element, the renderer reads the source via ffmpeg internally. Transparent WebMs are decoded with the alpha plane preserved.
</Tip>

## Compositing patterns and pitfalls

The cutout webm is a **re-encoded copy** of the source mp4's RGB — the matter pipeline decodes the source to raw RGB, runs segmentation, and re-encodes to VP9 with alpha. That choice has consequences depending on what you put behind it.

### The three patterns

| Pattern | Behind the cutout | Result |
|---|---|---|
| **Cutout over a different scene** *(most common)* | Static image, gradient, animated bg, or unrelated footage | Clean. The cutout is the only source of the subject — no doubling, no edge halo. Use any `--quality`. |
| **Cutout over its own source mp4** *(text-behind-subject, talking-head with overlays)* | The same mp4 the cutout was generated from | Two RGB sources for the same person. At default `--quality balanced` (crf 18) the doubling is barely visible; at `--quality fast` (crf 30) you'll see a slight color shift / soft edge on the silhouette. Use `--quality best` (crf 12) for hero shots. |
| **Cutout over different footage of the same subject** | Another take of the same person | Looks like two overlapping people. Avoid — re-shoot or re-cut the source. |

### Text-behind-subject: the recommended layout

Putting a headline *behind* a presenter so their silhouette occludes the text:

```html
<!-- z=1 base mp4: full lobby + presenter, plays the whole scene -->
<video
id="cf-base"
data-start="0" data-duration="6" data-media-start="0" data-track-index="0"
src="presenter.mp4"
muted playsinline
></video>

<!-- z=2 headline -->
<h1 id="cf-headline" style="position:absolute;top:50%;left:50%;
transform:translate(-50%,-50%); z-index:2;
color:#fff; text-shadow:0 6px 32px rgba(0,0,0,.55);
clip-path:inset(0 0 100% 0); font-size:220px; font-weight:900;">
MAKE IT IN HYPERFRAMES
</h1>

<!-- z=3 cutout: same source, alpha around presenter, hidden until the cut.
The wrapper carries the opacity, NOT the <video> itself. -->
<div class="cutout-wrap" style="position:absolute;inset:0;z-index:3;opacity:0">
<video
id="cf-cutout"
data-start="0" data-duration="6" data-media-start="0" data-track-index="1"
src="presenter.webm"
muted playsinline
></video>
</div>
```

```js
const tl = gsap.timeline({ paused: true });
const CUT = 3.3;

// Reveal the headline early
tl.to("#cf-headline", { clipPath: "inset(0 0 0% 0)", duration: 0.6, ease: "expo.out" }, 0.25);

// At the cut, flip the cutout wrapper visible — silhouette punches through the headline
tl.set(".cutout-wrap", { opacity: 1 }, CUT);

// Sentinel: extend timeline to the composition's full duration so the renderer
// doesn't bail past the last meaningful tween.
tl.set({}, {}, 6);
```

### Two non-obvious rules

**1. Wrap the cutout video in a non-timed `<div>` and animate the wrapper, not the video.**

The framework forces `opacity: 1` on any element with `data-start`/`data-duration` while it's "active" — that's how it controls clip visibility. CSS `opacity: 0` on the video element is silently overwritten by the framework's clip lifecycle, so an opacity tween on the video element won't do anything. Wrap the video in a `<div>` that has no `data-*` attributes; the wrapper is owned entirely by your CSS/GSAP.

**2. Both videos start at `data-start="0"` and decode in sync from t=0.**

It's tempting to "late-mount" the cutout (`data-start="3.3"` to match the cut). Don't — Chrome does a seek + decoder warm-up at mount, which can land one frame off the base mp4 at the cut moment. With both videos mounted from t=0 and the cutout's wrapper opacity-animated, both decoders advance the same way and stay frame-accurate.

### Quality preset and color match

When the cutout is overlaid on its own source mp4, the encoder's CRF directly affects how visible the doubling is at edges:

| `--quality` | CRF | File size (12s @ 1080p) | When to use |
|---|---|---|---|
| `fast` | 30 | ~2 MB | Cutout sits over an unrelated background and file size matters |
| `balanced` *(default)* | 18 | ~6 MB | Recommended for text-behind-subject and any pattern that overlays on the source |
| `best` | 12 | ~12 MB | Hero shots, masters, or anything you'll re-encode downstream |

The encoder also writes BT.709 + limited-range color metadata so Chrome's YUV→RGB pipeline matches the source mp4's. Without those tags, the cutout would render slightly differently from the underlying mp4 even at lossless quality (visible red/skin shift).

## What u²-net_human_seg is and isn't good for

The model is purpose-built for **portrait / human matting**. It excels when:
Expand All @@ -166,7 +248,7 @@ If your use case hits one of these, see the alternatives below.

## Alternatives — when the built-in command isn't the right tool

The CLI ships **one model on purpose** — the one that's MIT-licensed, runs everywhere, and produces production-quality output for HeyGen-style avatar workflows. The list below leads with **free, open-source tools** that pair naturally with HyperFrames. Each entry calls out the actual catch — license, install effort, hardware needs — so you can pick the right one for your situation. Full benchmarks are in the [matting eval](https://www.heygenverse.com/a/0dd5a431-1832-4858-862d-de7fb7d02654).
The CLI ships **one model on purpose** — the one that's MIT-licensed, runs everywhere, and produces production-quality output for person/portrait video. The list below leads with **free, open-source tools** that pair naturally with HyperFrames. Each entry calls out the actual catch — license, install effort, hardware needs — so you can pick the right one for your situation. Full benchmarks are in the [matting eval](https://www.heygenverse.com/a/0dd5a431-1832-4858-862d-de7fb7d02654).

### Free, open-source CLIs and libraries

Expand Down Expand Up @@ -208,7 +290,7 @@ ffmpeg -i frames-%04d.png -c:v libvpx-vp9 \

### How to choose

- **Avatars / portraits, web playback, MIT-clean** → use the built-in `hyperframes remove-background` (this is what it's tuned for).
- **Person / portrait video, web playback, MIT-clean** → use the built-in `hyperframes remove-background` (this is what it's tuned for).
- **Non-human subject** (product, animal, object) → `rembg` with `isnet-general-use`.
- **Maximum portrait quality, especially hair** → `BiRefNet` via Python.
- **Long video where edge flicker would be visible**, GPL is OK → `RVM`.
Expand Down Expand Up @@ -259,7 +341,7 @@ The decoded `frame0.png` should be RGBA and have non-trivial alpha values.
The pipeline auto-falls-back to CPU if CoreML fails to bind, with a warning. If you want to skip the CoreML attempt entirely, force CPU:

```bash Terminal
npx hyperframes remove-background avatar.mp4 -o transparent.webm --device cpu
npx hyperframes remove-background subject.mp4 -o transparent.webm --device cpu
```

### The alpha mask has rough or jagged edges
Expand Down
1 change: 1 addition & 0 deletions docs/packages/cli.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,7 @@ This is suppressed in CI environments, non-TTY shells, and when `HYPERFRAMES_NO_
|------|-------------|
| `--output, -o` | Output path. Format inferred from extension: `.webm` (default), `.mov`, `.png` |
| `--device` | Execution provider: `auto` (default), `cpu`, `coreml`, `cuda` |
| `--quality` | WebM encoder preset: `fast` (crf 30, smallest), `balanced` (crf 18, default), `best` (crf 12, near-lossless). Higher quality keeps the cutout's RGB closer to the source mp4 — important when overlaying the cutout on its own source for text-behind-subject effects. Ignored for `.mov` / `.png`. |
| `--info` | Print detected execution providers and exit (no render) |
| `--json` | Output result as JSON |

Expand Down
6 changes: 6 additions & 0 deletions packages/cli/src/background-removal/inference.ts
Original file line number Diff line number Diff line change
Expand Up @@ -159,10 +159,16 @@ async function postprocess(
}

// lanczos3 keeps soft edges; nearest leaves visible jaggies on hair.
// Sharp upcasts the single-channel raw input to a 3-channel buffer during
// resize, so the output is laid out as RGB-interleaved (R0,G0,B0,R1,G1,B1,...)
// even though all three channels carry the same grayscale value. Force the
// output back to single channel with toColourspace("b-w") so we can index
// it linearly as a mask.
const fullMask = await sharp(maskBuf, {
raw: { width: INPUT_SIZE, height: INPUT_SIZE, channels: 1 },
})
.resize(width, height, { kernel: "lanczos3", fit: "fill" })
.toColourspace("b-w")
.raw()
.toBuffer();

Expand Down
29 changes: 29 additions & 0 deletions packages/cli/src/background-removal/pipeline.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,35 @@ describe("background-removal/pipeline — buildEncoderArgs", () => {
expect(args[args.length - 1]).toBe("/tmp/out.webm");
});

it("webm preset tags BT.709 colorspace + limited range", () => {
// Without these tags, ffmpeg's RGB→YUV conversion uses the BT.601 default,
// and Chrome's YUV→RGB pass on the resulting webm produces a different
// RGB triple than the source mp4 (visible color shift on overlay). Pin
// BT.709 limited-range so the cutout matches modern Rec.709 sources.
const args = buildEncoderArgs("webm", 1920, 1080, 30, "/tmp/out.webm");
const csIdx = args.indexOf("-colorspace");
expect(csIdx).toBeGreaterThan(-1);
expect(args[csIdx + 1]).toBe("bt709");
const rangeIdx = args.indexOf("-color_range");
expect(rangeIdx).toBeGreaterThan(-1);
expect(args[rangeIdx + 1]).toBe("tv");
});

it("webm quality presets map to crf 30/18/12", () => {
const fast = buildEncoderArgs("webm", 1920, 1080, 30, "/tmp/o.webm", "fast");
const balanced = buildEncoderArgs("webm", 1920, 1080, 30, "/tmp/o.webm", "balanced");
const best = buildEncoderArgs("webm", 1920, 1080, 30, "/tmp/o.webm", "best");
const crf = (args: string[]) => args[args.indexOf("-crf") + 1];
expect(crf(fast)).toBe("30");
expect(crf(balanced)).toBe("18");
expect(crf(best)).toBe("12");
});

it("webm default quality is balanced (crf 18)", () => {
const args = buildEncoderArgs("webm", 1920, 1080, 30, "/tmp/o.webm");
expect(args[args.indexOf("-crf") + 1]).toBe("18");
});

it("mov preset emits ProRes 4444 + yuva444p10le", () => {
const args = buildEncoderArgs("mov", 1920, 1080, 30, "/tmp/out.mov");
expect(args).toContain("prores_ks");
Expand Down
50 changes: 46 additions & 4 deletions packages/cli/src/background-removal/pipeline.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,28 @@ import { type Device, type ModelId } from "./manager.js";

export type OutputFormat = "webm" | "mov" | "png";

export const QUALITY_CRF = {
fast: 30,
balanced: 18,
best: 12,
} as const;

export type Quality = keyof typeof QUALITY_CRF;

export const QUALITIES = Object.keys(QUALITY_CRF) as readonly Quality[];

export const DEFAULT_QUALITY: Quality = "balanced";

export const isQuality = (v: unknown): v is Quality =>
typeof v === "string" && (QUALITIES as readonly string[]).includes(v);

export interface RenderOptions {
inputPath: string;
outputPath: string;
device?: Device;
model?: ModelId;
/** Encoder CRF preset for `.webm`. See `QUALITY_CRF`. Ignored for `.mov`/`.png`. */
quality?: Quality;
onProgress?: (event: ProgressEvent) => void;
}

Expand Down Expand Up @@ -100,6 +117,7 @@ export function buildEncoderArgs(
height: number,
fps: number,
outputPath: string,
quality: Quality = DEFAULT_QUALITY,
): string[] {
const base = [
"-y",
Expand All @@ -123,7 +141,7 @@ export function buildEncoderArgs(
"-b:v",
"0",
"-crf",
"30",
String(QUALITY_CRF[quality]),
"-deadline",
"good",
"-row-mt",
Expand All @@ -132,6 +150,19 @@ export function buildEncoderArgs(
"0",
"-pix_fmt",
"yuva420p",
// Tag the output as BT.709 limited range so browsers use the same
// YUV→RGB matrix the source video was encoded with. Without these tags
// ffmpeg's default RGB→YUV conversion is BT.601, which causes a visible
// color shift (red/skin tones in particular) when the matted overlay is
// composited over the original mp4.
"-colorspace",
"bt709",
"-color_primaries",
"bt709",
"-color_trc",
"bt709",
"-color_range",
"tv",
"-metadata:s:v:0",
"alpha_mode=1",
"-an",
Expand Down Expand Up @@ -250,9 +281,20 @@ async function runPipeline(
});
const decoderExit = waitForExit(decoder, "ffmpeg decoder", () => decoderStderr);

const encoder = spawn("ffmpeg", buildEncoderArgs(format, width, height, fps || 30, outputPath), {
stdio: ["pipe", "ignore", "pipe"],
});
const encoder = spawn(
"ffmpeg",
buildEncoderArgs(
format,
width,
height,
fps || 30,
outputPath,
options.quality ?? DEFAULT_QUALITY,
),
{
stdio: ["pipe", "ignore", "pipe"],
},
);
let encoderStderr = "";
encoder.stderr?.on("data", (d: Buffer) => {
encoderStderr += d.toString();
Expand Down
Loading
Loading