Skip to content

feat(cli): add remove-background command for transparent video#612

Merged
jrusso1020 merged 2 commits intomainfrom
feat/cli-remove-background
May 4, 2026
Merged

feat(cli): add remove-background command for transparent video#612
jrusso1020 merged 2 commits intomainfrom
feat/cli-remove-background

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

What

Adds hyperframes remove-background, a new CLI subcommand that mattes a video or image locally with a small AI model and outputs a transparent WebM, ProRes 4444 .mov, or RGBA PNG. The output drops directly into any composition's <video> or <img> element — no green screen, no API keys, no cloud upload.

npx hyperframes remove-background avatar.mp4 -o transparent.webm
npx hyperframes remove-background portrait.jpg -o cutout.png
npx hyperframes remove-background --info   # detected providers

Why

Putting an avatar (or any subject) over an arbitrary background is one of the most common asks for a HyperFrames composition. Today the user has to wire up a third-party tool (rembg, Runway, unscreen.com), get the encoding flags right (yuva420p + alpha_mode=1 — easy to miss; broken alpha in Chrome is silent), and integrate the output. Shipping it as a first-class CLI subcommand removes that friction and makes "transparent avatar over a scene" a one-liner.

The choice to run it locally (rather than punt to a SaaS) means: no API keys to configure, no upload of source media, runs fully offline once the model is cached.

How

  • Model: u2net_human_seg (MIT, ~168 MB ONNX) — the recommendation from a 12-cell matting eval that compared u²-net, MediaPipe SelfieSegmenter, BiRefNet, RVM, and BRIA RMBG-2.0 across CoreML, CUDA, and CPU. Single model on purpose: it's the only one that's MIT-clean, runs everywhere, and produces production-quality output for portraits at reasonable memory (~1.5 GB peak). Pre/postprocessing matches rembg's u²-net session exactly so output should be pixel-equivalent.
  • Runtime: onnxruntime-node for inference, sharp for resize, ffmpeg for decode/encode (already required by other CLI commands). Provider auto-detect picks CoreML on Apple Silicon, CUDA when HYPERFRAMES_CUDA=1, CPU otherwise. Provider failure falls back to CPU rather than aborting.
  • Pipeline: ffmpeg decode → per-frame ONNX → ffmpeg encode, with a streaming generator between the two and pre-allocated buffers for the hot path (Float32 input tensor, mask, RGBA output reused across every frame).
  • Encoder: VP9-with-alpha WebM is the default. The exact flag set (-pix_fmt yuva420p + -metadata:s:v:0 alpha_mode=1 + -auto-alt-ref 0) is what makes Chrome's <video> element decode the alpha plane. ProRes 4444 .mov and single-image PNG are also supported.
  • CLI shape: --info flag (no input required) prints detected providers + cache state; matches the tts --list pattern already in the codebase.
  • Reuse: hasFFmpeg was promoted from whisper/manager.ts and a sibling hasFFprobe added there (one helper, two consumers). Media metadata extraction reuses extractMediaMetadata from @hyperframes/engine rather than re-rolling ffprobe parsing.

The internal directory is named packages/cli/src/background-removal/ and the public command is remove-background — earlier iterations called it matte, but most users wouldn't recognize the VFX term, and "Remove Background" matches what Adobe / Canva / Figma all use for the same thing. The docs intro keeps the "matting" term searchable for the audience that does know it.

Alternatives considered

  • Documentation guide instead of a command: rejected. The hard part of this is the encoder flags (alpha plane silently discarded with the wrong -pix_fmt), not the matting model — that's exactly what a CLI should encapsulate.
  • Multiple models out of the box (mediapipe --fast, BiRefNet --quality): rejected for v1. Eval data showed the speed/quality gap is huge (~12 ms/f vs ~263 ms/f) but matting is offline preprocessing where quality wins. One model is simpler and the docs guide lists rembg, BiRefNet, RVM, ComfyUI, DaVinci Resolve, and free-tier SaaS options for the cases where this model isn't the right fit.
  • Python implementation as a separate package (the original sketch): rejected. The repo is bun/TypeScript; adding a Python package would fragment the install story. onnxruntime-node with sharp covers everything we need natively.

Test plan

  • Unit tests added: manager.test.ts (provider selection across auto/cpu/coreml/cuda × Apple Silicon / Linux / HYPERFRAMES_CUDA=1) and pipeline.test.ts (output format inference, input kind detection, encoder args including alpha_mode metadata) — 17 new tests.
  • Full CLI test suite still green: 213/213 tests pass after the change.
  • bunx oxlint, bunx oxfmt --check, bunx tsc --noEmit (cli + core + studio) — all clean.
  • Manual end-to-end smoke (Linux x86, CPU EP, ~320×240 fixture):
    • remove-background --info reports correct provider state, cached/uncached
    • Video → .webm produces VP9-alpha; verified by decoding with -c:v libvpx-vp9 ... -pix_fmt rgba and checking the resulting PNG is RGBA with non-trivial alpha
    • Video → .mov produces ProRes 4444 with alpha
    • Image → .png produces single RGBA frame
    • Error paths (missing input, wrong output extension, bad --device) surface clean messages
  • Documentation: new docs/guides/remove-background.mdx (quick start, performance per platform, composition usage examples, limitations of u2net_human_seg, free alternative tools when this model isn't the right fit, troubleshooting); docs/packages/cli.mdx updated with the command reference; skills/hyperframes-cli/SKILL.md updated.

Notes for review

  • First run downloads ~168 MB of weights to ~/.cache/hyperframes/background-removal/models/. Cached after that.
  • CoreML / CUDA paths weren't smoke-tested — the dev box is Linux CPU. The auto-detect logic is unit-tested with mocked os.platform/os.arch but the actual CoreML provider binding will need verification on Apple Silicon before release. Failure falls back to CPU, so worst case the user sees a CPU run with a warning rather than a hard error.
  • CUDA build is opt-in: the bundled onnxruntime-node is CPU + CoreML only, to keep the install size reasonable. Users with a GPU build set HYPERFRAMES_CUDA=1 to enable the CUDA path.

🤖 Generated with Claude Code

Adds `hyperframes remove-background` — a local-AI subcommand that mattes a
video or image with the u2net_human_seg ONNX model and emits a transparent
WebM (VP9-alpha), ProRes 4444 .mov, or RGBA PNG. Drops directly into any
composition's <video> tag — no green screen, no API keys, no upload.

Auto-picks the fastest available execution provider via onnxruntime-node:
CoreML on Apple Silicon, CUDA when HYPERFRAMES_CUDA=1, CPU otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 4, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
hyperframes 🟢 Ready View Preview May 4, 2026, 4:19 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reproduced two blocking issues on the current head d2ca45ef759dd1f635b815ebfae40941a8d2dc2b.

Repro notes:

  • Normalization: ran the actual u2net_human_seg.onnx model on one generated fixture twice, once with the PR std values and once with rembg's std values. The input tensor changed at all 307,200 values, and the final 320x320 mask changed on 8,317 pixels, with 239 pixels moving by at least 16 alpha levels and max alpha delta 78. That invalidates the PR's pixel-equivalence/quality claim.
  • Signal handling: drove the actual render() pipeline in a one-off Vitest repro with mocked child processes. When the encoder emitted exit(null) with signalCode = SIGTERM, render() still resolved with framesProcessed: 1.

I also tried to reproduce my earlier reusable-buffer concern using delayed child readers and the PR's exact drain wait behavior. I could not make that fail, so I am not carrying that as a requested-change blocker.

const INPUT_SIZE = 320;
const INPUT_PLANE = INPUT_SIZE * INPUT_SIZE;
const MEAN = [0.485, 0.456, 0.406] as const;
const STD = [1.0, 1.0, 1.0] as const;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to use rembg's actual normalization std values, not 1.0. I reproduced this against the real ONNX model: the same generated fixture, run once with the PR std values and once with rembg's (0.229, 0.224, 0.225), changed all 307,200 input tensor values and changed the final 320x320 mask on 8,317 pixels, with max alpha delta 78. Since the command claims rembg-equivalent output, this is a blocker until the preprocessing matches the reference and has a regression test.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 010c4f5. STD is now (0.229, 0.224, 0.225) to match U2netHumanSegSession.predict (the human-seg variant uses ImageNet std, not the (1,1,1) of the base u2net session — copied from the wrong reference).

Added a regression test (inference.test.ts) that pins both MEAN and STD to the rembg reference values with a link to u2net_human_seg.py:33 in the comment, so we don't drift again.

Local smoke shows the alpha range moved meaningfully (0-75 → 0-195 on the same fixture) — the model is now seeing the correct input distribution.

return new Promise<void>((resolve, reject) => {
proc.on("error", reject);
proc.on("exit", (code) => {
if (code === 0 || code === null) resolve();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This treats signaled ffmpeg exits as success. I reproduced it through the actual render() path with mocked child processes: when the encoder emitted exit(null) with signalCode = SIGTERM, render() still resolved successfully with framesProcessed: 1. Please reject unless code === 0 and include the signal in the error message, otherwise killed/terminated encoders can be reported as successful renders.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 010c4f5. waitForExit now uses the (code, signal) callback signature and only resolves when code === 0 && !signal. Signal-killed exits reject with ${label} killed by ${signal}: <stderr tail> (e.g. ffmpeg encoder killed by SIGTERM: ...).

Added four tests in pipeline.test.ts covering: clean exit (resolves), (null, "SIGTERM") (rejects with signal name + stderr), non-zero code (rejects), (null, "SIGKILL") (rejects). The previous behaviour of treating null-code as success is now actively guarded against.

…xits

Address Miguel's review on #612.

- Normalization std was (1, 1, 1) — that's the base u2net session, not
  u2net_human_seg. Switch to ImageNet (0.229, 0.224, 0.225) to match
  rembg's U2netHumanSegSession reference. Add a parity test pinning the
  exact MEAN/STD values.
- waitForExit treated `code === null` as success, but per Node child_process
  docs that's the signal-killed case — a SIGTERM'd ffmpeg encoder was
  reporting success with a partial output. Switch to (code, signal) and
  reject with the signal in the error message. Add four signal-handling
  tests (clean exit, signal-killed, non-zero code, SIGKILL).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed current head 010c4f5. The two requested-change blockers are fixed: u2net_human_seg preprocessing now uses rembg's ImageNet std values and the ONNX repro now has zero tensor/mask delta against the rembg std path; signal-killed ffmpeg exits now reject, with focused regression coverage. Local validation: background-removal tests passed (23/23), CLI typecheck passed after generating runtime artifacts, and the direct waitForExit SIGTERM repro now rejects. No remaining blockers from my review.

@jrusso1020 jrusso1020 merged commit aeae676 into main May 4, 2026
33 checks passed
@jrusso1020 jrusso1020 deleted the feat/cli-remove-background branch May 4, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants