feat(cli): add remove-background command for transparent video#612
feat(cli): add remove-background command for transparent video#612jrusso1020 merged 2 commits intomainfrom
Conversation
Adds `hyperframes remove-background` — a local-AI subcommand that mattes a video or image with the u2net_human_seg ONNX model and emits a transparent WebM (VP9-alpha), ProRes 4444 .mov, or RGBA PNG. Drops directly into any composition's <video> tag — no green screen, no API keys, no upload. Auto-picks the fastest available execution provider via onnxruntime-node: CoreML on Apple Silicon, CUDA when HYPERFRAMES_CUDA=1, CPU otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
miguel-heygen
left a comment
There was a problem hiding this comment.
I reproduced two blocking issues on the current head d2ca45ef759dd1f635b815ebfae40941a8d2dc2b.
Repro notes:
- Normalization: ran the actual
u2net_human_seg.onnxmodel on one generated fixture twice, once with the PR std values and once with rembg's std values. The input tensor changed at all 307,200 values, and the final 320x320 mask changed on 8,317 pixels, with 239 pixels moving by at least 16 alpha levels and max alpha delta 78. That invalidates the PR's pixel-equivalence/quality claim. - Signal handling: drove the actual
render()pipeline in a one-off Vitest repro with mocked child processes. When the encoder emittedexit(null)withsignalCode = SIGTERM,render()still resolved withframesProcessed: 1.
I also tried to reproduce my earlier reusable-buffer concern using delayed child readers and the PR's exact drain wait behavior. I could not make that fail, so I am not carrying that as a requested-change blocker.
| const INPUT_SIZE = 320; | ||
| const INPUT_PLANE = INPUT_SIZE * INPUT_SIZE; | ||
| const MEAN = [0.485, 0.456, 0.406] as const; | ||
| const STD = [1.0, 1.0, 1.0] as const; |
There was a problem hiding this comment.
This needs to use rembg's actual normalization std values, not 1.0. I reproduced this against the real ONNX model: the same generated fixture, run once with the PR std values and once with rembg's (0.229, 0.224, 0.225), changed all 307,200 input tensor values and changed the final 320x320 mask on 8,317 pixels, with max alpha delta 78. Since the command claims rembg-equivalent output, this is a blocker until the preprocessing matches the reference and has a regression test.
There was a problem hiding this comment.
Fixed in 010c4f5. STD is now (0.229, 0.224, 0.225) to match U2netHumanSegSession.predict (the human-seg variant uses ImageNet std, not the (1,1,1) of the base u2net session — copied from the wrong reference).
Added a regression test (inference.test.ts) that pins both MEAN and STD to the rembg reference values with a link to u2net_human_seg.py:33 in the comment, so we don't drift again.
Local smoke shows the alpha range moved meaningfully (0-75 → 0-195 on the same fixture) — the model is now seeing the correct input distribution.
| return new Promise<void>((resolve, reject) => { | ||
| proc.on("error", reject); | ||
| proc.on("exit", (code) => { | ||
| if (code === 0 || code === null) resolve(); |
There was a problem hiding this comment.
This treats signaled ffmpeg exits as success. I reproduced it through the actual render() path with mocked child processes: when the encoder emitted exit(null) with signalCode = SIGTERM, render() still resolved successfully with framesProcessed: 1. Please reject unless code === 0 and include the signal in the error message, otherwise killed/terminated encoders can be reported as successful renders.
There was a problem hiding this comment.
Fixed in 010c4f5. waitForExit now uses the (code, signal) callback signature and only resolves when code === 0 && !signal. Signal-killed exits reject with ${label} killed by ${signal}: <stderr tail> (e.g. ffmpeg encoder killed by SIGTERM: ...).
Added four tests in pipeline.test.ts covering: clean exit (resolves), (null, "SIGTERM") (rejects with signal name + stderr), non-zero code (rejects), (null, "SIGKILL") (rejects). The previous behaviour of treating null-code as success is now actively guarded against.
…xits Address Miguel's review on #612. - Normalization std was (1, 1, 1) — that's the base u2net session, not u2net_human_seg. Switch to ImageNet (0.229, 0.224, 0.225) to match rembg's U2netHumanSegSession reference. Add a parity test pinning the exact MEAN/STD values. - waitForExit treated `code === null` as success, but per Node child_process docs that's the signal-killed case — a SIGTERM'd ffmpeg encoder was reporting success with a partial output. Switch to (code, signal) and reject with the signal in the error message. Add four signal-handling tests (clean exit, signal-killed, non-zero code, SIGKILL). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-reviewed current head 010c4f5. The two requested-change blockers are fixed: u2net_human_seg preprocessing now uses rembg's ImageNet std values and the ONNX repro now has zero tensor/mask delta against the rembg std path; signal-killed ffmpeg exits now reject, with focused regression coverage. Local validation: background-removal tests passed (23/23), CLI typecheck passed after generating runtime artifacts, and the direct waitForExit SIGTERM repro now rejects. No remaining blockers from my review.
What
Adds
hyperframes remove-background, a new CLI subcommand that mattes a video or image locally with a small AI model and outputs a transparent WebM, ProRes 4444.mov, or RGBA PNG. The output drops directly into any composition's<video>or<img>element — no green screen, no API keys, no cloud upload.npx hyperframes remove-background avatar.mp4 -o transparent.webm npx hyperframes remove-background portrait.jpg -o cutout.png npx hyperframes remove-background --info # detected providersWhy
Putting an avatar (or any subject) over an arbitrary background is one of the most common asks for a HyperFrames composition. Today the user has to wire up a third-party tool (rembg, Runway, unscreen.com), get the encoding flags right (
yuva420p+alpha_mode=1— easy to miss; broken alpha in Chrome is silent), and integrate the output. Shipping it as a first-class CLI subcommand removes that friction and makes "transparent avatar over a scene" a one-liner.The choice to run it locally (rather than punt to a SaaS) means: no API keys to configure, no upload of source media, runs fully offline once the model is cached.
How
u2net_human_seg(MIT, ~168 MB ONNX) — the recommendation from a 12-cell matting eval that compared u²-net, MediaPipe SelfieSegmenter, BiRefNet, RVM, and BRIA RMBG-2.0 across CoreML, CUDA, and CPU. Single model on purpose: it's the only one that's MIT-clean, runs everywhere, and produces production-quality output for portraits at reasonable memory (~1.5 GB peak). Pre/postprocessing matches rembg's u²-net session exactly so output should be pixel-equivalent.onnxruntime-nodefor inference,sharpfor resize,ffmpegfor decode/encode (already required by other CLI commands). Provider auto-detect picks CoreML on Apple Silicon, CUDA whenHYPERFRAMES_CUDA=1, CPU otherwise. Provider failure falls back to CPU rather than aborting.ffmpegdecode → per-frame ONNX →ffmpegencode, with a streaming generator between the two and pre-allocated buffers for the hot path (Float32 input tensor, mask, RGBA output reused across every frame).-pix_fmt yuva420p+-metadata:s:v:0 alpha_mode=1+-auto-alt-ref 0) is what makes Chrome's<video>element decode the alpha plane. ProRes 4444.movand single-image PNG are also supported.--infoflag (no input required) prints detected providers + cache state; matches thetts --listpattern already in the codebase.hasFFmpegwas promoted fromwhisper/manager.tsand a siblinghasFFprobeadded there (one helper, two consumers). Media metadata extraction reusesextractMediaMetadatafrom@hyperframes/enginerather than re-rolling ffprobe parsing.The internal directory is named
packages/cli/src/background-removal/and the public command isremove-background— earlier iterations called itmatte, but most users wouldn't recognize the VFX term, and "Remove Background" matches what Adobe / Canva / Figma all use for the same thing. The docs intro keeps the "matting" term searchable for the audience that does know it.Alternatives considered
-pix_fmt), not the matting model — that's exactly what a CLI should encapsulate.--fast, BiRefNet--quality): rejected for v1. Eval data showed the speed/quality gap is huge (~12 ms/f vs ~263 ms/f) but matting is offline preprocessing where quality wins. One model is simpler and the docs guide listsrembg, BiRefNet, RVM, ComfyUI, DaVinci Resolve, and free-tier SaaS options for the cases where this model isn't the right fit.onnxruntime-nodewithsharpcovers everything we need natively.Test plan
manager.test.ts(provider selection acrossauto/cpu/coreml/cuda× Apple Silicon / Linux /HYPERFRAMES_CUDA=1) andpipeline.test.ts(output format inference, input kind detection, encoder args including alpha_mode metadata) — 17 new tests.bunx oxlint,bunx oxfmt --check,bunx tsc --noEmit(cli + core + studio) — all clean.remove-background --inforeports correct provider state, cached/uncached.webmproduces VP9-alpha; verified by decoding with-c:v libvpx-vp9 ... -pix_fmt rgbaand checking the resulting PNG is RGBA with non-trivial alpha.movproduces ProRes 4444 with alpha.pngproduces single RGBA frame--device) surface clean messagesdocs/guides/remove-background.mdx(quick start, performance per platform, composition usage examples, limitations ofu2net_human_seg, free alternative tools when this model isn't the right fit, troubleshooting);docs/packages/cli.mdxupdated with the command reference;skills/hyperframes-cli/SKILL.mdupdated.Notes for review
~/.cache/hyperframes/background-removal/models/. Cached after that.os.platform/os.archbut the actual CoreML provider binding will need verification on Apple Silicon before release. Failure falls back to CPU, so worst case the user sees a CPU run with a warning rather than a hard error.onnxruntime-nodeis CPU + CoreML only, to keep the install size reasonable. Users with a GPU build setHYPERFRAMES_CUDA=1to enable the CUDA path.🤖 Generated with Claude Code