Skip to content

fix(producer): audio drops + blank images on FFmpeg 4.x/CORS-restricted origins#1140

Merged
miguel-heygen merged 3 commits into
mainfrom
fix/amix-normalize-audio-mix
May 31, 2026
Merged

fix(producer): audio drops + blank images on FFmpeg 4.x/CORS-restricted origins#1140
miguel-heygen merged 3 commits into
mainfrom
fix/amix-normalize-audio-mix

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented May 31, 2026

Issues fixed

1. Audio silently dropped on FFmpeg 4.x/some 6.x builds

mixAudioTracks and mixTracks (audioExtractor) both used amix=...:normalize=0. The normalize option was added to FFmpeg's amix filter in FFmpeg 5.0 and is absent from many system packages (Ubuntu 20.04 ships 4.2.7). When unrecognized, FFmpeg aborts filter-graph initialization — processCompositionAudio returns success:false, runAudioStage sets hasAudio=false, and the assembled MP4 has no audio stream. No error surfaced to the user.

2. Images with crossorigin="anonymous" render blank

External <img> elements from CORS-restricted origins (e.g. S3 buckets without an Access-Control-Allow-Origin header for localhost) fail to load when crossorigin="anonymous" forces CORS mode against the renderer's localhost file server. The same strip was already applied to <video> elements (htmlCompiler.ts:261) but not images.

Changes

File Change
engine/audioMixer.ts Remove normalize=0 + weights= from amix, multiply master gain by tracks.length to compensate
producer/audioExtractor.ts Same fix for the video data-has-audio mixing path
producer/htmlCompiler.ts Strip crossorigin from <img> elements, matching the existing <video> strip

Math

amix with normalize=true (default) divides output by track count. Adding volume=N after the mix restores the net level — identical to normalize=0 on FFmpeg versions that support it.

Verification

Rendered the composition from the bug report (3 audio tracks + CORS-restricted S3 images) locally:

  • Before: video only, blank cards
  • After: video + 28s AAC audio, images loaded correctly

ffprobe confirms: video h264 28.0s + audio aac 28.0s

amix's normalize=0 option is absent from many FFmpeg builds (e.g.
FFmpeg 4.2 on Ubuntu 20.04). When the option is not recognized, FFmpeg
fails the entire filter graph initialization, processCompositionAudio
returns success:false, and the assembled video has no audio stream.

Replace normalize=0 + weights='1...' with the amix default behavior
(normalize=true, divides by track count) and multiply the master output
gain by the track count to restore the original per-track volumes.
The net volume is identical across all FFmpeg versions.

Fixes #1136-adjacent: reported as 'audio doesn't play' in rendered MP4.
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed end-to-end. (Home flagged "no description" but the PR body is detailed — root cause, repro, fix, test plan all present.) Diagnosis verified, fix math checks out, but I found a sister-bug surface in audioExtractor.ts that has the same FFmpeg-4.x incompatibility and isn't touched here.

Diagnosis verified

  • amix normalize option was added in FFmpeg 5.0; Ubuntu 20.04 ships 4.2.7, plenty of 6.x packages omit it. ✓
  • Confirmed the team already knows about this — packages/producer/src/utils/audioRegression.ts:137-146 has an existing comment: "Avoids amix's normalize option (not available on ffmpeg 4.x) — we use volume=-1 + amix to" — institutional knowledge that didn't propagate when audioMixer.ts was last touched.

Fix math

The PR replaces amix=...:normalize=0 (raw sum) + volume=masterGain with amix=... (sum/N default) + volume=masterGain × N. Net:

  • Old: Σ inputs × masterGain
  • New: (Σ inputs / N) × (masterGain × N) = Σ inputs × masterGain

Mathematically identical at the output. The weights='1 1 1' clause being removed is correct — uniform weights are the default. ✓

Sister-bug — audioExtractor.ts has the SAME pattern

// packages/producer/src/services/audioExtractor.ts
const mixFilter = `${mixInputs}amix=inputs=${tracks.length}:duration=longest:normalize=0[out]`;

Same :normalize=0 clause, same FFmpeg-4.x abort, same silent-failure path (mixer returns success: false → assembled MP4 has no audio). This file is in the production rendering path (extracting audio from video tracks for compositions that have data-has-audio="true" on <video> elements). A user with a composition that uses video audio (per yesterday's hf#1140-adjacent work in this area) on FFmpeg 4.x hits the same bug.

This isn't a sample I had to dig for — audioExtractor.ts is in the same package and on the same code path that handles audio compositing. Worth folding into the same PR rather than shipping a half-fix.

Severity discipline (per the retro pattern)

Walking next-user-action from the broken state:

  • User renders a composition with audio on FFmpeg 4.x
  • Mixer silently fails → MP4 emitted with video only
  • User downloads, plays back, hears no audio
  • No error surfaced anywhere — user thinks they authored the composition wrong

This is production asset loss, not "minor compatibility." The audio data was assembled correctly upstream; it just never made it into the final MP4. PR body's "silent failure" framing is accurate. Severity should be treated as a P1-ish bug, not a P3 compatibility nit.

Test coverage

PR claims existing producer regression tests pass (same net volume). No new test specifically pinning the FFmpeg-4.x compatibility. A minimal hermetic test would be:

  1. Construct a filter graph via mixAudioTracks with mocked runFfmpeg
  2. Assert the filter string does NOT include normalize= or weights=
  3. Assert the post-mix volume= filter equals masterOutputGain * tracks.length

That pins the contract Vai's review will look for tomorrow if someone re-introduces the option. Non-blocking, but cheap to add.

Other observations

  • audioExtractor.ts (production) — definitely needs the same fix, per above.
  • examples/aws-lambda/scripts/eval.sh — also uses normalize=0 but it's an eval script for sample-cancellation testing, not the production path. If someone runs eval.sh on FFmpeg 4.x for a metrics check, it'd fail there too. Minor.
  • audioRegression.ts — already correct. Good precedent the audioMixer fix could have cited.

Verdict

Materially LGTM on the audioMixer.ts math + the fix shape. Real blocker: the sibling fix in audioExtractor.ts should be in the same PR. Same root cause, same blast radius, two-line diff. Right move is to extend this PR rather than leave the half-fix on main and follow up with another.

Holding the stamp — same protocol. James gates.

— Rames Jusso (hyperframes)

Two follow-up fixes:

1. htmlCompiler: strip crossorigin attribute from <img> elements during
   compilation. External images (e.g. S3) with crossorigin='anonymous'
   force CORS-mode requests against the renderer's localhost file server,
   which S3 rejects → images render blank. Matches the existing video
   strip at line 261.

2. audioExtractor: same amix normalize=0 bug as audioMixer.ts. The
   audioExtractor path is used for <video data-has-audio='true'> mixing
   in the CLI's local render pipeline; on FFmpeg 4.x it would also drop
   audio silently. Fix: remove normalize=0, compensate with volume=N.
@miguel-heygen miguel-heygen changed the title fix(engine): audio drops on FFmpeg builds without amix normalize option fix(producer): audio drops + blank images on FFmpeg 4.x/CORS-restricted origins May 31, 2026
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: APPROVE

The fix is correct and the root cause analysis is solid. Here's the full breakdown.

Math

Old: amix normalize=0 passes samples through at full amplitude → output = Σ(track[i]) → volume=masterGain brings it to the intended level.

New: amix default (normalize=true) divides by N → output = Σ(track[i]) / N → volume=masterGain×N exactly restores it.

The compensation is exact because apad=whole_dur=totalDuration is preserved unchanged — all N inputs remain active for the same full duration, so amix always divides by a fixed N, not by the number of inputs still active at a given sample. If apad were ever dropped, this equivalence would drift. That's fine for this PR but worth keeping in mind if apad gets revisited.

Per-track volume= filters (the buildVolumeExpression path) are not touched — individual track gains are unchanged.

Scope

packages/producer/src/services/audioMixer.ts is a pure re-export from engine — there's no duplicate implementation to update. The fix covers both consumers.

formatFilterNumber on the compensated gain

Using formatFilterNumber here is correct. Without it, a 3-track composition with masterOutputGain=1 would emit volume=3 which is fine, but a non-unit PRODUCER_AUDIO_GAIN (e.g. 0.7) would produce volume=2.0999999999999996 — a malformed filter string on some FFmpeg builds. Good catch.

What's missing (nit only)

nit: The test at line 65 ([mixed]volume=1[out]) passes by coincidence — single track → 1 * 1 = 1. There's no multi-track test that exercises compensatedGain = masterOutputGain * tracks.length with N>1. A 2-track case asserting [mixed]volume=2[out] (or volume=1.4 with a custom gain) would lock in the semantics. Not a blocker — the math is simple and the comment is clear — but it's the only novel logic in this commit.

nit: Commit message says "Fixes #1136-adjacent" — if there's a linked issue, a clean Fixes #XXXX reference is preferable. If not, drop the -adjacent.

Test plan assessment

The manual repro (ffprobe showing video-only before, video+AAC after) and the note that "existing producer regression audio correlation tests pass" are both meaningful signals. The unit test suite does exercise processCompositionAudio with the runFfmpeg mock, and the filter-string assertions for the single-track default case pass correctly.

Ship it.

— Vai

@jrusso1020
Copy link
Copy Markdown
Collaborator

Correction to my earlier review: I claimed packages/producer/src/services/audioExtractor.ts was a sister-bug not fixed by this PR. That was wrong — the second commit (8f7f69d6) applies the same normalize=0 → default-amix + volume=N compensation to audioExtractor.ts. Both production paths (engine audioMixer.ts + producer audioExtractor.ts) are covered in this PR. Magi and Vai caught it.

Root cause of my mistake: I sourced the sister-bug finding from mcp__github__search_code, which indexes the repo's default branch, not the PR's HEAD. The hit on audioExtractor.ts was the pre-fix state on main, not the in-PR state. The right artifact to grep is the raw PR diff (patch-diff.githubusercontent.com/raw/.../1140.diff), which immediately shows audioExtractor.ts is touched here. Filed as a memory entry for next time.

Net review: math correct on both files, FFmpeg 2.5+ syntax (works everywhere realistic), dropout_transition=0 correctly retained, both production paths covered. The two open nits remain Vai's: no multi-track unit test pinning the compensatedGain = masterOutputGain × N math, and the issue link could be more precise. Stamp still held — James gates.

— Rames Jusso (hyperframes)

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed at 8f7f69d6. Three files, 16+/5-, all three changes verified. Materially LGTM; holding the stamp per protocol — James gates.

What changed since my first pass

  • audioExtractor.ts — applies the same normalize=0 → default-amix + volume=N pattern (resolves what I incorrectly called a "missing sister-bug"; see my correction comment).
  • htmlCompiler.ts — new: strips crossorigin from <img>, mirroring the existing <video> strip.

audioExtractor.ts — verified

const mixFilter = `${mixInputs}amix=inputs=${tracks.length}:duration=longest[mixed]`;
const postMixGain = `[mixed]volume=${tracks.length}[out]`;

Math: pre-fix output was Σ inputs (raw sum, via normalize=0). Post-fix output is (Σ inputs / N) × N = Σ inputs. Identical. ✓ No master gain in this path (consistent with the original code's normalize=0[out] having no separate volume filter).

Small consistency nit, non-blocking: audioMixer.ts runs the compensated gain through formatFilterNumber(); audioExtractor.ts interpolates tracks.length raw. For pure integers it's fine — FFmpeg's volume filter parses volume=3 unambiguously — but you might want to mirror the helper for symmetry. Pure style, take or leave.

htmlCompiler.ts — verified

compiledHtml = compiledHtml.replace(/(<img\b[^>]*)\s+crossorigin(?:=["'][^"']*["'])?/gi, "$1");

Regex is the same shape as the <video> strip three lines above (<video\b[^>]*crossorigin(?:=…)?). Matches both quoted and unquoted forms, optional attribute value. ✓

Rationale checks out: the renderer captures via headless Chrome's BeginFrame screenshot path (compositor-level), not canvas readback (toDataURL/getImageData). CORS-tainted images render fine to the compositor; they just can't be read back into JS. The fix unblocks the actual render path.

One edge case worth knowing — not a blocker: if a composition uses WebGL textures (gl.texImage2D(img)), it DOES need CORS-clean images, and stripping crossorigin would silently break those textures. Blast radius is small (HTML/CSS compositions don't hit this) and the workaround for any future WebGL author is to host the asset on the localhost file server. Worth a one-line comment in the code if WebGL is ever a supported path, otherwise fine to leave.

Tests

Echo of Vai's nit, expanded:

  • Audio path: no multi-track unit test pins the compensatedGain = masterOutputGain × N math. With N=1 the existing test passes either formulation; would only catch a regression on N≥2.
  • CORS path: matching strip-crossorigin-from-<video> likely has a test; <img> should have a parallel one. Two-line addition to whatever the existing video-strip test asserts.

Both non-blocking — if Miguel wants to fold them in here vs. follow-up is a judgment call, but pinning the multi-track math seems cheap enough to do now.

Verdict

Math correct on both audio files. Both production paths covered (engine + producer). CORS-img strip is safe under the current render model. Test coverage is the only remaining open item. Stamp still held — James gates.

— Rames Jusso (hyperframes)

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second-pass review covering both commits in full.

Commit 1 (22d33de) — audioMixer.ts: Already covered in prior round. Fix is correct.

Commit 2 (8f7f69d) — audioExtractor.ts + htmlCompiler.ts:

audioExtractor.ts (mixTracks): same pattern as audioMixer. normalize=0 dropped, replaced with amix default (divides by N) + volume=${tracks.length} post-mix gain. Math is correct — per-track volume is applied earlier in filterParts via volume=${track.volume}, so the chain is track.volume × (1/N) × N = track.volume, identical to the pre-patch intent. No masterOutputGain concept here, so no multiplier needed beyond N — correct.

htmlCompiler.ts (<img crossorigin> strip): Regex is identical in shape to the existing <video> strip immediately above. The renderer captures DOM frames via compositor screenshot (BeginFrame), not canvas readback, so crossorigin on <img> is not needed for rendering correctness. I traced the WebGL texture path in hyper-shader.ts — textures are sourced from captured frame blobs (blobToTextureSource), not from raw <img> elements, so stripping crossorigin from <img> does not affect shader transitions. Change is safe.

normalize=0 audit: Grepped the full codebase — only these two files used amix ... normalize=0. Both are fixed. No other sites.

CI: All required checks pass (CLI smoke, Build, Test, Typecheck, Lint, Format, Test: runtime contract, Semantic PR title, etc.). Pending checks are regression shards and Windows tests, which are non-required.

Clean across both commits. No blockers.

— Vai

…strip

- audioMixer.test.ts: assert filter has no normalize=/weights=; add
  3-track test confirming compensatedGain = masterGain × N = 3
- htmlCompiler.test.ts: parallel tests for img and video crossorigin
  strip (covers both elements, not just video)
@miguel-heygen miguel-heygen merged commit fc608ad into main May 31, 2026
39 of 41 checks passed
@miguel-heygen miguel-heygen deleted the fix/amix-normalize-audio-mix branch May 31, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants