Skip to content

fix(producer): localize remote media sources + strip audio crossorigin#1146

Merged
miguel-heygen merged 2 commits into
mainfrom
fix/localize-remote-media-sources
Jun 1, 2026
Merged

fix(producer): localize remote media sources + strip audio crossorigin#1146
miguel-heygen merged 2 commits into
mainfrom
fix/localize-remote-media-sources

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

Root causes

Compositions using remote S3 URLs for <video> and <audio> (e.g. generated cinematic trailers) rendered with blank black frames and silent audio.

Bug 1 — Remote <video>/<audio> sources → blank frames

The renderer (Puppeteer) waits for all <video> elements to reach readyState >= 2 (HAVE_CURRENT_DATA — a frame is actually decoded) before beginning capture. With 10+ large S3 clips the browser must buffer all of them over the network simultaneously. This consistently exhausts pageReadyTimeout; frameCapture.ts logs a warning and continues with every clip as a black rectangle.

Fix: localizeRemoteMediaSources() in compileForRender — scans <video> and <audio> opening tags for HTTP src URLs, downloads all unique URLs in parallel during compilation, rewrites the src attributes to local relative paths (_remote_media/<file>), and registers the downloaded files as externalAssets so writeCompiledArtifacts copies them into compiledDir. The file server serves them from localhost; the browser reads local files and reaches readyState >= 2 immediately.

Download failures (404, timeout, DNS) warn and fall back to the original remote URL so the render degrades gracefully rather than crashing.

Bug 2 — crossorigin="anonymous" on <audio> not stripped

htmlCompiler.ts already stripped crossorigin from <video> (pre-#1140) and <img> (#1140) but not <audio>. When the browser's preload path tries to fetch audio from S3 with CORS mode from localhost, S3 may reject the request, leaving the element in a failed network state. Extended the strip to <audio>.

Changes

  • packages/producer/src/services/htmlCompiler.ts

    • New localizeRemoteMediaSources() function (downloads + rewrites)
    • Wire into compileForRender after external asset collection
    • Add <audio> crossorigin strip in compileHtmlFile
  • packages/producer/src/services/htmlCompiler.test.ts

    • New it("strips crossorigin from <audio> elements") test

Testing

  • bun test packages/producer/src/services/htmlCompiler.test.ts — 44/44 pass (all crossorigin tests green, download fallback warnings expected for example.com URLs)
  • Rendered skybound_trailer_index.html (10 S3 video clips, 9 S3 audio tracks) — video and audio confirmed in rendered output (see thread)

Two bugs affecting compositions that use remote S3 URLs for video/audio.

Bug 1 — Remote <video>/<audio> sources cause blank frames
The renderer (Puppeteer) must buffer all video elements to readyState >= 2
before frame capture begins. With 10+ large S3 clips, Chrome exhausts
pageReadyTimeout and every clip renders as a blank black frame. Fix:
localizeRemoteMediaSources() downloads all remote <video>/<audio> src
URLs in parallel during compilation and rewrites the src attributes to
local paths served by the file server, eliminating the buffering race.

Bug 2 — crossorigin on <audio> elements not stripped
htmlCompiler.ts already stripped crossorigin from <video> and <img>
(hf#1140) but missed <audio>. Compositions with crossorigin="anonymous"
on audio elements caused CORS-mode failures against the localhost file
server. Extended the strip to cover <audio>.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at 63b8f1dd. Two real fixes (remote-media localization + <audio crossorigin> strip), each well-motivated. One real correctness issue on Windows; one notable test-coverage gap; rest is minor. PR body has the description (Home's "no description" framing was stale — same pattern as hf#1129/1140).

Diagnosis verified

Both root causes check out:

  • Remote media → blank frames. Renderer waits for <video readyState >= 2> before capture; with 10+ S3 clips the network buffering blows pageReadyTimeout; frameCapture.ts warns + emits black rectangles. Localizing the sources before render eliminates the network race. ✓
  • <audio crossorigin> not stripped. htmlCompiler.ts had <video> (pre-#1140) and <img> (#1140) strips but no <audio>. S3 buckets without CORS headers reject the localhost preflight; audio element ends in failed network state. ✓ The added strip mirrors the existing pattern exactly. Comment makes the orthogonality with the FFmpeg audio path explicit, which is nice — clears up "why isn't this a regression?"

Real blocker — absPath.split("/").at(-1) is non-portable

// htmlCompiler.ts (new code in localizeRemoteMediaSources)
const relPath = `${REMOTE_MEDIA_SUBDIR}/${absPath.split("/").at(-1)}`;

downloadToTemp builds localPath via Node's path.join, which is OS-aware. On Windows that produces backslash-separated paths like C:\tmp\foo\_remote_media\download_abc.mp4. Splitting on / returns the whole path as one element, so relPath ends up as _remote_media/C:\tmp\foo\_remote_media\download_abc.mp4 — wrong URL for the file server + garbage when written into the HTML src attribute.

CI runs Tests on windows-latest. Currently green because the new code path doesn't have a Windows-platform regression test (more on this below). Production renders on Windows would silently regress.

Fix is one line:

import { basename } from "path";
// ...
const relPath = `${REMOTE_MEDIA_SUBDIR}/${basename(absPath)}`;

The / separator in relPath is correct as-is — that's a URL path, not a filesystem path, and HTML src attributes always use / regardless of OS. The issue is just the basename extraction.

Test coverage gap on the main fix

localizeRemoteMediaSources is ~80 lines of substantive logic — download, dedup by URL, fallback on download failure, HTML rewrite across both quote styles, basename derivation, externalAssets registration. The PR adds one new test (strips crossorigin from <audio> elements), which is fine for the audio-CORS half, but the localize half has zero unit test coverage. Manual smoke on skybound_trailer_index.html (the bug-report composition) is mentioned in the PR body but doesn't pin the behaviors for the next person.

A handful of cheap tests with fetch mocked would pin:

  1. Multiple <video> tags pointing at the SAME URL → one download, both rewritten. (Dedup correctness.)
  2. Download failure (fetch rejects) → original URL preserved in HTML + warning logged + render continues. (Graceful-degradation contract.)
  3. Mixed quote styles (src="https://..." vs src='https://...') → both rewritten. (replaceAll covers both.)
  4. Mix of remote + local sources → only remote get rewritten; locals untouched.
  5. (Windows guard) basename extraction works on a path with backslashes — easiest if it's an explicit unit on the basename derivation.

The dedup behavior in particular is non-obvious from the diff (uses a Set + a per-URL urlToLocal map); without a test it can easily regress to per-tag downloads on a future refactor.

Echo of Magi's hf#1145 catch — regex [^>] truncation

The new regexes both use [^>] spans:

const REMOTE_MEDIA_TAG_RE =
  /<(?:video|audio)\b[^>]*?\bsrc\s*=\s*["'](https?:\/\/[^"']+)["'][^>]*>/gi;

// ...and the existing crossorigin-strip pattern:
compiledHtml = compiledHtml.replace(/(<audio\b[^>]*)\s+crossorigin(?:=["'][^"']*["'])?/gi, "$1");

A literal > inside a quoted attribute value on the same tag (e.g. data-title="A > B") would truncate the regex match early → tag missed → fallback to remote URL. Composition <video> / <audio> roots don't typically carry prose-bearing attributes so this isn't a practical bite, but it'd be worth a one-line comment recording the assumption — same suggestion Magi made on hf#1145's detection regex; the codebase is accumulating [^>]-span regexes and the documented gotcha would scale.

Severity verification

Walking the next user action from the broken state:

  • Localize bug: user renders a cinematic with 10+ remote sources → render "succeeds" (no error code) → MP4 emitted → user downloads + plays → black frames + silent audio with no explanation. Destructive output — the asset looks complete but isn't usable. User can't tell whether their composition is wrong or the renderer is. This is the more serious of the two halves; "blank video silently delivered" is the exact thing API customers can't tolerate.
  • Audio CORS strip: same shape as hf#1140's video/img strips. Compositions with remote <audio> (background music, voiceover stems on S3) would have silently dropped audio at the engine layer until they hit the FFmpeg out-of-band path. Net effect: also production asset loss for the audio half of any composition with remote audio sources. Equally P1-ish.

Both halves grade as real production bugs; the PR's "fix" framing is right.

Other observations

  • fallow-ignore-next-line complexity on compileForRender — fair use; the function is doing genuine orchestration with several conditional branches. Not a blocker, but the function is getting unwieldy. Future PRs might benefit from extracting the per-feature compilation steps into named helpers (compileWithExternalAssets, compileWithPositionScript, compileWithLocalizedMedia) so each step is reviewable on its own.
  • downloadToTemp already has process-local caching (downloadPathCache + inFlightDownloads in the engine helper). Subsequent renders within the same worker reuse downloads transparently. Nice — no per-render network thrash.
  • No interaction with hf#1140 — that PR fixed the audio mixer + image strip; this one extends image strip to audio. Orthogonal. No shared lines.

Verdict

Materially LGTM on diagnosis + fix shape. Real blocker on the Windows basename bug — needs to land before merge or CI's Windows job will quietly regress. Test coverage gap on the main fix is the next thing; non-blocking but worth folding in. Echo-Magi comment on the regex span is cosmetic.

Stamp held — same protocol. James gates.

— Rames Jusso (hyperframes)

Addresses Rames' review on hf#1146:

- Replace `absPath.split('/').at(-1)` with `path.basename(absPath)`. On
  Windows, path.join emits backslash-separated paths; split('/') returns
  the whole path as a single element, producing a garbage relPath.
  path.basename delegates to the OS separator on the current platform.

- Export `localizeRemoteMediaSources` for unit testing. Tests verify:
  - Successful download rewrites src to _remote_media/ path
  - Download failure preserves original URL without throwing
  - Duplicate src URL across two tags → single fetch call (dedup)
  - Local (non-HTTP) src paths are not rewritten
  - Both double-quoted and single-quoted src attributes are rewritten
  - basename extraction is correct on POSIX paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jrusso1020
Copy link
Copy Markdown
Collaborator

Correction to my earlier review: I flagged "would be worth a one-line comment recording the assumption" for the [^>] span in REMOTE_MEDIA_TAG_RE. That comment was already in the original commit — Magi correctly pointed this out. Grepping the raw diff would have shown:

+// Match opening tags of <video> or <audio> elements that carry an HTTP(S) src.
+// Uses [^>]* to span attributes — safe for composition elements that won't
+// have `>` inside quoted attribute values (data-title etc.).
+const REMOTE_MEDIA_TAG_RE =

I scanned the diff in chunks and missed the comment lines. Same sampling-miss pattern I hit on hf#1123 — the lesson on grepping raw artifacts before asserting "X isn't in the diff" didn't catch this time because I framed it as "would be worth adding," which sounded additive but was actually a false claim about what wasn't there. Filing as a memory refinement: trigger phrases like "worth a comment noting" or "would benefit from documenting X" require the same curl patch-diff... | grep reflex as "X isn't in the diff."

The other two findings stand — Magi's commits address them cleanly:

  • Windows basename portability fix: path.basename(absPath)
  • localizeRemoteMediaSources test coverage: 6 new tests covering dedup, download-fail fallback, local-src untouched, mixed quote styles, basename portability ✓

— Rames Jusso (hyperframes)

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is correct and well-reasoned. Pre-downloading remote media before the file server starts is the right solution to the readyState >= 2 race. Traced the full path resolution chain: downloads/_remote_media/externalAssetswriteCompiledArtifactscompiled/_remote_media/resolveProjectRelativeSrc picking up compiledDir first. Solid end-to-end.

Magi's follow-up commits resolve my two blockers:

  • localizeRemoteMediaSources now has 6 unit tests covering success, fail-fallback, dedup, local-src passthrough, quote styles, and Windows basename — same bar as inlineExternalScripts above it. ✓
  • path.basename(absPath) replaces the split-based extraction. ✓

Nit (clean up in follow-up)

// fallow-ignore-next-line complexity is a no-op — fallow is not a recognized linting directive in this repo (oxlint uses oxlint-disable-next-line). If the complexity rule isn't active, remove it; if it is, swap in the correct directive.

Nit

\bsrc in REMOTE_MEDIA_TAG_RE will match data-src because - is a non-word character and \b fires between - and s. Not a current bug (no data-src in this codebase), but a silent correctness landmine if it ever appears. Fix: (?<![\w-])src\b.

Nit

The replaceAll('"..."', ...) rewrite operates on the full HTML string — same URL in an inlined JS literal, CSS url(), or JSON block would also get rewritten. Currently harmless for the composition format, but brittle. A comment explaining why DOM-based rewrite isn't used (DOM round-trip perturbs the position-reapply <script> injected just above) would close the gap.


What's solid

  • downloadToTemp module-level cache is safe: rechecks existsSync on hits, so files deleted with workDir on render cleanup don't produce stale paths. ✓
  • recompileWithResolutions is clean: starts from already-rewritten compiled.html, spreads externalAssets via ...compiled. ✓
  • Fail-open fallback on download error is the right default for a render system. ✓
  • Crossorigin strip regex is correct for all attribute positions and both value forms. ✓

LGTM. Core fix is sound, path chain is correct, blockers are addressed.

— Vai

@miguel-heygen miguel-heygen merged commit 7bbedc0 into main Jun 1, 2026
41 checks passed
@miguel-heygen miguel-heygen deleted the fix/localize-remote-media-sources branch June 1, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants