Skip to content

fix(producer): localize remote @font-face src URLs before render#1155

Merged
miguel-heygen merged 4 commits into
mainfrom
fix-remote-fontface
Jun 2, 2026
Merged

fix(producer): localize remote @font-face src URLs before render#1155
miguel-heygen merged 4 commits into
mainfrom
fix-remote-fontface

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented Jun 1, 2026

Root cause

@font-face { src: url("https://s3.../font.ttf") } declarations with remote URLs fail silently during rendering. Chrome fetches the font from the local file server origin (http://localhost:PORT); S3 does not return Access-Control-Allow-Origin for that origin → CORS rejection → Chrome falls back to the next font in the stack (e.g. Arial). The composition renders with wrong typography but no visible error.

Reported: Beasty Style caption template using Komika Axis .ttf from S3 always falling back to Arial.

Fix

localizeRemoteFontFaces() — parallel to the existing localizeRemoteMediaSources() for <video>/<audio>:

  • Scans only <style> blocks (not JS) to avoid false-positive URL matches
  • Extracts HTTP url() references inside @font-face rules
  • Downloads each font file in parallel into _remote_media/
  • Rewrites url("https://...")url("_remote_media/filename.ttf") in the compiled HTML
  • Preserves original URL on download failure (warn + continue)
  • background-image: url(...) and other non-font-face URLs are untouched

The shared download+rewrite logic is extracted into downloadAndRewriteUrls() to eliminate duplication between the two localize functions.

Tests

6 new unit tests in htmlCompiler.test.ts:

  • Rewrites @font-face url() to _remote_media/
  • Ignores url() references outside @font-face (e.g. background-image)
  • Preserves original URL on download failure
  • Deduplicates: same font URL in two @font-face blocks → 1 download
  • No-op when no @font-face present
  • Ignores local (non-HTTP) src URLs

Test plan

  • bun test packages/producer/src/services/htmlCompiler.test.ts passes (56/56)
  • Render the Beasty Style template locally — Komika Axis renders correctly instead of falling back to Arial

Remote font URLs in @font-face blocks fail with a CORS rejection when
the renderer fetches them from http://localhost:PORT (S3 does not echo
the local origin in Access-Control-Allow-Origin). Chrome falls back to
the next font in the stack (e.g. Arial), producing wrong typography.

localizeRemoteFontFaces() scans <style> blocks, extracts HTTP url()
references inside @font-face rules, downloads them in parallel into
_remote_media/, and rewrites the CSS url() references to local paths —
the same pattern as localizeRemoteMediaSources() for <video>/<audio>.

Background url() references outside @font-face blocks are intentionally
left untouched to avoid downloading arbitrary images.

The shared download+rewrite logic is extracted into downloadAndRewriteUrls()
to eliminate duplication between the two localize functions.

Reported via the Beasty Style caption template (Komika Axis .ttf from S3
falling back to Arial on every cloud render).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at 4191cd21. Diagnosis is right (S3 @font-face fails CORS from http://localhost:PORT); fix shape is right (parallel to localizeRemoteMediaSources); shared-helper extraction is good. The 6 tests cover the most important paths. One real security concern worth raising early — this PR opens a customer-controlled SSRF surface in the producer. A handful of smaller observations after.

Fix shape — solid

Verified the diagnosis end-to-end:

  • localizeRemoteMediaSources was the existing precedent (<video> / <audio> from hf#1146). This PR adds the symmetric localizeRemoteFontFaces for @font-face src: url(...).
  • The refactor extracts a shared downloadAndRewriteUrls helper that takes the URL-set, the directory, labels, and an optional extraRewrite callback. Clean factoring — the font path uses extraRewrite to handle the unquoted url(https://...) CSS form while the standard replaceAll handles double/single-quoted. Both URL forms are covered. ✓
  • Scoping to <style> blocks first, then @font-face { ... } blocks within, before extracting URLs is the right call — avoids matching url(...) in JavaScript string literals or background-image: declarations. The test "ignores url() references outside @font-face (e.g. background-image)" pins this. ✓
  • Process-level dedup via the shared downloadToTemp cache (downloadPathCache + inFlightDownloads in engine/src/utils/urlDownloader.ts) means subsequent renders in the same worker reuse downloads. Cold-start cost only on the first hit per URL. The "deduplicates: same font URL in two @font-face blocks → 1 download" test pins the per-call dedup; the process-cache layer is implicit but real.

Security — customer-controlled SSRF surface in the producer

The composition is customer-supplied (zip uploaded via the v3 API). This PR adds a new producer-side outbound-fetch surface driven by URLs the customer authors in their CSS. The current code:

const REMOTE_FONTFACE_URL_RE = /url\(["']?(https?:\/\/[^"')]+)["']?\)/gi;
// ...
const localPath = await downloadToTemp(url, remoteDir);
  • Accepts both http:// AND https:// — no scheme restriction.
  • No host validation169.254.169.254 (AWS metadata), 127.0.0.1, RFC1918 (10.x, 172.16-31.x, 192.168.x), and *.heygen.com are all valid targets.
  • No allowlist + no denylist.

A malicious composition could embed:

@font-face {
  font-family: "exfil";
  src: url("http://169.254.169.254/latest/meta-data/iam/security-credentials/<role>") format("truetype");
}

The producer fetches the URL, saves the bytes to _remote_media/, registers as an external asset. The file server then exposes _remote_media/<file> to localhost. A second @font-face in the same composition (or any JS in the composition that does fetch("/_remote_media/...") and embeds the bytes into a canvas/text element) can read the credentials back into the rendered video frame.

That's the full chain: customer-supplied composition → producer-side SSRF → bytes on disk → file-server-readable → exfil via rendered MP4 frame.

This is the same shape as the SSRF guard the EF API already applies to customer-supplied callback URLs (hyperframes_v3_dto.py:_validate_callback_url):

  • HTTPS-only
  • Explicit denylist on 169.254.169.254, localhost, 127., 0.0.0.0
  • The downstream webhook service adds DNS-resolution-based RFC1918 checks

Recommend mirroring that shape here. A minimal _validate_remote_font_url (HTTPS-only + the same hard-coded denylist + url_is_public if available) keeps the legitimate path (S3, Google Fonts, public CDNs) working while closing the SSRF surface. Same change applies to localizeRemoteMediaSources from hf#1146 — that one has the same gap. Could be one cleanup PR against downloadToTemp itself, since both callers go through it.

Severity grading: in pre-launch / internal-customer-only mode this is P1 "must address before GA." Post-launch with external API customers, this is P0. Either way it should land before the API surface accepts public traffic.

(I haven't built a working PoC. The chain depends on JS-in-composition being able to read /_remote_media/<file> and embed it into a rendered frame — Studio's file server allows that for the renderer's own loading, but I haven't verified arbitrary JS can read the same path. If the file server scopes reads to the renderer process only, the exfil path closes. Worth confirming before assigning final severity.)

Smaller observations

  • Test gap — format() preservation: the tests use src: url("...") format("truetype") but assertions check the URL rewrite, not that format("truetype") survives. Want a test like expect(result).toMatch(/format\("truetype"\)/) after the rewrite. Currently passes by accident because the rewrite only touches url(...) content, but a test would pin the contract.
  • Test gap — fallback chain: src: url("a.woff2") format("woff2"), url("a.ttf") format("truetype") is the standard fallback. Need a test that BOTH URLs get rewritten when both are remote. The dedup test covers same-URL-twice but not different-URLs-in-one-declaration. The current regex looks correct (the [^"')] character class means each url(...) matches independently), but no test pins it.
  • Test gap — unquoted url() form: the extraRewrite callback handles url(https://...) without quotes. None of the 6 tests use that form. Easy fix — one test that uses unquoted CSS.
  • Whitespace inside url(...): url( "https://..." ) (with leading/trailing whitespace inside the parens) is legal CSS but the regex captures only the URL while the replaceAll(\"${url}"`, ...)rewrite would miss because the matched substring in HTML is"https://..."noturl("https://..."). Looking at the helper closer, the standard double-quote rewrite anchors on the URL bytes (replaceAll(`"${url}"`, ...)) so whitespace outside the URL is fine. ✓ But it's worth checking the unquoted extraRewrite (replaceAll(`url(${url})`, ...)`) handles whitespace — looks like it doesn't. Edge case, not common in author-written CSS.
  • @import url(...) in CSS: the regex only scans inside @font-face { ... } blocks, so @import url(...) and other top-level url(...) references are correctly left alone. ✓ Sister case worth knowing: <link rel="stylesheet" href="..."> from a remote stylesheet that contains its own @font-face will NOT be localized because the producer doesn't follow the stylesheet. Out of scope for this PR (HTML-only scope, not transitive CSS).
  • Console.warn label refactor: the shared-helper extraction changed "Remote media download failed for ${url}""${warnLabel} ${url}" — the localizeRemoteMediaSources path now logs "Remote media download failed for" (no trailing word) and the font path logs "Remote font download failed for". Both read fine. ✓

DM context

PR body references the Beasty Style caption template + Komika Axis font (S3) as the specific repro. Home flagged a DM link that I can't read from this session. If there's mobile-template-specific context from Jake Moran in that DM (e.g. "this only happens on portrait 9:16 renders" or "only on Lambda, not local"), it isn't in the PR body. Worth confirming with Jake whether the repro is general (any remote @font-face) or mobile-specific. The fix shape (localize all remote @font-face regardless of context) is right for either case, but the dashboard signal Miguel was working from might tell us about a tighter pattern.

Severity walk

Without this fix:

  • Customer authors a composition with @font-face { src: url("https://s3.../font.ttf") }
  • Render proceeds; Chrome silently falls back to next font in stack (Arial)
  • User downloads MP4, sees wrong typography
  • User can't tell whether their composition is broken or the renderer is

Not destructive (the video renders + plays), but wrong-typography on a brand-specific composition is materially wrong output. Customer-impact-real for any template using non-system fonts via remote CDN, which is the common case for branded templates. P1.

Verdict

Materially LGTM on the fix mechanism + the refactor + the test coverage of the happy path. SSRF guard is the one thing I'd want before merge given this opens a customer-controlled outbound-fetch surface — same shape as the existing callback_url guard in the EF API, ~10 lines to add, can land as a downloadToTemp wrapper so both localizeRemoteMediaSources (from hf#1146) and localizeRemoteFontFaces (this PR) inherit the guard. The format-preservation + fallback-chain test gaps are non-blocking but cheap to fold in.

Stamp held — James gates.

— Rames Jusso (hyperframes)

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix mechanism is correct. downloadAndRewriteUrls extraction is clean — localizeRemoteMediaSources and localizeRemoteFontFaces share a single download+rewrite core with no duplication. Pipeline ordering is verified: collectExternalAssets.processPath explicitly passes through http:///https:// URLs untouched, so remote @font-face URLs survive to be scanned. Test coverage matches the localizeRemoteMediaSources baseline: rewrite, fail-preserve, dedup, local-skip cases all covered. CI is on main, required checks green. ✓

One blocker before merge.


Blocker — customer-controlled SSRF surface

Compositions are customer-supplied (zip via /v3 API). This PR adds downloadToTemp calls driven by URLs the customer authors in their @font-face CSS. There's currently no host validation at that fetch layer:

  • http:// is accepted (not HTTPS-only)
  • 169.254.169.254 (AWS IMDS), RFC1918 ranges, localhost, and internal hostnames are all reachable
  • No allowlist, no denylist

Plausible exfil chain: @font-face { src: url("http://169.254.169.254/latest/meta-data/iam/...") } → producer fetches + saves to _remote_media/ → file server exposes to localhost → composition JS reads /_remote_media/<hash> → bytes land in rendered MP4 frames. Whether the exfil completes depends on whether the file server scopes reads to the renderer process only — worth confirming — but the fetch itself happens unconditionally.

This is the same shape as hf#1146 (<video>/<audio> localize). Both share downloadToTemp. The fix belongs in downloadToTemp or in a wrapper, not per-callsite — a single guard would close both surfaces at once. Pattern already exists in hyperframes_v3_dto.py:_validate_callback_url (HTTPS-only + hard-coded denylist on 169.254/127.0/localhost/0.0.0.0 + url_is_public-equivalent check).

Suggested guard in downloadToTemp or at the call-site before it:

  1. HTTPS-only (reject http://)
  2. Resolve hostname and deny RFC1918 + link-local + loopback
  3. Or scope allowed prefixes to known CDN/asset origins

Nits (no blockers)

CSS-comment false-positive: fontFaceRe captures everything between { and }, including comments. A commented-out src: url("https://...") inside a @font-face block will be fetched and rewritten into a comment — unused but spurious download. Strip /* ... */ before the URL scan to fix.

remoteMediaAssets misnomer: fonts aren't media. The caller aliases it (remoteFontAssets) so the blast radius is low, but remoteAssets or downloadedAssets would be more accurate if this type ever becomes public.

Misleading block comment on REMOTE_FONTFACE_URL_RE: the comment describes the broader @font-face scoping strategy (handled by fontFaceRe), not what the constant itself does. The constant just matches any url(https?://...) anywhere.


The fix is correct and the shared-helper approach is the right architecture. The SSRF guard is the only thing needed before merge — and fixing it here also patches the identical gap in hf#1146.

— Vai

…dresses)

Customer-supplied compositions can author @font-face src URLs (and <video>/
<audio> src attrs via the existing localize path) that point to private
infrastructure. Without a guard, the producer's downloadToTemp would fetch
http://169.254.169.254/... (AWS IMDS), RFC1918, loopback, etc., save the
response to _remote_media/, and expose it via the local file server.

assertPublicHttpsUrl() rejects:
  - Non-HTTPS (http://) — all composition fetches must use HTTPS
  - 169.254.x (AWS link-local / IMDS)
  - 127.x / localhost / 0.x (loopback / unspecified)
  - 10.x, 172.16–172.31, 192.168.x (RFC1918)
  - [::1], [fc...], [fd...] (IPv6 loopback + unique-local)

The guard fires before the cache check so a blocked URL never gets into
the in-flight map. Applies to both the font-face localize path (PR #1155)
and the existing video/audio localize path (PR #1146) since both call
downloadToTemp.

Note: DNS-rebinding bypasses are not closed by this check (hostname
comparison only, no DNS resolution). Acceptable risk for current threat
model; server-side DNS validation can be layered on later.

12 unit tests covering all blocked ranges + the allowed edge cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSRF guard looks correct. fires at the top of before any cache or filesystem touch — the right placement. Coverage is solid: HTTPS-only enforcement, AWS IMDS (169.254.), loopback (127./localhost/[::1]), RFC1918 (10., 172.16–31, 192.168.), unspecified (0.*), IPv6 unique-local ([fc/fd]). The 172.16–31 range check via regex + integer comparison is the correct approach (prefix-only would over-block 172.32+). All 12 tests confirm the expected allow/block edges.

DNS-rebinding caveat is correctly documented in the code. Acceptable risk for this surface — server-side DNS pinning can layer on later if needed.

One very minor nit: the prefix would also block hostnames like (false positive). Not a real-world concern since no CDN uses such a hostname, but a comment noting the intent (unspecified address ) would make the case for it explicit.

This closes the blocker from my prior review. The original fix (font-face localization, shared-helper extraction, pipeline ordering) was already correct.

— Vai

miguel-heygen and others added 2 commits June 2, 2026 00:07
…uard

m[1] from RegExp.match() is typed string | undefined; parseInt requires string.
Use nullish coalescing to satisfy tsc without changing runtime behavior —
the regex guarantees m[1] is always defined when the match succeeds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bun:test is not available in CI — the engine package runs tests via vitest.
@miguel-heygen miguel-heygen merged commit b1b0378 into main Jun 2, 2026
40 checks passed
@miguel-heygen miguel-heygen deleted the fix-remote-fontface branch June 2, 2026 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants