fix(engine): byte-budget the frame data uri cache to bound memory at 4k#662
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
vanceingalls
left a comment
There was a problem hiding this comment.
Verdict: request changes. The two-bound LRU is the right fix, and the math holds up at 4K. But there are two real issues that should land before this ships, plus a missing observability hook that I'd push hard on for a memory-budget knob in prod.
Blockers
-
blocker —
packages/engine/src/services/videoFrameInjector.ts:62-78(and the call site at line 124-130) — The 64 MB minimum onframeDataUriCacheBytesLimitMbis not sufficient to guarantee a single 4K frame is cacheable. A 4K PNG with a busy/noisy frame can easily exceed 33 MB raw, and base64 inflation is 4/3 + ~22-byte prefix = up to ~45-50 MB per data URI in the upper tail. With the 64 MB floor and a single 50 MB data URI, the post-insert eviction loop runswhile (totalBytes > bytesLimit && cache.size > 0)and evicts the entry it just inserted. The caller still gets the data URI through the returned promise, but every subsequentget()re-reads the file and re-base64s it — turning the cache into a CPU hot path under exactly the conditions it was designed to handle.Fix one of: (a) when inserting an entry that exceeds the budget alone, log + skip caching it (
if (dataUri.length > bytesLimit) return dataUribeforecache.set), or (b) raise the floor to a value that fits a worst-case 4K PNG (256 MB feels safer). I'd take (a) — caching a single oversized entry that immediately evicts itself is pure overhead. Either way, add a test that pins the behavior atbytesLimit = 32 * 1024with a 64 KB entry (current code: entry is inserted, then evicted,stats().entries === 0).#665's JSDoc acknowledges this edge case ("the post-insert eviction loop will drop the entry we just inserted") but neither PR fixes it. Acknowledging a footgun in a comment is not the same as guarding against it.
Important
-
important —
packages/engine/src/services/videoFrameInjector.ts:76— No observability on eviction. For a memory budget that the team is going to tune in prod (PRODUCER_FRAME_DATA_URI_CACHE_BYTES_MB), there's no way from logs/metrics to tell whether the budget is too tight (lots of evictions → cache thrash) or too loose (memory pressure but evictions never fire). Add a counter — at minimum log every Nth eviction with{ totalBytes, entries, evictedKey }at debug, or expose astats()-derived eviction count so renderOrchestrator can include it inRenderPerfSummary. Without this, the next "4K renders are slow" investigation has no telemetry to anchor on. -
important —
packages/engine/src/services/videoFrameInjector.ts:74-78—while (cache.size > entryLimit || totalBytes > bytesLimit)is correct, butevictOldest()at line 64-69 readscache.get(oldestKey)?.length ?? 0to decrement — this is the post-#665 shape. In this PR (#662) it's still using the parallelsizesMap. The parallel Map is fine (correct, simple) but #665 also needs to land cleanly on top — and #665's "drop the parallel Map" change relies on the invariant thatcache.get(key).lengthreturns the byte size we accounted for at insertion. That's true for raw strings, but if anyone ever wraps the value (e.g. caches aBufferor an object), the simplification silently breaks. Worth a load-bearing comment in this PR'sremember()that says "value MUST be a string whose.lengthequals the bytes-accounted-at-insert" so the assumption survives the next refactor. -
important —
packages/engine/src/services/videoFrameInjector.test.ts:36-46— The "evicts oldest entry when entry count exceeds limit" test assertsentries === 2after inserting 3 entries with limit 2, but doesn't assert which entry was evicted. The contract is "oldest" (i.e. LRU on insert order, sinceMap.keys().next()returns insertion order). Add: insert a, b, c with limit 2 → callcache.get(a)afterwards → that should be a miss (re-read from disk produces a fresh data URI). Today the test passes even if the cache evicted the wrong entry. -
important — Memory math sanity check (good news, mostly):
- 4K PNG raw ≈ 25-33 MB (matches PR table)
- Base64 data URI ≈ 33-44 MB per frame
- Default 1500 MB ÷ 33 MB = ~45 frames cached at 4K = 1.5s @ 30fps. For sequential render this is fine; for any code path that does seek-back / look-ahead across more than ~1.5s, the cache is effectively useless at 4K. Question for James: have you traced whether anything in the renderOrchestrator does non-sequential frame access? A perf regression at 4K from cache thrash isn't visible from
wc -lof the diff. - 1080p with 256 entries × 8 MB = 2 GB old behavior, but 1500 MB budget caps it at ~187 entries (~6s @ 30fps). This is a behavior regression for 1080p — the PR description acknowledges it ("actually slightly tighter, by design") but it should be tested. Add a 1080p regression test that proves
entries < 256at the default budget. If anyone tunes the budget down further, 1080p users want to know what the steady-state cache size is.
Nits
-
nit —
packages/engine/src/config.ts:80-87— JSDoc says "1080p with ~6 MB per JPEG frame" but the data URI math elsewhere uses PNG (25 MB raw → 33 MB encoded). Pick one and stick with it; the 1080p case is JPEG (needsAlpha ? "png" : "jpeg"in renderOrchestrator), the 4K case can be either. JSDoc could just say "data URI size scales with frame size" and skip the per-format numbers. -
nit —
packages/engine/src/services/videoFrameInjector.test.ts:57— Comment says "1 KB raw frame → ~1.4 KB base64" — the multiplier is 4/3 ≈ 1.33×, not 1.4×. Tiny precision; fine to leave. -
nit —
packages/engine/src/services/videoFrameInjector.ts:117—bytesLimitMb * 1024 * 1024— fine but1024 ** 2reads slightly cleaner.
Cross-PR
This PR composes correctly with #663 — the orchestrator's deviceScaleFactor path is what makes 4K data URIs actually appear, and this is the cache that holds them. Fine to land independently but the blocker should be fixed first.
Praise
The two-bound LRU is the right shape; the byte budget is the one that actually matters for memory safety. Per-entry size tracking, eviction-on-insert (vs. eviction-on-overflow next-call), and reset-on-rewrite (if (cache.has) totalBytes -= prev) are all correct. Test that the cache short-circuits via frameSrcResolver without polluting the cache is a nice touch.
— Vai
miguel-heygen
left a comment
There was a problem hiding this comment.
Approving this layer. I did not find a blocker in the byte-budgeted frame data URI cache diff.
What I checked:
- the cache now has both an entry-count bound and byte-budget bound
- replacing a key subtracts the previous cached string length before re-inserting
- the config option is wired through producer -> engine frame injection
- the top-of-stack focused verification that included this code passed after bootstrapping generated runtime artifacts (
videoFrameInjector.test.ts, producer render-orchestrator focused tests, core/studio route tests, CLI docker args/init tests)
Non-blocking follow-ups I would still consider useful:
- add
PRODUCER_FRAME_DATA_URI_CACHE_BYTES_MBcoverage inconfig.test.tsfor default/env/min clamp - assert the specific oldest entry is evicted, not just that the cache has fewer entries after byte-budget pressure
Note: the stack still has blockers in lower/higher layers, and this PR currently still shows an aggregate CHANGES_REQUESTED state from existing review state. My approval is for this PR diff itself.
I rechecked the live head before posting: 03fd975a4a8b879e40333be95e5ec1d7dea37504.
vanceingalls
left a comment
There was a problem hiding this comment.
Retracting my prior REQUEST_CHANGES on this PR — Magi's analysis is correct.
I re-traced createFrameSourceCache after Magi's pushback. The critical flow when a single frame exceeds bytesLimit:
remember()inserts the entry (cache.set+sizes.set+totalBytes += size).- The
while ((cache.size > entryLimit || totalBytes > bytesLimit) && cache.size > 0) { evictOldest() }loop runs until either the cache fits or empties. - If the just-inserted entry alone exceeds
bytesLimit, the loop evicts everything including the new entry — butremember()thenreturn dataUri, so the caller still gets the data URI.
The outcome on a 4K frame larger than the configured budget is cache thrash (re-read + re-encode on every get), not OOM or render failure. That's a perf concern, not a correctness blocker.
Magi also correctly pointed out that the default budget is 1500 MB, not 64 MB — 64 MB is the minimum floor that only fires when someone explicitly sets PRODUCER_FRAME_DATA_URI_CACHE_BYTES_MB to a low value via env var. At default, 4K frames (~33 MB encoded) cache comfortably (~45 frames at default).
Updated verdict: comment. The if (dataUri.length > bytesLimit) return dataUri skip-cache guard I suggested would still be a small win (saves the Map insert + evict cycle on each 4K frame call when an env-var override sets the budget too low), but it's a follow-up nit, not a blocker. My earlier characterization that this would OOM or break rendering was wrong — it degrades to "cache miss every time," which is the bug-tier I should have called originally.
Two of my prior important items still stand on their own merits:
- Eviction observability — a Datadog counter or log line on eviction fires gives you "is the budget right?" data from prod, especially valuable for users who set env-var overrides.
- 1080p regression test — pin the tighter behavior so future tweaks don't quietly regress the cache hit rate at 1080p.
Apologies for the noise of the retraction. Better than letting a stale REQUEST_CHANGES sit on top of clean-degradation code.
— Vai
dbbef86 to
cb0abed
Compare
512491a to
70842d6
Compare
vanceingalls
left a comment
There was a problem hiding this comment.
Re-review — APPROVE
Verdict shift: COMMENT (downgraded from REQUEST_CHANGES) → APPROVE.
Reference: prior review #662 (review) (REQUEST_CHANGES, retracted at #662 (review)).
Addressed since prior review (commit 70842d6b)
- Important — eviction observability:
packages/engine/src/services/videoFrameInjector.ts:25-32—FrameSourceCacheStatsnow exposesevictionsandoversizedRejectionscounters viacache.stats(). Comments explain how to interpret each. - Important — 1080p regression test:
packages/engine/src/services/videoFrameInjector.test.ts:131—at the production default (1500 MB), 1080p frames stay cachedpins that the byte-budget default keeps ≥6s of 1080p look-ahead at 30fps. A future tweak toframeDataUriCacheBytesLimitMbwill surface here. - Bonus — oversized-rejection self-test:
videoFrameInjector.test.ts:101—skips caching an entry that alone exceeds the byte budget (no self-eviction)directly tests the "graceful degradation" code path Magi defended. Verifies the rejected entry doesn't pollute internal state and a subsequent normal entry caches cleanly.
Still open
None of the prior items.
New findings (nit, follow-up)
- Stats are exposed but not consumed:
cache.stats()is wired through theFrameSourceCacheinterface and validated by tests, but thecreateVideoFrameInjectorcall site (videoFrameInjector.ts:158) doesn't surface the cache handle to the producer — meaning eviction counters can't be read by the orchestrator's perf summary or logged at job end. The observability hooks exist; they just aren't yet plugged into the producer's logging. Not blocking — easy follow-up: hoist the cache out ofcreateVideoFrameInjector(or exposegetStatson the returned hook) soexecuteRenderJobcan log{evictions, oversizedRejections}after the render completes.
— Vai
## What Adds `landscape-4k` (3840×2160) and `portrait-4k` (2160×3840) presets to `CanvasResolution` and `CANVAS_DIMENSIONS` in `@hyperframes/core`. Foundation for end-to-end 4K rendering support. ## Why There is no codified way today to mark a composition as 4K. The string union `CanvasResolution = "landscape" | "portrait"` is the only enum used by templates, generators, and stage-zoom math; without 4K members, scaffolds and helpers always emit 1080p dimensions even when the underlying engine + encoder pipeline can already handle larger viewports (Chrome `setViewport`, ffmpeg `libx264`/`libvpx-vp9`/`prores_ks` all scale fine). This is PR 1 of a 3-PR stack making 4K a first-class option: 1. **PR #660 (this)** — core types/constants + parser detection. 2. **PR #661** — `hyperframes init --resolution 4k` flag to scaffold 4K projects. 3. **PR #662** — byte-budget the frame data-URI cache so 4K renders don't OOM. ## How - `CanvasResolution` extended additively (no breaking change). - `CANVAS_DIMENSIONS` gains `landscape-4k` / `portrait-4k` entries. - `parseResolutionFromHtml` accepts `data-resolution="landscape-4k|portrait-4k"` and infers 4K from `data-composition-width`/`data-composition-height` (long side ≥ 2560 → UHD variant). - `parseResolutionFromCss` reuses the same dimension-aware classifier so inline `#stage { width/height }` styles are detected too. - Helper extracted: `resolveResolutionFromDimensions(w, h)`. - `cli/info.ts` cleanup: replaces a nested `parsed.resolution === "portrait" ? 1080 : 1920` ternary with a `CANVAS_DIMENSIONS[parsed.resolution]` lookup so it stays correct for 4K. Stage-CSS generators (`templates/base.ts`, `generators/hyperframes.ts`) already index `CANVAS_DIMENSIONS[resolution]`, so they pick up the new presets automatically. ## Test plan - [x] Unit tests added/updated — 4 new parser tests, 1 expanded constants test - [x] Manual testing performed — `bun run --cwd packages/core test` (679 pass), `bun run --cwd packages/cli test` (277 pass) - [ ] Documentation updated (deferred to PR #661 where the user-facing flag lands)
bced546 to
9f0074e
Compare
70842d6 to
8355b39
Compare

What
Replaces the entry-count-only LRU in
videoFrameInjectorwith a two-bound LRU that evicts on either entry count OR byte budget. AddsframeDataUriCacheBytesLimitMbconfig (default 1500 MB) plus aPRODUCER_FRAME_DATA_URI_CACHE_BYTES_MBenv var.Why
Today the cache evicts purely on entry count (
frameDataUriCacheLimit: 256). The base64 data URI of each cached frame scales with the source frame size:At 4K with a multi-worker render, the cache alone OOMs commodity boxes long before the entry cap fires. With supersampling or HDR PNG frames it gets worse. The byte budget keeps the steady-state memory bounded regardless of resolution while leaving 1080p behavior unchanged (256 × 8 MB ≈ 2 GB ≤ 1.5 GB budget — actually slightly tighter, by design).
This is PR 3 of the 4K stack. Stacked on #661.
How
EngineConfig.frameDataUriCacheBytesLimitMb: numberadded topackages/engine/src/config.ts. Default1500. Min clamp64. Env overridePRODUCER_FRAME_DATA_URI_CACHE_BYTES_MB.createFrameSourceCache(entryLimit, bytesLimit, frameSrcResolver?)now tracks per-entry byte size in a parallelMap. After eachremember(), evicts oldest entry whilecache.size > entryLimit || totalBytes > bytesLimit. Both bounds are upper bounds — eviction is greedy until both are satisfied.createVideoFrameInjectorplumbs the new config field through. Producer'srenderOrchestratorpassescfg.frameDataUriCacheBytesLimitMbto the injector.__testingexport exposescreateFrameSourceCachefor unit testing without spinning up Chrome.Test plan
videoFrameInjector.test.ts:bytes <= limitinvariant)frameSrcResolvershort-circuit doesn't pollute the cache; falls through to file read when resolver returns nullbun run --cwd packages/engine test— 541/541 passbunx vitest run packages/producer/src/services/renderOrchestrator.test.ts— 42/43 pass; the 1 failing test (rejects a maliciously crafted key…) also fails on cleanmainand is unrelated to this changeEngineConfig; no user-facing CLI flag added in this PR