fix: batch GSAP timeline construction to prevent main-thread hang (#1231)#1249
Conversation
) Compositions with thousands of tl.to() calls (e.g. 8,562 in the reported case) block Chrome's main thread synchronously during HTML parsing, preventing DOMContentLoaded from firing before Puppeteer's navigation timeout. This caused render jobs to hang indefinitely at 'Initializing calibration session...' with no error message. Root cause: GSAP's timeline API is synchronous — each tl.to() call registers a tween immediately on the main thread. A script with 8k+ calls holds the thread for seconds, starving the browser event loop and delaying DCL past the navigation timeout window. Fix: install a property trap on window.gsap in HF_EARLY_STUB (injected at the top of <head>, before GSAP or user scripts load). When GSAP assigns itself to window.gsap, the setter intercepts the real gsap object and wraps gsap.timeline() to return a proxy that queues tween descriptors (to/from/fromTo/set) instead of calling them synchronously. A requestAnimationFrame-based flush loop drains 100 tweens per frame, yielding the main thread between batches so DCL can fire. When the queue is drained, the stub sets window.__hfTimelinesBuilding = false and dispatches a 'hf-timelines-built' CustomEvent. init.ts checks this flag at DOMContentLoaded time; if building is still in progress it defers bindRootTimelineIfAvailable() until the event fires, then sets window.__renderReady = true as normal. pollHfReady continues to gate on both __renderReady and window.__hf.duration > 0, so the render pipeline does not start until the full timeline is bound. - Batch size: 100 tweens/rAF tick (empirical; ~4ms/batch at 8k scale) - Yield mechanism: requestAnimationFrame (cooperative, no setTimeout(0)) - Determinism: 'hf-timelines-built' event guarantees sequencing - Proxy forwards: pause/seek/totalTime/time/duration/add/paused/ timeScale/play delegate to the real timeline immediately - No GSAP package changes; no navigation timeout increase Fixes #1231
b20c5eb to
0d725fd
Compare
jrusso1020
left a comment
There was a problem hiding this comment.
Reviewed against the design checklist I sketched in the Slack thread. Architecture is sound; surfacing two correctness concerns that the PR's own description doesn't address + a couple of nice-to-haves. None look blocking for the customer-unblocking purpose of the PR; flagging them as follow-up scope.
Strengths
- Property-trap-on-
window.gsapis exactly the right interception layer. Captures every call regardless of whether the user code goes throughgsap.timeline()directly or via UMD'swindow.gsapassignment. The fact that GSAP isn't loaded yet when the stub runs is handled correctly via the configurable getter/setter. - Explicit
hf-timelines-builtevent +init.tsdeferral ofbindRootTimelineIfAvailable()is the correct determinism model. The renderer can't race ahead —pollHfReadygates on__renderReadywhich only flips after the event fires. ✓ - Build pipeline mirrors
@hyperframes/core'sruntime-inline.tspattern — compiled stub source instubs/hf-early-stub.ts, esbuild → IIFE → generated TS module exportinggetHfEarlyStub(). The previous 138-line inline JS string is gone, which is a maintainability win. - Defensive
try/catcharounddefineProperty+CustomEvent— handles non-Chrome runtimes (tests, jsdom) gracefully. - Batch size = 100/tick, rAF yield matches the design recommendation. Empirical "~4ms/batch" claim is consistent with GSAP's tween-registration cost on modern V8.
Concerns
1. proxy.add() doesn't unwrap proxy children — passes proxy objects to real.add()
The HF runtime in packages/core/src/runtime/init.ts:574-580 does:
const compositeTimeline = gsapApi.timeline({ paused: true }); // PROXY
for (const candidate of candidates) {
compositeTimeline.add(candidate.timeline, ...); // candidate.timeline is also a PROXY
}And similar at :590-593 (fallbackTimeline.add(existingRootTimeline, 0)) and :633 (rootTimeline.add(candidate.timeline, startSec)).
Because user composition scripts get proxies from gsap.timeline(), and the runtime gets proxies from gsapApi.timeline() too, every .add() call composes proxy-into-proxy. The proxy's add() forwards args to real.add(...args) without unwrapping arg.__hfReal, so GSAP's real timeline ends up holding proxy references in its internal tween-graph linked list (_first/_next/_prev).
Empirically this likely works in your 8,562-tween test (the proxy's __hfReal has its tweens by the time bindRootTimelineIfAvailable fires, and proxy.duration() / proxy.seek() / proxy.totalTime() all forward correctly). But GSAP's internal iteration paths (e.g. getChildren(), internal label resolution, time-mapping) may misbehave on a proxy that lacks _dp / _first / _recent linkage in the way GSAP expects.
Suggested fix (one-liner in wrapTimeline()'s add method):
add(...args: unknown[]): TimelineProxy {
const unwrapped = args.map((a) =>
a && typeof a === "object" && "__hfReal" in (a as object)
? (a as TimelineProxy).__hfReal
: a,
);
real.add(...unwrapped);
return proxy;
},This routes the real child timeline into GSAP's tween graph. Caller still gets proxy back for chaining.
If your test composition doesn't exercise the multi-sub-comp path (__timelines-registered children composed via init.ts:574), this bug stays dormant. Worth verifying explicitly before merging — pick a composition with 2+ sub-comps and check getChildren() returns sensible objects.
2. Value-returning setter methods leak the real timeline out of the proxy chain
totalTime(...args: unknown[]): unknown {
return real.totalTime(...args);
},GSAP's tl.totalTime(5) (setter form) returns this (the timeline) for chaining; tl.totalTime() (getter form) returns a number. The proxy unconditionally returns whatever GSAP returns. So a caller chaining tl.totalTime(5).to(el, {...}) gets the real timeline from .totalTime(5), then .to(...) runs synchronously against real — bypassing the batching.
Same applies to time(), paused(), timeScale(). The pattern is: when called with args (setter form), return proxy; when called without args (getter form), return the value.
totalTime(...args: unknown[]): unknown {
const result = real.totalTime(...args);
return args.length > 0 ? proxy : result;
},Lower-impact than #1 (most callers don't chain past these), but architecturally cleaner.
Nice-to-haves (not blockers)
- No unit tests for the batching logic. The proxy's chain semantics, the rAF flush loop, and the
hf-timelines-builtevent dispatch all have correctness traps (above). One Vitest covering "queue 200 tweens, advance rAF, verify all bound + event fires" would lock in the contract. - No telemetry hook for
tweenCount+initDurationMs. This investigation thread (PR description references #1231) hit a diagnostic wall because PostHogrender_errorevents lack tween-count visibility. Even a singleconsole.log({ event: "hf_timeline_batching_done", tweenCount, initDurationMs })line at the queue-drained moment would let us spot future pathological compositions before they file issues. kill()doesn't remove the proxy fromactiveProxies: small memory growth across long sessions. Probably never matters in render-per-process mode; matters for the studio preview path if a session creates+kills many timelines.
Render-mode latency math
For the 8,562-tween case: 86 batches × ~16ms rAF gaps = ~1.4s additional render init time vs. the "do it all synchronously" baseline. For normal compositions (<100 tweens, single batch), the cost is one rAF gap ≈ 16ms. Acceptable trade for going from infinite hang → working render on the pathological case + negligible overhead on the common case.
Verdict
Sound architecture, correctly addresses the customer issue, and the build/inject pipeline is well-engineered. The two correctness concerns above (proxy-unwrap + setter-chain) are latent risks that may not bite in the test composition but could bite multi-sub-comp compositions. If concern #1 is verified non-issue (or fixed), I'd say merge. If it does trip a real composition, it's a 5-line fix in wrapTimeline().
Posting as COMMENT so this doesn't block the customer-unblocking merge. Happy to follow up with the fix as a separate PR if you'd like.
— Rames Jusso
…args.length Addresses two latent correctness concerns from code review: 1. proxy.add() now unwraps __hfReal from any proxy child before passing it to the real timeline. GSAP's internal tween graph (_first/_next/_prev linkage) requires real timeline instances — proxy objects lack internal fields like _dp that GSAP's iteration paths expect. 2. totalTime/time/paused/timeScale now return proxy when called in setter form (args.length > 0). Previously these returned the real timeline, causing callers who chain .to(...) after a setter call to bypass batching. Also: build-hf-early-stub.ts now runs oxfmt on the generated output file so the format check passes in CI on every build.
Code review responseBoth correctness concerns from the review have been addressed inline (commit 1. 2. Setter-form methods ( On the nice-to-haves:
Render-mode math matches — 8,562 / 100 × 16ms ≈ 1.4s is acceptable for the pathological case; normal compositions don't see it. |
| // Format the generated file so `oxfmt --check` passes in CI. | ||
| // Errors are intentionally swallowed — oxfmt unavailable in some envs. | ||
| try { | ||
| execSync(`bunx oxfmt ${outPath}`, { stdio: "ignore" }); |
The HF_BRIDGE_SCRIPT duration getter now returns 0 whenever window.__hfTimelinesBuilding is true (set by HF_EARLY_STUB while the rAF batch loop is draining queued tl.to() calls). pollHfReady in the engine polls until window.__hf.duration > 0, so returning 0 keeps the engine waiting until the hf-timelines-built event fires and all tweens are committed to the real GSAP timelines. Without this gate, normal compositions (style-6, style-13, vignelli) were being captured mid-batch — the real timelines were empty so GSAP could not seek them, producing frozen/blank frames in the output video.
|
Regression fix pushed (commit The prior push introduced visual failures on Root cause: Fix: gate the duration getter to return Note: |
Problem
Compositions with thousands of GSAP timeline construction calls can block Chrome's main thread during HTML parsing. In the reported case, the page can sit at
Initializing calibration session...because the event loop is starved before the render bridge and runtime can publish a stable ready state.Closes #1231
What this fixes
This PR batches early GSAP timeline construction in the producer's injected head stub, then only lets render capture proceed once the runtime has rebound the completed timelines and published render readiness.
The latest fixes on top of the original batching change keep that batching compatible with render-time seeking and runtime child-timeline binding:
requestAnimationFramewhile the early stub drains large construction queueswindow.__renderReadyinstead of forcing render readiness from the bridgetotalTime()synchronous after construction is completegetChildren()through the lightweight timeline proxy after flushing queued construction calls, so runtime auto-nesting still sees child timelinesRoot cause
GSAP applies each
tl.to()/from()/fromTo()/set()call synchronously. Large compositions can execute thousands of those calls in one parser task, which delays browser lifecycle events and makes Puppeteer's readiness polling observe an incomplete runtime state.The first batching pass solved the main-thread starvation, but it exposed two correctness edges:
getChildren(), so runtime root-child binding could miss child timelines after batchingVerification
Local checks
bun run --filter @hyperframes/producer build:hf-early-stubbun test packages/producer/src/services/fileServer.test.ts— 26 tests passedbunx oxlint packages/producer/stubs/hf-early-stub.ts packages/producer/src/generated/hf-early-stub-inline.ts packages/producer/src/services/fileServer.test.tsbunx oxfmt --check packages/producer/stubs/hf-early-stub.ts packages/producer/src/generated/hf-early-stub-inline.ts packages/producer/src/services/fileServer.test.tsbun run buildDevbox regression replay
Ran the failed CI shard set on devbox against the final patch:
bun run --cwd packages/producer test style-7-prod style-8-prod style-10-prod css-spinner-render-compat webm-transparency mp4-h264-sdr webm-vp9 --sequential --keep-tempResult: 6 active suites passed, 0 failed, 0 skipped.
webm-transparencywas excluded by the harness transparency tag, matching the CI shard behavior. The original blockerstyle-10-prodpassed with 0 failed visual frames and audio passed.Browser verification
Validated the rendered CLI smoke output in-browser through the agent-browser flow.
/tmp/hf-pr-1249-proof/cli-smoke-render.png/tmp/hf-pr-1249-proof/cli-smoke-render.webmCI
Live PR head:
2cfe1831ba6fc2130edaa1ea3d6f2d09a1e7eda4All CI checks are green as of the latest run, including:
Notes
Mintlify Deployment is skipped by the integration and is the only non-passing check state in the rollup. No composition HTML was changed.