fix: batch GSAP timeline construction to prevent main-thread hang (#1231) by miguel-heygen · Pull Request #1249 · heygen-com/hyperframes

miguel-heygen · 2026-06-07T03:34:09Z

Problem

Compositions with thousands of GSAP timeline construction calls can block Chrome's main thread during HTML parsing. In the reported case, the page can sit at Initializing calibration session... because the event loop is starved before the render bridge and runtime can publish a stable ready state.

Closes #1231

What this fixes

This PR batches early GSAP timeline construction in the producer's injected head stub, then only lets render capture proceed once the runtime has rebound the completed timelines and published render readiness.

The latest fixes on top of the original batching change keep that batching compatible with render-time seeking and runtime child-timeline binding:

preserve virtual-time requestAnimationFrame while the early stub drains large construction queues
gate bridge duration on window.__renderReady instead of forcing render readiness from the bridge
keep render-time timeline controls like totalTime() synchronous after construction is complete
forward getChildren() through the lightweight timeline proxy after flushing queued construction calls, so runtime auto-nesting still sees child timelines

Root cause

GSAP applies each tl.to() / from() / fromTo() / set() call synchronously. Large compositions can execute thousands of those calls in one parser task, which delays browser lifecycle events and makes Puppeteer's readiness polling observe an incomplete runtime state.

The first batching pass solved the main-thread starvation, but it exposed two correctness edges:

render-time seeks were still being routed through the construction queue after enough timeline calls, causing late caption/timeline drift in regression renders
the lightweight timeline proxy did not expose getChildren(), so runtime root-child binding could miss child timelines after batching

Verification

Local checks

bun run --filter @hyperframes/producer build:hf-early-stub
bun test packages/producer/src/services/fileServer.test.ts — 26 tests passed
bunx oxlint packages/producer/stubs/hf-early-stub.ts packages/producer/src/generated/hf-early-stub-inline.ts packages/producer/src/services/fileServer.test.ts
bunx oxfmt --check packages/producer/stubs/hf-early-stub.ts packages/producer/src/generated/hf-early-stub-inline.ts packages/producer/src/services/fileServer.test.ts
bun run build
pre-commit hook passed lint, format, fallow, typecheck, and commitlint

Devbox regression replay

Ran the failed CI shard set on devbox against the final patch:

bun run --cwd packages/producer test style-7-prod style-8-prod style-10-prod css-spinner-render-compat webm-transparency mp4-h264-sdr webm-vp9 --sequential --keep-temp

Result: 6 active suites passed, 0 failed, 0 skipped. webm-transparency was excluded by the harness transparency tag, matching the CI shard behavior. The original blocker style-10-prod passed with 0 failed visual frames and audio passed.

Browser verification

Validated the rendered CLI smoke output in-browser through the agent-browser flow.

screenshot: /tmp/hf-pr-1249-proof/cli-smoke-render.png
recording: /tmp/hf-pr-1249-proof/cli-smoke-render.webm

CI

Live PR head: 2cfe1831ba6fc2130edaa1ea3d6f2d09a1e7eda4

All CI checks are green as of the latest run, including:

Build, Lint, Format, Typecheck, Test, Fallow audit
CLI smoke and global install smoke
CodeQL
Windows tests and Windows render verification
Player perf and preview regression
regression shards 1 through 8

Notes

Mintlify Deployment is skipped by the integration and is the only non-passing check state in the rollup. No composition HTML was changed.

) Compositions with thousands of tl.to() calls (e.g. 8,562 in the reported case) block Chrome's main thread synchronously during HTML parsing, preventing DOMContentLoaded from firing before Puppeteer's navigation timeout. This caused render jobs to hang indefinitely at 'Initializing calibration session...' with no error message. Root cause: GSAP's timeline API is synchronous — each tl.to() call registers a tween immediately on the main thread. A script with 8k+ calls holds the thread for seconds, starving the browser event loop and delaying DCL past the navigation timeout window. Fix: install a property trap on window.gsap in HF_EARLY_STUB (injected at the top of <head>, before GSAP or user scripts load). When GSAP assigns itself to window.gsap, the setter intercepts the real gsap object and wraps gsap.timeline() to return a proxy that queues tween descriptors (to/from/fromTo/set) instead of calling them synchronously. A requestAnimationFrame-based flush loop drains 100 tweens per frame, yielding the main thread between batches so DCL can fire. When the queue is drained, the stub sets window.__hfTimelinesBuilding = false and dispatches a 'hf-timelines-built' CustomEvent. init.ts checks this flag at DOMContentLoaded time; if building is still in progress it defers bindRootTimelineIfAvailable() until the event fires, then sets window.__renderReady = true as normal. pollHfReady continues to gate on both __renderReady and window.__hf.duration > 0, so the render pipeline does not start until the full timeline is bound. - Batch size: 100 tweens/rAF tick (empirical; ~4ms/batch at 8k scale) - Yield mechanism: requestAnimationFrame (cooperative, no setTimeout(0)) - Determinism: 'hf-timelines-built' event guarantees sequencing - Proxy forwards: pause/seek/totalTime/time/duration/add/paused/ timeScale/play delegate to the real timeline immediately - No GSAP package changes; no navigation timeout increase Fixes #1231

jrusso1020

Reviewed against the design checklist I sketched in the Slack thread. Architecture is sound; surfacing two correctness concerns that the PR's own description doesn't address + a couple of nice-to-haves. None look blocking for the customer-unblocking purpose of the PR; flagging them as follow-up scope.

Strengths

Property-trap-on-window.gsap is exactly the right interception layer. Captures every call regardless of whether the user code goes through gsap.timeline() directly or via UMD's window.gsap assignment. The fact that GSAP isn't loaded yet when the stub runs is handled correctly via the configurable getter/setter.
Explicit hf-timelines-built event + init.ts deferral of bindRootTimelineIfAvailable() is the correct determinism model. The renderer can't race ahead — pollHfReady gates on __renderReady which only flips after the event fires. ✓
Build pipeline mirrors @hyperframes/core's runtime-inline.ts pattern — compiled stub source in stubs/hf-early-stub.ts, esbuild → IIFE → generated TS module exporting getHfEarlyStub(). The previous 138-line inline JS string is gone, which is a maintainability win.
Defensive try/catch around defineProperty + CustomEvent — handles non-Chrome runtimes (tests, jsdom) gracefully.
Batch size = 100/tick, rAF yield matches the design recommendation. Empirical "~4ms/batch" claim is consistent with GSAP's tween-registration cost on modern V8.

Concerns

1. `proxy.add()` doesn't unwrap proxy children — passes proxy objects to `real.add()`

The HF runtime in packages/core/src/runtime/init.ts:574-580 does:

const compositeTimeline = gsapApi.timeline({ paused: true });  // PROXY
for (const candidate of candidates) {
  compositeTimeline.add(candidate.timeline, ...);  // candidate.timeline is also a PROXY
}

And similar at :590-593 (fallbackTimeline.add(existingRootTimeline, 0)) and :633 (rootTimeline.add(candidate.timeline, startSec)).

Because user composition scripts get proxies from gsap.timeline(), and the runtime gets proxies from gsapApi.timeline() too, every .add() call composes proxy-into-proxy. The proxy's add() forwards args to real.add(...args) without unwrapping arg.__hfReal, so GSAP's real timeline ends up holding proxy references in its internal tween-graph linked list (_first/_next/_prev).

Empirically this likely works in your 8,562-tween test (the proxy's __hfReal has its tweens by the time bindRootTimelineIfAvailable fires, and proxy.duration() / proxy.seek() / proxy.totalTime() all forward correctly). But GSAP's internal iteration paths (e.g. getChildren(), internal label resolution, time-mapping) may misbehave on a proxy that lacks _dp / _first / _recent linkage in the way GSAP expects.

Suggested fix (one-liner in wrapTimeline()'s add method):

add(...args: unknown[]): TimelineProxy {
  const unwrapped = args.map((a) =>
    a && typeof a === "object" && "__hfReal" in (a as object)
      ? (a as TimelineProxy).__hfReal
      : a,
  );
  real.add(...unwrapped);
  return proxy;
},

This routes the real child timeline into GSAP's tween graph. Caller still gets proxy back for chaining.

If your test composition doesn't exercise the multi-sub-comp path (__timelines-registered children composed via init.ts:574), this bug stays dormant. Worth verifying explicitly before merging — pick a composition with 2+ sub-comps and check getChildren() returns sensible objects.

2. Value-returning setter methods leak the real timeline out of the proxy chain

totalTime(...args: unknown[]): unknown {
  return real.totalTime(...args);
},

GSAP's tl.totalTime(5) (setter form) returns this (the timeline) for chaining; tl.totalTime() (getter form) returns a number. The proxy unconditionally returns whatever GSAP returns. So a caller chaining tl.totalTime(5).to(el, {...}) gets the real timeline from .totalTime(5), then .to(...) runs synchronously against real — bypassing the batching.

Same applies to time(), paused(), timeScale(). The pattern is: when called with args (setter form), return proxy; when called without args (getter form), return the value.

totalTime(...args: unknown[]): unknown {
  const result = real.totalTime(...args);
  return args.length > 0 ? proxy : result;
},

Lower-impact than #1 (most callers don't chain past these), but architecturally cleaner.

Nice-to-haves (not blockers)

No unit tests for the batching logic. The proxy's chain semantics, the rAF flush loop, and the hf-timelines-built event dispatch all have correctness traps (above). One Vitest covering "queue 200 tweens, advance rAF, verify all bound + event fires" would lock in the contract.
No telemetry hook for tweenCount + initDurationMs. This investigation thread (PR description references #1231) hit a diagnostic wall because PostHog render_error events lack tween-count visibility. Even a single console.log({ event: "hf_timeline_batching_done", tweenCount, initDurationMs }) line at the queue-drained moment would let us spot future pathological compositions before they file issues.
kill() doesn't remove the proxy from activeProxies: small memory growth across long sessions. Probably never matters in render-per-process mode; matters for the studio preview path if a session creates+kills many timelines.

Render-mode latency math

For the 8,562-tween case: 86 batches × ~16ms rAF gaps = ~1.4s additional render init time vs. the "do it all synchronously" baseline. For normal compositions (<100 tweens, single batch), the cost is one rAF gap ≈ 16ms. Acceptable trade for going from infinite hang → working render on the pathological case + negligible overhead on the common case.

Verdict

Sound architecture, correctly addresses the customer issue, and the build/inject pipeline is well-engineered. The two correctness concerns above (proxy-unwrap + setter-chain) are latent risks that may not bite in the test composition but could bite multi-sub-comp compositions. If concern #1 is verified non-issue (or fixed), I'd say merge. If it does trip a real composition, it's a 5-line fix in wrapTimeline().

Posting as COMMENT so this doesn't block the customer-unblocking merge. Happy to follow up with the fix as a separate PR if you'd like.

— Rames Jusso

…args.length Addresses two latent correctness concerns from code review: 1. proxy.add() now unwraps __hfReal from any proxy child before passing it to the real timeline. GSAP's internal tween graph (_first/_next/_prev linkage) requires real timeline instances — proxy objects lack internal fields like _dp that GSAP's iteration paths expect. 2. totalTime/time/paused/timeScale now return proxy when called in setter form (args.length > 0). Previously these returned the real timeline, causing callers who chain .to(...) after a setter call to bypass batching. Also: build-hf-early-stub.ts now runs oxfmt on the generated output file so the format check passes in CI on every build.

miguel-heygen · 2026-06-07T04:13:36Z

Code review response

Both correctness concerns from the review have been addressed inline (commit de813adc):

1. proxy.add() now unwraps proxy children before passing them to the real timeline. Any argument that carries __hfReal is unwrapped so GSAP's internal tween graph holds real timeline references — proxy objects missing _dp etc. would have caused GSAP's internal iteration paths to misbehave with multi-sub-comp compositions.

2. Setter-form methods (totalTime/time/paused/timeScale) now return proxy when called with arguments (args.length > 0). Previously they returned the real timeline, leaking callers out of the batching chain if they chained .to(...) after a setter call (e.g. tl.totalTime(5).to(...)). Getter form (no args) still returns the raw value.

On the nice-to-haves:

Unit tests for batching logic — agreed, would be a solid follow-up. The stub runs in a browser context (depends on window, requestAnimationFrame, CustomEvent) so it needs a jsdom/happy-dom harness; out of scope for this fix PR but worth a dedicated issue.
Telemetry hook (tweenCount + initDurationMs) — also a good follow-up. Would close the diagnostic gap and give us early warning before a customer hits the hang again.
kill() not removing from activeProxies — acknowledged. The memory growth is bounded (proxies are short-lived per render), but cleaning up is cleaner. Tagged as follow-up.

Render-mode math matches — 8,562 / 100 × 16ms ≈ 1.4s is acceptable for the pathological case; normal compositions don't see it.

+// Format the generated file so `oxfmt --check` passes in CI.
+// Errors are intentionally swallowed — oxfmt unavailable in some envs.
+try {
+  execSync(`bunx oxfmt ${outPath}`, { stdio: "ignore" });


The HF_BRIDGE_SCRIPT duration getter now returns 0 whenever window.__hfTimelinesBuilding is true (set by HF_EARLY_STUB while the rAF batch loop is draining queued tl.to() calls). pollHfReady in the engine polls until window.__hf.duration > 0, so returning 0 keeps the engine waiting until the hf-timelines-built event fires and all tweens are committed to the real GSAP timelines. Without this gate, normal compositions (style-6, style-13, vignelli) were being captured mid-batch — the real timelines were empty so GSAP could not seek them, producing frozen/blank frames in the output video.

miguel-heygen · 2026-06-07T05:59:28Z

Regression fix pushed (commit 1b3e1a4b)

The prior push introduced visual failures on style-6-prod, style-13-prod, and vignelli-stacking — all three rendered blank/frozen frames during sections that should show GSAP-animated content.

Root cause: HF_BRIDGE_SCRIPT's __hf.duration getter was returning the real timeline's duration (via p.getDuration()) even while __hfTimelinesBuilding was true. The engine's pollHfReady condition is window.__hf.duration > 0, so it immediately passed — but the real GSAP timelines were still empty mid-batch. Frame capture started against empty timelines → animations frozen.

Fix: gate the duration getter to return 0 while window.__hfTimelinesBuilding is true. This keeps pollHfReady spinning until hf-timelines-built fires and all tweens are committed to the real GSAP timelines. One-line change in HF_BRIDGE_SCRIPT.

Note: gsap-letters-render-compat was already passing in CI — the fix is confirmed not to break that test, and the pollHfReady wait overhead for normal compositions (< 100 tweens) is a single rAF frame (~16ms).

miguel-heygen force-pushed the fix/gsap-tween-count-hang branch from b20c5eb to 0d725fd Compare June 7, 2026 03:59

sarichan777 mentioned this pull request Jun 7, 2026

v0.6.74 regression: CLI render hangs on "Initializing calibration session..." (macOS M4) (edge case) #1231

Closed

style: apply oxfmt formatting to producer stub files

63debf4

jrusso1020 reviewed Jun 7, 2026

View reviewed changes

github-advanced-security AI found potential problems Jun 7, 2026

View reviewed changes

Comment thread packages/producer/scripts/build-hf-early-stub.ts

// Format the generated file so `oxfmt --check` passes in CI.

// Errors are intentionally swallowed — oxfmt unavailable in some envs.

try {

execSync(`bunx oxfmt ${outPath}`, { stdio: "ignore" });

miguel-heygen added 3 commits June 7, 2026 02:20

fix(producer): flush GSAP batching under virtual time

78fb936

fix(producer): gate render bridge on runtime readiness

758ad7b

fix(producer): preserve timeline child binding under batching

2cfe183

miguel-heygen merged commit ebd156b into main Jun 7, 2026
63 checks passed

miguel-heygen deleted the fix/gsap-tween-count-hang branch June 7, 2026 13:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: batch GSAP timeline construction to prevent main-thread hang (#1231)#1249

fix: batch GSAP timeline construction to prevent main-thread hang (#1231)#1249
miguel-heygen merged 7 commits into
mainfrom
fix/gsap-tween-count-hang

miguel-heygen commented Jun 7, 2026 •

edited

Loading

Uh oh!

jrusso1020 left a comment

Uh oh!

miguel-heygen commented Jun 7, 2026

Uh oh!

miguel-heygen commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

miguel-heygen commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What this fixes

Root cause

Verification

Local checks

Devbox regression replay

Browser verification

CI

Notes

Uh oh!

jrusso1020 left a comment

Choose a reason for hiding this comment

Strengths

Concerns

1. proxy.add() doesn't unwrap proxy children — passes proxy objects to real.add()

2. Value-returning setter methods leak the real timeline out of the proxy chain

Nice-to-haves (not blockers)

Render-mode latency math

Verdict

Uh oh!

miguel-heygen commented Jun 7, 2026

Code review response

Uh oh!

miguel-heygen commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

miguel-heygen commented Jun 7, 2026 •

edited

Loading

1. `proxy.add()` doesn't unwrap proxy children — passes proxy objects to `real.add()`