fix(engine): fast-fail on zero duration instead of 45s timeout#1186
Conversation
When a composition's runtime finishes initializing but reports zero duration (no GSAP timeline and no data-duration attribute), the engine previously polled for the full 45-second timeout before failing. Now, after 10 seconds of polling, the engine checks whether the runtime has finished (window.__renderReady === true) with a working seek function but zero duration. If so, it fails immediately with a diagnostic message explaining what's wrong and how to fix it. This also improves the generic timeout error message to include runtime state (whether __player exists, __hf.seek, GSAP timelines, declared duration) so users can self-diagnose. PostHog data: 555-1,234 occurrences/day, each wasting 45s of user time.
jrusso1020
left a comment
There was a problem hiding this comment.
Review — Fast-fail on permanent zero-duration with diagnostic
Good shape. Replaces two near-identical inline poll loops with a single pollHfReady helper that fast-fails after 10s when the runtime has announced it's ready with a permanently zero duration. The diagnostic message is the strongest part — it tells the user exactly what attribute is missing and how to fix it.
What I verified
- Fast-fail guard is correctly narrow. The 10s early-exit fires only when
diag.renderReady && diag.hasSeek && diag.duration === 0. That triplet is the "the runtime finished initializing and is reporting zero" signal — not a "still loading, give it more time" signal. Legitimate slow-init compositions (large asset preload, deferred font loading, Three.js bootstrap) still get the fullplayerReadyTimeoutbudget because they don't haverenderReady=trueyet at 10s. ✓ - No regression on legitimate low-duration compositions.
duration > 0always passes the ready expression (window.__hf.duration > 0). Onlyduration === 0triggers the diagnostic path. A 1s composition withdata-duration="1"still resolves cleanly. ✓ - Diagnostic message is actionable. The hint chain ("
window.__playerwas never set" → "No GSAP timeline registered" → "Adddata-durationto root") walks the user from generic failure to specific fix. The PostHog data (555-1234 occurrences/day, #1 render error) suggests this is hitting the no-data-durationmistake; the diagnostic literally tells them which attribute to add. ✓ - Helper consolidation eliminates duplicate poll code. The previous BeginFrame branch (lines 766-779 in the old file) had a hand-rolled poll loop that subtly differed from the non-BeginFrame path. Both now route through
pollHfReady, so future contract changes touch one place. ✓ - Error wrapping in BeginFrame path is correct. The
try { await pollHfReady(...) } catch (err) { warmupState.running = false; throw err; }block (lines 766-770) preserves thewarmupState.running = falsecleanup that the old inline code did before throwing. ✓
Tiny ask, optional
- The diagnostic JS at
HF_READY_DIAGNOSTIC_EXPRreaches intowindow.__player,window.__renderReady,window.__timelines,window.__hf, anddocument.querySelector("[data-composition-id]"). Worth a comment block above that constant documenting which of those are contract (load-bearing for the runtime) vs diagnostic (read for error-message detail but not actually required to render). The next person debugging an unrelated startup issue will read this expression and wonder which fields are intentional handshake points. - The
FAST_FAIL_AFTER_MS = 10_000could live alongsideDEFAULT_CONFIGas a named config entry rather than a function-local const, so an integrator could lower it for tests or raise it for slow-CI environments. Non-blocking; the hardcoded 10s is reasonable for now.
No blockers. Defensive, low-risk, observability-data-driven. Ship.
Review by Jerrai (hyperframes specialist)
vanceingalls
left a comment
There was a problem hiding this comment.
Review
Good fix for a real, high-volume problem — the PostHog numbers and the actionable error message both land well. A few things need attention before merge.
Important: __renderReady is not a strong enough invariant to guarantee zero-duration is permanent
The fast-fail condition (renderReady && hasSeek && duration === 0) assumes that once the runtime finishes initializing, if __hf.duration is still 0 it will never become non-zero. That isn't quite right in BeginFrame mode.
__renderReady = true is set in init.ts (line 1544) before state.transportRafId = window.requestAnimationFrame(transportTick) (line 2026). The transport tick — which calls bindRootTimelineIfAvailable() and then clock.setDuration(dur) — runs on rAF. In BeginFrame mode, rAF only fires when the warmup loop issues frames. At the moment __renderReady is first seen as true inside the 10s polling window, it's possible the transport tick hasn't yet had a chance to bind the GSAP timeline and update the clock duration.
In practice this is unlikely to bite — the warmup loop runs at ~30fps so by 10s the tick has had ~300 opportunities — but the invariant is stated more strongly in the diagnostic message ("The runtime finished initializing but reported zero duration — this is permanent") than the code can actually guarantee. The correct guard is "renderReady AND hasSeek AND hasTimeline=false AND declaredDuration <= 0", not the current form. That would fast-fail only when there is provably nothing to wire up, rather than relying on a temporal assumption about the transport tick.
Alternatively, add a short stabilization delay (e.g., 500ms) after first seeing renderReady=true before declaring the zero-duration permanent — this gives the transport tick time to run.
Important: evaluateHfDiagnostic runs on every 100ms tick after 10s
After the 10-second mark, the loop calls evaluateHfDiagnostic(page) on every iteration before sleeping. For a composition that genuinely takes 11–44 seconds to initialize (e.g., heavy sub-composition with slow CDN, high-complexity GSAP timeline), this adds ~350 extra page.evaluate calls (2 per 100ms tick × 35s). That's cheap individually, but at 555–1234 failures/day multiplied across renders this is non-trivial CDP overhead.
The simple fix: only re-check the diagnostic every ~1s after the 10s mark (track a lastDiagCheckAt timestamp), or short-circuit once diag.renderReady is confirmed false the first time (if the runtime still isn't ready after 10s, it won't be ready in another 100ms — wait longer before re-querying).
Nit: elapsed calculation is unnecessarily indirect
const elapsed = timeoutMs - (deadline - Date.now());deadline was captured as Date.now() + timeoutMs, so this simplifies to Date.now() - startTime. Capture const startedAt = Date.now() before the loop and use that directly — the current form requires the reader to reverse-engineer the arithmetic.
Nit: hint ordering produces contradictory output
buildZeroDurationDiagnostic can emit both:
"window.__player was never set — the HyperFrames runtime did not initialize.""The runtime finished initializing but reported zero duration — this is permanent."
...in the same diagnostic if hasPlayer=false && hasSeek=true && duration===0 && renderReady=true. These two hints directly contradict each other. Gate the "runtime finished initializing" hint behind diag.hasPlayer being true.
Nit: test coverage gap
buildZeroDurationDiagnostic is a pure function with six boolean combinations driving four distinct hint paths, and it's fixing the #1 render error by volume. There's no test coverage for it. The PR's test plan has two unchecked boxes. pollHfReady's fast-fail logic is also not unit-tested. The engine package has a pattern for this (frameCapture-namePolyfill.test.ts). These don't need to be full Puppeteer tests — the diagnostic builder is pure, and the fast-fail logic can be exercised with a mock page.
Minor: fallow suppression scope
The fallow-ignore-file complexity at the top suppresses complexity checks for the entire 800+ line file. The fallow-ignore-next-line complexity on buildZeroDurationDiagnostic would have been sufficient to cover the new code — the file-level suppression is broader than needed.
CI
All required checks pass (CI: Build, Lint, Typecheck, Test, CLI smoke, Format; CodeQL; Windows verification). Regression shards are in-progress at review time — worth confirming they complete clean.
— Vai
miguel-heygen
left a comment
There was a problem hiding this comment.
Root cause confirmed: DEFAULT_CONFIG.playerReadyTimeout is 45s, both polling paths in frameCapture.ts wait for window.__hf.duration > 0, and with no GSAP timeline + no data-duration attribute both getDuration() and getDeclaredDuration() return 0 permanently. The __renderReady flag fires unconditionally after bindRootTimelineIfAvailable() regardless of timeline found — so renderReady=true, hasSeek=true, duration=0 is a stable terminal state.
On Vai's rAF timing concern: worth clarifying. The fast-fail only triggers after 10s in the poll loop, not synchronously. By 10s, any legitimate composition would have had its timeline registered and duration set via rAF ticks — even BeginFrame mode would have processed many frames by then. A composition still at duration=0 after 10s is almost certainly broken, not warming up.
One edge case: GSAP loaded from an external CDN (unusual in render envs but theoretically possible) could still be in-flight at 10s, triggering the fast-fail with a misleading 'add data-duration' message. Rare enough not to block.
The CDP round-trip frequency for evaluateHfDiagnostic is worth a follow-up throttle — 350 calls on a failed render is noisy. Not a merge blocker.
LGTM.
…-fail Two nit fixes in pollHfReady: 1. Throttle evaluateHfDiagnostic calls to once per ~1000ms after the 10s mark. Previously called on every 100ms loop tick, generating ~350 CDP round-trips per failed render. One check per second is sufficient to detect a permanently-zero composition. 2. Change fast-fail condition from 'duration === 0' to '!hasTimeline && declaredDuration <= 0'. A composition with a GSAP timeline but no data-duration attribute should not be fast-failed — GSAP sets duration synchronously before __renderReady via __timelines, so a non-empty __timelines is a reliable signal that duration will eventually be non-zero. Only compositions with NEITHER a GSAP timeline NOR a declared duration are permanently zero.
jrusso1020
left a comment
There was a problem hiding this comment.
Re-review — commit 0739a91df0
Both Vai's importants addressed. ✓
Verified:
- Diagnostic throttled to ~1s cadence via
DIAGNOSTIC_INTERVAL_MS = 1_000+lastDiagnosticAtgate. For a 45splayerReadyTimeout, that drops the failed-render diagnostic load from ~350 CDP roundtrips to ~35. ✓ - Fast-fail guard tightened from
diag.duration === 0to!diag.hasTimeline && diag.declaredDuration <= 0. This is the stronger "permanent zero" signal — a GSAP composition withoutdata-durationis still valid (GSAP drives duration via__timelines), so the old guard would have false-positived on it. ✓ - The comment block above the new guard explains the rationale ("GSAP sets duration synchronously before
__renderReady") — matches what I traced ininit.ts:1645-1654. ✓ - Final-timeout path (after
whileloop exits) still uses the olddiag.hasSeek && diag.duration === 0condition for the rich error message — that's fine; by that point we're past the timeout and just selecting between rich vs. generic error text. ✓
Ready from my side, pending stamp authorization.
Review by Jerrai (hyperframes specialist)
Summary
pollHfReadythat fast-fails after 10 seconds when it detects the runtime has finished but duration is permanently zeroProblem
When a composition has no GSAP timeline and no
data-durationattribute,window.__hf.durationstays at 0. The engine polled forduration > 0for the full 45-second timeout before giving up with a generic error message. Every occurrence wasted 45 seconds of wall-clock time with no actionable feedback.Reproduction
Root cause
The readiness condition
window.__hf.duration > 0has five failure modes:data-duration(CSS-only, WAAPI, Lottie compositions)DOMContentLoadednever fires in BeginFrame mode (blocking scripts)window.__playerAll five produce the same symptom: duration stays at 0 indefinitely.
Fix
After 10 seconds of polling, the engine evaluates a diagnostic expression that checks
window.__renderReady,window.__player,window.__timelines, anddata-duration. If the runtime has finished initializing (renderReady=true) but duration is still zero, it fails immediately with a diagnostic message.Test plan
bun run build)add data-duration="<seconds>")data-durationrender correctly