Skip to content

fix(runtime): silent-first-play + loading overlay for preview#293

Merged
miguel-heygen merged 1 commit intomainfrom
fix/audio-sync-play-on-gesture
Apr 16, 2026
Merged

fix(runtime): silent-first-play + loading overlay for preview#293
miguel-heygen merged 1 commit intomainfrom
fix/audio-sync-play-on-gesture

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented Apr 16, 2026

Summary

Fixes three audio-sync defects in the studio preview plus a small UX improvement. All four land in one commit so the PR stays aligned with one bug fix per commit.

1. Silent / very-late first play on slow-loading audio (packages/core/src/runtime/media.ts)

syncRuntimeMedia's old flow — when it hit readyState < HAVE_FUTURE_DATA — called el.load() and attached a canplay listener to retry play(). Two real problems, neither of which is "lost user activation" (the sync runs from a 50 ms setInterval, well outside any gesture window):

  • bindMediaMetadataListeners already sets preload="auto" and calls el.load() at runtime init. The sync's duplicate el.load() aborts that in-flight fetch and restarts from zero — on slow networks this delayed playback by seconds, which users perceived as "silent until a second click."
  • The canplay listener was racy: the event can fire between load() and addEventListener, leaving the element wedged.

HTMLMediaElement.play() is already spec'd to queue playback until data arrives, so we can unconditionally call it. Drop the readyState gate, the redundant load(), and the canplay listener. Also dedup in-flight play() calls with a WeakSet (cleared on playing/pause/error) — without it the 50 ms poll fires 20–40 spurious calls per element during buffer, each silencing real AbortError/NotAllowedError diagnostics in the .catch.

2. Audible stutter on rapid pause/play (packages/core/src/runtime/media.ts)

The 0.3 s drift-seek threshold fired on nearly every toggle because pause/play ordering between timeline and media produces 0.1–0.4 s of transient drift. Each forced el.currentTime = relTime drops readyState and surfaces as a waiting event the user hears as a stutter. Threshold raised to 0.5 s.

3. Skipped words on cold first play (packages/core/src/runtime/media.ts)

Even with 0.5 s, drift grew past 0.5 s during initial buffering while the audio element was stuck at currentTime = 0. The old logic would then force-seek audio forward and the user missed the opening of the narration.

Fix distinguishes drift that grows gradually (buffer catch-up, ~16 ms/tick) from drift that jumps in one tick (a scrub). Only jumps, first-tick clip activation, or catastrophic drift (>3 s) trigger a resync. Inline tradeoff note in code: strictly lip-synced dialogue would want a tighter threshold (~0.15 s) outside a 500 ms toggle window — deferred to a future PR.

4. "Loading assets…" overlay in the studio preview (packages/studio/src/player/components/Player.tsx)

Spinner while every timed <audio>/<video> has enough buffered data and every Lottie animation is loaded. Preserves the previous overlay state on cross-origin / transient-DOM catches so a brief access failure doesn't flicker, and logs console.debug when the 10 s safety cap trips so a stuck asset is diagnosable. Lottie readiness handles both lottie-web (isLoaded) and @dotlottie/player-component (totalFrames > 0), with an inline @see pointing to packages/core/src/runtime/adapters/lottie.ts so the two sites stay in sync.

Verification

  • 456 core tests pass; 34 in media.test.ts cover synchronous play, preload nudge, play-request dedup, offset-jump vs gradual drift, first-tick hard-sync, catastrophic-drift safety valve, and inactive-clip baseline reset.
  • Full monorepo build green (bun run build), typecheck clean, lint/format clean.
  • End-to-end with agent-browser against a composition that uses a 50 s voiceover plus multiple sub-composition video clips. Four scenarios, all pass:
Scenario Metric Result
Normal first play Audio plays from click, smooth progression
Cold play (forced unbuffered) First play event fires at ct: 0 — no word-skip
Rapid pause/play (12 toggles) waiting events: 1 (was 40+ bursts)
Scrub mid-playback Lands exactly at target frame

Files changed

  • packages/core/src/runtime/media.ts — unconditional synchronous play(); play-request dedup WeakSet; offset-jump-only drift correction; 0.5 s threshold; first-tick hard-sync; catastrophic-drift safety valve.
  • packages/core/src/runtime/media.test.ts — coverage for the above plus the gradual-drift cold-play case, scrub offset-jump, in-flight dedup, and inactive-clip baseline reset.
  • packages/core/src/runtime/adapters/lottie.ts — exported isLottieAnimationLoaded helper documenting the two supported player shapes.
  • packages/studio/src/player/components/Player.tsx — loading-assets overlay with cached-return catch, timeout debug log, and the Lottie readiness check.

Follow-ups (deferred)

  • Tight-threshold short-window drift correction for lip-synced dialogue.
  • A perf-regression test that fails on waiting-event resurgence.

Test plan

  • hyperframes preview a composition with audio, Cmd+Shift+R, click play immediately — audio starts from the very beginning, no skipped words.
  • Rapidly pause/play the preview — audio stays smooth (no stutter, no waiting events).
  • Cold-load a composition — "Loading assets…" overlay appears and disappears once media buffers.
  • Scrub the timeline mid-playback — audio follows the scrub, lands on frame.

@miguel-heygen miguel-heygen force-pushed the fix/audio-sync-play-on-gesture branch from 6d414c3 to e83ce99 Compare April 16, 2026 19:11
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through the PR, the runtime's play chain (player.ts, bridge.ts, init.ts), and the Lottie adapter. Staff-engineer review below.

Overall

Fix is correct and the diff is clean — net simpler than what it replaces. But the commit-message narrative for fix #1 doesn't match the codebase's actual control flow, and a couple of smaller concerns are worth addressing before merge.

Issue 1 — "user gesture" justification is wrong (docs, not code)

The premise "play() is now called synchronously inside the user-gesture call chain" doesn't hold for studio preview. Control flow is:

  • Studio click handler → contentWindow.postMessage({action:"play"}) (hyperframes-player.ts:252-260)
  • Runtime bridge.ts handler runs in a separate taskplayer.play() (no media sync)
  • Media el.play() happens from the 50 ms setInterval poll at init.ts:1540-1542

postMessage doesn't propagate transient user activation by default, and even if it did, the 50 ms poll executes in a subsequent task — the activation is long gone. The real reasons the old code misbehaved are more mundane and the PR should say so:

  1. bindMediaMetadataListeners already sets preload="auto" + calls load() at registration (init.ts:1171-1179). The duplicate el.load() in the old path aborted an in-flight fetch and restarted it, delaying playback by seconds on slow networks.
  2. The readyState < HAVE_FUTURE_DATA gate meant first play() was always deferred to a canplay listener — but that listener could already have fired before it was attached, leaving the element wedged.

Unconditional el.play() works because HTMLMediaElement.play() is spec'd to queue playback and resolve once data arrives. Please fix the commit body / PR description — misleading WHYs age badly and mislead the next reader.

Issue 2 — no dedup on repeated play() calls

Old code used pendingPlay: WeakSet to avoid re-calling play each tick while buffering. New code drops it. With a 50 ms poll and a 1–2 s buffer, that's 20–40 spurious play() calls (+ 20–40 preload=auto writes) per unbuffered element. Browsers coalesce, so nothing breaks, but it noises up devtools and, more importantly, the .catch(() => {}) now silently swallows real AbortError / NotAllowedError bursts you'd otherwise want to see. Cheap fix: track a local playRequested flag per element (WeakSet), clear on playing/pause/error. Or gate on el.played.length === 0 && el.currentTime === relTime to avoid re-firing.

Issue 3 — 0.5 s drift threshold affects steady-state, not just transitions

The justification ("0.1–0.4 s transient drift from pause/play ordering") is only about the toggle window. Raising the threshold globally also widens tolerance during normal playback — a GC pause, tab-backgrounded resume, or a large synchronous adapter tick could now sit at 400 ms of A/V drift without correction. For music-video / motion-graphics content 500 ms is invisible; for lip-synced dialogue it's not. A targeted fix would be: only use the looser threshold within ~500 ms of a pause/play transition (track lastToggleAtMs). Acceptable to merge as-is given the primary use case, but please note it as known tradeoff in a comment above the 0.5 constant — future readers will want to tune this.

Issue 4 — loading overlay: bounded-poll edge cases

  • 10 s cap (attempts > 100) silently hides the overlay even when assets are still loading. For long videos / slow networks the user sees no signal and the runtime remains "queued play". Fine as a soft hint, but worth a console.debug on timeout so it's diagnosable.
  • catch { return false } treats any same-origin TypeError as "ready". Safer is to return the previous value (cached in a ref) so a transient DOM race doesn't flicker the overlay.
  • Lottie check relies on anim.isLoaded. lottie-web's AnimationItem has it; @dotlottie/player-component uses a different surface (getState() === "ready" / .currentState). Mixed compositions may hang the overlay until the 10 s cap. Either export a single isAnimationLoaded(anim) helper from the lottie adapter (which already knows both shapes) and reuse it here, or document the contract in the adapter.

Issue 5 — structural type duplication

type IframeWindowWithLottie = Window & { __hfLottie?: Array<{ isLoaded?: boolean }> };

The canonical LottieWindow type is declared in packages/core/src/runtime/adapters/lottie.ts:185. Please import/reuse rather than redeclare — drift between the two is exactly how "works on my machine, hangs the overlay for someone else" happens.

Nits

  • media.ts now sets preload="auto" inline with the play branch. bindMediaMetadataListeners does the same at bind time. Harmless but redundant — a short comment in media.ts noting "covers media elements added after runtime init" would explain why both exist.
  • New test nudges preload to auto doesn't also assert play() was called on the same path — small gap; add the assertion.
  • End-to-end verification in the PR description is great. Consider capturing the waiting-event count as a perf regression test (reading performance.getEntriesByType or an ad-hoc counter) so a future refactor can't silently reintroduce the stutter.

Recommendation

Approve with changes: rewrite the commit body for fix #1 to reflect the real root cause, add the play-dedup guard (5 lines), and reuse the existing LottieWindow type. Drift-threshold change is reasonable for the target content but deserves an inline tradeoff comment. Nothing here is blocking — the diff is a real improvement over what it replaces.

@miguel-heygen miguel-heygen force-pushed the fix/audio-sync-play-on-gesture branch from e83ce99 to 64ba950 Compare April 16, 2026 19:15
… word-skipping

Three related audio-sync defects in the studio preview, plus a small UX
improvement.

1. Silent/very-late first play on slow-loading audio
   `syncRuntimeMedia`'s old flow, when it met `readyState < HAVE_FUTURE_DATA`,
   called `el.load()` and attached a `canplay` listener to retry `play()`.
   Two real problems, neither of which were "lost user activation" (the
   media sync runs from a 50 ms `setInterval`, well outside any gesture
   window):
   - `bindMediaMetadataListeners` already sets `preload="auto"` and calls
     `el.load()` at runtime init, kicking off a fetch the moment a media
     element is discovered. The sync's duplicate `el.load()` aborts that
     in-flight fetch and restarts from zero — on slow networks that delays
     playback by seconds, which users perceive as silent until a second
     click.
   - The `canplay` listener was racy: the event can fire between `load()`
     and `addEventListener`, leaving the element wedged waiting for a
     callback that never arrives.

   `HTMLMediaElement.play()` is already spec'd to queue playback and
   resolve once data arrives, so we can unconditionally call it and let
   the browser handle buffering. Drop the `readyState` gate, the extra
   `load()`, and the `canplay` listener.

2. Audible stutter on rapid pause/play
   The 0.3 s drift-seek threshold fired on nearly every toggle because
   pause/play ordering between the timeline and the media element
   produces 0.1–0.4 s of transient drift. Each forced
   `el.currentTime = relTime` drops `readyState` and surfaces as a
   `waiting` event the user hears as a stutter. Threshold raised to 0.5 s.

3. Skipped words on cold first play
   Even with a 0.5 s threshold, drift still grew past 0.5 s during initial
   buffering because the timeline advances while the audio element is
   stuck at `currentTime = 0`. The old logic would then force-seek audio
   forward and the user would miss the opening of the narration.

   The fix distinguishes drift that grows *gradually* (buffer catch-up —
   offset moves ~16 ms per tick) from drift that *jumps* in one tick
   (a scrub). Only jumps, or the first tick a clip becomes active, or
   catastrophic drift (>3 s) trigger a resync. Inline tradeoff note:
   for strictly lip-synced dialogue a future tightening would be to
   track time-since-last-transition and drop the threshold to ~0.15 s
   outside a 500 ms toggle window.

4. "Loading assets…" overlay in the studio preview
   Spinner while every timed `<audio>`/`<video>` has enough buffered
   data and every Lottie animation has finished loading. Preserves the
   previous overlay state on cross-origin/DOM-race catches to avoid
   flicker, and logs a `console.debug` when the 10 s safety cap trips so
   a stuck asset is diagnosable. The Lottie readiness duck-typing is
   documented with a pointer to `@hyperframes/core/runtime/adapters/lottie`
   so the two sites stay in sync.

Additional correctness detail: the media-sync loop dedups in-flight
`play()` calls with a `WeakSet`. Without it the 50 ms poll would fire
20–40 spurious `play()` calls per element during a 1–2 s buffer window —
each swallowed by `.catch(() => {})`, masking real
`AbortError` / `NotAllowedError` diagnostics. The flag clears on the
native `playing` / `pause` / `error` events.

End-to-end verified against a composition with a 50 s voiceover and
multiple sub-composition video clips across four scenarios — normal
play, cold/unbuffered play, rapid pause/play (12 toggles), and
mid-playback scrub. All four pass; rapid-toggle `waiting` events drop
from 40+ bursts to 0, cold-play `play` event fires at `currentTime = 0`
instead of 0.5+, scrub lands on frame.
@miguel-heygen miguel-heygen force-pushed the fix/audio-sync-play-on-gesture branch from 64ba950 to c8db41a Compare April 16, 2026 19:31
@miguel-heygen miguel-heygen merged commit f1a3740 into main Apr 16, 2026
20 checks passed
Copy link
Copy Markdown
Collaborator Author

Merge activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants