fix(cli): validate stops misreporting slow-loading media as unreadable#1849
Conversation
Two independent post-release feedback reports of validate warning about audio duration despite an explicit, correct data-duration slot, one of them naming a timeout explicitly. Root cause: auditClipDurations reads each <video>/<audio> element's intrinsic .duration via a single page.evaluate() snapshot taken after a flat, unconditional page-settle sleep (opts.timeout ?? 3000ms, shared with other audits). Per the HTML spec, HTMLMediaElement.duration is NaN until metadata loads. A slow-loading audio file (large narration WAV, remote source) can still be mid-fetch when that sleep elapses — el.duration is NaN at that exact instant, which the audit permanently records as "could not read the duration" even though the render pipeline (which properly awaits media readiness) handles the same file fine. Fix: race each not-yet-ready element's loadedmetadata/error event against a deadline instead of taking one fixed-time snapshot. Elements already ready resolve immediately (no added latency in the common case); only genuinely slow elements get a real second chance before the warning fires. The race/cleanup wiring lives twice by necessity — once inline inside the page.evaluate() closure (Puppeteer serializes and re-runs that closure in an isolated browser realm with no access to this module), and once as the exported, duck-typed raceMediaReady for a real, deterministic unit test via Node's built-in EventTarget (no browser or DOM library needed). The comment on raceMediaReady flags that both copies must move together.
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Reviewed at 209c2897 (batch review, Group B CLI/renderer; layering on Magi's own vetting).
Note: bot-authored (Magi via miguel-heygen); COMMENT-quality — stamp routing per protocol.
Summary — Replaces the fixed-time snapshot of <video>/<audio> .duration with a per-element race between loadedmetadata/error and a deadline, so a slow-loading media element gets a real second chance before the audit warns it's unreadable. The race/cleanup logic is duplicated across the inline page.evaluate() closure and an exported raceMediaReady (unit-testable via Node's built-in EventTarget).
The mechanism is right. Failure path is preserved (deadline fires → onReady resolves → el.duration still NaN → existing "could not read the duration" warning still emits, so the "media IS unreadable" cohort still gets flagged correctly). Duck-typed shape (EventTarget & { duration: number }) is a nice touch — real deterministic unit tests without a headless-browser stub.
Concerns
🟡 Timeout budget doubles worst-case (validate.ts:356-362) — before this PR: page.goto → flat opts.timeout ?? 3000ms sleep → snapshot. After: page.goto → same flat 3s sleep → THEN up to another 3s per-element wait inside auditClipDurations. So the worst-case validate wallclock for a page with genuinely unreadable media has moved from ~3s to ~6s (both use opts.timeout ?? 3000). Not catastrophic — 6s validate is still fine — but opts.timeout now overloads two orthogonal concepts (page-settle vs media-probe deadline) that were previously conflated. Consider (a) a separate --media-probe-timeout knob so the flat sleep can be shortened once media-probe is deterministic, or (b) a docstring on auditClipDurations naming that the sleep + probe timeouts stack. Purely a followup, not blocking.
Nits
🟡 Two copies must stay in sync — a divergence has no test (validate.ts:79-104 vs 126-149) — the comment does call this out ("if you change one copy, change both"), and fallow-ignore-next-line code-duplication documents intent. But there's no assertion that the two closures actually behave the same on a shared input. If someone touches only the inline copy (e.g. adds canplay as a resolve trigger), the unit-tested raceMediaReady would silently pass while the production path drifts. Cheapest guard: extract the shape into a stringifiable helper (getRaceMediaReadyFnSource()) and inject it into page.evaluate via evaluate(new Function(src)(...)) — same duck-typed shape works. But this is likely overkill; the comment discipline is probably enough.
Questions
↩️ Interaction with page.on('error') listener at validate.ts top of validateInBrowser — the existing error path treats network 4xx as validation errors. If a media element's loadedmetadata never fires because the fetch itself 404'd, does the deadline-fallback path emit both the network error (from the response listener) AND the "could not read the duration" warning? Not necessarily wrong (two independent failure signals), but worth confirming that isn't a regression in test noise for the "media 404" cohort.
↩️ data-loop="true" case — the audit reads loop: el.loop || el.getAttribute("data-loop") === "true", and looped clips are exempted from the shorter-than-slot warning. But if a looped clip never loads (duration=NaN on the deadline), does the audit emit the "unreadable" warning even though loops are supposed to be lenient? Not touched by this PR, but with the probe path now giving real reads a fair chance, it's a good time to check the "still-unreadable, but looped" branch.
What I didn't verify
- Whether Puppeteer's browser-side
EventTargetonHTMLMediaElementcorrectly firesloadedmetadataeven when the element is added to the DOM before.srcis set (the exact lifecycle matters for the race — ifsrcis set later thanpage.evaluateruns,loadedmetadatamay have fired already and the listener misses it). Node EventTarget in tests can't validate this — needs a Puppeteer integration test. - Whether
data-loopclips with unknown-durations behave as intended in the deadline-fallback path (Question 2 above).
— Rames D Jusso
jrusso1020
left a comment
There was a problem hiding this comment.
APPROVE — verified CI green (0 fail / 0 pending) + no open CR at this head. Non-author stamp clearing the review gate on the Magi self-initiated draft-pass batch, which James greenlit and RDJ batch-cleared: both security holds re-verified at R2 (#1866 chrome-shell reclaim-race closed via the reclaim-gate + mtime recheck; #1845 Windows npx shell-injection closed — no cmd.exe, node <npx-cli.js> with pure argv), zero drift on the other nine, all green.
Merge via Magi's normal path (no admin-merge). Ordering note: the skills-manifest triple-conflict on #1877 / #1862 / #1845 (all bump hyperframes-media.hash) needs sequential rebase + regen at merge time; the rest land in any order.
— Rames Jusso
Root-caused from two independent post-release feedback reports of
validatewarning about audio duration despite an explicit, correctdata-durationslot — one naming a timeout explicitly.Root cause
auditClipDurationsreads each<video>/<audio>element's intrinsic.durationvia a singlepage.evaluate()snapshot, taken after a flat, unconditional page-settle sleep (opts.timeout ?? 3000ms, shared with the other validate audits). Per spec,HTMLMediaElement.durationisNaNuntil metadata loads. A slow-loading audio file (large narration WAV, remote source) can still be mid-fetch when that sleep elapses —el.durationisNaNat that exact instant, which the audit permanently records as "could not read the duration" even though the render pipeline (which properly awaits media readiness elsewhere) handles the same file fine.Fix
Race each not-yet-ready element's
loadedmetadata/errorevent against a deadline instead of taking one fixed-time snapshot. Elements already ready resolve immediately (no added latency in the common case); only genuinely slow elements get a real second chance before the warning fires.The race/cleanup wiring lives twice by necessity: once inline inside the
page.evaluate()closure (Puppeteer serializes and re-runs that closure in an isolated browser realm with no access to this module's scope, so it can't reference an outer helper), and once as the exported, duck-typedraceMediaReadyso the actual race/cleanup logic has a real, deterministic unit test via Node's built-inEventTarget— no browser or DOM library needed. A comment onraceMediaReadyflags that both copies must move together;fallow-ignore-next-line code-duplicationdocuments why the clone is intentional.Tests
4 new cases for
raceMediaReady(already-ready, resolves-on-event-before-deadline, resolves-on-error, falls-back-to-deadline). Fullpackages/clisuite (1117 tests) passes.