refactor(producer): extract extractVideosStage + add materializeSymlinks param#725
Conversation
…nks param Move the video frame extraction sub-stage out of `executeRenderJob` into `services/render/stages/extractVideosStage.ts`. The stage covers HDR color-space pre-detection for videos and images, the `extractAllVideoFrames` call, frame-lookup-table construction, video readiness skip-id collection, video metadata hints, and the auto-detect of audio tracks from video files. Hard constraints preserved verbatim: - `composition.audios` is still mutated in place to add audio entries auto-discovered from video files via ffprobe. - `perfStages.videoExtractMs` is set at the same end-of-stage point. - `materializeExtractedFramesForCompiledDir` is still called once when `extractionResult.extracted` is non-empty. - `force-sdr` mode still skips ALL ffprobe overhead. New for distributed mode (`materializeSymlinks: boolean`, default false): - Plumbs through to `materializeExtractedFramesForCompiledDir` via a new option of the same name. When `true`, the helper invokes `cpSync(recursive)` instead of `symlinkSync` so the staged frames are real files inside `compiledDir`. Symlinks don't survive S3 / GCS round-trips, so distributed `plan()` will pass `true` once it lands. Default `false` preserves the in-process renderer's symlink behavior. - New unit test covers the copy path and asserts `symlinkSync` is NOT invoked; the existing symlink test was updated with the parallel guard that `cpSync` is NOT invoked. Removes the imports the orchestrator no longer needs after the extraction: `extractAllVideoFrames`, `resolveProjectRelativeSrc`, `createFrameLookupTable`, `FrameLookupTable`, `detectTransfer`, `isHdrColorSpace`, `extractMediaMetadata`, `VideoColorSpace` (oxlint flagged each). Verified: - `bunx oxlint` + `bunx oxfmt --check` clean - `bun run --filter @hyperframes/producer typecheck` + `build` clean - `bun test packages/producer/src/services/` — 176 pass, 1 pre-existing unrelated failure - `docker run hyperframes-producer:test` against `font-variant-numeric`, `many-cuts`, `variables-prod`, `sub-composition-video` — 4/4 PASS with correlations 1.000 / 0.994 / 0.947 / 0.975 (sub-composition-video is the one that exercises video frame extraction end-to-end) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vanceingalls
left a comment
There was a problem hiding this comment.
Verdict: approve.
PR 1.4 of the stack. Pure extraction of the video-extract sub-stage plus an additive materializeSymlinks param for distributed mode.
Verified the extraction is byte-clean against renderOrchestrator.ts pre-#725:
- Probe ordering preserved: video probe → image probe →
extractAllVideoFrames→materializeExtractedFramesForCompiledDir→ frame lookup → readiness skip IDs → metadata hints →composition.audiosmutation. All inside the sameif (composition.videos.length > 0)guard. composition.audiosmutation preserved verbatim — the auto-discovered audio entries still mutate the orchestrator-owned object. (This is the existing mutation that mirrors thecfg.forceScreenshotsmell flagged on the foundation stack; not new debt.)force-sdrshort-circuit on both probe loops preserved.perfStages.videoExtractMssemantics preserved: the stage'sstage2Startcovers the full window (probe loops + extract + materialize), same as pre-extraction. Both the videos-present and videos-absent branches now collapse to a singleDate.now() - stage2Startat the end of the stage, which matches the pre-#725 behavior exactly (the oldelsebranch did the same thing).- 11 destructured return fields line up with the 11 locals the downstream HDR auto-detection block consumes — verified by name match on the sequencer side.
Tests: the new cpSync path test asserts symlinkSync is NOT invoked (and vice versa for the symlink path) — that's exactly the parity-guard pattern the foundation stack established. Good.
Important (non-blocking, carry-forward):
packages/producer/src/services/render/stages/extractVideosStage.ts:213-232— the stage still mutatescomposition.audiosin place viacomposition.audios.push(...). The stage interface declarescomposition: CompositionMetadataand the JSDoc onExtractVideosStageInputacknowledges the mutation, but for retry/replay safety this should eventually move into a result field (e.g.discoveredAudios: AudioElement[]) the sequencer merges. Same shape as thecfg.forceScreenshotdebt on the foundation stack — flagged for the chunk-rendering PR where retry-safety becomes load-bearing, not blocking here.
Nits:
extractVideosStage.ts:33-44— importsRenderJob,collectVideoMetadataHints,collectVideoReadinessSkipIds,materializeExtractedFramesForCompiledDirfrom../../renderOrchestrator.js. Same runtime cycle pattern the capture stages (#730/#731/#733) introduce; #737 only solves theupdateJobStatusend of it. Worth a tracking line in the stage's header comment so the next PR consolidatesmaterializeExtractedFramesForCompiledDir+ the twocollect*helpers intoshared.tsalongsideupdateJobStatus.materializeSymlinksdefaults tofalse. If a single composition's frames could ever be cached and re-materialized with different values, the cachedoutputDirwould collide. Not a problem today (one renderer, one mode per render), but worth noting in the planHash invariant once distributed plan-cache lands — the param is a stage input that affects on-disk layout.
Praise: the new param is gated behind a default so in-tree callers are unchanged, the new code path is covered by a real test (avoiding the "added but unused" landmine from earlier stacks), and the 8 dropped orchestrator imports (extractAllVideoFrames, resolveProjectRelativeSrc, createFrameLookupTable, FrameLookupTable, detectTransfer, isHdrColorSpace, extractMediaMetadata, VideoColorSpace) are all flagged by oxlint — the diff hygiene is clean.
— Vai
miguel-heygen
left a comment
There was a problem hiding this comment.
Clean mechanical extraction — no behavior changes, no introduced bugs. Verified imports, error handling, and cleanup invariants are preserved. LGTM. — Magi

What
Phase 1 PR 1.4 of the distributed-render refactor. Moves the video frame extraction sub-stage of
executeRenderJobintopackages/producer/src/services/render/stages/extractVideosStage.ts. The sequencer now callsrunExtractVideosStageat the same code point with identical inputs and outputs.Source range moved:
renderOrchestrator.ts:1979-2100from before this PR (the block between the "── Stage 2: Video frame extraction ──" comment and the start of "── HDR auto-detection ──").Also adds a new
materializeSymlinks: booleanparameter (defaultfalse) onextractVideosStageAND onmaterializeExtractedFramesForCompiledDir. Whentrue, the helper invokescpSync(recursive)instead ofsymlinkSyncso the staged frames are real files insidecompiledDir— required for distributedplan()output where the planDir must be self-contained across machines (symlinks don't survive S3 / GCS round-trips). Defaultfalsepreserves the in-process renderer's symlink behavior.Why
Continues the Phase 1 mechanical extraction that splits the ~2000-line
executeRenderJobinto 8 stage functions plus a thin sequencer. Phase 1 ships zero new functionality — it's purely a reviewable refactor that sets up the codebase for follow-on determinism hardening and the new distributed primitives.The
materializeSymlinksparameter is additive — there are no in-tree callers that passtrueyet, so in-process behavior is preserved exactly. The implementation is tested so it's not dead code (one of the lessons from the previous stack's reviews).How
extractVideosStage.tsexportsrunExtractVideosStage(input) → ExtractVideosStageResult. The function body is the existing video-extraction code lifted verbatim. Returns 11 outputs that the downstream HDR auto-detection block, capture stages, and HDR composite path all consume.materializeExtractedFramesForCompiledDir:cpSyncto the injectedMaterializeFileSystemtype and to the module-scopematerializeFileSystemconst.materializeSymlinks?: booleantoMaterializeExtractedFramesOptions.cpSync(..., { recursive: true })andsymlinkSync(...)based on the option.renderOrchestrator.ts):runExtractVideosStage(...)at the same code point.extractAllVideoFrames,resolveProjectRelativeSrc,createFrameLookupTable,FrameLookupTable,detectTransfer,isHdrColorSpace,extractMediaMetadata,VideoColorSpace(all flagged by oxlint after the extraction).Preserved invariants
composition.audiosis still mutated in place to add audio entries auto-discovered from video files via ffprobe.perfStages.videoExtractMsis set at the same end-of-stage point with the same value.materializeExtractedFramesForCompiledDiris called exactly once whenextractionResult.extractedis non-empty.force-sdrmode still skips ALL ffprobe overhead (both video and image color-space probes).Test plan
bunx oxlint packages/producer/src/services/render/stages/extractVideosStage.ts packages/producer/src/services/renderOrchestrator.ts packages/producer/src/services/renderOrchestrator.test.ts— clean.bunx oxfmt --checkon the same files — clean.bun run --filter @hyperframes/producer typecheck— clean.bun run --filter @hyperframes/producer build— clean.bun test packages/producer/src/services/— 176 pass, 1 pre-existing unrelated failure (writeCompiledArtifacts — rejects a maliciously crafted key) that also fails on main.materializeSymlinks: truecpSync code path; existing symlink test updated with a parallel guard thatcpSyncis NOT invoked when the option isfalse.docker run hyperframes-producer:test --sequential font-variant-numeric many-cuts variables-prod sub-composition-video— 4/4 PASS with audio correlations 1.000 / 0.994 / 0.975 / 0.947.sub-composition-videois the fixture that exercises the video-extraction path end-to-end.regressionworkflow.🤖 Generated with Claude Code