refactor(producer): extract probeStage from executeRenderJob#719
Conversation
miguel-heygen
left a comment
There was a problem hiding this comment.
This is the heaviest extraction in the stack (+409/-283) and the riskiest since the probe stage owns the browser lifecycle. Good decisions:
- FileServerHandle and CaptureSession ownership stays with the sequencer (created inside probe, returned to sequencer, cleaned up in sequencer's finally) — this means the stage can fail without leaking resources.
compositionmutation in place is the right call for this refactor phase — avoids a massive return-type refactor while preserving the downstream contract.- The re-assertion of
job.durationandjob.totalFramesat the call site to restore TypeScript narrowing is a thoughtful detail. - PSNR regression results (48dB font-variant-numeric, 0 failed many-cuts, 69dB variables-prod) confirm the extraction is behavior-preserving.
The recompileWithResolutions staying inside probe (because it depends on browser-resolved durations) is documented and correct — it can't move to compile stage without changing the execution order.
LGTM
vanceingalls
left a comment
There was a problem hiding this comment.
Verdict: Approve.
Largest extraction in the stack and the one most likely to drift, but it's clean. I diffed probeStage.ts against the executeRenderJob probe block on main (lines 2177-2417) — every statement is preserved in order, with the same control flow, the same regex patterns, the same first-10/first-5 slicing on failedRequests, and the same __timelines/gsapLoaded diagnostics shape. The PSNR/audio-correlation results in the test plan back this up.
A few specific things I verified end-to-end:
-
The in-place mutation contract on
compositionis preserved:composition.videos = compiled.videos; composition.audios = compiled.audios; composition.images = compiled.images;runs at the same code point (probeStage.ts:252-254), and the sequencer sees the reconciled view through the same reference becausecompositionis passed by reference intoProbeStageInput. -
The
lastBrowserConsolelifecycle is preserved: the sequencer'slet lastBrowserConsole: string[] = []initialization is unchanged, the probeStage returns[]whenneedsBrowserwas false, and reassignment toprobeResult.lastBrowserConsoleis a no-op in that path. -
The
if (duration <= 0)check moved inside the stage uses the localduration = composition.duration, which is the same valuejob.durationwould have had in the original (becausejob.duration = composition.durationhappens immediately above). Equivalent. -
The post-probe "failed network requests" warning is guarded by
if (probeSession)in both the old and new code — andprobeSessionis only non-null whenneedsBrowserwas true. Same gating. -
recompileWithResolutionsis correctly kept inside the probe stage (vs. promoted to a sibling) because it depends on browser-resolved durations. The PR description acknowledges this is a divergence from §2.1 of the design doc with an explicit reason — good call.
Important
-
(important)
probeStage.ts:373-376— the result type re-assertsdurationandtotalFramesas siblings tojob.duration/job.totalFramespurely to restore TS control-flow narrowing in the sequencer. The PR description notes this is intentional, but it's a maintenance hazard: future readers will not understand why a value lives in TWO places (onjobAND in the result), and a refactor could easily drop one and create a silent skew.Suggested fix (no behavior change): drop
job.duration = composition.durationandjob.totalFrames = ...from insideprobeStageentirely. Make those assignments the sequencer's job, off the typedprobeResult.duration/probeResult.totalFrames. The stage produces values; the sequencer owns thejobobject's state. Cleaner separation, no TS narrowing trick needed, and aligns with howchunkStagewill eventually have to work (a chunk worker can't mutate the orchestrator'sjob).Not blocking because behavior is preserved — but if this is easy to do in a follow-up, it'll pay back across the next 6 stages.
-
(important)
probeStage.ts:190-193— thereasonsarray is built but never used (no log line, no diagnostic, no throw context). This is preserved-as-is dead code frommain(it was already dead there), so I won't ask you to remove it in this PR — but please file a follow-up. Either usereasons.join(", ")in the existinglog.info("Probed composition duration from browser", ...)payload, or delete it. Dead code in a 370-line stage is the kind of thing that becomes "load-bearing" via cargo-culting six months from now.
Nits
- (nit)
probeStage.ts:132—BROWSER_MEDIA_EPSILON = 0.0001is a magic constant duplicated from where it used to live inrenderOrchestrator.ts. If the eventualmediaReconcileStagealso needs it, hoist torender/shared.ts(which #720 is already setting up). Easy to do in #720. - (nit)
probeStage.ts:167-180— same destructure-everything pattern ascompileStage. Consistency is fine here, butlet { compiled }mixed withconst { ... }is a small readability hiccup. Acceptable. - (nit)
probeStage.ts:386—(window as any).__timelines/(window as any).__hfcasts. Pre-existing pattern frommain, not introduced here. If you want to clean up in a follow-up, defineinterface WindowWithHfDebug extends Window { __timelines?: Record<string, unknown>; __hf?: { duration: number | null } }— not blocking. - (nit) The second commit (
drop internal PR/phase identifiers from stages doc) is good housekeeping but slightly out of scope for a refactor PR. Future you/reviewers can squash. No action needed.
Praise
- The result type carrying
compiled: CompiledComposition(potentially reassigned fromrecompileWithResolutions) rather than mutating an input field is correct — it makes the rebind visible at the call site instead of via a hidden side-effect. Good design choice in an otherwise mutation-heavy block. - The Docker regression run on
font-variant-numeric,many-cuts, andvariables-prodis exactly the kind of evidence a refactor PR of this size needs. PSNR ≈ 48 dB and audio correlation 1.000/0.994/0.975 are tight enough to prove no drift.
Stack-coherence: after #717 → #718 → #719, the stage interface is settling into a consistent shape (input object, async result object, optional resources returned to sequencer for cleanup). A future chunkStage will fit cleanly. The one thing the stack interface doesn't yet have — and will need before Phase 2 — is per-stage observability hooks (Datadog spans, log scopes). Suggest adding tracer: Tracer to the input types when the first stage that benefits lands, rather than retrofitting all 8 later. Not for this PR.
— Vai
Move the browser probe / duration discovery / recompile / media reconciliation block out of `executeRenderJob` into `services/render/stages/probeStage.ts`. No behavior change. The sequencer calls `runProbeStage` at the same code point with identical inputs and outputs. The probe stage owns the `FileServerHandle` and the `CaptureSession` it creates and returns them to the sequencer. The sequencer still tracks them in its `let fileServer` / `let probeSession` bindings and closes them in its `finally` block — the resource lifetime is unchanged. `recompileWithResolutions` lives inside this stage because it depends on browser-resolved durations even though §2.1 of the distributed plan lists recompile as a sibling phase. Preserved invariants: - `composition` is mutated in place (videos / audios / duration) so downstream stages see the reconciled view through the same reference. - `job.duration` and `job.totalFrames` end up with the same values at the same code points. The result type carries `duration: number` alongside `totalFrames: number`, and the sequencer re-asserts the assignments after the call so TypeScript's control-flow narrowing works for the rest of `executeRenderJob`. - `perfStages.browserProbeMs` and `perfStages.compileMs` are written at the same code points with the same values. - The "Composition duration is 0" diagnostic builds the same hint string from the same console-buffer regex and `__timelines` probe. - The post-probe "failed network requests" warning fires with the same regex, the same first-10/first-5 slicing, and the same `console.warn` prefix. Renderer smoke-tested inside `Dockerfile.test` against `font-variant-numeric`, `many-cuts`, and `variables-prod` — all PSNR / audio correlation baselines match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment-only cleanup. Removes "PR 1.x", "Phase 1 PR", and "Phase 3 PR 3.1" references from JSDoc blocks in `compileStage.ts`, `probeStage.ts`, `planHash.ts`, and `freezePlan.ts`. Track / PR identifiers rot quickly and belong in PR descriptions, not in source. Design-doc section citations (DISTRIBUTED-RENDERING-PLAN.md §X.Y) are kept — those reference a stable external artifact. Also tightens the `probeStage.ts` `browserProbeMs` doc string to say "near-zero when `needsBrowser` was false" instead of "0" — the Date.now() delta around the function body is sub-ms but not literally zero. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…talFrames The probe stage previously assigned `job.duration` and `job.totalFrames` inside its body AND the sequencer re-asserted them after the call to restore TS narrowing. Two writers for the same field is a maintenance hazard — a future refactor could drop one and create a silent skew. Move ownership: the stage computes `duration` and `totalFrames` and returns them; the sequencer is the sole writer onto the `RenderJob`. This also aligns with the eventual chunk-worker model where a chunk running in a separate process cannot mutate the orchestrator's `job`. No observable behavior change. `job.duration` / `job.totalFrames` end up with the same values; the zero-duration `throw` still happens inside the stage (now using the local `duration` constant) before any sequencer-side assignment. Verified by: - `bun run --filter @hyperframes/producer typecheck` clean - `bun test packages/producer/src/services/` 175 pass / 1 pre-existing unrelated failure on `main` Review feedback addressed: vanceingalls on #719. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
02f78f9 to
d168397
Compare
835c834 to
2024251
Compare
|
Thanks @vanceingalls @miguel-heygen — addressed the important item in commit
Deferred per your "not for this PR" framing:
Done in #720:
Style nits ( |
The base branch was changed.

What
Phase 1 PR 1.3 of the distributed-render refactor. Moves the browser probe / duration discovery / recompile / media reconciliation block out of `executeRenderJob` into `packages/producer/src/services/render/stages/probeStage.ts`. The sequencer now calls `runProbeStage` at the same code point with identical inputs and outputs.
Source range moved: `renderOrchestrator.ts:2145-2402` from before this stack (the `probeStart` / `needsBrowser` block plus the post-probe zero-duration diagnostic and failed-request warning).
Why
Continues the Phase 1 mechanical extraction that splits the ~2000-line `executeRenderJob` into 8 stage functions plus a thin sequencer. Phase 1 ships zero new functionality — it's purely a reviewable refactor that sets up the codebase for follow-on determinism hardening (Phase 2) and the new distributed primitives (Phase 3).
The probe stage owns the `FileServerHandle` and the `CaptureSession` it creates and returns them to the sequencer. The sequencer continues to track them in its `let fileServer` / `let probeSession` bindings and closes them in its `finally` block, so the resource lifetime is unchanged.
How
Preserved invariants
Test plan
🤖 Generated with Claude Code