Skip to content

refactor(producer): extract probeStage from executeRenderJob#719

Merged
jrusso1020 merged 3 commits into
mainfrom
refactor/producer-stages-1.3-probe
May 11, 2026
Merged

refactor(producer): extract probeStage from executeRenderJob#719
jrusso1020 merged 3 commits into
mainfrom
refactor/producer-stages-1.3-probe

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

Stacked on #717 (PR 1.1) → #718 (PR 1.2) → this. Rebase to `main` once 1.2 merges.

What

Phase 1 PR 1.3 of the distributed-render refactor. Moves the browser probe / duration discovery / recompile / media reconciliation block out of `executeRenderJob` into `packages/producer/src/services/render/stages/probeStage.ts`. The sequencer now calls `runProbeStage` at the same code point with identical inputs and outputs.

Source range moved: `renderOrchestrator.ts:2145-2402` from before this stack (the `probeStart` / `needsBrowser` block plus the post-probe zero-duration diagnostic and failed-request warning).

Why

Continues the Phase 1 mechanical extraction that splits the ~2000-line `executeRenderJob` into 8 stage functions plus a thin sequencer. Phase 1 ships zero new functionality — it's purely a reviewable refactor that sets up the codebase for follow-on determinism hardening (Phase 2) and the new distributed primitives (Phase 3).

The probe stage owns the `FileServerHandle` and the `CaptureSession` it creates and returns them to the sequencer. The sequencer continues to track them in its `let fileServer` / `let probeSession` bindings and closes them in its `finally` block, so the resource lifetime is unchanged.

How

  • New `probeStage.ts` exports `runProbeStage(input) → ProbeStageResult`. The function body is the existing probe code lifted verbatim. `recompileWithResolutions` runs inside this stage because it depends on browser-resolved durations.
  • `composition` is mutated in place (videos / audios / duration) so downstream stages see the reconciled view through the same reference.
  • `job.duration` and `job.totalFrames` are set inside `runProbeStage`. The result type also carries `duration: number` and `totalFrames: number`, and the sequencer re-asserts the assignments after the call. This is needed because TypeScript's control-flow narrowing tracked the inline assignments in the old code; re-asserting at the call site restores the narrowing for the rest of `executeRenderJob`.
  • Removes the now-unused `getCompositionDuration` import from `@hyperframes/engine`, the `resolveCompositionDurations` / `recompileWithResolutions` / `discoverMediaFromBrowser` imports from `./htmlCompiler.js`, and the local `BROWSER_MEDIA_EPSILON` constant from the sequencer (oxlint flagged each as unused after the extraction).

Preserved invariants

  • `perfStages.browserProbeMs` and `perfStages.compileMs` are written at the same code points with the same values.
  • The "Composition duration is 0" diagnostic builds the same hint string from the same console-buffer regex and `__timelines` probe.
  • The post-probe "failed network requests" warning fires with the same regex, the same first-10 / first-5 slicing, and the same `console.warn` prefix.
  • `fileServer` / `probeSession` / `lastBrowserConsole` are still tracked by the sequencer's bindings and reach the existing `finally` cleanup.

Test plan

  • `bunx oxlint packages/producer/src/services/render/stages/ packages/producer/src/services/renderOrchestrator.ts` — clean.
  • `bunx oxfmt --check packages/producer/src/services/render/stages/ packages/producer/src/services/renderOrchestrator.ts` — clean.
  • `bun run --filter @hyperframes/producer typecheck` — clean.
  • `bun run --filter @hyperframes/producer build` — clean.
  • `bun test packages/producer/src/services/` — 173 pass, 1 pre-existing failure (`writeCompiledArtifacts — rejects a maliciously crafted key`) that also fails on main and is unrelated to this PR.
  • `docker build -t hyperframes-producer:test -f Dockerfile.test .` then `docker run --rm -v .../tests:/app/packages/producer/tests hyperframes-producer:test --sequential font-variant-numeric many-cuts variables-prod` — 3/3 pass with PSNR / audio-correlation baselines intact (font-variant-numeric: PSNR ≈ 48 dB across 100 checkpoints, audio correlation 1.000; many-cuts: 0 failed frames, audio correlation 0.994; variables-prod: PSNR ≈ 69 dB across 100 checkpoints, audio correlation 0.975).
  • Full regression matrix on CI via the `regression` workflow.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Collaborator Author

jrusso1020 commented May 11, 2026

miguel-heygen
miguel-heygen previously approved these changes May 11, 2026
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the heaviest extraction in the stack (+409/-283) and the riskiest since the probe stage owns the browser lifecycle. Good decisions:

  • FileServerHandle and CaptureSession ownership stays with the sequencer (created inside probe, returned to sequencer, cleaned up in sequencer's finally) — this means the stage can fail without leaking resources.
  • composition mutation in place is the right call for this refactor phase — avoids a massive return-type refactor while preserving the downstream contract.
  • The re-assertion of job.duration and job.totalFrames at the call site to restore TypeScript narrowing is a thoughtful detail.
  • PSNR regression results (48dB font-variant-numeric, 0 failed many-cuts, 69dB variables-prod) confirm the extraction is behavior-preserving.

The recompileWithResolutions staying inside probe (because it depends on browser-resolved durations) is documented and correct — it can't move to compile stage without changing the execution order.

LGTM

vanceingalls
vanceingalls previously approved these changes May 11, 2026
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: Approve.

Largest extraction in the stack and the one most likely to drift, but it's clean. I diffed probeStage.ts against the executeRenderJob probe block on main (lines 2177-2417) — every statement is preserved in order, with the same control flow, the same regex patterns, the same first-10/first-5 slicing on failedRequests, and the same __timelines/gsapLoaded diagnostics shape. The PSNR/audio-correlation results in the test plan back this up.

A few specific things I verified end-to-end:

  1. The in-place mutation contract on composition is preserved: composition.videos = compiled.videos; composition.audios = compiled.audios; composition.images = compiled.images; runs at the same code point (probeStage.ts:252-254), and the sequencer sees the reconciled view through the same reference because composition is passed by reference into ProbeStageInput.

  2. The lastBrowserConsole lifecycle is preserved: the sequencer's let lastBrowserConsole: string[] = [] initialization is unchanged, the probeStage returns [] when needsBrowser was false, and reassignment to probeResult.lastBrowserConsole is a no-op in that path.

  3. The if (duration <= 0) check moved inside the stage uses the local duration = composition.duration, which is the same value job.duration would have had in the original (because job.duration = composition.duration happens immediately above). Equivalent.

  4. The post-probe "failed network requests" warning is guarded by if (probeSession) in both the old and new code — and probeSession is only non-null when needsBrowser was true. Same gating.

  5. recompileWithResolutions is correctly kept inside the probe stage (vs. promoted to a sibling) because it depends on browser-resolved durations. The PR description acknowledges this is a divergence from §2.1 of the design doc with an explicit reason — good call.

Important

  • (important) probeStage.ts:373-376 — the result type re-asserts duration and totalFrames as siblings to job.duration / job.totalFrames purely to restore TS control-flow narrowing in the sequencer. The PR description notes this is intentional, but it's a maintenance hazard: future readers will not understand why a value lives in TWO places (on job AND in the result), and a refactor could easily drop one and create a silent skew.

    Suggested fix (no behavior change): drop job.duration = composition.duration and job.totalFrames = ... from inside probeStage entirely. Make those assignments the sequencer's job, off the typed probeResult.duration / probeResult.totalFrames. The stage produces values; the sequencer owns the job object's state. Cleaner separation, no TS narrowing trick needed, and aligns with how chunkStage will eventually have to work (a chunk worker can't mutate the orchestrator's job).

    Not blocking because behavior is preserved — but if this is easy to do in a follow-up, it'll pay back across the next 6 stages.

  • (important) probeStage.ts:190-193 — the reasons array is built but never used (no log line, no diagnostic, no throw context). This is preserved-as-is dead code from main (it was already dead there), so I won't ask you to remove it in this PR — but please file a follow-up. Either use reasons.join(", ") in the existing log.info("Probed composition duration from browser", ...) payload, or delete it. Dead code in a 370-line stage is the kind of thing that becomes "load-bearing" via cargo-culting six months from now.

Nits

  • (nit) probeStage.ts:132BROWSER_MEDIA_EPSILON = 0.0001 is a magic constant duplicated from where it used to live in renderOrchestrator.ts. If the eventual mediaReconcileStage also needs it, hoist to render/shared.ts (which #720 is already setting up). Easy to do in #720.
  • (nit) probeStage.ts:167-180 — same destructure-everything pattern as compileStage. Consistency is fine here, but let { compiled } mixed with const { ... } is a small readability hiccup. Acceptable.
  • (nit) probeStage.ts:386(window as any).__timelines / (window as any).__hf casts. Pre-existing pattern from main, not introduced here. If you want to clean up in a follow-up, define interface WindowWithHfDebug extends Window { __timelines?: Record<string, unknown>; __hf?: { duration: number | null } } — not blocking.
  • (nit) The second commit (drop internal PR/phase identifiers from stages doc) is good housekeeping but slightly out of scope for a refactor PR. Future you/reviewers can squash. No action needed.

Praise

  • The result type carrying compiled: CompiledComposition (potentially reassigned from recompileWithResolutions) rather than mutating an input field is correct — it makes the rebind visible at the call site instead of via a hidden side-effect. Good design choice in an otherwise mutation-heavy block.
  • The Docker regression run on font-variant-numeric, many-cuts, and variables-prod is exactly the kind of evidence a refactor PR of this size needs. PSNR ≈ 48 dB and audio correlation 1.000/0.994/0.975 are tight enough to prove no drift.

Stack-coherence: after #717#718#719, the stage interface is settling into a consistent shape (input object, async result object, optional resources returned to sequencer for cleanup). A future chunkStage will fit cleanly. The one thing the stack interface doesn't yet have — and will need before Phase 2 — is per-stage observability hooks (Datadog spans, log scopes). Suggest adding tracer: Tracer to the input types when the first stage that benefits lands, rather than retrofitting all 8 later. Not for this PR.

— Vai

jrusso1020 and others added 3 commits May 11, 2026 19:01
Move the browser probe / duration discovery / recompile / media
reconciliation block out of `executeRenderJob` into
`services/render/stages/probeStage.ts`. No behavior change. The sequencer
calls `runProbeStage` at the same code point with identical inputs and
outputs.

The probe stage owns the `FileServerHandle` and the `CaptureSession` it
creates and returns them to the sequencer. The sequencer still tracks
them in its `let fileServer` / `let probeSession` bindings and closes
them in its `finally` block — the resource lifetime is unchanged.

`recompileWithResolutions` lives inside this stage because it depends on
browser-resolved durations even though §2.1 of the distributed plan
lists recompile as a sibling phase.

Preserved invariants:

- `composition` is mutated in place (videos / audios / duration) so
  downstream stages see the reconciled view through the same reference.
- `job.duration` and `job.totalFrames` end up with the same values at
  the same code points. The result type carries `duration: number`
  alongside `totalFrames: number`, and the sequencer re-asserts the
  assignments after the call so TypeScript's control-flow narrowing
  works for the rest of `executeRenderJob`.
- `perfStages.browserProbeMs` and `perfStages.compileMs` are written at
  the same code points with the same values.
- The "Composition duration is 0" diagnostic builds the same hint string
  from the same console-buffer regex and `__timelines` probe.
- The post-probe "failed network requests" warning fires with the same
  regex, the same first-10/first-5 slicing, and the same `console.warn`
  prefix.

Renderer smoke-tested inside `Dockerfile.test` against `font-variant-numeric`,
`many-cuts`, and `variables-prod` — all PSNR / audio correlation baselines
match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment-only cleanup. Removes "PR 1.x", "Phase 1 PR", and "Phase 3 PR 3.1"
references from JSDoc blocks in `compileStage.ts`, `probeStage.ts`,
`planHash.ts`, and `freezePlan.ts`. Track / PR identifiers rot quickly and
belong in PR descriptions, not in source. Design-doc section citations
(DISTRIBUTED-RENDERING-PLAN.md §X.Y) are kept — those reference a stable
external artifact.

Also tightens the `probeStage.ts` `browserProbeMs` doc string to say
"near-zero when `needsBrowser` was false" instead of "0" — the Date.now()
delta around the function body is sub-ms but not literally zero.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…talFrames

The probe stage previously assigned `job.duration` and `job.totalFrames`
inside its body AND the sequencer re-asserted them after the call to
restore TS narrowing. Two writers for the same field is a maintenance
hazard — a future refactor could drop one and create a silent skew.

Move ownership: the stage computes `duration` and `totalFrames` and
returns them; the sequencer is the sole writer onto the `RenderJob`.
This also aligns with the eventual chunk-worker model where a chunk
running in a separate process cannot mutate the orchestrator's `job`.

No observable behavior change. `job.duration` / `job.totalFrames` end
up with the same values; the zero-duration `throw` still happens
inside the stage (now using the local `duration` constant) before any
sequencer-side assignment. Verified by:

- `bun run --filter @hyperframes/producer typecheck` clean
- `bun test packages/producer/src/services/` 175 pass / 1 pre-existing
  unrelated failure on `main`

Review feedback addressed: vanceingalls on #719.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jrusso1020 jrusso1020 force-pushed the refactor/producer-stages-1.2-compile branch from 02f78f9 to d168397 Compare May 11, 2026 19:07
@jrusso1020 jrusso1020 force-pushed the refactor/producer-stages-1.3-probe branch from 835c834 to 2024251 Compare May 11, 2026 19:07
@jrusso1020
Copy link
Copy Markdown
Collaborator Author

Thanks @vanceingalls @miguel-heygen — addressed the important item in commit 20242515:

  • Single writer for job.duration / job.totalFrames: dropped the assignments from inside runProbeStage. The stage now only computes duration / totalFrames and returns them; the sequencer is the sole writer onto the RenderJob. Also aligns with the eventual chunk-worker model (a chunk in a different process can't mutate the orchestrator's job).

    The TS-narrowing concern that originally motivated the duplication is solved by the typed result — probeResult.duration: number lands on job.duration and TS narrows from there. Verified typecheck clean + same 175-pass / 1-pre-existing-fail baseline + Docker smoke (font-variant-numeric / many-cuts / variables-prod) unchanged.

Deferred per your "not for this PR" framing:

  • reasons dead-code array: still preserved-as-is. Will file a separate cleanup follow-up rather than fold into this refactor.

Done in #720:

  • BROWSER_MEDIA_EPSILON hoist to render/shared.ts.

Style nits ((window as any) casts, second-commit squash): left for follow-up.

Base automatically changed from refactor/producer-stages-1.2-compile to main May 11, 2026 20:35
@jrusso1020 jrusso1020 dismissed stale reviews from vanceingalls and miguel-heygen May 11, 2026 20:35

The base branch was changed.

@jrusso1020 jrusso1020 merged commit 70d3db4 into main May 11, 2026
47 of 51 checks passed
@jrusso1020 jrusso1020 deleted the refactor/producer-stages-1.3-probe branch May 11, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants