feat(cli): hyperframes lambda render-batch verb#967
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
2be3f1a to
e2a2722
Compare
0445b42 to
0b1732d
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Well-designed batch rendering command. The key architecture decisions are all correct: JSONL format (streamable, line-number errors), single deploySite reuse across all entries, concurrency cap via runWithConcurrencyLimit, per-entry error containment (failed entries captured in manifest, not aborting the batch), and --dry-run for pre-flight validation.
Pre-validation against composition schema before any AWS calls is a nice touch. Test coverage is solid — 11 tests covering concurrency limiter (order preservation, peak-in-flight, rejection) and batch file parsing.
One suggestion (non-blocking): runWithConcurrencyLimit docstring says "Promise.allSettled semantics" but the implementation uses Promise.all which rejects on first failure. The production caller wraps in try/catch so this is unreachable, but the documented contract should match — either update the docstring or switch to allSettled behavior.
CI green. Ship it.
vanceingalls
left a comment
There was a problem hiding this comment.
render-batch CLI verb review
Clean delivery — the per-entry try/catch + index-cursor concurrency + dry-run scaffolding are the right primitives for this surface, and the JSONL parser exits with a line-numbered error rather than dumping a stack. Calling out specifics below.
Audited: packages/cli/src/commands/lambda/render-batch.ts (end-to-end), packages/cli/src/commands/lambda/render-batch.test.ts (end-to-end), packages/cli/src/commands/lambda.ts (the new switch case + flag wiring), docs/packages/cli.mdx (new section).
Trusting: the sibling render.ts integration points (reportVariableIssues, loadProjectVariableSchema, renderToLambda contract) — verified call shapes match.
Calibrated strengths
runWithConcurrencyLimitrender-batch.ts:720-738— index-cursor +Math.min(limit, inputs.length)workers is the right shape; preserves order, never overspawns, and the test onpeak === 3(render-batch.test.ts:208-221) pins the concurrency invariant honestly.- Per-entry try/catch in
startEntryrender-batch.ts:554-579correctly isolates a single SFNStartExecutionfailure into the manifest withstatus: \"failed-to-start\"instead of aborting the batch — that's the load-bearing primitive for this verb's value prop. - Schema is loaded once and reused across entries
render-batch.ts:488— explicit avoidance of the 10k-row file-read + DOMParser hot loop. Comment cites the reason. --dry-runsynthesises a placeholderSiteHandleand never importsrenderToLambdarender-batch.ts:539— keeps the lint-the-batch path AWS-free.- Documented JSONL example +
--max-concurrentsemantics indocs/packages/cli.mdx:976correctly distinguish orchestrator-side caps from AWS account-level Lambda concurrency. That distinction is the failure mode users will hit first; spelling it out in docs is right.
Findings
important — no SIGINT / cancellation handling for in-flight executions
When a user runs a 10 000-entry batch and Ctrl+C's mid-fanout, the workers' await renderToLambda(...) calls already in flight will continue, and their started SFN executions keep running on AWS (chargeable). The manifest never gets written because Promise.all is interrupted, so the caller has no record of which executions were started — leaks both money and observability.
Minimum fix: register a SIGINT handler at the top of runRenderBatch that flips a cancelled flag the worker loop checks before pulling the next index, and emits the partial manifest on shutdown (process.on('SIGINT', ...) + write to stderr or a manifest path). Bonus: surface the in-flight executionArns so the user can aws stepfunctions stop-execution them.
This is the cancellation surface called out in the project plan — worth landing now rather than as a follow-up since the dispatch primitive is the whole point of the verb.
important — strict-variables aborts on the first failing entry instead of collecting all issues
render-batch.ts:489-498 loops over every entry calling reportVariableIssues(..., { strict }), and reportVariableIssues (in utils/variables.ts) calls process.exit(1) on the first entry with issues when strict: true. So a 1000-row batch with 100 broken entries forces the user to fix-one → re-run → fix-one → re-run, 100 cycles to clean the batch. The whole pitch of --strict-variables is "abort before any AWS call" — that doesn't preclude collecting all the diagnostics first.
Fix: in the batch path, accumulate issues across all entries first, then in strict mode print the full list and exit once. (Non-strict path already prints all warnings — only strict regresses.)
important — JSDoc on runWithConcurrencyLimit is wrong about semantics
render-batch.ts:715-719 says "Promise.allSettled semantics — a worker failure is captured in the returned BatchManifestEntry … this helper never rejects." The implementation uses Promise.all (line 729) and explicitly rejects on the first worker rejection — that's pinned by the test on lines 249-256. The "never rejects" wording is only true because the caller (startEntry) wraps renderToLambda in try/catch.
Two problems with shipping this doc:
- If anyone reuses
runWithConcurrencyLimitlater assumingallSettledsemantics and skips the try/catch, the first rejection silently kills the in-flight workers (they're abandoned mid-await, the unhandled-promise-rejection from later cursor iterations can crash the process). - The behavior contract is fuzzy: "never rejects" is a property of the (helper, caller) pair, not the helper alone.
Fix: rewrite the JSDoc to say "This helper uses Promise.all and propagates the first worker rejection. Callers that need partial-failure isolation must catch inside worker (as startEntry does)."
important — no progress visibility for long-running batches
A 10 000-entry batch with --max-concurrent 50 is silent for minutes between the "Site uploaded once" line and the final manifest print. The PR description acknowledges 1000s of executions but ships no progress signal. For SREs running this in CI, the only signal that the process is alive is a ps check.
Fix: emit a one-line progress write to stderr every K completed entries (K = 25 or Math.max(1, Math.floor(inputs.length / 40))), gated off when --json is set. Doesn't pollute the stdout manifest, gives a heartbeat.
important — no upfront cost confirmation on the live path
--dry-run is great for linting, but the live path commits to N billable executions with zero friction — a typo in --batch ./users.jsonl pointing at a 50k-row file fires 50k StartExecution calls before the user can react. The PR description even calls this out ("warn before submission") but the implementation has no warning.
Fix: before the first renderToLambda call, print \"About to dispatch N renders (estimated cost: $X, --max-concurrent={M}). Press Ctrl+C within 3s to abort.\" then setTimeout(3000). Gate with --yes for CI consumers. Cost-per-render estimate is hand-wavy but order-of-magnitude is enough — "50000 renders × ~\$0.05 = ~\$2500" is the signal users want to see.
If a cost-estimator is out of scope for this PR, at least the count + --yes gate.
Nits
render-batch.ts:643— the JSDoc says "Throws viaerrorBox+process.exit(1)on the first malformed line."errorBoxwrites to stderr but does NOT throw or exit (verified inui/format.ts:22-27); only the subsequentprocess.exit(1)halts the process. The function doesn't "throw," it exits — minor wording nit.render-batch.ts:539—const { renderToLambda } = args.dryRun ? { renderToLambda: null as never } : await loadSDK();is a clever TS-friendly hack but obscures intent. A plainif (!args.dryRun) { ({ renderToLambda } = await loadSDK()); }reads more directly and theas neverlie disappears.render-batch.ts:643docstring claims "sized for the documented ~10k-entry cap" — no runtime enforcement of that cap exists. Either add a soft check (if (entries.length > 10000) console.warn(...)) or drop the claim from the doc. Surprises a user who hands the verb a 100k-row batch.- PR body says "11 unit tests" — count is correct (6 for
runWithConcurrencyLimit, 5 forparseBatchFile). For the record, no integration test exercisesstartEntrywith a mockedrenderToLambda. Acceptable since the per-entry try/catch is straightforward, but the failure-isolation invariant the verb promises has no test pin.
CI
Required checks green at HEAD e2a2722f: regression, preview-regression, Preview parity, player-perf, preview-regression (across both workflows). Optional/cancelled checks from a superseded run noted but not gating. Graphite mergeability check still IN_PROGRESS.
Verdict: COMMENT
Reasoning: Design is sound and the failure-isolation primitive is correctly built, but ship-blockers for the headline use case (10k-row batches) are missing: no cancellation, no progress, no cost gate, strict-mode UX regression vs single-render. Calling COMMENT rather than REQUEST CHANGES because none of these are correctness bugs — they're production-readiness gaps the author can address inline or in 9.5/9.6 follow-ups. Author's call which.
— Vai
0b1732d to
a4b1d1a
Compare
e2a2722 to
3d056a5
Compare
|
Addressed Vai's feedback in the latest force-push:
Skipping for follow-up (none are correctness bugs — production-readiness gaps that merit separate PRs):
|
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-reviewed after force-push. All 3 addressed items verified:
-
Collect-all-issues in strict mode — Fixed.
reportVariableIssuesis called withstrict: falseper entry (never exits early), loop accumulateshadStrictIssueflag, single exit after all entries processed. A 1000-row batch with 100 broken entries now shows all 100 problems in one run. -
Promise.all JSDoc corrected — Fixed. Now accurately documents
Promise.allwith first-rejection propagation and explicitly notes caller responsibility for partial-failure isolation. The false "Promise.allSettled / never rejects" claim is gone. -
null as nevercleaned up — Fixed. Replaced withundefined+ runtime narrowing guard. No type-system lies.
The 3 deferred items (SIGINT handling, progress signal, upfront cost gate) are acknowledged as follow-up work. CI green. Ship it.
vanceingalls
left a comment
There was a problem hiding this comment.
Re-review post-force-push. Per James's per-PR response, three of the five prior production-readiness items addressed:
--strict-variablesnow collects ALL issues + single exit (covers the abort-on-first-failure feedback-loop concern).- JSDoc on
runWithConcurrencyLimitcorrected to reflectPromise.allreject-on-first semantics. null as neverhack cleaned up.
Two still open as documented follow-ups:
- No SIGINT handling for in-flight SFN executions on Ctrl+C (real chargeable leak; worth a follow-up issue).
- No progress signal on long batches (operators silent until completion).
- No upfront cost gate on the live path (typo-in-
--batchfootgun).
These are real production-readiness gaps for the 10k headline use case but acceptable as fast-followers if tracked.
Verdict: APPROVE — flipping COMMENT → APPROVE on the strength of #2/#3/#JSDoc fixes; the remaining three are operational, not correctness, and worth ticketing.
Re-review by Vai
… --strict-variables Mirror the local hyperframes render variables UX on the Lambda CLI: - --variables '<json>' inline JSON object of variable values - --variables-file <path> path to a JSON file with variable values - --strict-variables fail on type/declared-mismatch (warn by default) Resolution + validation logic is hoisted to packages/cli/src/utils/variables.ts so both surfaces share one parser. The new reportVariableIssues helper formats the warning block + handles --strict-variables exit, deduping the per-CLI issue-handling block. Variables flow into SerializableDistributedRenderConfig.variables and reach every chunk worker via the path PR 9.1 + 9.2 wired up (plan() → meta/encoder.json → renderChunk() → window.__hfVariables). Pre-validation against the composition's data-composition-variables declaration runs only when the project's index.html is on disk — --site-id pointing at a pre-uploaded site that was packaged elsewhere skips the check, matching how the local CLI treats unreadable index files. The render.ts re-exports of parseVariablesArg / resolveVariablesArg / validateVariablesAgainstProject are dropped; the matching tests move to packages/cli/src/utils/variables.test.ts where the implementations now live. Docs: docs/packages/cli.mdx adds a section on --variables / --variables-file / --strict-variables for lambda render, including the 256 KiB Step Functions execution-input cap and a pointer to the upcoming templates-on-lambda guide (PR 9.5). Phase 9 PR 9.3 of the distributed rendering plan.
New subcommand for automated template-rendering pipelines. Given a
project dir + a JSONL batch file, fans out N personalised renders by
calling renderToLambda once per batch row with per-entry variables and
outputKey:
hyperframes lambda render-batch ./my-template \
--batch ./users.jsonl \
--width 1920 --height 1080 \
--max-concurrent 10
JSONL format (one JSON object per line):
{"outputKey": "renders/alice.mp4", "variables": {"name": "Alice"}}
{"outputKey": "renders/bob.mp4", "variables": {"name": "Bob"}}
The verb deploys the site once and reuses it across renders (--site-id
skips the deploy when the project was pre-uploaded). Concurrent Step
Functions starts are capped at --max-concurrent (default 50) via a
semaphore so a 10 000-entry batch doesn't try to spawn 10 000
executions simultaneously and trip the AWS account's concurrent-
execution quota.
Per-entry results land in a manifest (one row per input line) with
executionArn + status. --json emits the manifest as machine-readable
JSON. --dry-run prints the manifest with status: "would-invoke" for
every entry without calling AWS, so callers can lint their batch file
before paying for N executions.
Variables in each batch entry pre-validate against the composition's
data-composition-variables declaration (mirroring the local
hyperframes render UX). --strict-variables aborts the run on the first
failing entry before any AWS call. The reportVariableIssues helper from
PR 9.3 is reused so the warning format matches the single-render path
exactly.
Distinction from --max-parallel-chunks: --max-concurrent caps
ORCHESTRATOR-side fan-out (how many StartExecution calls run at once);
--max-parallel-chunks caps chunks PER render. AWS account-level Lambda
concurrent-execution limits live one level up and render-batch can't
enforce those; pick --max-concurrent based on your account quota +
the reserved concurrency you provisioned via lambda deploy.
Tests cover the concurrency-cap semaphore (preserve-order,
peak-in-flight, empty-input, limit > inputs.length, propagate
rejection) and the JSONL parser (blank-line handling, malformed JSON,
missing outputKey, non-object variables).
Phase 9 PR 9.4 of the distributed rendering plan.
a4b1d1a to
39891fe
Compare
The base branch was changed.
3d056a5 to
3b3a955
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-approved — same diff, rebased onto main.
vanceingalls
left a comment
There was a problem hiding this comment.
Re-stamp at the new HEAD after rebase onto main. Same content as my prior approve (4323426147); three of five prior importants addressed (collect-all-issues in strict mode, Promise.all JSDoc corrected, null-as-never cleaned up). Three production-readiness items still open as documented follow-ups (SIGINT handling, progress signal, upfront cost gate). Base now main.
Re-stamp by Vai (post-rebase)

What
Phase 9 PR 9.4 of the distributed rendering plan —
hyperframes lambda render-batch <projectDir>, the headline ergonomic for automated template-rendering pipelines on Lambda.Given a project + a JSONL batch file, fans out N personalised renders by calling
renderToLambdaonce per row with per-entryvariablesandoutputKey. Deploys the site once and reuses it across renders.Why
PRs 9.1–9.3 made variable injection work end-to-end at the SDK and single-render CLI surface. Users authoring template pipelines (personalised intros, weekly digests, marketing variants) want one CLI call that takes a manifest of (recipient, variables) pairs and dispatches all the renders. Without this verb, callers fall back to
for u in ...; do hyperframes lambda render ... done, which re-tars the project every iteration and can't cap concurrent executions.How
packages/cli/src/commands/lambda/render-batch.tsadds:deploySiteruns once at the top, then every entry reuses the resultingsiteHandle(or skips the deploy when--site-idis set).--dry-runskips the deploy too and synthesises a placeholder handle.runWithConcurrencyLimit— index-cursor + N concurrent producers picking the next entry. Defaults to 50 (configurable via--max-concurrent). Preserves input order in the output manifest.data-composition-variablesdeclaration, reusingreportVariableIssuesfrom PR 9.3 so the warning format matches the single-render path.--strict-variablesaborts before any AWS call.{ inputLine, outputKey, executionArn, status, error? }. Status isstarted,would-invoke(--dry-run), orfailed-to-start.--jsonfor the machine-readable form.--max-concurrentis orchestrator-side only — it caps how manyStartExecutioncalls run at once, not how many Lambda invocations the account can handle. AWS account-level Lambda concurrency limits live one level up andrender-batchcannot enforce them; the guide (PR 9.5) explains the relationship tolambda deploy --concurrency.Test plan
runWithConcurrencyLimit(order preservation, peak-in-flight cap,inputs.length < limit, empty input, rejection propagation),parseBatchFile(happy path, blank-line skipping, malformed JSON, missing outputKey, non-object variables).npx tsx packages/cli/src/cli.ts lambda render-batch ./proj --batch ./test.jsonl --width 320 --height 240 --dry-run --jsonproduces the expected manifest withstatus: "would-invoke"for every entry.bun run --cwd packages/cli test— all 346 CLI tests pass.bun run --cwd packages/cli typecheckclean.bunx oxlint+bunx oxfmt --checkclean.bunx fallow audit --base origin/main --fail-on-issues—✓ No issues in 21 changed files.Docs:
docs/packages/cli.mdxadds a section onlambda render-batchcovering the JSONL format,--max-concurrentsemantics,--dry-run, and--json.This is part of a stack of 5 (+1 optional) PRs (9.1–9.6); this is PR 9.4.
🤖 Generated with Claude Code