Skip to content

feat(cli): hyperframes lambda render-batch verb#967

Closed
jrusso1020 wants to merge 2 commits into
mainfrom
05-19-feat_cli_hyperframes_lambda_render-batch_verb
Closed

feat(cli): hyperframes lambda render-batch verb#967
jrusso1020 wants to merge 2 commits into
mainfrom
05-19-feat_cli_hyperframes_lambda_render-batch_verb

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 commented May 19, 2026

What

Phase 9 PR 9.4 of the distributed rendering plan — hyperframes lambda render-batch <projectDir>, the headline ergonomic for automated template-rendering pipelines on Lambda.

Given a project + a JSONL batch file, fans out N personalised renders by calling renderToLambda once per row with per-entry variables and outputKey. Deploys the site once and reuses it across renders.

Why

PRs 9.1–9.3 made variable injection work end-to-end at the SDK and single-render CLI surface. Users authoring template pipelines (personalised intros, weekly digests, marketing variants) want one CLI call that takes a manifest of (recipient, variables) pairs and dispatches all the renders. Without this verb, callers fall back to for u in ...; do hyperframes lambda render ... done, which re-tars the project every iteration and can't cap concurrent executions.

How

packages/cli/src/commands/lambda/render-batch.ts adds:

  • JSONL batch parser with friendly errors that name the offending line.
  • Single site deploy: deploySite runs once at the top, then every entry reuses the resulting siteHandle (or skips the deploy when --site-id is set). --dry-run skips the deploy too and synthesises a placeholder handle.
  • Concurrency cap via runWithConcurrencyLimit — index-cursor + N concurrent producers picking the next entry. Defaults to 50 (configurable via --max-concurrent). Preserves input order in the output manifest.
  • Per-entry pre-validation against the composition's data-composition-variables declaration, reusing reportVariableIssues from PR 9.3 so the warning format matches the single-render path. --strict-variables aborts before any AWS call.
  • Manifest output: one row per input line with { inputLine, outputKey, executionArn, status, error? }. Status is started, would-invoke (--dry-run), or failed-to-start. --json for the machine-readable form.

--max-concurrent is orchestrator-side only — it caps how many StartExecution calls run at once, not how many Lambda invocations the account can handle. AWS account-level Lambda concurrency limits live one level up and render-batch cannot enforce them; the guide (PR 9.5) explains the relationship to lambda deploy --concurrency.

Test plan

  • 11 unit tests: runWithConcurrencyLimit (order preservation, peak-in-flight cap, inputs.length < limit, empty input, rejection propagation), parseBatchFile (happy path, blank-line skipping, malformed JSON, missing outputKey, non-object variables).
  • Manual smoke: npx tsx packages/cli/src/cli.ts lambda render-batch ./proj --batch ./test.jsonl --width 320 --height 240 --dry-run --json produces the expected manifest with status: "would-invoke" for every entry.
  • bun run --cwd packages/cli test — all 346 CLI tests pass.
  • bun run --cwd packages/cli typecheck clean.
  • bunx oxlint + bunx oxfmt --check clean.
  • bunx fallow audit --base origin/main --fail-on-issues✓ No issues in 21 changed files.

Docs: docs/packages/cli.mdx adds a section on lambda render-batch covering the JSONL format, --max-concurrent semantics, --dry-run, and --json.

This is part of a stack of 5 (+1 optional) PRs (9.1–9.6); this is PR 9.4.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Collaborator Author

jrusso1020 commented May 19, 2026

Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-designed batch rendering command. The key architecture decisions are all correct: JSONL format (streamable, line-number errors), single deploySite reuse across all entries, concurrency cap via runWithConcurrencyLimit, per-entry error containment (failed entries captured in manifest, not aborting the batch), and --dry-run for pre-flight validation.

Pre-validation against composition schema before any AWS calls is a nice touch. Test coverage is solid — 11 tests covering concurrency limiter (order preservation, peak-in-flight, rejection) and batch file parsing.

One suggestion (non-blocking): runWithConcurrencyLimit docstring says "Promise.allSettled semantics" but the implementation uses Promise.all which rejects on first failure. The production caller wraps in try/catch so this is unreachable, but the documented contract should match — either update the docstring or switch to allSettled behavior.

CI green. Ship it.

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

render-batch CLI verb review

Clean delivery — the per-entry try/catch + index-cursor concurrency + dry-run scaffolding are the right primitives for this surface, and the JSONL parser exits with a line-numbered error rather than dumping a stack. Calling out specifics below.

Audited: packages/cli/src/commands/lambda/render-batch.ts (end-to-end), packages/cli/src/commands/lambda/render-batch.test.ts (end-to-end), packages/cli/src/commands/lambda.ts (the new switch case + flag wiring), docs/packages/cli.mdx (new section).
Trusting: the sibling render.ts integration points (reportVariableIssues, loadProjectVariableSchema, renderToLambda contract) — verified call shapes match.

Calibrated strengths

  • runWithConcurrencyLimit render-batch.ts:720-738 — index-cursor + Math.min(limit, inputs.length) workers is the right shape; preserves order, never overspawns, and the test on peak === 3 (render-batch.test.ts:208-221) pins the concurrency invariant honestly.
  • Per-entry try/catch in startEntry render-batch.ts:554-579 correctly isolates a single SFN StartExecution failure into the manifest with status: \"failed-to-start\" instead of aborting the batch — that's the load-bearing primitive for this verb's value prop.
  • Schema is loaded once and reused across entries render-batch.ts:488 — explicit avoidance of the 10k-row file-read + DOMParser hot loop. Comment cites the reason.
  • --dry-run synthesises a placeholder SiteHandle and never imports renderToLambda render-batch.ts:539 — keeps the lint-the-batch path AWS-free.
  • Documented JSONL example + --max-concurrent semantics in docs/packages/cli.mdx:976 correctly distinguish orchestrator-side caps from AWS account-level Lambda concurrency. That distinction is the failure mode users will hit first; spelling it out in docs is right.

Findings

important — no SIGINT / cancellation handling for in-flight executions

When a user runs a 10 000-entry batch and Ctrl+C's mid-fanout, the workers' await renderToLambda(...) calls already in flight will continue, and their started SFN executions keep running on AWS (chargeable). The manifest never gets written because Promise.all is interrupted, so the caller has no record of which executions were started — leaks both money and observability.

Minimum fix: register a SIGINT handler at the top of runRenderBatch that flips a cancelled flag the worker loop checks before pulling the next index, and emits the partial manifest on shutdown (process.on('SIGINT', ...) + write to stderr or a manifest path). Bonus: surface the in-flight executionArns so the user can aws stepfunctions stop-execution them.

This is the cancellation surface called out in the project plan — worth landing now rather than as a follow-up since the dispatch primitive is the whole point of the verb.

important — strict-variables aborts on the first failing entry instead of collecting all issues

render-batch.ts:489-498 loops over every entry calling reportVariableIssues(..., { strict }), and reportVariableIssues (in utils/variables.ts) calls process.exit(1) on the first entry with issues when strict: true. So a 1000-row batch with 100 broken entries forces the user to fix-one → re-run → fix-one → re-run, 100 cycles to clean the batch. The whole pitch of --strict-variables is "abort before any AWS call" — that doesn't preclude collecting all the diagnostics first.

Fix: in the batch path, accumulate issues across all entries first, then in strict mode print the full list and exit once. (Non-strict path already prints all warnings — only strict regresses.)

important — JSDoc on runWithConcurrencyLimit is wrong about semantics

render-batch.ts:715-719 says "Promise.allSettled semantics — a worker failure is captured in the returned BatchManifestEntry … this helper never rejects." The implementation uses Promise.all (line 729) and explicitly rejects on the first worker rejection — that's pinned by the test on lines 249-256. The "never rejects" wording is only true because the caller (startEntry) wraps renderToLambda in try/catch.

Two problems with shipping this doc:

  1. If anyone reuses runWithConcurrencyLimit later assuming allSettled semantics and skips the try/catch, the first rejection silently kills the in-flight workers (they're abandoned mid-await, the unhandled-promise-rejection from later cursor iterations can crash the process).
  2. The behavior contract is fuzzy: "never rejects" is a property of the (helper, caller) pair, not the helper alone.

Fix: rewrite the JSDoc to say "This helper uses Promise.all and propagates the first worker rejection. Callers that need partial-failure isolation must catch inside worker (as startEntry does)."

important — no progress visibility for long-running batches

A 10 000-entry batch with --max-concurrent 50 is silent for minutes between the "Site uploaded once" line and the final manifest print. The PR description acknowledges 1000s of executions but ships no progress signal. For SREs running this in CI, the only signal that the process is alive is a ps check.

Fix: emit a one-line progress write to stderr every K completed entries (K = 25 or Math.max(1, Math.floor(inputs.length / 40))), gated off when --json is set. Doesn't pollute the stdout manifest, gives a heartbeat.

important — no upfront cost confirmation on the live path

--dry-run is great for linting, but the live path commits to N billable executions with zero friction — a typo in --batch ./users.jsonl pointing at a 50k-row file fires 50k StartExecution calls before the user can react. The PR description even calls this out ("warn before submission") but the implementation has no warning.

Fix: before the first renderToLambda call, print \"About to dispatch N renders (estimated cost: $X, --max-concurrent={M}). Press Ctrl+C within 3s to abort.\" then setTimeout(3000). Gate with --yes for CI consumers. Cost-per-render estimate is hand-wavy but order-of-magnitude is enough — "50000 renders × ~\$0.05 = ~\$2500" is the signal users want to see.

If a cost-estimator is out of scope for this PR, at least the count + --yes gate.

Nits

  • render-batch.ts:643 — the JSDoc says "Throws via errorBox + process.exit(1) on the first malformed line." errorBox writes to stderr but does NOT throw or exit (verified in ui/format.ts:22-27); only the subsequent process.exit(1) halts the process. The function doesn't "throw," it exits — minor wording nit.
  • render-batch.ts:539const { renderToLambda } = args.dryRun ? { renderToLambda: null as never } : await loadSDK(); is a clever TS-friendly hack but obscures intent. A plain if (!args.dryRun) { ({ renderToLambda } = await loadSDK()); } reads more directly and the as never lie disappears.
  • render-batch.ts:643 docstring claims "sized for the documented ~10k-entry cap" — no runtime enforcement of that cap exists. Either add a soft check (if (entries.length > 10000) console.warn(...)) or drop the claim from the doc. Surprises a user who hands the verb a 100k-row batch.
  • PR body says "11 unit tests" — count is correct (6 for runWithConcurrencyLimit, 5 for parseBatchFile). For the record, no integration test exercises startEntry with a mocked renderToLambda. Acceptable since the per-entry try/catch is straightforward, but the failure-isolation invariant the verb promises has no test pin.

CI

Required checks green at HEAD e2a2722f: regression, preview-regression, Preview parity, player-perf, preview-regression (across both workflows). Optional/cancelled checks from a superseded run noted but not gating. Graphite mergeability check still IN_PROGRESS.


Verdict: COMMENT
Reasoning: Design is sound and the failure-isolation primitive is correctly built, but ship-blockers for the headline use case (10k-row batches) are missing: no cancellation, no progress, no cost gate, strict-mode UX regression vs single-render. Calling COMMENT rather than REQUEST CHANGES because none of these are correctness bugs — they're production-readiness gaps the author can address inline or in 9.5/9.6 follow-ups. Author's call which.

— Vai

@jrusso1020 jrusso1020 force-pushed the 05-19-feat_cli_hyperframes_lambda_render_variables_flags branch from 0b1732d to a4b1d1a Compare May 19, 2026 21:13
@jrusso1020 jrusso1020 force-pushed the 05-19-feat_cli_hyperframes_lambda_render-batch_verb branch from e2a2722 to 3d056a5 Compare May 19, 2026 21:13
@jrusso1020
Copy link
Copy Markdown
Collaborator Author

Addressed Vai's feedback in the latest force-push:

  • important — --strict-variables collect-all-issues — batch path now accumulates every failing entry's warnings, prints the full list, then exits once with a single Aborting batch due to variable issues... errorBox. A 100-row batch with 50 broken entries surfaces all 50 in one pass; the pre-fix behaviour would have required 50 fix-one → re-run cycles.
  • important — runWithConcurrencyLimit JSDoc rewritten: now correctly says Uses Promise.all and propagates the first worker rejection. Calls out that partial-failure isolation is the caller's responsibility (in this file, startEntry's try/catch).
  • nit — null as never ternary replaced with const renderToLambda = args.dryRun ? undefined : (await loadSDK()).renderToLambda; + an explicit null guard inside startEntry. Reads more directly; the type narrowing is explicit.
  • nit — "sized for the documented ~10k-entry cap" claim dropped — no runtime enforcement existed, so the claim was aspirational. The JSDoc now says "reads the whole file into memory — fine for typical batch sizes, an in-memory bound the caller can size around".
  • nit — "throws via errorBox" wording — kept the intent but the helper's behaviour is exit, not throw; reworded.

Skipping for follow-up (none are correctness bugs — production-readiness gaps that merit separate PRs):

  • SIGINT/cancellation — substantial work; needs a partial-manifest-on-shutdown writeout + ARN surfacing.
  • Progress visibility for long batches — heartbeat to stderr.
  • Upfront cost confirmation gate — needs a cost estimator + --yes flag.
  • Integration test for startEntry — the per-entry try/catch is straightforward but agreed it has no test pin.

miguel-heygen
miguel-heygen previously approved these changes May 19, 2026
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after force-push. All 3 addressed items verified:

  1. Collect-all-issues in strict mode — Fixed. reportVariableIssues is called with strict: false per entry (never exits early), loop accumulates hadStrictIssue flag, single exit after all entries processed. A 1000-row batch with 100 broken entries now shows all 100 problems in one run.

  2. Promise.all JSDoc corrected — Fixed. Now accurately documents Promise.all with first-rejection propagation and explicitly notes caller responsibility for partial-failure isolation. The false "Promise.allSettled / never rejects" claim is gone.

  3. null as never cleaned up — Fixed. Replaced with undefined + runtime narrowing guard. No type-system lies.

The 3 deferred items (SIGINT handling, progress signal, upfront cost gate) are acknowledged as follow-up work. CI green. Ship it.

vanceingalls
vanceingalls previously approved these changes May 19, 2026
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review post-force-push. Per James's per-PR response, three of the five prior production-readiness items addressed:

  • --strict-variables now collects ALL issues + single exit (covers the abort-on-first-failure feedback-loop concern).
  • JSDoc on runWithConcurrencyLimit corrected to reflect Promise.all reject-on-first semantics.
  • null as never hack cleaned up.

Two still open as documented follow-ups:

  • No SIGINT handling for in-flight SFN executions on Ctrl+C (real chargeable leak; worth a follow-up issue).
  • No progress signal on long batches (operators silent until completion).
  • No upfront cost gate on the live path (typo-in---batch footgun).

These are real production-readiness gaps for the 10k headline use case but acceptable as fast-followers if tracked.

Verdict: APPROVE — flipping COMMENT → APPROVE on the strength of #2/#3/#JSDoc fixes; the remaining three are operational, not correctness, and worth ticketing.

Re-review by Vai

… --strict-variables

Mirror the local hyperframes render variables UX on the Lambda CLI:

- --variables '<json>'       inline JSON object of variable values
- --variables-file <path>    path to a JSON file with variable values
- --strict-variables          fail on type/declared-mismatch (warn by default)

Resolution + validation logic is hoisted to packages/cli/src/utils/variables.ts
so both surfaces share one parser. The new reportVariableIssues helper formats
the warning block + handles --strict-variables exit, deduping the per-CLI
issue-handling block.

Variables flow into SerializableDistributedRenderConfig.variables and reach
every chunk worker via the path PR 9.1 + 9.2 wired up (plan() →
meta/encoder.json → renderChunk() → window.__hfVariables). Pre-validation
against the composition's data-composition-variables declaration runs only
when the project's index.html is on disk — --site-id pointing at a
pre-uploaded site that was packaged elsewhere skips the check, matching how
the local CLI treats unreadable index files.

The render.ts re-exports of parseVariablesArg / resolveVariablesArg /
validateVariablesAgainstProject are dropped; the matching tests move to
packages/cli/src/utils/variables.test.ts where the implementations now live.

Docs: docs/packages/cli.mdx adds a section on --variables / --variables-file /
--strict-variables for lambda render, including the 256 KiB Step Functions
execution-input cap and a pointer to the upcoming templates-on-lambda guide
(PR 9.5).

Phase 9 PR 9.3 of the distributed rendering plan.
New subcommand for automated template-rendering pipelines. Given a
project dir + a JSONL batch file, fans out N personalised renders by
calling renderToLambda once per batch row with per-entry variables and
outputKey:

  hyperframes lambda render-batch ./my-template \
    --batch ./users.jsonl \
    --width 1920 --height 1080 \
    --max-concurrent 10

JSONL format (one JSON object per line):

  {"outputKey": "renders/alice.mp4", "variables": {"name": "Alice"}}
  {"outputKey": "renders/bob.mp4",   "variables": {"name": "Bob"}}

The verb deploys the site once and reuses it across renders (--site-id
skips the deploy when the project was pre-uploaded). Concurrent Step
Functions starts are capped at --max-concurrent (default 50) via a
semaphore so a 10 000-entry batch doesn't try to spawn 10 000
executions simultaneously and trip the AWS account's concurrent-
execution quota.

Per-entry results land in a manifest (one row per input line) with
executionArn + status. --json emits the manifest as machine-readable
JSON. --dry-run prints the manifest with status: "would-invoke" for
every entry without calling AWS, so callers can lint their batch file
before paying for N executions.

Variables in each batch entry pre-validate against the composition's
data-composition-variables declaration (mirroring the local
hyperframes render UX). --strict-variables aborts the run on the first
failing entry before any AWS call. The reportVariableIssues helper from
PR 9.3 is reused so the warning format matches the single-render path
exactly.

Distinction from --max-parallel-chunks: --max-concurrent caps
ORCHESTRATOR-side fan-out (how many StartExecution calls run at once);
--max-parallel-chunks caps chunks PER render. AWS account-level Lambda
concurrent-execution limits live one level up and render-batch can't
enforce those; pick --max-concurrent based on your account quota +
the reserved concurrency you provisioned via lambda deploy.

Tests cover the concurrency-cap semaphore (preserve-order,
peak-in-flight, empty-input, limit > inputs.length, propagate
rejection) and the JSONL parser (blank-line handling, malformed JSON,
missing outputKey, non-object variables).

Phase 9 PR 9.4 of the distributed rendering plan.
@jrusso1020 jrusso1020 force-pushed the 05-19-feat_cli_hyperframes_lambda_render_variables_flags branch from a4b1d1a to 39891fe Compare May 19, 2026 22:50
@jrusso1020 jrusso1020 changed the base branch from 05-19-feat_cli_hyperframes_lambda_render_variables_flags to main May 19, 2026 22:51
@jrusso1020 jrusso1020 dismissed stale reviews from vanceingalls and miguel-heygen May 19, 2026 22:51

The base branch was changed.

@jrusso1020 jrusso1020 force-pushed the 05-19-feat_cli_hyperframes_lambda_render-batch_verb branch from 3d056a5 to 3b3a955 Compare May 19, 2026 22:51
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approved — same diff, rebased onto main.

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-stamp at the new HEAD after rebase onto main. Same content as my prior approve (4323426147); three of five prior importants addressed (collect-all-issues in strict mode, Promise.all JSDoc corrected, null-as-never cleaned up). Three production-readiness items still open as documented follow-ups (SIGINT handling, progress signal, upfront cost gate). Base now main.

Re-stamp by Vai (post-rebase)

@jrusso1020
Copy link
Copy Markdown
Collaborator Author

Superseded by #968 (rebase-merged at f0a2740f). Same commit content is now on main as part of the #968 rebase-merge.

@jrusso1020 jrusso1020 closed this May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants