feat(lambda): add TypeScript SDK and CDK construct#909
Conversation
Adds the client-side surface on top of the Phase 6a Lambda handler so
adopters can drive a deployed stack from Node without writing AWS-SDK
boilerplate:
- renderToLambda(opts) starts a Step Functions execution and returns a
handle. Does NOT poll.
- getRenderProgress({ executionArn }) returns a snapshot of progress,
frames rendered, cost (Lambda GB-seconds + SFN transitions), errors,
and the final output object once Assemble completes.
- deploySite({ projectDir, bucketName }) content-addresses the project
tree, tar.gzs it, and uploads to S3 with a HeadObject short-circuit so
re-renders of the same tree skip the tar+PUT.
- validateDistributedRenderConfig throws a typed InvalidConfigError
before StartExecution, so shape errors surface synchronously.
- computeRenderCost is exposed for callers who want to format cost out
of band.
Also ships HyperframesRenderStack, an aws-cdk-lib L2 construct that
emits the same topology as examples/aws-lambda/template.yaml. Lives on
the ./cdk subpath export so SDK-only consumers don't pull aws-cdk-lib
into their runtime graph (declared as an optional peer dependency).
Tests: 24 new unit tests across the SDK plus 9 CDK synth / contract /
snapshot tests. All 83 tests in packages/aws-lambda/src pass.
Pulls shared logic out so the SDK doesn't re-invent things the handler and the producer already have: - `formatExtension` extracted to packages/aws-lambda/src/formatExtension.ts. handler.ts and renderToLambda.ts both used identical 12-line copies of this switch. - `PLAN_PROJECT_DIR_SKIP_SEGMENTS` is now exported from @hyperframes/producer/distributed. deploySite consumes it instead of its own duplicate SKIP_TOP_LEVEL set; the two lists were trivially identical and would have drifted silently. - `FakeS3` + `drainBody` factored out of the two SDK test files into src/sdk/__fixtures__/fakeS3.ts. Drops ~110 lines of test-file duplication and gives future SDK tests a one-line FakeS3 import. - S3 URI building in deploySite and renderToLambda routes through the existing `formatS3Uri` helper instead of inline `s3://...` concatenation; matches the convention already in handler.ts. Net -133 lines across the touched files. All 83 aws-lambda tests still pass; all 60 producer distributed tests still pass.
bba543d to
2dfaec7
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Review: feat(lambda): add TypeScript SDK and CDK construct
CI is red on one test; the rest of the build (lint, typecheck, regressions, all other test packages) passes.
Failing test — root cause
HyperframesRenderStack — snapshot > emits the expected set of AWS resource types in the expected counts timed out after 5000ms.
The other two snapshot tests pass fine (55ms, 35ms). This one calls template.toJSON() and iterates Resources — which in a CDK template can be a very large object the first time it is accessed because CDK lazy-evaluates token resolution during toJSON(). The CDK synthesis itself completed (the contract tests which run the same synth() fixture all pass), so this is a performance issue with the test's assertion loop, not a correctness bug.
The fix is straightforward: add { timeout: 15_000 } (or higher) to this test case. The same synth() is already used by the two sibling tests in the file, so a file-level beforeAll that memoizes the template would also eliminate the repeated synthesis and make the suite faster overall.
This is a real issue that needs fixing before merge — CI must be green.
Architecture
Well-structured three-layer split: handler, SDK, CDK. Clean separation of concerns:
src/cdk/only needsaws-cdk-lib— correctly isolated behind a subpath export so SDK-only consumers don't pull CDK into their bundle.src/sdk/wraps@aws-sdk/client-s3+@aws-sdk/client-sfnwith thin, well-typed wrappers that have injection seams for tests.src/handler.tsis unchanged except for pulling the sharedformatExtensionout. Good factoring.formatExtensionextracted from the handler into its own module and reused by the SDK — exactly right.
AWS integration
IAM: The construct uses bucket.grantReadWrite(renderFunction) (scoped to the specific bucket, not *) and renderFunction.grantInvoke(stateMachine). CDK's default new lambda.Function execution role includes the equivalent of AWSLambdaBasicExecutionRole for CloudWatch Logs. The comment in the code ("NOT CloudWatchLogsFullAccess") is correct. This is well-scoped.
One observation: stateMachine.grantInvoke(...) is not granted to any external IAM principal. That's fine for the construct itself, but the README example doesn't mention this — adopters wiring a Lambda or API Gateway trigger will need to add that grant themselves. Worth a note in the README or a readonly stateMachineArn convenience accessor on the construct, but not blocking.
S3 operations: HeadObject + PutObject in deploySite. Error handling covers 404, NotFound, NoSuchKey, and propagates everything else — that's correct S3 SDK behavior since the v3 SDK throws typed errors with $metadata.
Step Functions: StartExecution uses JSON.stringify(input) for the input field — correct (SFN requires a string). GetExecutionHistory paginates up to 50 pages of 1000 events each — the 50-page cap is a reasonable guard against infinite loops on runaway executions.
State machine definition: The AssertChunkCount choice uses numberGreaterThan("$.Plan.ChunkCount", 0) — correct. The maxConcurrencyPath: "$.Plan.ChunkCount" will match actual chunk count, not maxParallelChunks from the config. That's a potential overage — if the user sets maxParallelChunks: 8 and the plan produces 240 chunks, all 240 fire in parallel. The SDK validates maxParallelChunks but the CDK construct doesn't thread it through to maxConcurrencyPath. The SAM template may handle this differently — worth a follow-up.
Error handling
deploySite: catches 404 correctly, propagates auth/network errors.renderToLambda: validates config before any AWS call (good), throws ifexecutionArnis absent in the response.getRenderProgress: handlesFAILED,TIMED_OUT,ABORTED,PENDING_REDRIVEstatuses. CoversExecutionFailed,ExecutionAborted,ExecutionTimedOutevents in the history scan.- CDK construct: no try/catch (not needed — CDK construct failures throw at synth time, not runtime).
One gap: getRenderProgress has a 50-page loop guard but no timeout or circuit-breaker for the SFN calls themselves. For very long-running renders this could stall a caller process. Not a blocker for the SDK surface, just worth noting.
Types
Types are clean — no any in the SDK files. The few casts are all in test helpers or fixture code (as unknown as SFNClient, as unknown as S3Client) which is the correct pattern for mock injection. FakeS3.send accepts unknown and narrows via constructor name — a bit fragile for a production mock, but fine for test fixtures.
DistributedFormat in formatExtension.ts is defined locally and may diverge from the type in events.ts. Worth confirming they stay in sync (or re-exporting from one canonical location).
HyperframesRenderStackProps.lambdaMemoryMb is typed as a union of allowed MB values — good, prevents invalid steps.
CDK construct
Well-parameterized with sensible defaults (10240 MB, 900s timeout, x86_64, sparticuz, RETAIN on delete). The construct exposes .bucket, .renderFunction, and .stateMachine for adopter wiring.
The defaultHandlerZipPath() resolves relative to import.meta.url. This will work in the source tree and in a published node_modules install if the package is published with the ZIP included. The package.json doesn't have a files field limiting what gets published — confirm the ZIP ends up in the published package or that handlerZipPath is always required in practice.
RemovalPolicy.DESTROY on the log group while defaulting to RETAIN on the S3 bucket — the asymmetry is intentional (logs are cheaper to lose) but worth a comment for the next maintainer.
Security
No hardcoded credentials, ARNs, or account IDs. Placeholder ARNs in the README (arn:aws:states:us-east-1:123:...) are clearly illustrative.
The content-addressed siteId (SHA-256 truncated to 16 hex chars) has a ~2^64 collision space — acceptable for this use case.
hashProjectDir reads files synchronously (readFileSync) which is fine for project sizes, but the comment notes this. The sort order is deterministic (lexicographic by entry.name). Looks correct.
Summary
Blocking: The snapshot test timeout needs a fix — bump the test timeout or memoize synth() across the three tests in that file.
Non-blocking observations:
maxConcurrencyPathin the Map state uses$.Plan.ChunkCountinstead of$.Config.MaxParallelChunks— may silently ignore the user's parallelism cap.- The
filesfield inpackage.jsonshould be audited to confirmdist/handler.zipgets published. DistributedFormatdefined in two places — confirm they stay in sync.- README example should note that callers need to grant their trigger principal
states:StartExecutionon the state machine.
The SDK and CDK code itself is solid — well-tested, clean types, proper injection seams, sensible defaults.
vanceingalls
left a comment
There was a problem hiding this comment.
Summary
Solid Phase 6b foundation. SDK is well-typed, validated up-front, and tested at a level above the rest of the package. CDK construct mirrors SAM resource shape and IAM scope discipline. Decent docstrings throughout.
One real CI blocker (the test timeout on main's Test job is this PR, not flake), plus a few correctness + parity items worth fixing before merge.
Calibrated strengths
validateConfig.tsis genuinely well-structured: typedInvalidConfigErrorwith a.fieldpointer, even-dim enforcement onwidth/height(yuv420p constraint), and thecrf/bitratemutual-exclusion check (packages/aws-lambda/src/sdk/validateConfig.ts:74-82) all surface bad calls at compile-equivalent speed.- IAM discipline holds the SAM template's bar:
bucket.grantReadWrite(renderFunction)is ARN-scoped (HyperframesRenderStack.ts:861), noAction: "*", noCloudWatchLogsFullAccess. The comment at :857-860 even calls out why — matches the SAM template's same callout. - Snapshot test design (
HyperframesRenderStack.snapshot.test.ts:580-600) deliberately avoidstoMatchSnapshotfor the right reason — CFN token suffixes would create false diffs. Per-type resource counts + frozen state names + frozen non-retryable error set is the right level of granularity. formatExtensionextraction into a shared module (src/formatExtension.ts) keeps SDK and handler in lockstep on the mp4 vs mov vs png-sequence key extension. Cross-consumer extension drift was a real foot-gun risk.
Findings
Blocker — CI Test job is red on this PR (not a flake)
Logs (runs/25976205112/job/76356669390):
(fail) HyperframesRenderStack — snapshot > emits the expected set of AWS resource types in the expected counts [5443.01ms]
^ this test timed out after 5000ms.
This is the first test in HyperframesRenderStack.snapshot.test.ts. Subsequent tests in the same file pass in 35-55ms; contract-test first synth runs in 70ms. The 5.4s wall time = aws-cdk-lib first-time module load + first new App() + first Template.fromStack synth. The 5s bun:test default timeout barely loses.
This will flake intermittently on every PR that touches the CDK file until fixed. Two clean options:
it("...", { timeout: 30_000 }, async () => { ... })on the first synth test (ordescribe.timeout(30_000)).- Hoist one
synth()call to abeforeAlland reuse theTemplateacross all three tests (each only inspects the synthed template, none re-synths).
I'd take the beforeAll approach — it cuts the CDK suite wall time by ~5s every run and makes the dependency-load cost visible exactly once. Same change in HyperframesRenderStack.contract.test.ts:79-81 is a separate small win.
Blocker — stateTransitions = events.length overcounts SFN cost
getRenderProgress.ts:2262:
stateTransitions: events.length,SFN Standard Workflows bill per state transition, not per history event. A single Lambda task produces (at minimum) TaskStateEntered, LambdaFunctionScheduled, LambdaFunctionStarted, LambdaFunctionSucceeded, TaskStateExited — five history events for one billable transition. With 100 chunks the cost report will show ~5× the actual SFN charge.
This understates the AWS bill misleading direction: it makes accruedSoFarUsd and displayCost look ~3-5× scarier than reality, which will drive adopters to under-fan-out renders. Filter to state-entry events (TaskStateEntered | MapStateEntered | PassStateEntered | ChoiceStateEntered) or count StateEntered family events only. The infra to do this is already in summarizeHistory — currentLambdaState is set on exactly those event types.
Important — assembleComplete parsed too narrowly
getRenderProgress.ts (around line 2210):
if (typeof obj.FramesEncoded === "number") {
if (obj.Action === "renderChunk") { ... }
if (obj.Action === "assemble") { assembleComplete = true; ... }
}assembleComplete is only set when both FramesEncoded is a number and Action === "assemble". If Assemble succeeds but the result payload happens to omit FramesEncoded (e.g. a future handler refactor), overallProgress will stay at 0.9 even though the render is done. The status === "SUCCEEDED" check in computeOverallProgress covers the final state, but the transitional moment between "Assemble lambda returned" and "SFN marks SUCCEEDED" will misreport.
Two simple fixes (either works): (a) gate assembleComplete on obj.Action === "assemble" only, set outputFile when present; (b) detect Assemble by the enclosing state name (currentLambdaState === "Assemble"), which avoids the payload-shape dependency entirely. (b) is more robust to future result-type churn.
Important — CDK ↔ SAM parity gap on state-machine IAM
The SAM template explicitly grants the state machine role:
WriteCloudwatchLogs(9logs:*actions on*, required for SFN log delivery)XRayTracing(xray:PutTraceSegments,xray:PutTelemetryRecords— required forTracing.Enabled)
The CDK construct relies on aws-cdk-lib's new sfn.StateMachine with tracingEnabled: true + logs: { destination } to auto-grant these. As of aws-cdk-lib@2.130+ CDK does auto-grant both, but the contract test never asserts this — meaning a future minor CDK version that changes the auto-grant behaviour would silently produce a state machine that can't write logs or trace.
Add one contract assertion: t.hasResourceProperties("AWS::IAM::Policy", { ... Action: contains "xray:PutTraceSegments", Resource: "*" ... }) for the state-machine role's policy. Cheap, catches the SAM-parity regression that's the whole point of the snapshot test.
Separately, the EXPECTED_RESOURCE_COUNTS map declares "AWS::IAM::Policy": 2 but the SAM template has 3 inline policies on the state-machine role (InvokeRenderFunction, WriteCloudwatchLogs, XRayTracing). CDK collapses adjacent inline policies into fewer managed policy resources, but if the count drifts up because CDK's auto-grants emit a third policy on a minor bump, the snapshot test fails with a useful message. So the 2-vs-3 isn't itself a bug — it just means the snapshot test does not assert SAM-template parity, only "what CDK emits today". Worth a comment so a future reader doesn't think parity is verified.
Important — README example has a confusing bucketName derivation
packages/aws-lambda/README.md:319-321:
const handle = await renderToLambda({
siteHandle: site,
bucketName: site.projectS3Uri.split("/")[2]!,
...
});The example reaches back into the SiteHandle.projectS3Uri string to extract the bucket name. That's exactly the boilerplate the SDK is supposed to remove. Two paths forward (either is fine):
- Carry
bucketNameonSiteHandle(it's already known atdeploySitecall time —opts.bucketNameis right there). - Make
bucketNameoptional onRenderToLambdaOptionswhensiteHandleis supplied, derive it insiderenderToLambdafrom the URI parse.
Right now the README is teaching an anti-pattern.
Important — SAM template uses RetentionInDays: 30 for the SFN log group; CDK uses RetentionDays.ONE_MONTH
These match (ONE_MONTH = 30), so the runtime behaviour is identical. But the SAM template has RetentionInDays: 30 implicit by enum value and the CDK construct uses the named enum — fine, just worth a brief comment near HyperframesRenderStack.ts:864 noting the SAM-template parity reasoning. The state-machine log group is the only resource on RemovalPolicy.DESTROY (the bucket is RETAIN) — a comment on why would prevent a future contributor from "fixing" the inconsistency.
Nit — removalPolicy mismatch potential
The SAM template has explicit DeletionPolicy: Retain + UpdateReplacePolicy: Retain on the bucket. The CDK construct uses props.bucketRemovalPolicy ?? RemovalPolicy.RETAIN. CDK's RemovalPolicy.RETAIN sets both DeletionPolicy and UpdateReplacePolicy to Retain, so this matches — fine.
Nit — validateConfig.ts allows force-sdr but the producer's distributed plan may not
validateConfig.ts:84:
const ALLOWED_HDR_MODES = ["auto", "force-sdr"] as const;Verify against @hyperframes/producer/distributed's actual accepted set. If the producer's plan stage rejects something the SDK accepts, you'll get the exact "ExecutionFailed five minutes in" pain the validator is designed to prevent.
Nit — loadFullHistory caps at 50 pages × 1000 events = 50K events
getRenderProgress.ts:2141:
for (let page = 0; page < 50; page++) { ... }Silent truncation. 50K SFN history events is roughly a 10K-chunk render — well past anything that would fit in a single execution today, but a hard cap with no log line or thrown error is the kind of bug that bites three years later. Either drop the cap (Standard Workflows have a hard 25K event limit anyway) or throw new Error("history exceeded 50K events; SFN execution would have failed first").
Nit — inferBilledMs uses wall-clock DurationMs, not actual billed duration
getRenderProgress.ts:2289:
function inferBilledMs(payload: unknown): number {
...
if (typeof obj.DurationMs === "number") return obj.DurationMs;The handler measures wall-clock and reports it as DurationMs. Lambda's actual BilledDuration is rounded up to the nearest 1ms but excludes cold-start init time, so wall-clock is a slight overestimate of bill — opposite direction from the SFN overcount above. The module doc (costAccounting.ts:7) calls this out, so it's documented; the displayCost formatting is already loud about being "best-effort." Acceptable as-is, but a one-line code comment on inferBilledMs pointing at the SFN-overcount issue (after that's fixed) would close the loop.
Verdict
Verdict: REQUEST CHANGES
Reasoning: CI is red on a real timing issue this PR introduced, and stateTransitions = events.length will visibly overstate AWS cost to the adopters this surface is meant to serve. Both are 1-2 line fixes. Everything else above is important-or-nit polish on what's otherwise a well-built foundation.
— Vai
Review by Vai
The bun:test default 5s timeout tripped the first CDK snapshot test in CI when the cold-start `Template.fromStack(stack)` synth took ~5-8s on the slowest GitHub Actions runner. Locally on a warm shell the synth measures <1s, so the failure didn't reproduce until PR #909 hit CI. Two changes: - Both CDK test files cache one synth in `beforeAll(..., 30000)` and reuse the result across every test that uses the default props. Each individual test now runs in microseconds (pure assertions against the already-synthed template), so the 5s timeout no longer applies on the hot path. - The two contract tests that exercise non-default props (reservedConcurrency, projectName) still synth fresh per-test; they get a per-test `it(..., 30000)` timeout. No behavior changes.
vanceingalls
left a comment
There was a problem hiding this comment.
Verdict reconciliation: The earlier
--commentfrom this account (id4304554278) was a fallback after the harness blocked--request-changespre-call. Reposting as--request-changesfrom foreground to set the correct review state.
Two blockers + three importants. Architecture and IAM scope hold the bar; the issues below are correctness-grade, not stylistic.
Blockers
Blocker 1 — CI snapshot test times out. HyperframesRenderStack.snapshot.test.ts first test (emits the expected set of AWS resource types…) times out at 5443ms vs the 5000ms bun:test default. First-synth pays the full aws-cdk-lib module-load cost; subsequent synths in the same file complete in 35-55ms. Fix: it(name, { timeout: 30_000 }, …) on the first test, OR hoist app.synth() to beforeAll so every test reuses the assembly. Will flake until done.
Blocker 2 — stateTransitions: events.length over-reports SFN cost. getRenderProgress.ts:2262 counts every history event as a billable transition. Step Functions only charges per state entry (each Task produces ~5-7 history events: ScheduleEvent, StartedEvent, SucceededEvent, etc.). Result: cost projections will over-report by 3-5×. Filter to historyEvents.filter(e => e.type?.endsWith("StateEntered")) (or equivalent) before counting.
Important
assembleCompletediscriminant is fragile. Gated on bothevent.previousEventDetails.parameters.Action === "assemble"ANDtypeof FramesEncoded === "number". The Action coupling makes the check brittle if a future caller renames or relocates that field. Prefer the enclosing state name (StateExitedEventDetails.name === "Assemble").- CDK↔SAM parity is asserted by aws-cdk-lib auto-grants, not by test. The CDK construct relies on
bucket.grantReadWrite(fn)and CDK's automatic X-Ray / log-delivery permissions. No assertion in the snapshot test verifies these match the SAM template's resource set. Add one explicittoMatchObjectassertion. - README example does
bucketName: site.projectS3Uri.split("/")[2]!. String-parsing the URI is exactly the kind of boilerplate the SDK should remove for users. CarrybucketNameonSiteHandleso the example becomesbucketName: site.bucketName.
Strengths
- SDK ergonomics:
InvalidConfigError.fieldpointer (specific blame), even-dimension enforcement, crf/bitrate mutex,runtimeCapenum. Abort signal is correctly not threaded (fire-and-forget invoke; nothing to cancel). - CDK IAM scope:
bucket.grantReadWrite(fn)is ARN-scoped. NoAction: "*". Trust policy correctly restricts tolambda.amazonaws.com+states.amazonaws.com. - Three-layer split (handler/SDK/CDK) is clean — each layer has a single responsibility.
Review by Vai
Three correctness + ergonomics fixes raised in Vai's review:
- getRenderProgress over-counted SFN transitions by 3-5×. Step
Functions Standard Workflows bill per state-entry, not per
history event. Each Task produces ~5-7 history events
(Scheduled / Started / Succeeded / TaskStateExited / …);
counting `events.length` reported the runaway. Switch to
counting `*StateEntered` events explicitly.
- assembleComplete + outputFile detection was coupled to the
Lambda payload's `Action` field. Move both signals onto the
enclosing state name (`StateExited.name === "Assemble"`), which
is the state-machine identity rather than the Lambda event
contract. framesRendered increment moves to the same boundary
(RenderChunk state).
- SiteHandle now carries `bucketName` directly so README + CLI
callers don't have to re-parse `projectS3Uri.split("/")[2]`.
Test updates: getRenderProgress tests wrap renderChunk/assemble
events in matching StateEntered + StateExited pairs so the new
state-name-driven dispatch is exercised end-to-end. SiteHandle
fixture in renderToLambda.test.ts gets the new bucketName field.
All 83 aws-lambda tests still pass.
miguel-heygen
left a comment
There was a problem hiding this comment.
Both blockers from the previous review are addressed:
1. CDK snapshot test timeout — Fixed. Both HyperframesRenderStack.contract.test.ts and HyperframesRenderStack.snapshot.test.ts now cache the CDK synth in beforeAll with a 30s timeout (line: beforeAll(() => { ... }, 30000)). Individual tests that synth non-default-props stacks also have per-test }, 30000) timeouts. The default 5s vs ~5-8s cold synth race is eliminated.
2. stateTransitions: events.length over-reporting SFN cost — Fixed. summarizeHistory() now counts only *StateEntered events (TaskStateEntered, MapStateEntered, PassStateEntered, ChoiceStateEntered, SucceedStateEntered, FailStateEntered, WaitStateEntered, ParallelStateEntered) via a switch statement, not all history events. This correctly models SFN Standard Workflow billing (per state entry) and eliminates the 3-5x over-report from intermediate events like LambdaFunctionScheduled/Started/Succeeded.
Regression shards still pending but unrelated to this PR's changes (CDK construct + SDK cost accounting).
vanceingalls
left a comment
There was a problem hiding this comment.
Re-review of 4f2c4b0c. Both blockers and 2/3 importants resolved cleanly; the remaining important is partial but not blocking.
Findings status
Blocker 1 — CDK snapshot test timeout: ADDRESSED. beforeAll(..., 30000) hoists synth in both HyperframesRenderStack.snapshot.test.ts and HyperframesRenderStack.contract.test.ts; default-args tests now run as µs assertions against SYNTHED / DEFAULT_TEMPLATE. The two non-default-prop contract tests (reservedConcurrency, projectName) keep a per-test 30000 timeout for their fresh synth. CI is green on this HEAD.
Blocker 2 — SFN cost over-reporting: ADDRESSED. getRenderProgress.ts:185-200 now counts only *StateEntered events toward stateTransitions, dropping the 3-5× over-count. The accompanying test wraps lambda events in matching stateEntered/stateExited pairs and the cost-projection test still passes (progress.costs.breakdown.lambdaUsd ≈ $0.001 for a 1-invocation execution).
Important — assembleComplete discriminant: ADDRESSED. Moved to stateExitedEventDetails?.name === "Assemble" (getRenderProgress.ts:230), decoupled from the inner Lambda payload's Action field. framesRendered is also state-gated via currentLambdaState === "RenderChunk" (line 222), which removes the implicit dependency on Action and adds a test (does not double-count Assemble's FramesEncoded) that exercises the boundary.
Important — SiteHandle.bucketName: ADDRESSED. SiteHandle now carries bucketName (deploySite.ts); README example simplified to bucketName: site.bucketName.
Important — CDK↔SAM parity test: PARTIAL. SFN TracingConfiguration: { Enabled: true } is asserted in HyperframesRenderStack.contract.test.ts:59, but Lambda X-Ray tracing + log-delivery IAM are still not asserted explicitly — they rely on aws-cdk-lib's auto-grants. Not blocking: the resource-count snapshot + state-machine topology assertion catch resource-shape regressions, and explicit X-Ray/log-delivery assertions are a reasonable follow-up rather than a merge gate.
Nits (non-blocking)
currentLambdaStatelifetime.currentLambdaState = ev.stateEnteredEventDetails?.name ?? currentLambdaState(line 199) overwrites on every*StateEntered. For this state machine's history shape (no nested state-entered between Task entered → Lambda lifecycle → Task exited) the carry-through is correct in practice. If future changes nest Choice/Pass states between Task entry and the Lambda lifecycle events, the gate could mis-attribute — worth a comment, not a fix.
Verdict
Verdict: APPROVE
Reasoning: Both blockers + the load-bearing importants are resolved with code that's evidence-backed by the updated tests; CI is green. The remaining partial is non-blocking and a reasonable follow-up.
Review by Vai (re-review)

What
Adds the client-side surface on top of the Phase 6a Lambda handler so adopters can drive a deployed HyperFrames stack from Node without writing AWS-SDK boilerplate, plus a
HyperframesRenderStackaws-cdk-lib L2 construct.New exports from
@hyperframes/aws-lambda:renderToLambda(opts) → RenderHandle— validates the config, optionally callsdeploySite, starts a Step Functions execution, returns the handle. Does NOT poll.getRenderProgress({ executionArn }) → RenderProgress— oneDescribeExecution+ oneGetExecutionHistory(paginated). ReportsoverallProgress,framesRendered/totalFrames,lambdasInvoked,costs.accruedSoFarUsd+displayCost,errors,fatalErrorEncountered, and the finaloutputFileonce Assemble completes.deploySite({ projectDir, bucketName }) → SiteHandle— content-addressed S3 upload atsites/<siteId>/project.tar.gzwithHeadObjectshort-circuit, so re-renders of an unchanged tree skip tar+PUT entirely.validateDistributedRenderConfig(config)— typedInvalidConfigErrorwith a.fieldpointer for every shape error (fps/format/codec/crf/bitrate/chunkSize/maxParallelChunks/runtimeCap/hdrMode/dimensions). Called insiderenderToLambdabefore any AWS call.computeRenderCost(lambdaInvocations, stateTransitions) → RenderCost— best-effort USD math using us-east-1 on-demand Lambda + SFN Standard rates.New
./cdksubpath export:HyperframesRenderStack— aws-cdk-lib L2 construct that emits the same topology asexamples/aws-lambda/template.yaml(Lambda + Step Functions standard workflow + S3 bucket with 7-day intermediates lifecycle + 3 CloudWatch alarms + scoped IAM).aws-cdk-lib+constructsare declared as optional peer dependencies so SDK-only consumers don't pull them into their runtime graph.Why
Phase 6a (PRs #878 / #882 / #903 / #907 / #908) validated the Lambda architecture on real AWS but the user-facing surface is still ~8 manual commands across
bun+sam+aws s3+aws stepfunctions. PR 6.4 is the foundational piece of the Phase 6b "≈3 commands end-to-end" target documented inDISTRIBUTED-RENDERING-PLAN.md§11 / Phase 6b. The CLI (PR 6.5) consumes this SDK; the docs (PR 6.7 / 6.8) describe it. Shipping the SDK first keeps the two surfaces in lockstep when the CLI lands.How
packages/aws-lambda/src/sdk/— SDK files (one .ts + one .test.ts per concept). All AWS calls are routed through asfn?: SFNClient/s3?: S3Clientinjection seam so the unit tests run with FakeSFN / FakeS3 stand-ins.packages/aws-lambda/src/cdk/— CDK construct + an./index.tsbarrel so the subpath export resolves cleanly.package.json—@aws-sdk/client-sfnadded as a runtime dep;aws-cdk-lib+constructsadded as optional peer + dev deps;./cdkadded toexports.Notable design calls:
renderToLambdadeliberately does NOT poll. Step Functions standard workflows can run for hours; blocking the caller's process is the wrong default../cdk(separate subpath) so SDK-only consumers don't pay the ~8 MBaws-cdk-libimport cost.DurationMsreturn field (carried in the SFN history's success payload), since SFN'sLambdaFunctionSucceededEventDetailsdoesn't exposeBilledDurationdirectly.breakdown.estimatedflags any invocation that fell back to a constant.toMatchSnapshotbecause CDK emits CFN tokens with random suffixes (asset hashes, log-group IDs) that would noise up snapshot diffs.Test plan
bun test packages/aws-lambda/src→ 83 pass, 0 failbun run --cwd packages/aws-lambda typecheckbunx oxlint packages/aws-lambda/srcbunx oxfmt --check packages/aws-lambda/src packages/aws-lambda/README.mdpackages/aws-lambda/README.mdsmoke.shstill drives the same SFN input shape today; PR 6.6 adds a--mode=lambda-localregression harness next)🤖 Generated with Claude Code