Skip to content

feat(lambda): add SAM template and sample events for AWS deployment#907

Merged
jrusso1020 merged 3 commits into
mainfrom
06-2-feat_lambda_add_sam_template
May 16, 2026
Merged

feat(lambda): add SAM template and sample events for AWS deployment#907
jrusso1020 merged 3 commits into
mainfrom
06-2-feat_lambda_add_sam_template

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

Summary

Recreated after #879 was auto-closed when its base branch (#878's branch) got deleted on merge. Same content, rebased onto main now that #878 is in.

  • examples/aws-lambda/template.yaml — Lambda function + Step Functions standard workflow (Plan → Map(N) RenderChunk → Assemble) + S3 bucket + IAM + CloudWatch alarms (Lambda Errors, SFN ExecutionsFailed, runaway invocations). sam validate --lint clean.
  • Sample event JSON for each Action.
  • Examples-directory README with deploy walkthrough.

Carries over all review feedback addressed in #879 (Vai + Magi approved):

  • Dead HandlerZipUri parameter dropped, mislabeled HandlerZipKey output dropped.
  • CloudWatchLogsFullAccess narrowed to S3CrudPolicy.
  • Assemble state typed non-retryables added.
  • State machine LoggingConfiguration wired.
  • MaxConcurrencyPath: \$.Plan.ChunkCount so caller's config drives fan-out.
  • Lambda Tracing: Active so X-Ray spans don't terminate at the SF→Lambda boundary.
  • Cost-allocation Tags on bucket + function.
  • Defensive top-level TimeoutSeconds: 3600 on the state machine.
  • AssertChunkCount Choice gate catches Plan returning 0 chunks.

Testing

Replaces #879 (closed because GitHub auto-closes when base branch is deleted).

Phase 6.2 of the distributed rendering plan (DISTRIBUTED-RENDERING-PLAN.md
§15). Reference SAM template for deploying HyperFrames distributed
rendering on AWS — one Lambda function in three roles, choreographed by
a Step Functions standard workflow with a Map state for parallel chunk
rendering.

Resources created by the template:
  - Lambda function pointing at the Phase 6.1 ZIP
  - Step Functions state machine: Plan -> Map(N) RenderChunk -> Assemble
  - S3 bucket for plan tarballs, chunk outputs, final mp4
  - IAM role for the state machine
  - CloudWatch alarm guarding against runaway chunk invocations

Retry policy: 4 attempts, 2s initial, 2x backoff, max 60s, with the
typed non-retryable error codes from plan §9.3 explicitly opted out.

CodeUri points at packages/aws-lambda/dist/handler.zip; sam deploy
resolves the local path and uploads to a SAM-managed bucket on first
deploy.

Validated: sam validate --lint passes against the template.

This is part of the 8-PR Phase 6 stack; PR 6.2 of 8.
- Add CloudWatch alarms for Lambda Errors metric (5min window, threshold 1)
  and Step Functions ExecutionsFailed metric. The existing runaway-
  invocations alarm catches too-many-calls but missed silent per-chunk
  failures and retry-exhaustion.
- Document VersioningConfiguration: Suspended tradeoff inline. Adopters
  treating the final mp4 as user-keepable should bump to Enabled.
- Cost-allocation Tags on RenderBucket + Lambda Globals.
- Lambda Tracing: Active so X-Ray spans don't terminate at the SF→Lambda
  boundary (the state machine already had tracing).
- State-machine top-level TimeoutSeconds: 3600 as defensive ceiling on
  the whole choreography — catches Plan-retry storms before they hit
  individual task budgets.
- AssertChunkCount Choice state: if Plan ever returns ChunkCount=0 the
  Map would silently iterate zero times and Assemble would receive an
  empty ChunkS3Uris[] producing an empty output. Fail-fast with typed
  PLAN_TOO_LARGE error instead.
- Architecture comment: explicit x86_64-only constraint from
  @sparticuz/chromium so adopters trying Graviton don't get bitten.
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approved — same content as #879 which was already reviewed and approved. Rebased cleanly onto post-#878 main. CI green (CodeQL JS still running but non-blocking for SAM template changes).

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approve on the recreated #907 (formerly #879, closed when #878's base branch was deleted on merge).

Content matches the prior approved state — same 3 commits (9785cc7 SAM template, 71102ac review-feedback fix, ed6b50a docs scrub), rebased onto post-#878 main. All 7 prior #879 findings remain addressed in the same shape verified on c3d486c:

  • Blocker #1 (HandlerZipUri dead param) → resolved by deletion
  • Blocker #2 (HandlerZipKey mislabeled output) → resolved by deletion
  • Important #1 (CloudWatchLogsFullAccess overscope) → narrowed to S3CrudPolicy
  • Important #2 (Assemble typed non-retryables) → 3 codes added
  • Important #3 (log perms w/o config) → dedicated LogGroup + Level: ERROR wired
  • Important #4 (MaxConcurrency hardcoded) → now $.Plan.ChunkCount, fan-out driven by caller's maxParallelChunks
  • Important #5 (no error-rate alarm) → Lambda Errors + SFN ExecutionsFailed alarms

Two nits from prior round still open (state-machine missing Tags:, TimeoutSeconds: 3600 + RetentionInDays: 30 hardcoded) — non-blocking follow-ups.

All required CI green; WIP is the marketplace marker, not a required check.

Re-review by Vai

@jrusso1020 jrusso1020 merged commit 6664e54 into main May 16, 2026
33 checks passed
@jrusso1020 jrusso1020 deleted the 06-2-feat_lambda_add_sam_template branch May 16, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants