feat(gcp-cloud-run): Google Cloud Run + Workflows distributed render adapter#1253
Conversation
…der adapter Adds @hyperframes/gcp-cloud-run, the GCP counterpart to @hyperframes/aws-lambda (issue #932). The OSS distributed primitives (plan, renderChunk x N, assemble) are unchanged; this package is the storage/compute/orchestration glue. Package: Cloud Run handler (one image, three actions), runs under bun; GCS transport; in-image chrome-headless-shell resolver; client SDK (renderToCloudRun, getRenderProgress, deploySite, computeRenderCost); Dockerfile; Cloud Workflows definition; Terraform module; CLI cloudrun deploy|sites|render|render-batch|progress|destroy with --output-resolution and --strict-variables; 62 unit tests + docs + live smoke script. Shared extraction (removes ~640 lines of adapter duplication): move the cloud-agnostic config validator + content-hash into producer/distributed; both adapters import them. Validated end-to-end on GCP at 37.4 dB PSNR vs baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
…build The CLI bundle (esbuild) couldn't resolve `@hyperframes/gcp-cloud-run/sdk`, failing Build/Typecheck/CLI-smoke (and the perf/windows/regression jobs that build first). Mirror the aws-lambda handling: mark the gcp adapter + its /sdk subpath external in tsup.config.ts with a source alias, and add gcp-cloud-run to the root `build` filter so its dist exists for publish + runtime. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stall The regression test image runs `bun install --frozen-lockfile` after copying each workspace package.json individually. The CLI now depends on @hyperframes/gcp-cloud-run (workspace:*), so the frozen install fails to resolve it unless its manifest is present. Add the COPY line. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
James — honest assessment: this is substantially better than the typical "Claude went crazy" code I was bracing for. The Claude-generated-code failure modes you flagged (over-engineering, dead code, hallucinated APIs, mock-everything tests) are mostly absent. The shared extraction is real (not duplicated), the AWS-Lambda 174 tests still pass through it, the GCP API surface usage (ExecutionsClient.workflowPath + createExecution, Storage.bucket().file(), @google-cloud/storage upload semantics) all match the actual libraries, and the e2e smoke against real GCP (37.4 dB PSNR vs in-process baseline) is genuinely meaningful coverage. The test seams (HandlerDeps for the handler, ExecutionsClientLike for the SDK) inject real interfaces rather than mocking-everything.
Scope of review: I focused on server.ts (dispatch + bucket allowlist + plan-hash guard + retryable/non-retryable error map), gcsTransport.ts (storage primitives), the shared extraction (projectHash.ts + renderConfigValidation.ts) for AWS-Lambda parity, terraform/main.tf + workflow.yaml (IAM + retry topology), Dockerfile (image surface), sdk/renderToCloudRun.ts + sdk/validateConfig.ts + sdk/costAccounting.ts. I did not deeply read cli/cloudrun.ts (829 lines — CLI surface), getRenderProgress.ts (268 lines), terraform/variables.tf/outputs.tf, the 50% of tests I didn't open, the smoke script, or fakeGcs.ts. Given the 5644-line surface area, this review is "I focused on the security + correctness + cost-model surface; the CLI ergonomics and rendering-output quality need a cloud-rendering specialist's eyes for the parts I skipped."
Concerns worth fixing before merge
[concern] HYPERFRAMES_RENDER_BUCKET fail-open default in server.ts:478. The bucket-allowlist guard is your defense against request injection — if Cloud Workflows or a misconfigured client posts a URI pointing at a different bucket, the handler should reject. Currently:
const allowedBucket = process.env.HYPERFRAMES_RENDER_BUCKET?.trim();
if (!allowedBucket) return; // ← silently disables the guardThe Terraform module wires the env var at line 100-105, so the prod path is fine. But this fails open by default: a self-hosted deployment that forgets the env (or empty-strings it after trim()) loses the guard with no warning. Tighten options:
- Require explicit
"*"to opt out, anything-else-but-unset enforces. - Fail-fast at server startup: if
NODE_ENV === "production"AND the env is unset, log + exit instead of silently disabling. - At minimum log a
[handler] bucket allowlist guard disabledwarning on startup so it's visible in Cloud Logging.
[concern] Audio.aac is in BOTH the plan tarball AND uploaded separately. server.ts:249-255 does:
const audioPath = join(planDir, "audio.aac"); // path inside planDir
const hasAudio = existsSync(audioPath) && statSync(audioPath).size > 0;
// then tarDirectory(planDir, planTar) — includes audio.aac
// AND uploadFileToGcs(storage, audioPath, audioUri, ...) — separate objectThe comment claims "Audio is co-located alongside the plan so RenderChunk doesn't have to pull the whole plan tarball when audio isn't relevant to the chunk" — but audio.aac is still inside planDir, so RenderChunk's untarDirectory(planTar, planDir) writes it to disk anyway. The standalone copy is then re-downloaded by Assemble and overwrites the untar'd one. Wasted bytes on every Plan upload + every Assemble download. Either move audio.aac OUT of planDir before the tarball (e.g. write directly to work/audio.aac), or drop the standalone upload + path.
[concern] No idempotency token on createExecution. Cloud Workflows accepts an optional executionId parameter that dedupes retries — if the SDK caller's renderToCloudRun call gets a transient network error mid-request and retries, the current code creates two distinct executions. The renderId is already a UUID minted client-side; pass it as executionId so a network retry is idempotent. The diff is one line in executions.createExecution({ parent, executionId: renderId, execution: { argument: ... } }). (Confirm the Node client surface accepts it — the gRPC API supports it.)
[concern] renderChunks.concurrency_limit: ${chunkCount} can exceed Cloud Workflows' parallel-branch ceiling. Workflows standard caps parallel branches per execution (200 last I checked — verify against the Workflows quotas page). Your validator already caps MAX_PARALLEL_CHUNKS_CEILING = 256 (renderConfigValidation.ts:61), which is HIGHER than Workflows' branch limit, so a config with maxParallelChunks: 256 could pass validation and then explode the workflow execution. Either lower MAX_PARALLEL_CHUNKS_CEILING to ≤200 on the Cloud Run adapter side, OR cap the workflow's concurrency_limit to min(chunkCount, 200).
[concern] Dockerfile doesn't pin bun version. RUN curl -fsSL https://bun.sh/install | bash pulls whatever's current at image build time. The PR body specifically calls out "the producer bundle relies on a runtime require; Node ESM can't load it" — that's a brittle dependency on bun's ESM interop. A bun release that changes the interop would silently break the next image rebuild. Pin to a specific version: curl -fsSL https://bun.sh/install | bash -s "bun-v1.1.42" (or whatever you tested with).
[concern] Cloud Run image runs as root. No USER nonroot instruction. Cloud Run doesn't sandbox like Lambda does — the container has root inside. Not exploitable in isolation (the bucket-allowlist guard + Cloud Run network egress limits + IAM segregation contain the blast radius), but a defense-in-depth gap if any of those upstream guards regresses.
[concern] bucket_force_destroy = var.bucket_force_destroy with unknown default. Looking at the terraform/main.tf change but not opening variables.tf — if the default is true, terraform destroy wipes ALL stored renders (including the user's final outputs). Should default to false in any prod-shaped deployment; users opt INTO destruction.
Nits
[nit] NON_RETRYABLE_ERROR_NAMES is duplicated between server.ts:551-568 and terraform/workflow.yaml's retry predicate. The comment explicitly says "Keep this list in sync with the retry predicate in workflow.yaml" — a known-fragile contract. Could DRY by exporting the list from the producer's distributed module and (a) importing it in the handler, (b) generating the workflow YAML from a template that interpolates the list. Probably not worth doing today; flagging as a follow-up.
[nit] uploadChunkOutput uses result.outputPath.slice(result.outputPath.lastIndexOf(".")) (server.ts:343) for the file extension. If the output path has no dot (unlikely in practice), lastIndexOf(".") = -1, slice(-1) returns the last character — wrong extension. Use extname(result.outputPath) from node:path for a cleaner failure mode.
[nit] No min_instance_count in the Cloud Run service config. Default is 0 (scale-to-zero), so the first render after deploy or after idle period hits cold start (image pull + Chrome + bun + Node import graph). For a service with 1.5-2GB image size this could be 20-30s. Set min_instance_count = 1 in the Terraform if first-render latency matters to UX.
[nit] No alert on workflow execution FAILURES, only on request count. terraform/main.tf:141-165 alerts when request_count > threshold over 1h. A workflow that fails 100% of the time at low volume wouldn't trip it. Add an alert keyed off workflowexecutions.googleapis.com/execution/count filtered on state="FAILED".
[nit] runtimeCap validation accepts "cloud-run-job" and "k8s-job" (renderConfigValidation.ts:55) — neither has an implementation in this PR. Dead-code-shaped surface. Either implement them, drop them from the union, or document them as "planned" in the JSDoc so future-you doesn't grep for them and find nothing.
[nit] Dockerfile workspace-manifest list at lines 83-91 needs manual maintenance. Every new workspace package added to the monorepo needs another COPY packages/X/package.json line. COPY packages/*/package.json packages/ would be a one-shot if bun's frozen-lockfile install works against the resulting layout; worth checking.
[nit] Cost accounting is "best-effort" per the docstring, with gcsEstimate: "not-included". GCS storage + network egress for plan tarballs (which can be ~100MB) + chunk artifacts is a non-trivial component of total render cost. Worth at least documenting the estimation gap in the SDK's JSDoc so callers don't take displayCost as authoritative.
What I'd want a cloud-rendering specialist to look at
- The
getRenderProgresspolling flow (268 lines I didn't read deeply). Polling Cloud Workflows executions has cost + rate-limit edges. - The CLI command surface (
cli/cloudrun.ts, 829 lines). Claude-generated CLIs tend to have unused flags + verbose help; could be tightened. - Cloud Run vCPU/memory sizing defaults in
variables.tf. Render workload is heterogeneous — Plan is light, RenderChunk is heavy, Assemble is medium. One-size-fits-all sizing on a single Cloud Run service means RenderChunk's footprint sets the bill for Plan + Assemble too. Could split into multiple Cloud Run services, but the simplicity tradeoff is real. - Terraform module's
notification_channelsvariable — the runaway-request alert is wired to it but I didn't verify how it's set in prod. - Whether the workflow's
resultaccumulator (Plan+Chunks: []+Assemble) hits Workflows' state-size limits (1 MiB hard, 512 KiB recommended per execution) at chunkCount ~200.
What I verified that lined up cleanly
- Producer-side shared extraction (
hashProjectDir+validateDistributedRenderConfig) imports cleanly into both adapters; the AWS-LambdadeploySite.tsandvalidateConfig.tsshrunk from copies to re-exports, no behavioral drift. - The
@google-cloud/workflowsand@google-cloud/storageAPI usage in the SDK is real (not hallucinated). - Cost accounting rates ($0.000024/vCPU-sec, $0.0000025/GiB-sec, $0.0000004/request, $0.00001/Workflows-step) match GCP's us-central1 Tier 1 on-demand published rates.
- Terraform IAM: two least-privilege service accounts, render service authenticated-only (no
allUsersbinding), bucket-scopedobjectAdminfor run_sa,run.invokeronly for workflow_sa. renderIdregex validation inrenderToCloudRun.ts:119prevents path-injection into the GCS key prefix.validateWorkflowsInputSizemeasures UTF-8 bytes (not UTF-16 code units) — the right semantic for the wire boundary.MAX_ENVELOPE_DEPTH = 4inunwrapEventprevents infinite-loop on malformed input.verifyPlanHashat the handler boundary fails BEFORE paying Chrome-launch cost on a misrouted chunk.- Exhaustive
_exhaustive: neverswitch indispatchmakes a newCloudRunActionmember a compile error.
— Review by Rames D Jusso
miguel-heygen
left a comment
There was a problem hiding this comment.
Solid package — the architecture is clean, the handler/SDK/Terraform split mirrors the aws-lambda adapter faithfully, and the security scaffolding (OIDC auth, plan-hash guard, bucket allowlist, retry predicate) is thoughtfully designed. A few concrete issues below.
P1 — Bug / Security
validateEventGcsUris silently no-ops when HYPERFRAMES_RENDER_BUCKET is unset (src/server.ts L316-330)
function validateEventGcsUris(...): void {
const allowedBucket = process.env.HYPERFRAMES_RENDER_BUCKET?.trim();
if (!allowedBucket) return; // ← entire guard silently disabledThe Terraform module always injects this env var, so the Terraform path is fine. But the package README and hyperframes cloudrun deploy command both allow custom gcloud run deploy invocations. A user who deploys the image manually without setting this variable gets zero bucket validation — any GCS URI in a crafted request body would be accepted. At minimum this should emit a loud warning at server startup (logged as a structured HYPERFRAMES_RENDER_BUCKET_UNSET event so it shows up in Cloud Logging), or fail the startServer() call entirely. The current behaviour — silent pass-through — violates the principle of least-surprise for a guard described as "defense against request injection."
P2 — Correctness / Operator confusion
Wrong error code for zero-chunk plan (terraform/workflow.yaml L66-69)
- planProducedZeroChunks:
raise:
code: PLAN_TOO_LARGE
message: "Plan returned ChunkCount=0 — non-retryable producer-side invariant violation."PLAN_TOO_LARGE means the opposite of what happened (the plan produced nothing, not too much). The code field is what operators see in Cloud Logging and what you'd grep alerts for — PLAN_TOO_LARGE for a zero-chunk failure will create genuinely misleading runbooks. Should be PLAN_EMPTY or PLAN_ZERO_CHUNKS.
Cfr field not wired through the workflow (terraform/workflow.yaml assemble body)
The assemble step's body has no Cfr field, so event.Cfr === true in handleAssemble is always false — CFR encoding is silently disabled for all Cloud Run renders. The aws-lambda state machine passes this through. Either add Cfr: ${args.Cfr} to the workflow input + assemble body (and expose it in RenderToCloudRunOptions), or document explicitly that CFR is unsupported in this adapter.
Bun version unpinned in Dockerfile (Dockerfile L71)
RUN curl -fsSL https://bun.sh/install | bashThis always pulls the latest bun release. One bun regression breaks all Cloud Run image builds. Pin with BUN_VERSION=<tag> — the Dockerfile comment says bun is needed because the producer bundle uses require incompatible with Node ESM, so the version matters for correctness too, not just reproducibility.
P3 — Nits
Comment/code mismatch (src/server.ts L617)
// Boot when executed directly (the Dockerfile runs `node dist/server.js`),The Dockerfile CMD is ["bun", "dist/server.js"], not node. The comment matches the old intended entrypoint but not what's shipped. Small but will mislead someone debugging a startup failure.
concurrency_limit: ${chunkCount} allows unbounded Cloud Run fan-out (workflow.yaml L93)
The comment explains that plan() is the source of truth via maxParallelChunks, so this is by design. Worth a one-liner note in the Terraform variables that callers should set max_instances ≤ their expected maxParallelChunks budget to avoid surprise billing — the runaway-request alert is the only other backstop today.
Closes the parity gap with `lambda deploy` (which exposes --memory etc.). `cloudrun deploy` now threads --cpu, --memory, --max-instances, and --timeout into the Terraform apply; omitted flags keep the module defaults (4 vCPU / 16Gi / 100 instances / 3600s). For finer control, apply the module directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- server.ts: bucket-allowlist guard no longer fails open silently. Unset env logs a one-time WARNING; "*" is an explicit opt-out; otherwise it enforces. - server.ts: stop double-shipping audio.aac. It already rides in the plan tarball every consumer downloads, so drop the redundant standalone upload (plan) + re-download/overwrite (assemble); assemble reads it from the untar, falling back to a supplied AudioGcsUri for compat. - server.ts: chunk extension via path.extname() instead of slice(lastIndexOf). - workflow.yaml: clamp parallel concurrency_limit to math.min(chunkCount, 20) — Cloud Workflows hard-caps concurrent iterations at 20. - Dockerfile: pin bun (bun-v1.3.9) so an interop change can't silently break the image rebuild. - terraform: add min_instances var (default 0); add a workflow-failure alert (finished_execution_count status=FAILED) alongside the request-count one. - costAccounting: document that displayCost excludes GCS storage/egress. Verified against the actual APIs: @google-cloud/workflows@4.4.0 ICreateExecutionRequest has no executionId (so the idempotency-token suggestion isn't available in this client); Workflows concurrency cap is 20; failure metric is workflows.googleapis.com/finished_execution_count (status label). 174 adapter tests pass, fallow/oxlint/oxfmt/terraform clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for the detailed review — addressed in Concerns
Nits
All four "specialist eyes" areas (getRenderProgress polling, CLI surface, vCPU/mem sizing, multi-service split) are fair — happy to follow up on any. 174 adapter tests still pass; fallow/oxlint/oxfmt/terraform clean. |
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Round-2 re-review. Walked the delta d1e8945b → 83292805 (9 files, +133 lines) and your point-by-point response. Clean close — every concern is either fixed cleanly or deferred with verifiable justification, and you correctly pushed back on one of my nits.
What was addressed
[concern — bucket allowlist fail-open] RESOLVED CLEAN. server.ts:474-499 now:
"*"is an explicit intentional opt-out (if (allowedBucket === "*") return;).- Unset / empty fires a one-time structured
bucket_allowlist_disabledWARNING vialogEventso the gap is visible in Cloud Logging — module-levelwarnedAllowlistDisabledflag throttles to once per instance lifecycle. - The Terraform-wired prod path enforces normally.
- New test at
server.test.ts:231-251exercises the"*"path with proper env-var save/restore.
[concern — audio.aac shipped twice] RESOLVED CLEAN. Plan no longer uploads the standalone audio object; audio.aac rides inside the plan tarball as before, the result returns AudioGcsUri: null (kept in the shape for wire compatibility), and assemble prefers the in-tarball copy with event.AudioGcsUri as a back-compat fallback for older plans. The bytes-shipped reduction is real on every render with audio.
[concern — concurrency_limit exceeds Workflows ceiling] RESOLVED, BETTER THAN MY SUGGESTION. Clamped at the workflow level to math.min(chunkCount, 20). You're right that the cap is 20 (not the 200 I guessed); I was off by a factor of 10. And clamping in the workflow rather than lowering MAX_PARALLEL_CHUNKS_CEILING is the right call — that constant is shared with the Lambda adapter where 200+ fan-out works fine. Comment correctly notes that above-20 iterations queue rather than fail, so a maxParallelChunks > 20 config still produces a correct render, just with less parallelism than requested.
[concern — Dockerfile bun not pinned] RESOLVED. curl -fsSL https://bun.sh/install | bash -s "bun-v1.3.9". Comment explicitly notes "the container runs bun dist/server.js and relies on bun's ESM require interop … a future bun release changing that behaviour shouldn't silently break the next image rebuild. Bump deliberately." ✓
[concern — bucket_force_destroy default] RESOLVED — by my error. Fair — I didn't open variables.tf. Already false.
[nit — extname() for chunk extension] RESOLVED. result.outputPath.slice(result.outputPath.lastIndexOf(".")) → extname(result.outputPath).
[nit — min_instance_count] RESOLVED. New var.min_instances with sensible default (0 = scale-to-zero, opt into 1 for warm); JSDoc explicitly calls out the ~20-30s cold start tradeoff.
[nit — no alert on workflow failures] RESOLVED. New google_monitoring_alert_policy.workflow_failures keyed off workflows.googleapis.com/finished_execution_count filtered to status="FAILED", threshold > 0 with 5-min alignment — fires on any failure even at low volume. Complements the request-count runaway alert.
[nit — Cost excludes GCS] RESOLVED. Documented in the RenderCost JSDoc per the comment thread.
Reasoned defers I accept
[concern — idempotency token on createExecution] DEFER ACCEPTED. You verified against @google-cloud/workflows@4.4.0 that ICreateExecutionRequest is {parent, execution} with no executionId field. I cross-checked the public proto: Cloud Workflows Executions v1 CreateExecutionRequest doesn't include executionId in the proto, so the client surface genuinely doesn't expose it (this differs from some other GCP APIs that take it as a query param). I was wrong to suggest the one-liner. Your mitigation — the SDK doesn't auto-retry createExecution, so a transient error surfaces to the caller rather than silently double-starting — is the right boundary.
[concern — Cloud Run runs as root] DEFER ACCEPTED. Cloud Run gen2 sandboxes the container via gVisor, so root-inside-container is contained — meaningfully different threat model from root-on-bare-metal. Refactoring needs chowning /opt/chrome + the writable tmp + re-validating the headless-shell launch, which is real work that risks the smoke-validated image. Reasonable to track as a follow-up.
[nit — NON_RETRYABLE list duplicated in YAML] DEFER ACCEPTED. Agreed.
[nit — Dockerfile manifest list / COPY --parents] DEFER ACCEPTED. BuildKit labs frontend with repo-wide blast radius — separate PR is the right shape.
My nits you correctly pushed back on
[nit — runtimeCap accepts cloud-run-job and k8s-job] WITHDRAWN. You're right, that union is the producer's existing DistributedRenderConfig.runtimeCap shared with examples/k8s-jobs (a real adopter). Not dead code. I should have grepped before calling it out.
Closing notes
The four "specialist-eyes" areas I flagged (getRenderProgress polling, CLI surface, vCPU/mem sizing, multi-service split) all still apply but are non-blocking — happy to dig into any if you want a follow-up review on a specific one. 174 adapter tests pass through the shared extraction with no regression, the live smoke is already in your back pocket, and CI on the new SHA is green. From where I sit this is ready when you are. Leaving as a comment.
— Review by Rames D Jusso
miguel-heygen
left a comment
There was a problem hiding this comment.
Good follow-up pass — three of the four findings are addressed. Two items remain.
✅ Fixed
- Bucket allowlist silent failure (P1):
validateEventGcsUrisnow logs a one-timeWARNINGwhenHYPERFRAMES_RENDER_BUCKETis unset and accepts"*"as an explicit opt-out. No longer silently fails open. Clean. - Bun unpinned (P3): Pinned to
bun-v1.3.9. Good. - Dockerfile comment (P3): Comment now correctly describes bun as the runtime.
Still open
P2 — PLAN_TOO_LARGE error code for zero chunks (workflow.yaml line 65)
- planProducedZeroChunks:
raise:
code: PLAN_TOO_LARGE
message: "Plan returned ChunkCount=0 — non-retryable producer-side invariant violation."PLAN_TOO_LARGE implies the plan exceeded a size ceiling — the opposite of what happened. A zero-chunk plan means the composition produced nothing. Runbook writers triaging a PLAN_TOO_LARGE alert will look for memory/size issues when the actual cause is an empty composition. Rename to PLAN_EMPTY or PLAN_PRODUCED_ZERO_CHUNKS.
P2 — CFR never forwarded through workflow.yaml
server.ts line 397 reads event.Cfr and passes it to primitive() for final encoding:
const result: AssembleResult = await primitive(planDir, chunkPaths, audioPath, finalOutput, {
cfr: event.Cfr === true,
});But the assemble body in workflow.yaml (lines 138–145) never includes Cfr:
body:
Action: assemble
PlanGcsUri: ${planResult.PlanGcsUri}
ChunkGcsUris: ${chunkUris}
AudioGcsUri: ${planResult.AudioGcsUri}
OutputGcsUri: ${outputGcsUri}
Format: ${planResult.Format}
# Cfr is missing — always false for Cloud Run rendersCFR is silently disabled for every Cloud Run render regardless of what the user requested. Fix: either include it in args alongside Config and thread it through, or have the plan step echo it back in planResult so the assemble step can forward it.
- workflow.yaml: rename the zero-chunk failure code PLAN_TOO_LARGE →
PLAN_PRODUCED_ZERO_CHUNKS. The old code implied a size-ceiling breach (the
opposite cause), misleading anyone triaging the alert.
- workflow.yaml: forward Config.cfr to the assemble step
(`Cfr: ${("cfr" in config) and config.cfr}`). It was read by the handler
but never sent, so exact-CFR was silently off for every Cloud Run render.
Uses the same `in`-operator guard already proven in the retryable predicate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Both P2s addressed in P2 — misleading P2 — CFR never forwarded ✅ The assemble step now sends One caveat on verification: I couldn't run a |
|
Verification follow-up: re-ran the |
miguel-heygen
left a comment
There was a problem hiding this comment.
Both remaining issues addressed:
PLAN_TOO_LARGE→PLAN_PRODUCED_ZERO_CHUNKS✅Cfr: ${("cfr" in config) and config.cfr}forwarded in assemble ✅
LGTM.
set-version.ts (driven by release:prepare) bumps an explicit package list to the shared version on each release. gcp-cloud-run was wired into the build + publish.yml but missing here, so a release would leave it at a stale version and publish.yml would push the wrong version. Add it so the new package version-bumps + publishes in lockstep with the others. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
miguel-heygen
left a comment
There was a problem hiding this comment.
Latest commit () adds gcp-cloud-run to set-version.ts — correct fix, would have published a stale version without it. LGTM.
Description
Adds
@hyperframes/gcp-cloud-run, the Google Cloud counterpart to@hyperframes/aws-lambda, addressing #932. The OSS distributed render primitives (plan→renderChunk× N →assemble) are unchanged — this package is the storage / compute / orchestration adapter, just like the AWS one.Architecture: a single Cloud Run service (one image, three actions dispatched on the request body) fronts a Cloud Workflows definition that fans renders out across parallel chunk workers, with intermediate artifacts in GCS.
What's included
src/server.ts) — dispatches plan/renderChunk/assemble, bridging GCS around each primitive. Runs under bun (the producer bundle relies on a runtimerequire; Node ESM can't load it). Plan-hash + bucket-allowlist guards; non-retryable errors → HTTP 400, retryable → 5xx (the workflow retry predicate keys off that split).chrome-headless-shellresolver (no Lambda ZIP ceiling — Cloud Run is a container)../sdk) —renderToCloudRun,getRenderProgress,deploySite,computeRenderCost.hyperframes cloudrun deploy | sites | render | render-batch | progress | destroy, with--output-resolutionand--strict-variables(parity withhyperframes lambda render).docs/deploy/gcp-cloud-run.mdx+ cli.mdx), README, and a live smoke script.Shared extraction (also DRYs up aws-lambda): moved the cloud-agnostic config-shape validator and the content-addressing project hash into
@hyperframes/producer/distributed; both adapters now import them instead of carrying copies (−~640 lines of duplication). aws-lambda keeps only its Step Functions size cap.Known follow-ups (intentionally out of scope): mid-flight per-chunk progress (needs the Workflows step-entries API); Cloud Run Jobs / Firebase variants; the remaining structural handler-dispatch similarity between the two adapters is reported by fallow as a warning (not a gate) and would need a transport-parameterized handler to fully share — deferred to avoid rewriting the working aws-lambda handler.
Testing
tsc,oxlint,oxfmt --check,terraform validate/fmt, andfallowall clean.examples/gcp-cloud-run/scripts/smoke.sh): built + pushed the image,terraform apply, rendered themp4-h264-sdrfixture through the workflow (Plan → 4 parallel RenderChunks → Assemble), produced a correct 60-frame MP4 in GCS at 37.4 dB PSNR vs the in-process baseline, then tore the stack down. The smoke surfaced and fixed three real issues (bun runtime, a workflow for-loop variable collision, anddeletion_protectionon the Cloud Run service + Workflow).🤖 Generated with Claude Code