fix(cli): stop declarative sync hanging on edge-runtime container exit#5714
Conversation
… output `supabase db schema declarative sync` could hang indefinitely at 0% CPU after all migrations were applied to the shadow database (supabase/pg-toolbelt#312). Root cause: the pg-delta Deno scripts run inside a one-shot Edge Runtime container and rely on the event loop draining for the worker to be destroyed and the container to exit. The catalog-export script opens a real connection pool (createManagedPool); when a keepalive handle lingers after close() resolves, the worker never exits, so the container never stops. The CLI streams that container's logs with Follow:true (DockerStreamLogs), so a worker that never exits blocks the parent `__catalog` subprocess — and the declarative-sync command that spawned it — forever. Only the error path force-closed the loop (`throw new Error("")`); the success path did not. Fix: force-close the event loop on the success path of every pg-delta Edge Runtime script (diff, declarative-export, catalog-export, and declarative-apply), so the worker is torn down deterministically once output has been flushed. The output is written synchronously before the throw, and RunEdgeRuntimeScript already tolerates the resulting "main worker has been destroyed" exit. Reproduced against supabase/edge-runtime:v1.74.2: a worker with a lingering handle keeps the container `running` and `docker logs -f` (the equivalent of DockerStreamLogs Follow:true) never returns; adding the force-close makes the container exit immediately with output intact. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XbxecW4DVmwgQB1YX321K3
Supabase CLI previewnpx --yes https://pkg.pr.new/supabase/cli/supabase@adb0910c4ed3c8e65793ae2d86e6dab7a03e85e3Preview package for commit |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4e2d749ed
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The migrations-catalog cache script (pgcache.TryCacheMigrationsCatalog, used by db start / db push with pg-delta caching) uses the same createManagedPool/extractCatalog/close() pattern as the other pg-delta Edge Runtime scripts and had the same missing success-path force-close, so it could hang the same way (supabase/pg-toolbelt#312). Add the force-close and a guard test. Flagged by automated PR review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XbxecW4DVmwgQB1YX321K3
|
@codex review |
|
Codex Review: Didn't find any major issues. Swish! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…ts logs The worker force-close made the edge-runtime container exit, but declarative sync still hung under podman. A user goroutine dump on the hung __catalog subprocess showed the block is in stdcopy.StdCopy reading the Follow:true Docker log stream (DockerStreamLogs): the stream never receives EOF after the container exits, so the read blocks forever at 0% CPU (supabase/pg-toolbelt#312). podman's /containers/<id>/logs?follow endpoint does not close when the container stops, unlike Docker. Run the bounded edge-runtime scripts via DockerRunOnceWaitWithConfig, which detects completion by polling ContainerInspect (reliable on podman) and then reads the buffered logs without following. The shared DockerStreamLogs (used by functions serve for genuine live streaming) and DockerRunOnceWithConfig (used by db dump for large streaming output) are left unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XbxecW4DVmwgQB1YX321K3
Fixes
supabase db schema declarative sync --experimental(and other pg-delta flows) hanging indefinitely at 0% CPU after the shadow-database work completes.Closes supabase/pg-toolbelt#312
There are two independent causes, both needed to unblock the command:
1. The Edge Runtime worker never exits
The pg-delta scripts run in a one-shot Edge Runtime container and rely on the event loop draining for the worker to be destroyed and the container to stop. The catalog-export script opens a real connection pool (
createManagedPool); when a keepalive handle stays registered afterclose()resolves, the worker never exits, so the container never stops. Only the error path force-closed the loop (throw new Error("")); the success path did not.Fix: force-close the event loop on the success path of every pg-delta script once its output is flushed, so the worker is torn down deterministically:
templates/pgdelta_catalog_export.ts— catalog snapshot (the path reported in test: secret list command #312)templates/pgdelta.ts— diff SOURCE→TARGETtemplates/pgdelta_declarative_export.ts— declarative file exporttemplates/pgdelta_declarative_apply.ts— apply declarative schemainternal/db/pgcache/cache.go— thedb start/db pushmigrations-catalog cache pathThe byte-for-byte embedded copies in
legacy-pgdelta.deno-templates.tsare kept in sync.2. Following the container log stream hangs under podman
With the worker now exiting, the container stops — but the parent still hung for podman users. A goroutine dump on the stuck
__catalogsubprocess showed the block instdcopy.StdCopyreading theFollow:trueDocker log stream (DockerStreamLogs): podman's/containers/<id>/logs?followendpoint does not close when the container stops, so the read never gets EOF and blocks forever. Docker closes it, which is why this only reproduced on podman.Fix: run the bounded edge-runtime scripts via a new
DockerRunOnceWaitWithConfig, which detects completion by pollingContainerInspect(reliable on podman) and then reads the buffered logs without following. The sharedDockerStreamLogs(live streaming forfunctions serve) andDockerRunOnceWithConfig(large streaming output fordb dump) are intentionally left unchanged, so only the bounded pg-delta/edge-runtime path changes behaviour.Regression coverage is added for both: guard tests asserting each script's success path force-closes, and a test pinning that the runner reads captured output and maps the inspected exit code without following the log stream.
https://claude.ai/code/session_01XbxecW4DVmwgQB1YX321K3