feat(deploy): persist DuckLake catalog on EFS instead of ephemeral EBS by smithclay · Pull Request #14 · smithclay/canardstack

smithclay · 2026-05-26T00:39:38Z

Problem

The catalog DuckDB file (canardstack.ducklake) — the index over the immutable S3 Parquet data — lived on a service-managed EBS volume. Service-managed EBS forces deleteOnTermination=true (confirmed in AWS docs), so the volume is destroyed on every catalog task replacement (deploy, crash, scale), orphaning the S3 data and effectively losing the dataset. Any catalog restart = data loss.

Change

Move the catalog volume to EFS (filesystem + access point + per-AZ mount targets + NFS security group), which persists across task replacement. The container mount point is unchanged (/var/lib/canardstack), so this is infra-only — no binary/image change. Drop the now-unused EbsSizeGiB parameter (the app raw-spool volume stays on managed EBS via AppEbsSizeGiB).

Single-writer safety on a shared filesystem

EBS physically enforced single-writer (single attach); EFS is shared, so single-writer is now enforced by DesiredCount: 1 + MaximumPercent: 100 (stop-old-before-start-new, never two tasks) plus DuckDB's file lock as a backstop. The catalog must never scale past 1. DuckDB advises against read-write DB files on NFS; acceptable here only because serve-catalog is the single writer. Postgres remains the stronger durable-catalog option.

Verified live

Deployed on top of the v0.0.6 image (which carries the catalog S3-creds compaction fix from #13):

Catalog rolled EBS→EFS and came up healthy (clean lock handoff).
Forced CHECKPOINT → ran:true, status:ok (the operation that previously 503'd), ducklake_checkpoint_runs_total{status="ok"} 1.
Fresh ingest + seals to S3 working.

Service-managed EBS volumes force deleteOnTermination=true, so the catalog's managed EBS volume (holding canardstack.ducklake, the index over the immutable S3 Parquet data) is destroyed on every catalog task replacement -- deploy, crash, scale -- which orphans the S3 data and effectively loses the dataset. Move the catalog volume to EFS (filesystem + access point + per-AZ mount targets + NFS security group), which persists across task replacement. The catalog container mount point is unchanged (/var/lib/canardstack), so this is infra-only -- no binary/image change. Drop the now-unused EbsSizeGiB parameter; the app raw-spool volume stays on managed EBS (AppEbsSizeGiB). NOTE: DuckDB advises against read-write database files on NFS; acceptable here only because serve-catalog is the single writer. Postgres remains the stronger durable-catalog option.

smithclay force-pushed the codex/deploy-efs-catalog branch from 1432c79 to c94e763 Compare May 26, 2026 00:40

smithclay mentioned this pull request May 26, 2026

feat(deploy): optional ALB (direct in-binary TLS default) + right-size defaults #15

Merged

smithclay merged commit 77106ed into main May 26, 2026
5 checks passed

smithclay deleted the codex/deploy-efs-catalog branch May 26, 2026 04:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(deploy): persist DuckLake catalog on EFS instead of ephemeral EBS#14

feat(deploy): persist DuckLake catalog on EFS instead of ephemeral EBS#14
smithclay merged 1 commit into
mainfrom
codex/deploy-efs-catalog

smithclay commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smithclay commented May 26, 2026

Problem

Change

Single-writer safety on a shared filesystem

Verified live

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant