feat(deploy): persist DuckLake catalog on EFS instead of ephemeral EBS#14
Merged
Conversation
Service-managed EBS volumes force deleteOnTermination=true, so the catalog's managed EBS volume (holding canardstack.ducklake, the index over the immutable S3 Parquet data) is destroyed on every catalog task replacement -- deploy, crash, scale -- which orphans the S3 data and effectively loses the dataset. Move the catalog volume to EFS (filesystem + access point + per-AZ mount targets + NFS security group), which persists across task replacement. The catalog container mount point is unchanged (/var/lib/canardstack), so this is infra-only -- no binary/image change. Drop the now-unused EbsSizeGiB parameter; the app raw-spool volume stays on managed EBS (AppEbsSizeGiB). NOTE: DuckDB advises against read-write database files on NFS; acceptable here only because serve-catalog is the single writer. Postgres remains the stronger durable-catalog option.
1432c79 to
c94e763
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The catalog DuckDB file (
canardstack.ducklake) — the index over the immutable S3 Parquet data — lived on a service-managed EBS volume. Service-managed EBS forcesdeleteOnTermination=true(confirmed in AWS docs), so the volume is destroyed on every catalog task replacement (deploy, crash, scale), orphaning the S3 data and effectively losing the dataset. Any catalog restart = data loss.Change
Move the catalog volume to EFS (filesystem + access point + per-AZ mount targets + NFS security group), which persists across task replacement. The container mount point is unchanged (
/var/lib/canardstack), so this is infra-only — no binary/image change. Drop the now-unusedEbsSizeGiBparameter (the app raw-spool volume stays on managed EBS viaAppEbsSizeGiB).Single-writer safety on a shared filesystem
EBS physically enforced single-writer (single attach); EFS is shared, so single-writer is now enforced by
DesiredCount: 1+MaximumPercent: 100(stop-old-before-start-new, never two tasks) plus DuckDB's file lock as a backstop. The catalog must never scale past 1. DuckDB advises against read-write DB files on NFS; acceptable here only becauseserve-catalogis the single writer. Postgres remains the stronger durable-catalog option.Verified live
Deployed on top of the v0.0.6 image (which carries the catalog S3-creds compaction fix from #13):
CHECKPOINT→ran:true, status:ok(the operation that previously 503'd),ducklake_checkpoint_runs_total{status="ok"} 1.