Skip to content

feat(deploy): optional ALB (direct in-binary TLS default) + right-size defaults#15

Merged
smithclay merged 2 commits into
mainfrom
codex/deploy-optional-alb
May 26, 2026
Merged

feat(deploy): optional ALB (direct in-binary TLS default) + right-size defaults#15
smithclay merged 2 commits into
mainfrom
codex/deploy-optional-alb

Conversation

@smithclay
Copy link
Copy Markdown
Owner

@smithclay smithclay commented May 26, 2026

Rebased on origin/main (after EFS #14 merged).

Makes the ALB optional via UseLoadBalancer (default false = direct in-binary TLS on port 4318, ephemeral self-signed cert, clients use insecure/skip-verify), dropping the ~$18/mo ALB. UseLoadBalancer=true restores the ALB (HTTP) path. Also bakes the right-sized Fargate defaults (0.5 vCPU / 1 GiB app, 0.25 vCPU / 1 GiB catalog, 512 MiB process-memory limit) into the template.

Design

  • Container port stays 4318 in both modes, so the ECS service can switch task definitions without the LoadBalancers/port mismatch that blocks an atomic update.
  • ALB mode: plaintext HTTP on 4318 behind the ALB; health check http://127.0.0.1:4318/healthz.
  • Direct mode: in-binary TLS terminator on 0.0.0.0:4318 forwarding to a plaintext backend on 127.0.0.1:4319; health check http://127.0.0.1:4319/healthz.
  • ALB resources (LB, target group, listener, LB SG) are conditional on WithAlb. App SG ingress opens 4318 to AllowedIngressCidr in direct mode, or only the ALB SG in ALB mode.
  • Listener ordering for ALB mode uses a conditional Metadata Ref (CloudFormation DependsOn can't be conditional).

⚠️ Migration caveat (real, hit while testing live)

This flag works cleanly for new stacks deployed with UseLoadBalancer=false from creation. Migrating an existing ALB-fronted stack to direct mode in a single CFN update fails: while CFN is applying the change, ECS auto-registers the new task to the still-attached target group, which HTTP-health-checks port 4318 — but that port is now TLS — so the task is marked unhealthy and the deployment circuit breaker rolls back.

To migrate an existing stack, do it in two steps:

  1. Detach the ALB out-of-band first: aws ecs update-service --cluster <c> --service <s> --load-balancers '[]' --force-new-deployment, wait for services-stable.
  2. Then aws cloudformation update-stack with this template (UseLoadBalancer=false); CFN sees the service already has no LB, applies the new task def, and deletes the now-unreferenced ALB resources.

Or simpler: deploy a fresh stack with UseLoadBalancer=false from the start.

Direct-mode endpoint

The public IP is ephemeral in direct mode (no stable DNS without a domain). The DirectTlsEndpoint stack output prints the AWS CLI command to find the current task's public IP; clients then hit https://<ip>:4318 with insecure/skip-verify (the API key still authenticates).

@smithclay smithclay force-pushed the codex/deploy-optional-alb branch 2 times, most recently from 6f5adcd to ec0fa67 Compare May 26, 2026 02:04
@smithclay smithclay changed the base branch from codex/deploy-efs-catalog to main May 26, 2026 02:04
smithclay added 2 commits May 25, 2026 21:22
…size defaults

Add UseLoadBalancer (default false): the app task is exposed directly to the
internet via the in-binary TLS terminator (ephemeral self-signed cert) on
AppTlsPort (8443, non-privileged so the non-root container can bind it),
dropping the ~$18/mo ALB. Clients use HTTPS with insecure/skip-verify; the
plaintext backend stays on 127.0.0.1:4318 so the container health check is
unchanged. UseLoadBalancer=true restores the ALB (HTTP) path.

The ALB resources (LB, target group, listener, LB SG) are now conditional; the
app SG opens AppTlsPort to AllowedIngressCidr in direct mode, or only the ALB SG
in ALB mode. Listener ordering for ALB mode uses a conditional Metadata Ref
(DependsOn can't be conditional). The public IP is ephemeral in direct mode (no
stable DNS without a domain); the DirectTlsEndpoint output prints the lookup.

Also bake right-sized defaults into the template (soak/bench showed ~6% CPU):
app 0.5 vCPU/1 GiB, catalog 0.25 vCPU/1 GiB, process mem limit 512 MiB
(~$46/mo vs ~$111/mo).
Versioning was enabled but provided no real data protection here -- DuckLake's
own catalog tracks file references and the `delete_older_than` retention grace
(24h default) covers orphan recovery. The only practical effect was operational
drag on teardown: a versioned bucket's `rm --recursive` writes delete markers
rather than removing object versions, leaving the bucket non-empty and blocking
CloudFormation's bucket delete during `delete-stack`. New deploys get a
non-versioned bucket and tear down cleanly.
@smithclay smithclay force-pushed the codex/deploy-optional-alb branch from bee02ee to 1fbab18 Compare May 26, 2026 04:22
@smithclay smithclay merged commit 39b3f80 into main May 26, 2026
5 checks passed
@smithclay smithclay deleted the codex/deploy-optional-alb branch May 26, 2026 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant