Skip to content

Fly.io deploy manifest + CI auto-deploy (#38)#76

Merged
ilmoniemi merged 4 commits into
mainfrom
feature/38
May 12, 2026
Merged

Fly.io deploy manifest + CI auto-deploy (#38)#76
ilmoniemi merged 4 commits into
mainfrom
feature/38

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

Summary

  • fly.toml at repo root: TCP passthrough on :80/:443 (no Fly HTTP/TLS handler — autocert stays the cert holder), volume mount at /var/lib/relay/autocert, --domain / --cert-cache as literal argv (distroless has no shell), single-machine cap via min_machines_running=1 + auto_start_machines=false + auto_stop_machines="off" + immediate deploy strategy.
  • .github/workflows/ci.yml: new deploy job, runs only on push to main, needs: [test, security, image-scan], permissions: contents: read, superfly/flyctl-actions/setup-flyctl SHA-pinned to 1.6 with a # Tracks: comment matching the Trivy / govulncheck convention.
  • docs/deploy.md: bootstrap (flyctl apps create, flyctl ips allocate-v4, flyctl volumes create, DNS, FLY_API_TOKEN secret), steady-state flow, rollback by image digest (preferred) and by release number.
  • docs/architecture.md § Hosting: decision record for Fly.io + relay-terminated TLS + single-machine-via-fly.toml, per AC relay: WS upgrade for /v1/client — accept phone connection, look up server-id, register #5 (architecture.md is the permitted equivalent of PROJECT-MEMORY, which this role cannot edit).

Issue

Closes #38. Spec: docs/specs/architecture/38-fly-deploy-manifest.md.

Notes for review

  • <REGION> and <DOMAIN> ship as loud placeholders (__REGION__, __DOMAIN__). The spec offered either real values or recognisable placeholders; without operator-supplied real values I picked the loud-placeholder path — first CI deploy fails with a clear "fill in fly.toml" signal rather than a silently-misconfigured production relay. docs/deploy.md step 1 calls this out.
  • flyctl validate fly.toml was not run — flyctl is not installed in this environment. The first CI deploy on main is the smoke test; spec § Testing strategy lists this as expected. TOML parses cleanly via tomllib.
  • max_machines is not a key in current Fly Apps v2; the single-machine cap is enforced via the four declarative knobs above, the in-binary PYRYCODE_RELAY_SINGLE_INSTANCE self-check (relay: startup self-check refuses to run as multi-instance deploy #65), and operator discipline at flyctl scale count. Documented in fly.toml's header comment.
  • Out-of-scope SHOULD-FIX from the spec's security review (update docs/threat-model.md § Deploy security to reflect Fly-account hardening alongside VPS) is not included — flagged as a follow-up, not blocking.

Testing

  • go vet ./... clean.
  • go test -race ./... passes (no Go code changed).
  • go build ./cmd/pyrycode-relay builds.
  • fly.toml TOML-parses; CI workflow YAML-parses.

Architecture compliance

Matches the spec's Design § for all four files: fly.toml shape (TCP passthrough, no Fly handlers, mount path, argv form, single-machine knobs, immediate strategy), CI deploy job (branch gate, needs: chain, permissions: contents: read, SHA-pinned action with # Tracks: comment), docs/deploy.md three-section structure, and docs/architecture.md § Hosting placement between § Single-instance constraint (v1) and § Threat model.

🤖 Generated with Claude Code

ilmoniemi and others added 3 commits May 12, 2026 21:19
Append a `deploy` job to `.github/workflows/ci.yml` and land `fly.toml`
at repo root. The job runs only on push to main, depends on test +
security + image-scan, holds `contents: read`, and invokes
`flyctl deploy --remote-only` with `FLY_API_TOKEN` from repo secrets.

`fly.toml` declares TCP-passthrough on :80 and :443 (no Fly HTTP
handlers — autocert stays the cert holder), mounts the autocert volume
at /var/lib/relay/autocert, passes --domain/--cert-cache as literal
argv (distroless has no shell), and pins the single-machine invariant
via min_machines_running=1 + auto_start_machines=false +
auto_stop_machines=off + immediate deploy strategy. `<REGION>` and
`<DOMAIN>` ship as loud placeholders that the operator fills at
bootstrap; the spec's first-deploy-fails-loud failure mode is the
desired behaviour over a silently-misconfigured production relay.

`setup-flyctl` is SHA-pinned (1.6) with a `# Tracks:` comment, matching
the Trivy (#68) and govulncheck (#41) pin convention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`docs/deploy.md`: bootstrap (apps create / IPv4 / volume / DNS /
FLY_API_TOKEN), steady-state (PR → merge → CI deploys), and rollback
(by image digest preferred, by release number as fallback).

`docs/architecture.md` § Hosting: records the Fly.io + relay-terminated
TLS + single-machine-via-fly.toml decision so the next agent reading
the repo cold sees the call without scrolling #38. AC #5 permits
architecture.md as the equivalent of PROJECT-MEMORY, which the
architect role is not allowed to edit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #38

Decision: PASS

Findings

  • [NIT] fly.toml:44-48, fly.toml:57-61[[services.ports]] blocks rely on Fly's default-empty handlers rather than an explicit handlers = []. The spec recommended the explicit empty list as more reviewable against a future Fly default change. The inline comments capture the intent, so this is defensible — but a one-token edit (handlers = []) would make the load-bearing "no Fly HTTP/TLS handler" invariant grep-able in the file itself, not just in the comment.
  • [NIT] fly.toml — neither [[services]] nor [[mounts]] declares processes = ["app"]. With a single process group this defaults correctly, but an explicit binding would make the wiring legible to anyone reading the manifest without having to know Fly's defaulting rules. Pure ergonomics; not load-bearing.

Security review (security-sensitive label)

The architect's spec (docs/specs/architecture/38-fly-deploy-manifest.md) contains a ## Security review section with verdict PASS — prerequisite satisfied.

Re-walking the diff with security goggles:

  • FLY_API_TOKEN exposure surface. Three structural defences hold up: branch gate (if: github.event_name == 'push' && github.ref == 'refs/heads/main') keeps fork PRs out structurally; needs: [test, security, image-scan] chains existing gates; permissions: contents: read locks the job to the workflow baseline. Token flows only via env:, never with:, never echoed. GitHub Actions auto-masks registered secrets. No findings.
  • Action pin. superfly/flyctl-actions/setup-flyctl@ed8efb33836e8b2096c7fd3ba1c8afe303ebbff1 verified to match refs/tags/1.6 upstream. # Tracks: comment present, matching the Trivy / govulncheck convention. No findings.
  • Subprocess. flyctl deploy --remote-only is a fixed command with no user-controlled interpolation. No sh -c. No findings.
  • TLS / network. TCP passthrough preserved on :80 and :443; autocert stays the cert holder; rate-limiter (relay: per-IP rate limit on /v1/server + /v1/client WS upgrades #34) continues to see the real socket peer IP. No findings.
  • File ops / crypto. Manifest declares, does not write. No crypto introduced. No findings.

The spec's [Threat model alignment] SHOULD FIX (update docs/threat-model.md § Deploy security to reflect "operator-owned Fly account") is flagged as out-of-scope for this ticket and deferred — that matches the spec's own framing, not blocking.

Summary

Spec match is exact for all four files: fly.toml shape (TCP passthrough, no Fly handlers, mount path, distroless-aware argv form, four declarative single-machine knobs, immediate strategy), CI deploy job (branch gate + needs-chain + contents: read + SHA-pinned action with # Tracks: comment), docs/deploy.md three-section structure, and docs/architecture.md § Hosting placement.

The placeholder choice (__REGION__ / __DOMAIN__) is deliberate per the PR description: first CI deploy fails loud rather than silently shipping a misconfigured production relay. docs/deploy.md step 1 calls this out clearly enough that the operator-bootstrap path is obvious. Acceptable.

Comments throughout the manifest explain the why — the "no Fly handler" invariant, the distroless-argv constraint, the single-machine-not-enforced-by-platform reality, the max_machines-isn't-a-key reality — at the right density for a thin wrapper whose every line carries a load-bearing platform interaction. Good craftsmanship.

Per-ticket file `codebase/38.md` captures implementation summary,
patterns established (action-pin convention extended to setup-flyctl;
TCP-passthrough → real peer IP for #34's rate limiter; loud placeholders
over silently-misconfigured real values), and lessons learned (Fly Apps
v2 has no `max_machines` key — verify platform docs at impl time;
distroless = no shell = no argv env expansion).

New feature doc `features/fly-deploy.md` covers the manifest shape (TCP
passthrough vs. Fly HTTP proxy, single-machine cap via four declarative
knobs, dedicated-IPv4 requirement, loud placeholders), the CI deploy
job's three-layer privilege model, and the rollback paths.

Cross-link refresh in `features/docker-image.md`: #38's host-wiring
deferrals now point at the landed fly-deploy feature doc.

INDEX entries added for the new feature doc and `docs/deploy.md`; the
System overview line gains a § Hosting pointer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi ilmoniemi merged commit 56b9c95 into main May 12, 2026
4 checks passed
@ilmoniemi ilmoniemi deleted the feature/38 branch May 12, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: host-specific deploy manifest — blocked on hosting + TLS decisions

1 participant