feat(deploy): Dockerfile + compose stack for Hetzner CX22#55
Merged
Conversation
Containerised deploy target designed to fit on a 2 vCPU / 4 GB RAM / 40 GB disk box. Two files define the build, two more define the runtime topology. Dockerfile (multi-stage): - Builder: `rust:1-slim` with pkg-config + libssl-dev for the alloy TLS transitive dep. `cargo build --locked --release --bin charon` with BuildKit cache mounts on `/build/target` and the registry, so iterating on compose config doesn't re-download crates. - Runtime: `debian:bookworm-slim` with `ca-certificates` + `libssl3` only. Non-root user (uid 10001) owns `/app`. Final image ≈ 150 MB. - `EXPOSE 9091`, `ENTRYPOINT charon`, `CMD --config config/default.toml listen`. .dockerignore keeps the build context lean (≈ 5 MB) — excludes target/, .git/, docs/, contracts outputs, and secrets. deploy/compose/docker-compose.yml: - `charon` service: built from the repo-root Dockerfile, runs read-only (config mounted as :ro from the host so profile rotation doesn't require a rebuild), resource limits at 1.5 CPU / 1 GB RAM. - `alloy` service: `grafana/alloy:v1.4.3` sidecar. Scrapes `charon:9091` over the internal `charon_net` and remote_writes to Grafana Cloud. Resource limits at 0.5 CPU / 256 MB — Alloy is lean. Storage-path volume persists the WAL across restarts. - The metrics port is deliberately NOT published to the host. Alloy reaches it by DNS name, so nothing on 0.0.0.0:9091 faces the public internet on the Hetzner box. deploy/compose/alloy-config.alloy: - River config: one `prometheus.scrape` for the `charon` target + one `prometheus.remote_write` for Grafana Cloud. Credentials come from `sys.env` so the file carries no secrets and is safe to commit. 15 s scrape interval matches what the Grafana dashboard is tuned for. deploy/compose/.env.example lists every variable both services need: BNB RPC endpoints, optional signer key, and the three Grafana Cloud `GRAFANA_CLOUD_PROM_*` values. The real `.env` stays git-ignored. README gains a "Deploy" section with the three-command quickstart. Closes #22.
This was referenced Apr 22, 2026
Closed
Closed
Closed
Closed
Both FROM lines used mutable tags (rust:1-slim, debian:bookworm-slim). A Docker Hub re-tag between rebuilds could silently swap toolchain, libc, or OpenSSL and ship a different binary under the same compose invocation — supply-chain risk on a key-holding host. Pin both to the sha256 digests current as of 2026-04-22. Inline comments flag digest bumps as dedicated commits so the base-image swap is auditable in git history. Closes #286
The Dockerfile previously copied config/ into the runtime image, which duplicated what the compose bind mount overlays at startup. Two hazards: TOML values (contract addresses, RPC endpoints, any accidental secret) became visible in docker history even after the mount overrode them, and `docker run` without a mount silently launched against stale baked-in config with no visible error. Drop the COPY, add config/ to .dockerignore so future drift cannot silently re-bake the directory, and document that compose bind-mount is the only supported config-delivery path. Running the image bare now fails fast — the intended behaviour. Closes #287
Add a HEALTHCHECK that curls the Prometheus exporter on :9091. Because the listener binds only after the async startup chain finishes (WS connect, chain_id verify, first block drained), a 200 on /metrics is a reliable proxy for end-to-end readiness. Install curl in the runtime layer to make the probe self-contained. Gate the alloy sidecar on `condition: service_healthy` so the scraper does not spray connection-refused errors into the logs while charon is still establishing its BSC RPC link. The `start-period=30s` absorbs slow cold starts on Hetzner. Closes #290
Alloy's admin server previously listened on 0.0.0.0:12345 inside the compose network. No host port was published, but any container on charon_net — including a future log shipper, node exporter, or a compromised charon — could reach the rendered River config, /-/reload, and the component-graph endpoints. A /-/reload pointed at an attacker-controlled remote_write would exfiltrate every series. Bind the admin server to 127.0.0.1 inside the container so those endpoints stay reachable only from alloy itself. Ad-hoc access goes through `docker compose exec alloy`. Closes #289
Neither service specified a logging driver, so Docker defaulted to json-file with no rotation. Charon emits ~1 log per 3s BSC block, alloy logs every scrape + every remote_write retry — unbounded growth fills the 40 GB CX22 disk in a matter of weeks and faster under a Grafana Cloud outage. Cap each service at 5 × 50 MB. 250 MB per service retains ~a week of history for incident forensics while leaving plenty of headroom on the small host. Closes #291
The prometheus.remote_write block shipped without a wal stanza, so Alloy defaulted to unbounded WAL retention at /var/lib/alloy/data. A sustained Grafana Cloud outage, an expired token, or a network partition would buffer indefinitely and race log rotation for the 40 GB CX22 disk. Set truncate_frequency=15m, min_keepalive_time=30m, and max_keepalive_time=2h. Two hours of replay covers typical cloud blips while keeping the WAL disk footprint well inside the host budget. truncate_frequency sits under min_keepalive_time because Alloy's underlying Prometheus WAL rejects equal values at startup. Closes #293
The 3-command quickstart routes operators to paste the hot-wallet private key into .env, which compose then injects as a container env var — readable in plaintext by any OS user in the host docker group via `docker inspect charon`. That group is effectively root on Linux, and the original doc set had no security callout. Inline an explicit warning in .env.example next to the variable: restrict the docker group to a single deploy user, size the hot wallet as compromisable (gas + working balance only, sweep profit to cold on every callback), and earmark the migration to an external signer as the production hardening path. Raw plaintext env injection is the v0.1 concession, not the end state. Closes #288
72ded4b to
084f00e
Compare
manifest-first layering does not give the advertised fast rebuild — the cache mount on /build/target is the actual mechanism. note Cargo.lock churn still triggers full dep recompile.
docker hub tags are mutable — pin by manifest-list sha256 so a republish cannot swap the sidecar silently. matches the base-image pinning already in dockerfile.
previously no ci gate verified dockerfile correctness; broken copy paths, missing system packages, or cargo --locked failures were only caught at deploy time.
# Conflicts: # README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Dockerfileat the repo root —rust:1-slimbuilder →debian:bookworm-slimruntime, ~150 MB final image, non-root uid 10001, BuildKit cache mounts so compose-loop iterations stay fast.dockerignorekeeps the build context to ~5 MBdeploy/compose/docker-compose.yml— two services (charon,alloy) fit the CX22 resource budget (2 vCPU / 4 GB RAM). Metrics port is internal-only; alloy reaches it by DNS namedeploy/compose/alloy-config.alloy— Grafana Alloy (River) scrape + remote_write to Grafana Cloud, credentials viasys.envdeploy/compose/.env.example— every env var both services need, grouped by purposeWhy Alloy, not Prometheus + Grafana locally
Grafana Cloud free tier is the visualisation surface (PR #54). A local Prom server is redundant and eats the CX22's RAM budget; local Grafana triples the surface to tune. Alloy does the one job we actually need — scrape then remote_write — in ~256 MB.
Security posture
:9091is not published to the host. Alloy reaches the exporter via compose DNS, so the metrics endpoint never faces the public internet..envstays git-ignored;.env.exampleis the checked-in template.ca-certificates+libssl3— no shell tooling beyond what Debian-slim ships.Test plan
cargo build --lockedfor reproducibilitydocker compose up -d --buildon the target host (next session — local machine has no docker installed, so end-to-end validation has to happen on the CX22 itself or in a VM)sys.env(...)per Alloy 1.x; verify on first deploy that credentials resolveStacked PR
Base is
feat/26-grafana-dashboard(PR #54). This is the final M4 PR. Merge order: #46 → #50 → #51 → #52 → #53 → #54 → this.Closes #22.