Skip to content

feat(deploy): Dockerfile + compose stack for Hetzner CX22#55

Merged
obchain merged 12 commits intomainfrom
feat/27-docker-compose
Apr 24, 2026
Merged

feat(deploy): Dockerfile + compose stack for Hetzner CX22#55
obchain merged 12 commits intomainfrom
feat/27-docker-compose

Conversation

@obchain
Copy link
Copy Markdown
Owner

@obchain obchain commented Apr 22, 2026

Summary

  • Multi-stage Dockerfile at the repo root — rust:1-slim builder → debian:bookworm-slim runtime, ~150 MB final image, non-root uid 10001, BuildKit cache mounts so compose-loop iterations stay fast
  • .dockerignore keeps the build context to ~5 MB
  • deploy/compose/docker-compose.yml — two services (charon, alloy) fit the CX22 resource budget (2 vCPU / 4 GB RAM). Metrics port is internal-only; alloy reaches it by DNS name
  • deploy/compose/alloy-config.alloy — Grafana Alloy (River) scrape + remote_write to Grafana Cloud, credentials via sys.env
  • deploy/compose/.env.example — every env var both services need, grouped by purpose
  • README "Deploy" section with the three-command quickstart

Why Alloy, not Prometheus + Grafana locally

Grafana Cloud free tier is the visualisation surface (PR #54). A local Prom server is redundant and eats the CX22's RAM budget; local Grafana triples the surface to tune. Alloy does the one job we actually need — scrape then remote_write — in ~256 MB.

Security posture

  • :9091 is not published to the host. Alloy reaches the exporter via compose DNS, so the metrics endpoint never faces the public internet.
  • .env stays git-ignored; .env.example is the checked-in template.
  • Non-root uid 10001 inside the container. Runtime layer has only ca-certificates + libssl3 — no shell tooling beyond what Debian-slim ships.

Test plan

  • Cargo workspace sweep green (fmt, clippy — no code changes here that would break anything)
  • Compose YAML shape check (indentation, top-level keys)
  • Dockerfile uses cargo build --locked for reproducibility
  • docker compose up -d --build on the target host (next session — local machine has no docker installed, so end-to-end validation has to happen on the CX22 itself or in a VM)
  • Alloy River syntax — uses sys.env(...) per Alloy 1.x; verify on first deploy that credentials resolve

Stacked PR

Base is feat/26-grafana-dashboard (PR #54). This is the final M4 PR. Merge order: #46#50#51#52#53#54 → this.

Closes #22.

Containerised deploy target designed to fit on a 2 vCPU / 4 GB RAM /
40 GB disk box. Two files define the build, two more define the
runtime topology.

Dockerfile (multi-stage):
- Builder: `rust:1-slim` with pkg-config + libssl-dev for the alloy
  TLS transitive dep. `cargo build --locked --release --bin charon`
  with BuildKit cache mounts on `/build/target` and the registry,
  so iterating on compose config doesn't re-download crates.
- Runtime: `debian:bookworm-slim` with `ca-certificates` + `libssl3`
  only. Non-root user (uid 10001) owns `/app`. Final image ≈ 150 MB.
- `EXPOSE 9091`, `ENTRYPOINT charon`, `CMD --config config/default.toml listen`.

.dockerignore keeps the build context lean (≈ 5 MB) — excludes
target/, .git/, docs/, contracts outputs, and secrets.

deploy/compose/docker-compose.yml:
- `charon` service: built from the repo-root Dockerfile, runs read-only
  (config mounted as :ro from the host so profile rotation doesn't
  require a rebuild), resource limits at 1.5 CPU / 1 GB RAM.
- `alloy` service: `grafana/alloy:v1.4.3` sidecar. Scrapes
  `charon:9091` over the internal `charon_net` and remote_writes to
  Grafana Cloud. Resource limits at 0.5 CPU / 256 MB — Alloy is
  lean. Storage-path volume persists the WAL across restarts.
- The metrics port is deliberately NOT published to the host. Alloy
  reaches it by DNS name, so nothing on 0.0.0.0:9091 faces the
  public internet on the Hetzner box.

deploy/compose/alloy-config.alloy:
- River config: one `prometheus.scrape` for the `charon` target + one
  `prometheus.remote_write` for Grafana Cloud. Credentials come from
  `sys.env` so the file carries no secrets and is safe to commit.
  15 s scrape interval matches what the Grafana dashboard is tuned
  for.

deploy/compose/.env.example lists every variable both services need:
BNB RPC endpoints, optional signer key, and the three Grafana Cloud
`GRAFANA_CLOUD_PROM_*` values. The real `.env` stays git-ignored.

README gains a "Deploy" section with the three-command quickstart.

Closes #22.
This was referenced Apr 22, 2026
obchain added 7 commits April 23, 2026 16:20
Both FROM lines used mutable tags (rust:1-slim,
debian:bookworm-slim). A Docker Hub re-tag between rebuilds could
silently swap toolchain, libc, or OpenSSL and ship a different
binary under the same compose invocation — supply-chain risk on a
key-holding host.

Pin both to the sha256 digests current as of 2026-04-22. Inline
comments flag digest bumps as dedicated commits so the base-image
swap is auditable in git history.

Closes #286
The Dockerfile previously copied config/ into the runtime image,
which duplicated what the compose bind mount overlays at startup.
Two hazards: TOML values (contract addresses, RPC endpoints, any
accidental secret) became visible in docker history even after the
mount overrode them, and `docker run` without a mount silently
launched against stale baked-in config with no visible error.

Drop the COPY, add config/ to .dockerignore so future drift cannot
silently re-bake the directory, and document that compose bind-mount
is the only supported config-delivery path. Running the image bare
now fails fast — the intended behaviour.

Closes #287
Add a HEALTHCHECK that curls the Prometheus exporter on :9091.
Because the listener binds only after the async startup chain
finishes (WS connect, chain_id verify, first block drained), a 200
on /metrics is a reliable proxy for end-to-end readiness. Install
curl in the runtime layer to make the probe self-contained.

Gate the alloy sidecar on `condition: service_healthy` so the
scraper does not spray connection-refused errors into the logs
while charon is still establishing its BSC RPC link. The
`start-period=30s` absorbs slow cold starts on Hetzner.

Closes #290
Alloy's admin server previously listened on 0.0.0.0:12345 inside
the compose network. No host port was published, but any container
on charon_net — including a future log shipper, node exporter, or a
compromised charon — could reach the rendered River config,
/-/reload, and the component-graph endpoints. A /-/reload pointed
at an attacker-controlled remote_write would exfiltrate every
series.

Bind the admin server to 127.0.0.1 inside the container so those
endpoints stay reachable only from alloy itself. Ad-hoc access goes
through `docker compose exec alloy`.

Closes #289
Neither service specified a logging driver, so Docker defaulted to
json-file with no rotation. Charon emits ~1 log per 3s BSC block,
alloy logs every scrape + every remote_write retry — unbounded
growth fills the 40 GB CX22 disk in a matter of weeks and faster
under a Grafana Cloud outage.

Cap each service at 5 × 50 MB. 250 MB per service retains ~a week
of history for incident forensics while leaving plenty of headroom
on the small host.

Closes #291
The prometheus.remote_write block shipped without a wal stanza, so
Alloy defaulted to unbounded WAL retention at
/var/lib/alloy/data. A sustained Grafana Cloud outage, an expired
token, or a network partition would buffer indefinitely and race
log rotation for the 40 GB CX22 disk.

Set truncate_frequency=15m, min_keepalive_time=30m, and
max_keepalive_time=2h. Two hours of replay covers typical cloud
blips while keeping the WAL disk footprint well inside the host
budget. truncate_frequency sits under min_keepalive_time because
Alloy's underlying Prometheus WAL rejects equal values at startup.

Closes #293
The 3-command quickstart routes operators to paste the hot-wallet
private key into .env, which compose then injects as a container
env var — readable in plaintext by any OS user in the host docker
group via `docker inspect charon`. That group is effectively root
on Linux, and the original doc set had no security callout.

Inline an explicit warning in .env.example next to the variable:
restrict the docker group to a single deploy user, size the hot
wallet as compromisable (gas + working balance only, sweep profit
to cold on every callback), and earmark the migration to an
external signer as the production hardening path. Raw plaintext
env injection is the v0.1 concession, not the end state.

Closes #288
@obchain obchain force-pushed the feat/27-docker-compose branch from 72ded4b to 084f00e Compare April 23, 2026 11:00
obchain added 3 commits April 23, 2026 18:55
manifest-first layering does not give the advertised fast rebuild —
the cache mount on /build/target is the actual mechanism. note
Cargo.lock churn still triggers full dep recompile.
docker hub tags are mutable — pin by manifest-list sha256 so a
republish cannot swap the sidecar silently. matches the base-image
pinning already in dockerfile.
previously no ci gate verified dockerfile correctness; broken copy
paths, missing system packages, or cargo --locked failures were
only caught at deploy time.
@obchain obchain changed the base branch from feat/26-grafana-dashboard to main April 24, 2026 17:32
@obchain obchain merged commit ccd8f42 into main Apr 24, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[deploy] Docker Compose: charon + prometheus + grafana stack for Hetzner CX22

1 participant