-
Notifications
You must be signed in to change notification settings - Fork 1
TLS mTLS
FANGS's orchestrator↔runner protocol defaults to plain HTTP because the common deployment is single-host (orchestrator + runner + UI on the same loopback). For anything else, turn on TLS.
Selected by which flags are set on the orchestrator:
| Mode | Flags | What it gives you |
|---|---|---|
| Plain HTTP | (none) | No transport security. Loopback-only deployments. |
| HTTPS (server-only) |
-tls-cert + -tls-key
|
Encrypts the wire. Runner verifies orchestrator identity via -tls-ca. Anyone with the URL can still register a rogue runner. |
| mTLS (recommended) | + -tls-client-ca |
Above, plus every connection must present a client cert signed by the CA. Runner identity is cryptographically verified. |
docs/scripts/gen-tls.sh mints a CA + server cert + runner client
cert with openssl in ~5 seconds:
SERVER_HOSTS="fangs.example.internal" RUNNER_ID="prod-runner-1" \
docs/scripts/gen-tls.shOutputs at tls/:
tls/ca.crt ← trust this on every runner + every Prometheus scraper
tls/ca.key ← CA private key; move OFFLINE in production
tls/server.crt ← orchestrator presents this
tls/server.key ← orchestrator's private key
tls/runner.crt ← runner client cert (CN = $RUNNER_ID)
tls/runner.key ← runner's private key
What the script does:
- 4096-bit RSA throughout
- 365-day validity (override with
$DAYS) - Server cert SANs from
$SERVER_HOSTSpluslocalhost,127.0.0.1,::1 - Server cert has
extendedKeyUsage=serverAuth - Runner cert has
extendedKeyUsage=clientAuth+ CN=$RUNNER_ID -
.keyfiles chmod'd to 600
Reads $SERVER_HOSTS for the server SAN list. Distinguishes IP
literals from DNS names automatically:
SERVER_HOSTS="fangs.internal 10.0.0.5" docs/scripts/gen-tls.shThe CA's private key (tls/ca.key) is only needed when minting new
certs. In production move it offline immediately:
# Mint all the runner certs you'll need
for i in 1 2 3; do
RUNNER_ID=runner-$i docs/scripts/gen-tls.sh
done
# Move the CA key offline
sudo mv tls/ca.key /secure/offline/storage/Operators wanting Prometheus to scrape via mTLS need a client cert too:
RUNNER_ID=prometheus-scraper docs/scripts/gen-tls.sh./bin/fangs-orchestrator \
-addr 0.0.0.0:8443 \
-tls-cert tls/server.crt \
-tls-key tls/server.key \
-tls-client-ca tls/ca.crtStartup log:
{"msg":"orchestrator listening","addr":"0.0.0.0:8443","scheme":"https (mTLS)"}
sudo ./bin/fangs-runner \
-orchestrator https://fangs.example.internal:8443 \
-tls-ca tls/ca.crt \
-tls-cert tls/runner.crt \
-tls-key tls/runner.keyThe runner's agent.Client builds an *http.Transport with both the
CA pool (verifies orchestrator) AND the client cert (presented during
handshake). The same transport gets reused by the EventStreamer + the
long-poll one-shot client, so every RPC carries the cert.
# Plain HTTP request → handshake error
curl http://fangs.example.internal:8443/v1/health
# → "Client sent an HTTP request to an HTTPS server"
# HTTPS without client cert → handshake aborted
curl --cacert tls/ca.crt https://fangs.example.internal:8443/v1/health
# → "tls: client didn't provide a certificate"
# HTTPS with full mTLS → 200 OK
curl --cacert tls/ca.crt --cert tls/runner.crt --key tls/runner.key \
https://fangs.example.internal:8443/v1/health
# → {"status":"ok","orchestrator_id":"fangs-orchestrator","version":"dev"}- Wire encryption. Events, scan results, heartbeats encrypted in transit. No plaintext sensor events crossing networks.
- Runner identity. Only runners with a cert signed by the configured CA can register or heartbeat. A leaked orchestrator URL is not enough.
- Orchestrator identity. Runners verify they're talking to the intended orchestrator, not a man-in-the-middle.
- Sandbox-to-internet. mTLS bounds the runner↔orchestrator hop. The sandbox container still talks to whatever the package's install scripts reach for (npm registry, CDNs, attacker C2 if it's there).
-
Database connections. Postgres-backed deployments need their
own TLS layer (
?sslmode=verify-fullon the DSN). - The UI dashboard. Same listener — when mTLS is on, the UI also requires a client cert. Operators using browser-based UI typically put a reverse proxy in front and let the proxy handle browser auth (basic auth, oauth2-proxy, etc.). The proxy uses an mTLS cert to the orchestrator; humans authenticate to the proxy.
-
CLI access. Same — the
fangsCLI talks directly to the DB for most subcommands, not through HTTP.fangs scan submitand thepackage addkickoff POST/v1/scans, which means those endpoints need a CLI-side client cert too. (v2 item: bake the CLI cert bootstrap into a config file like the orchestrator + runner have.)
The default validity is 365 days. Renew before that:
# Bump validity to 2 years
DAYS=730 docs/scripts/gen-tls.shThe script regenerates everything (CA + server + runner). To rotate JUST a leaf cert (server or runner) without changing the CA, hand- craft with openssl using the existing CA — the script doesn't currently support partial regen but the CA + signing logic is small and copy-pastable.
After rotation:
- Replace files at their orchestrator + runner paths.
- Restart both processes — Go's
tls.LoadX509KeyPairreads once at startup; there's no hot-reload.
For zero-downtime rotation on a single-runner setup: temporarily run two orchestrators on different ports, migrate the runner over, then shut down the old.
-
CA key offline. After issuing the certs you need, move
tls/ca.keyto a secure location (hardware token, vault, offline drive). Compromise of the CA key lets an attacker mint runner certs. - Per-runner certs, not shared. Mint a unique cert per runner so revocation is granular. The CN is recorded in the registration metadata so you can identify which runner sent which heartbeat.
-
CRL not wired. Today there's no Certificate Revocation List
check. To revoke a runner cert, take it out of the runner's
filesystem + reissue the orchestrator's CA bundle. A future
enhancement could wire CRL or OCSP into the
tls.Config.VerifyPeerCertificatecallback.
| File | Purpose | Generated by |
|---|---|---|
ca.crt |
trust anchor | openssl req -x509 -new ... -CN fangs-ca |
ca.key |
CA private key | openssl genrsa -out ca.key 4096 |
server.crt |
orchestrator's server cert | CA-signed, extendedKeyUsage=serverAuth
|
server.key |
server private key | openssl genrsa -out server.key 4096 |
runner.crt |
runner's client cert | CA-signed, extendedKeyUsage=clientAuth
|
runner.key |
runner private key | openssl genrsa -out runner.key 4096 |
Before deploying mTLS:
- Run gen-tls.sh with the production SERVER_HOSTS list (must include every name the orchestrator's listen address will be reached at)
- Issue a runner cert for every runner host (RUNNER_ID per host)
- Issue a Prometheus scraper cert if you want metrics monitoring
- Move
tls/ca.keyoffline - Distribute
tls/server.crt+tls/server.keyto the orchestrator host(s); chmod 0600 on the .key - Distribute
tls/ca.crt+tls/runner.crt+tls/runner.keyto each runner host - Document the cert renewal cadence somewhere ops will see it (the certs silently fail after 365 days otherwise)
- Set up Prometheus scrape with
tls_config:pointing at the scraper cert - Pre-stage the rotation procedure so the runbook is in place before you need it