Skip to content

TLS mTLS

cyb3rjerry edited this page May 23, 2026 · 1 revision

TLS / mTLS

FANGS's orchestrator↔runner protocol defaults to plain HTTP because the common deployment is single-host (orchestrator + runner + UI on the same loopback). For anything else, turn on TLS.

Modes

Selected by which flags are set on the orchestrator:

Mode Flags What it gives you
Plain HTTP (none) No transport security. Loopback-only deployments.
HTTPS (server-only) -tls-cert + -tls-key Encrypts the wire. Runner verifies orchestrator identity via -tls-ca. Anyone with the URL can still register a rogue runner.
mTLS (recommended) + -tls-client-ca Above, plus every connection must present a client cert signed by the CA. Runner identity is cryptographically verified.

Bootstrap script

docs/scripts/gen-tls.sh mints a CA + server cert + runner client cert with openssl in ~5 seconds:

SERVER_HOSTS="fangs.example.internal" RUNNER_ID="prod-runner-1" \
  docs/scripts/gen-tls.sh

Outputs at tls/:

tls/ca.crt          ← trust this on every runner + every Prometheus scraper
tls/ca.key          ← CA private key; move OFFLINE in production
tls/server.crt      ← orchestrator presents this
tls/server.key      ← orchestrator's private key
tls/runner.crt      ← runner client cert (CN = $RUNNER_ID)
tls/runner.key      ← runner's private key

What the script does:

  • 4096-bit RSA throughout
  • 365-day validity (override with $DAYS)
  • Server cert SANs from $SERVER_HOSTS plus localhost, 127.0.0.1, ::1
  • Server cert has extendedKeyUsage=serverAuth
  • Runner cert has extendedKeyUsage=clientAuth + CN=$RUNNER_ID
  • .key files chmod'd to 600

Reads $SERVER_HOSTS for the server SAN list. Distinguishes IP literals from DNS names automatically:

SERVER_HOSTS="fangs.internal 10.0.0.5" docs/scripts/gen-tls.sh

The CA's private key (tls/ca.key) is only needed when minting new certs. In production move it offline immediately:

# Mint all the runner certs you'll need
for i in 1 2 3; do
  RUNNER_ID=runner-$i docs/scripts/gen-tls.sh
done

# Move the CA key offline
sudo mv tls/ca.key /secure/offline/storage/

Operators wanting Prometheus to scrape via mTLS need a client cert too:

RUNNER_ID=prometheus-scraper docs/scripts/gen-tls.sh

Running with mTLS

Orchestrator

./bin/fangs-orchestrator \
  -addr 0.0.0.0:8443 \
  -tls-cert tls/server.crt \
  -tls-key tls/server.key \
  -tls-client-ca tls/ca.crt

Startup log:

{"msg":"orchestrator listening","addr":"0.0.0.0:8443","scheme":"https (mTLS)"}

Runner

sudo ./bin/fangs-runner \
  -orchestrator https://fangs.example.internal:8443 \
  -tls-ca tls/ca.crt \
  -tls-cert tls/runner.crt \
  -tls-key tls/runner.key

The runner's agent.Client builds an *http.Transport with both the CA pool (verifies orchestrator) AND the client cert (presented during handshake). The same transport gets reused by the EventStreamer + the long-poll one-shot client, so every RPC carries the cert.

Verification

# Plain HTTP request → handshake error
curl http://fangs.example.internal:8443/v1/health
# → "Client sent an HTTP request to an HTTPS server"

# HTTPS without client cert → handshake aborted
curl --cacert tls/ca.crt https://fangs.example.internal:8443/v1/health
# → "tls: client didn't provide a certificate"

# HTTPS with full mTLS → 200 OK
curl --cacert tls/ca.crt --cert tls/runner.crt --key tls/runner.key \
     https://fangs.example.internal:8443/v1/health
# → {"status":"ok","orchestrator_id":"fangs-orchestrator","version":"dev"}

What mTLS does and doesn't protect

Protects

  • Wire encryption. Events, scan results, heartbeats encrypted in transit. No plaintext sensor events crossing networks.
  • Runner identity. Only runners with a cert signed by the configured CA can register or heartbeat. A leaked orchestrator URL is not enough.
  • Orchestrator identity. Runners verify they're talking to the intended orchestrator, not a man-in-the-middle.

Doesn't protect

  • Sandbox-to-internet. mTLS bounds the runner↔orchestrator hop. The sandbox container still talks to whatever the package's install scripts reach for (npm registry, CDNs, attacker C2 if it's there).
  • Database connections. Postgres-backed deployments need their own TLS layer (?sslmode=verify-full on the DSN).
  • The UI dashboard. Same listener — when mTLS is on, the UI also requires a client cert. Operators using browser-based UI typically put a reverse proxy in front and let the proxy handle browser auth (basic auth, oauth2-proxy, etc.). The proxy uses an mTLS cert to the orchestrator; humans authenticate to the proxy.
  • CLI access. Same — the fangs CLI talks directly to the DB for most subcommands, not through HTTP. fangs scan submit and the package add kickoff POST /v1/scans, which means those endpoints need a CLI-side client cert too. (v2 item: bake the CLI cert bootstrap into a config file like the orchestrator + runner have.)

Cert rotation

The default validity is 365 days. Renew before that:

# Bump validity to 2 years
DAYS=730 docs/scripts/gen-tls.sh

The script regenerates everything (CA + server + runner). To rotate JUST a leaf cert (server or runner) without changing the CA, hand- craft with openssl using the existing CA — the script doesn't currently support partial regen but the CA + signing logic is small and copy-pastable.

After rotation:

  1. Replace files at their orchestrator + runner paths.
  2. Restart both processes — Go's tls.LoadX509KeyPair reads once at startup; there's no hot-reload.

For zero-downtime rotation on a single-runner setup: temporarily run two orchestrators on different ports, migrate the runner over, then shut down the old.

CA hygiene

  • CA key offline. After issuing the certs you need, move tls/ca.key to a secure location (hardware token, vault, offline drive). Compromise of the CA key lets an attacker mint runner certs.
  • Per-runner certs, not shared. Mint a unique cert per runner so revocation is granular. The CN is recorded in the registration metadata so you can identify which runner sent which heartbeat.
  • CRL not wired. Today there's no Certificate Revocation List check. To revoke a runner cert, take it out of the runner's filesystem + reissue the orchestrator's CA bundle. A future enhancement could wire CRL or OCSP into the tls.Config.VerifyPeerCertificate callback.

Cert types used

File Purpose Generated by
ca.crt trust anchor openssl req -x509 -new ... -CN fangs-ca
ca.key CA private key openssl genrsa -out ca.key 4096
server.crt orchestrator's server cert CA-signed, extendedKeyUsage=serverAuth
server.key server private key openssl genrsa -out server.key 4096
runner.crt runner's client cert CA-signed, extendedKeyUsage=clientAuth
runner.key runner private key openssl genrsa -out runner.key 4096

Production checklist

Before deploying mTLS:

  • Run gen-tls.sh with the production SERVER_HOSTS list (must include every name the orchestrator's listen address will be reached at)
  • Issue a runner cert for every runner host (RUNNER_ID per host)
  • Issue a Prometheus scraper cert if you want metrics monitoring
  • Move tls/ca.key offline
  • Distribute tls/server.crt + tls/server.key to the orchestrator host(s); chmod 0600 on the .key
  • Distribute tls/ca.crt + tls/runner.crt + tls/runner.key to each runner host
  • Document the cert renewal cadence somewhere ops will see it (the certs silently fail after 365 days otherwise)
  • Set up Prometheus scrape with tls_config: pointing at the scraper cert
  • Pre-stage the rotation procedure so the runbook is in place before you need it

Clone this wiki locally