-
Notifications
You must be signed in to change notification settings - Fork 0
manifests nats supercluster
Multi-cluster NATS JetStream topology with gateway-meshed clusters.
The operational guide for the static sample manifest committed at
ci/manifests/nats-supercluster/.
For the underlying Python generator (ClusterTopology /
SuperclusterTopology / build_nats_conf /
build_cluster_manifests in noetl/core/runtime/nats_topology.py),
see the
NATS Supercluster page
on the noetl/noetl wiki.
| File | Purpose |
|---|---|
namespace.yaml |
nats-supercluster Namespace. Separate from the existing nats namespace so both topologies can coexist. |
cluster-a.yaml |
ConfigMap + 3-replica StatefulSet + headless Service for cluster a in us-east-1. |
cluster-b.yaml |
Symmetric for cluster b in us-west-2. |
README.md |
Quick-start: apply + verify + regen recipe. |
Sibling: a parameterized renderer playbook at
automation/infrastructure/nats_supercluster.yaml
deploys one cluster member at a time with overrideable
parameters. The static 2-cluster manifests in this directory are
the opinionated reference for local kind validation; the
playbook is the parameterized path for arbitrary deployments.
NATS exposes two related topologies:
-
Cluster — 3+ NATS servers connected via a
cluster {}block; share JetStream state via Raft consensus. Single account namespace, mutualrouteURLs between members. -
Supercluster — multiple clusters connected via NATS
gateway connections (
gateway {}block). Each cluster has its own JetStream state; gateways enable cross-cluster subject routing without shared Raft.
The static manifests here ship the supercluster shape — two 3-replica clusters with mutual gateway connections.
port: 4222
http_port: 8222
jetstream {
store_dir: /data/jetstream
domain: "tenant_default_org_default_region_us_east_1_cluster_a"
max_memory_store: 1GB
max_file_store: 5GB
}
cluster {
name: "a"
port: 6222
routes: [
nats-route://nats-cluster-a-0.nats-cluster-a.nats-supercluster.svc.cluster.local:6222
nats-route://nats-cluster-a-1.nats-cluster-a.nats-supercluster.svc.cluster.local:6222
nats-route://nats-cluster-a-2.nats-cluster-a.nats-supercluster.svc.cluster.local:6222
]
}
gateway {
name: "a"
port: 7222
gateways: [
{ name: "b", urls: ["nats://nats-cluster-b.nats-supercluster.svc.cluster.local:7222"] }
]
}
accounts {
$SYS {
users: [
{ user: sys, password: sys }
]
}
NOETL {
jetstream: enabled
users: [
{ user: noetl, password: noetl }
]
}
}The accounts block is preserved verbatim from the existing
single-node
ci/manifests/nats/nats.yaml
(on the noetl/noetl repo) so the noetl user — and every
client + worker that currently authenticates against it — keeps
working against the supercluster without re-issuing
credentials. Per-tenant accounts are out-of-phase follow-up
work.
The JetStream domain is URN-derived from
ClusterTopology.cluster_urn: the URN's NATS subject form
with noetl. stripped and ./- collapsed to _.
Manual one-off cluster setup — not bundled into
noetl k8s deploy. The existing single-node deployment in the
nats namespace stays untouched; the supercluster is a
separate, opt-in topology.
# From the noetl/ops repo root
kubectl apply -f ci/manifests/nats-supercluster/namespace.yaml
kubectl apply -f ci/manifests/nats-supercluster/cluster-a.yaml
kubectl apply -f ci/manifests/nats-supercluster/cluster-b.yaml
kubectl rollout status statefulset/nats-cluster-a -n nats-supercluster
kubectl rollout status statefulset/nats-cluster-b -n nats-supercluster# Pods Running / Ready
kubectl get pods -n nats-supercluster -o wide
# Inspect routes and gateways via the monitoring port
kubectl port-forward -n nats-supercluster nats-cluster-a-0 18222:8222
curl -s http://localhost:18222/routez | jq '.num_routes'
curl -s http://localhost:18222/gatewayz | jq '{name, outbound: .outbound_gateways|keys, inbound: .inbound_gateways|keys}'
# Or via the nats CLI
nats server gateway list
nats stream cluster-info <stream-name>In a healthy 2-cluster supercluster you should see:
-
server_name: unique per pod (e.g.nats-cluster-a-0). -
cluster.name: matches the cluster ID (aorb). -
cluster.urls: N pod-DNS routes inside the same cluster. -
gateway.outbound_gateways: the peer cluster's name. -
gateway.inbound_gateways: the peer cluster's name (bidirectional once both clusters are up).
Validation in a local kind cluster confirmed all of the above
plus the URN-derived JetStream domain:
tenant_default_org_default_region_us_east_1_cluster_a.
| Knob | Default | Rule of thumb |
|---|---|---|
cluster_size |
3 |
JetStream Raft requires 3 minimum for HA. Bump to 5 for higher fault tolerance; odd sizes only. |
region / zone (per cluster) |
None |
Set per-cluster locality so pod labels carry placement metadata for the scheduler. |
| Gateway TLS | not configured | Production deployments should add tls { ... } inside the gateway block. Out-of-phase follow-up. |
max_file_store / max_memory_store
|
5GB / 1GB
|
Per-cluster JetStream storage. Match volumeClaimTemplates.storage. |
| PVC size | 5Gi |
volumeClaimTemplates matches max_file_store; bump together. |
| Per-tenant accounts | single NOETL
|
Out-of-phase. The current manifests preserve the existing NOETL account; per-tenant accounts wait for the catalog era. |
The default cluster_size: 3 × 2 clusters = 6 pods, each
requesting cpu: 250m/memory: 512Mi and limited at
cpu: 1000m/memory: 2Gi. So the full default supercluster
sits at ~1500m CPU requests / 3Gi memory requests, peak
6000m CPU / 12Gi memory under burst.
A stock kind cluster runs on one Kubernetes node with the podman VM's CPU budget (typically 4 vCPU on Apple Silicon defaults). The full default supercluster alone consumes ~38% of that. Stacked with the rest of NoETL (postgres + nats single- node + noetl-server / projector / outbox-publisher / 3 workers
- paginated-api + KEDA operator) the node hits ~96% CPU requests, at which point the scheduler refuses to place additional pods — including a third supercluster replica or KEDA-driven scale-up of the noetl-worker pool.
Mitigations for local kind validation:
- Drop
cluster_sizeto1per cluster — sufficient to validate the gateway mesh + JetStream domain derivation, not HA. 2 pods instead of 6. - Scale the supercluster
StatefulSets to0while running a KEDA scale-up smoke test, restore after. Validated this pattern in the local kind validation; works cleanly. - Bump the podman machine to 6+ vCPU and rebuild the kind cluster.
For production deployments (GKE / EKS / multi-node
clusters), the default cluster_size: 3 works as designed —
the node-spread that JetStream Raft benefits from happens
naturally with multiple nodes available.
These are bugs caught during live-kind validation that the generator now bakes in correctly. Documented here so anyone hand-editing manifests doesn't accidentally regress them:
-
server_nameis required for cluster-mode JetStream. Each pod must register under a unique server name. The generator pulls it from the downward API (metadata.name → POD_NAME → --name $(POD_NAME)). Without it NATS refuses to start withjetstream cluster requires server_name to be set. -
Use the split
/healthzendpoints. Plain/healthzreturns failure during normal JetStream meta-layer recovery, which is long enough that liveness probes kill pods before the cluster forms. Use:-
livenessProbe→/healthz?js-server-only=true -
readinessProbe→/healthz?js-enabled-only=true -
startupProbe→/healthz?js-server-only=truewith a longfailureThreshold(60+) to give cluster formation time.
-
-
Headless Service needs
publishNotReadyAddresses: true. Gateway URLs resolve through the peer cluster's headless Service DNS. Default headless Services only publish Ready pods → chicken-and-egg between peer clusters (each waits for the other to be Ready before its own gateway DNS resolves).publishNotReadyAddresses: truebreaks the cycle.
-
No client-side rewiring. NoETL's Python publishers /
subscribers and the worker ConfigMap keep pointing at
nats.nats.svc.cluster.local. Cluster-aware client routing arrives once the catalog can pick the right cluster per request. - No per-tenant NATS accounts. Future round.
- No cross-cluster stream mirror / source. The gateway topology enables it; nothing in this round configures a stream to mirror.
-
No edits to the existing single-node
ci/manifests/nats/. The supercluster is opt-in and lives alongside the existing deployment.
-
KEDA Scaler — worker autoscaling. Currently
uses the single
NOETLaccount; per-tenant accounts + account-aware scalers are out-of-phase work that will key off this supercluster topology. -
NATS Supercluster (noetl/noetl wiki)
— Python generator API reference for
ClusterTopology/SuperclusterTopology/build_nats_conf/build_cluster_manifests. - Resource Locator (noetl/noetl wiki) — URN scheme. JetStream domains derive from this.
- NATS supercluster docs: https://docs.nats.io/running-a-nats-service/configuration/gateways
- NATS clustering: https://docs.nats.io/running-a-nats-service/configuration/clustering