Skip to content

manifests nats supercluster

Kadyapam edited this page May 23, 2026 · 2 revisions

NATS Supercluster

Multi-cluster NATS JetStream topology with gateway-meshed clusters. The operational guide for the static sample manifest committed at ci/manifests/nats-supercluster/.

For the underlying Python generator (ClusterTopology / SuperclusterTopology / build_nats_conf / build_cluster_manifests in noetl/core/runtime/nats_topology.py), see the NATS Supercluster page on the noetl/noetl wiki.

What's in ci/manifests/nats-supercluster/

File Purpose
namespace.yaml nats-supercluster Namespace. Separate from the existing nats namespace so both topologies can coexist.
cluster-a.yaml ConfigMap + 3-replica StatefulSet + headless Service for cluster a in us-east-1.
cluster-b.yaml Symmetric for cluster b in us-west-2.
README.md Quick-start: apply + verify + regen recipe.

Sibling: a parameterized renderer playbook at automation/infrastructure/nats_supercluster.yaml deploys one cluster member at a time with overrideable parameters. The static 2-cluster manifests in this directory are the opinionated reference for local kind validation; the playbook is the parameterized path for arbitrary deployments.

Cluster vs. supercluster

NATS exposes two related topologies:

  • Cluster — 3+ NATS servers connected via a cluster {} block; share JetStream state via Raft consensus. Single account namespace, mutual route URLs between members.
  • Supercluster — multiple clusters connected via NATS gateway connections (gateway {} block). Each cluster has its own JetStream state; gateways enable cross-cluster subject routing without shared Raft.

The static manifests here ship the supercluster shape — two 3-replica clusters with mutual gateway connections.

Generated nats.conf (cluster a)

port: 4222
http_port: 8222

jetstream {
  store_dir: /data/jetstream
  domain: "tenant_default_org_default_region_us_east_1_cluster_a"
  max_memory_store: 1GB
  max_file_store: 5GB
}

cluster {
  name: "a"
  port: 6222
  routes: [
    nats-route://nats-cluster-a-0.nats-cluster-a.nats-supercluster.svc.cluster.local:6222
    nats-route://nats-cluster-a-1.nats-cluster-a.nats-supercluster.svc.cluster.local:6222
    nats-route://nats-cluster-a-2.nats-cluster-a.nats-supercluster.svc.cluster.local:6222
  ]
}

gateway {
  name: "a"
  port: 7222
  gateways: [
    { name: "b", urls: ["nats://nats-cluster-b.nats-supercluster.svc.cluster.local:7222"] }
  ]
}

accounts {
  $SYS {
    users: [
      { user: sys, password: sys }
    ]
  }
  NOETL {
    jetstream: enabled
    users: [
      { user: noetl, password: noetl }
    ]
  }
}

The accounts block is preserved verbatim from the existing single-node ci/manifests/nats/nats.yaml (on the noetl/noetl repo) so the noetl user — and every client + worker that currently authenticates against it — keeps working against the supercluster without re-issuing credentials. Per-tenant accounts are out-of-phase follow-up work.

The JetStream domain is URN-derived from ClusterTopology.cluster_urn: the URN's NATS subject form with noetl. stripped and ./- collapsed to _.

Install + verify

Manual one-off cluster setup — not bundled into noetl k8s deploy. The existing single-node deployment in the nats namespace stays untouched; the supercluster is a separate, opt-in topology.

Apply

# From the noetl/ops repo root
kubectl apply -f ci/manifests/nats-supercluster/namespace.yaml
kubectl apply -f ci/manifests/nats-supercluster/cluster-a.yaml
kubectl apply -f ci/manifests/nats-supercluster/cluster-b.yaml

kubectl rollout status statefulset/nats-cluster-a -n nats-supercluster
kubectl rollout status statefulset/nats-cluster-b -n nats-supercluster

Verify

# Pods Running / Ready
kubectl get pods -n nats-supercluster -o wide

# Inspect routes and gateways via the monitoring port
kubectl port-forward -n nats-supercluster nats-cluster-a-0 18222:8222
curl -s http://localhost:18222/routez   | jq '.num_routes'
curl -s http://localhost:18222/gatewayz | jq '{name, outbound: .outbound_gateways|keys, inbound: .inbound_gateways|keys}'

# Or via the nats CLI
nats server gateway list
nats stream cluster-info <stream-name>

In a healthy 2-cluster supercluster you should see:

  • server_name: unique per pod (e.g. nats-cluster-a-0).
  • cluster.name: matches the cluster ID (a or b).
  • cluster.urls: N pod-DNS routes inside the same cluster.
  • gateway.outbound_gateways: the peer cluster's name.
  • gateway.inbound_gateways: the peer cluster's name (bidirectional once both clusters are up).

Validation in a local kind cluster confirmed all of the above plus the URN-derived JetStream domain: tenant_default_org_default_region_us_east_1_cluster_a.

Tuning

Knob Default Rule of thumb
cluster_size 3 JetStream Raft requires 3 minimum for HA. Bump to 5 for higher fault tolerance; odd sizes only.
region / zone (per cluster) None Set per-cluster locality so pod labels carry placement metadata for the scheduler.
Gateway TLS not configured Production deployments should add tls { ... } inside the gateway block. Out-of-phase follow-up.
max_file_store / max_memory_store 5GB / 1GB Per-cluster JetStream storage. Match volumeClaimTemplates.storage.
PVC size 5Gi volumeClaimTemplates matches max_file_store; bump together.
Per-tenant accounts single NOETL Out-of-phase. The current manifests preserve the existing NOETL account; per-tenant accounts wait for the catalog era.

Resource footprint

The default cluster_size: 3 × 2 clusters = 6 pods, each requesting cpu: 250m/memory: 512Mi and limited at cpu: 1000m/memory: 2Gi. So the full default supercluster sits at ~1500m CPU requests / 3Gi memory requests, peak 6000m CPU / 12Gi memory under burst.

Single-node kind warning

A stock kind cluster runs on one Kubernetes node with the podman VM's CPU budget (typically 4 vCPU on Apple Silicon defaults). The full default supercluster alone consumes ~38% of that. Stacked with the rest of NoETL (postgres + nats single- node + noetl-server / projector / outbox-publisher / 3 workers

  • paginated-api + KEDA operator) the node hits ~96% CPU requests, at which point the scheduler refuses to place additional pods — including a third supercluster replica or KEDA-driven scale-up of the noetl-worker pool.

Mitigations for local kind validation:

  • Drop cluster_size to 1 per cluster — sufficient to validate the gateway mesh + JetStream domain derivation, not HA. 2 pods instead of 6.
  • Scale the supercluster StatefulSets to 0 while running a KEDA scale-up smoke test, restore after. Validated this pattern in the local kind validation; works cleanly.
  • Bump the podman machine to 6+ vCPU and rebuild the kind cluster.

For production deployments (GKE / EKS / multi-node clusters), the default cluster_size: 3 works as designed — the node-spread that JetStream Raft benefits from happens naturally with multiple nodes available.

Operational notes worth knowing

These are bugs caught during live-kind validation that the generator now bakes in correctly. Documented here so anyone hand-editing manifests doesn't accidentally regress them:

  1. server_name is required for cluster-mode JetStream. Each pod must register under a unique server name. The generator pulls it from the downward API (metadata.name → POD_NAME → --name $(POD_NAME)). Without it NATS refuses to start with jetstream cluster requires server_name to be set.

  2. Use the split /healthz endpoints. Plain /healthz returns failure during normal JetStream meta-layer recovery, which is long enough that liveness probes kill pods before the cluster forms. Use:

    • livenessProbe/healthz?js-server-only=true
    • readinessProbe/healthz?js-enabled-only=true
    • startupProbe/healthz?js-server-only=true with a long failureThreshold (60+) to give cluster formation time.
  3. Headless Service needs publishNotReadyAddresses: true. Gateway URLs resolve through the peer cluster's headless Service DNS. Default headless Services only publish Ready pods → chicken-and-egg between peer clusters (each waits for the other to be Ready before its own gateway DNS resolves). publishNotReadyAddresses: true breaks the cycle.

What this round does NOT do

  • No client-side rewiring. NoETL's Python publishers / subscribers and the worker ConfigMap keep pointing at nats.nats.svc.cluster.local. Cluster-aware client routing arrives once the catalog can pick the right cluster per request.
  • No per-tenant NATS accounts. Future round.
  • No cross-cluster stream mirror / source. The gateway topology enables it; nothing in this round configures a stream to mirror.
  • No edits to the existing single-node ci/manifests/nats/. The supercluster is opt-in and lives alongside the existing deployment.

Related

Clone this wiki locally