feat: enable site clusters to run Nautobot Celery workers on site clusters#1908
feat: enable site clusters to run Nautobot Celery workers on site clusters#1908haseebsyed12 merged 9 commits intomainfrom
Conversation
f49206f to
f47fced
Compare
b5e37b0 to
16e016a
Compare
b3c0d4b to
67436e4
Compare
d4915ed to
b8d2d0a
Compare
skrobul
left a comment
There was a problem hiding this comment.
left few comments inline
7828885 to
7d71b6e
Compare
Sites need to run background task processing locally to reduce cross-cluster latency and scale worker capacity independently. Workers connect back to the global PostgreSQL and Redis, so cross-cluster connections require stronger auth than passwords alone. Adds a site-scoped ArgoCD Application that deploys only the Celery worker portion of the Nautobot Helm chart. The web server, Redis, and PostgreSQL remain on the global cluster. All cross-cluster connections use end-to-end mTLS: - nautobot_config.py gains conditional SSL/mTLS logic for both PostgreSQL (NAUTOBOT_DB_SSLMODE) and Redis (auto-detected from mounted CA cert) - nautobot-worker component values disable everything except celery - envoy-configs gateway template supports gatewayPort on TLS passthrough listeners for non-443 ports (5432, 6379) - envoy-configs schema adds gatewayPort to the tls route type - Deploy guide documents the full architecture, step-by-step site onboarding, certificate infrastructure, and troubleshooting
… issued cert+key to sites via the external secrets provider.
8c34aca to
7423e9a
Compare
Allows for a different nautobot config file to be stored in the deploy repo and supplied to Nautobot.
skrobul
left a comment
There was a problem hiding this comment.
This looks almost ready, added few comments in another review
| tls.crt: >- | ||
| {{ .client_password | ||
| | regexFind "-----BEGIN CERTIFICATE-----[\\s\\S]*?-----END CERTIFICATE-----" | ||
| | replace "\r" "" }} | ||
| tls.key: >- | ||
| {{ .client_password | ||
| | regexFind "-----BEGIN EC PRIVATE KEY-----[\\s\\S]*?-----END EC PRIVATE KEY-----" | ||
| | replace "\r" "" }} | ||
| ca.crt: >- | ||
| {{ .ca_password | replace "\r" "" }} | ||
| dataFrom: | ||
| - extract: | ||
| key: "<client-cert-credential-id>" | ||
| rewrite: | ||
| - regexp: | ||
| source: "(.*)" | ||
| target: "client_$1" | ||
| - extract: | ||
| key: "<ca-cert-credential-id>" | ||
| rewrite: | ||
| - regexp: | ||
| source: "(.*)" | ||
| target: "ca_$1" |
There was a problem hiding this comment.
This expects very specific type of key/certificate and can be fragile.
Consider switching from a freeform text field to a JSON.
For example, if you store the credentials as JSON:
{
"tls.crt":"LS0tLS1CRUdJTiBDRVJ...",
"tls.key":"LS0tLS1CRUdJTiBFQyB...."
}then you can use simpler and more robust template that does not need regular expressions:
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: nautobot-mtls-client-marektest
namespace: nautobot
spec:
dataFrom:
- extract:
key: "602777"
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: pwsafe
target:
creationPolicy: Owner
deletionPolicy: Retain
template:
data:
tls.crt: '{{ index (.password | fromJson) "tls.crt" | b64dec }}'
tls.key: '{{ index (.password | fromJson) "tls.key" | b64dec }}'
engineVersion: v2
metadata: {}
type: kubernetes.io/tlsAlternatively, if you really want to store those as plaintext, please consider using PEM block filters provided by ESO instead of regexes.
There was a problem hiding this comment.
done. using PEM block filters
There was a problem hiding this comment.
Nice, thanks. If you want to eliminate the regex and simplify the dataFrom section, sourcing data from multiple credentials like here is possible too, just need to assign secretKey for each remoteRef.
It's okay to leave it as is - your call.
7a26828 to
f67b30f
Compare
Nautobot currently runs entirely on the global cluster, including its Celery workers. Sites that generate heavy background task load have no way to offload that processing closer to where the work originates, and a single global worker pool becomes a bottleneck as sites scale.
This adds a site-scoped ArgoCD Application that deploys only the Celery worker portion of the Nautobot helm chart. The web server, Redis, and PostgreSQL are all disabled because they remain on the global cluster — site workers connect back to those shared services.
This lets operators scale worker capacity per-site independently, run queue-specific workers closer to the hardware they manage, and reduce cross-cluster task latency for site-driven automation.
application-nautobot-worker.yaml) gated behindsite.nautobot_worker.enablednautobotnamespace on the site cluster.