Skip to content

feat: enable site clusters to run Nautobot Celery workers on site clusters#1908

Merged
haseebsyed12 merged 9 commits intomainfrom
site-nautobot-worker
Apr 21, 2026
Merged

feat: enable site clusters to run Nautobot Celery workers on site clusters#1908
haseebsyed12 merged 9 commits intomainfrom
site-nautobot-worker

Conversation

@haseebsyed12
Copy link
Copy Markdown
Contributor

@haseebsyed12 haseebsyed12 commented Apr 2, 2026

Nautobot currently runs entirely on the global cluster, including its Celery workers. Sites that generate heavy background task load have no way to offload that processing closer to where the work originates, and a single global worker pool becomes a bottleneck as sites scale.

This adds a site-scoped ArgoCD Application that deploys only the Celery worker portion of the Nautobot helm chart. The web server, Redis, and PostgreSQL are all disabled because they remain on the global cluster — site workers connect back to those shared services.

This lets operators scale worker capacity per-site independently, run queue-specific workers closer to the hardware they manage, and reduce cross-cluster task latency for site-driven automation.

  • ArgoCD Application template (application-nautobot-worker.yaml) gated behind site.nautobot_worker.enabled
  • It deploys only the Celery worker portion of the Nautobot helm chart into the nautobot namespace on the site cluster.

@haseebsyed12 haseebsyed12 force-pushed the site-nautobot-worker branch 9 times, most recently from f49206f to f47fced Compare April 7, 2026 10:22
@haseebsyed12 haseebsyed12 requested a review from a team April 7, 2026 13:36
@haseebsyed12 haseebsyed12 marked this pull request as ready for review April 7, 2026 13:36
@haseebsyed12 haseebsyed12 changed the title feat: enable site clusters to run Nautobot Celery workers locally feat: enable site clusters to run Nautobot Celery workers on site clusters Apr 7, 2026
@haseebsyed12 haseebsyed12 force-pushed the site-nautobot-worker branch 7 times, most recently from b5e37b0 to 16e016a Compare April 14, 2026 09:36
Comment thread components/nautobot-worker/kustomization.yaml
Comment thread charts/argocd-understack/templates/application-nautobot-worker.yaml
@haseebsyed12 haseebsyed12 force-pushed the site-nautobot-worker branch 8 times, most recently from b3c0d4b to 67436e4 Compare April 15, 2026 12:52
@haseebsyed12 haseebsyed12 requested review from cardoe and skrobul April 15, 2026 14:50
@haseebsyed12 haseebsyed12 force-pushed the site-nautobot-worker branch from d4915ed to b8d2d0a Compare April 20, 2026 07:38
Copy link
Copy Markdown
Collaborator

@skrobul skrobul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left few comments inline

@haseebsyed12 haseebsyed12 force-pushed the site-nautobot-worker branch 9 times, most recently from 7828885 to 7d71b6e Compare April 20, 2026 18:02
Sites need to run background task processing locally to reduce
cross-cluster latency and scale worker capacity independently. Workers
connect back to the global PostgreSQL and Redis, so cross-cluster
connections require stronger auth than passwords alone.

Adds a site-scoped ArgoCD Application that deploys only the Celery
worker portion of the Nautobot Helm chart. The web server, Redis, and
PostgreSQL remain on the global cluster.

All cross-cluster connections use end-to-end mTLS:
- nautobot_config.py gains conditional SSL/mTLS logic for both
  PostgreSQL (NAUTOBOT_DB_SSLMODE) and Redis (auto-detected from
  mounted CA cert)
- nautobot-worker component values disable everything except celery
- envoy-configs gateway template supports gatewayPort on TLS
  passthrough listeners for non-443 ports (5432, 6379)
- envoy-configs schema adds gatewayPort to the tls route type
- Deploy guide documents the full architecture, step-by-step site
  onboarding, certificate infrastructure, and troubleshooting
… issued cert+key to sites via the external secrets provider.
@haseebsyed12 haseebsyed12 force-pushed the site-nautobot-worker branch from 8c34aca to 7423e9a Compare April 20, 2026 19:10
@haseebsyed12 haseebsyed12 requested a review from skrobul April 21, 2026 07:34
Allows for a different nautobot config file to be stored in the deploy
repo and supplied to Nautobot.
Copy link
Copy Markdown
Collaborator

@skrobul skrobul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks almost ready, added few comments in another review

Comment thread docs/deploy-guide/components/nautobot-worker.md Outdated
Comment on lines +299 to +321
tls.crt: >-
{{ .client_password
| regexFind "-----BEGIN CERTIFICATE-----[\\s\\S]*?-----END CERTIFICATE-----"
| replace "\r" "" }}
tls.key: >-
{{ .client_password
| regexFind "-----BEGIN EC PRIVATE KEY-----[\\s\\S]*?-----END EC PRIVATE KEY-----"
| replace "\r" "" }}
ca.crt: >-
{{ .ca_password | replace "\r" "" }}
dataFrom:
- extract:
key: "<client-cert-credential-id>"
rewrite:
- regexp:
source: "(.*)"
target: "client_$1"
- extract:
key: "<ca-cert-credential-id>"
rewrite:
- regexp:
source: "(.*)"
target: "ca_$1"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expects very specific type of key/certificate and can be fragile.
Consider switching from a freeform text field to a JSON.

For example, if you store the credentials as JSON:

{
"tls.crt":"LS0tLS1CRUdJTiBDRVJ...",
"tls.key":"LS0tLS1CRUdJTiBFQyB...."
}

then you can use simpler and more robust template that does not need regular expressions:

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: nautobot-mtls-client-marektest
  namespace: nautobot
spec:
  dataFrom:
  - extract:
      key: "602777"
  refreshInterval: 1h
  secretStoreRef:
    kind: ClusterSecretStore
    name: pwsafe
  target:
    creationPolicy: Owner
    deletionPolicy: Retain
    template:
      data:
        tls.crt: '{{ index (.password | fromJson) "tls.crt" | b64dec }}'
        tls.key: '{{ index (.password | fromJson) "tls.key" | b64dec }}'
      engineVersion: v2
      metadata: {}
      type: kubernetes.io/tls

Alternatively, if you really want to store those as plaintext, please consider using PEM block filters provided by ESO instead of regexes.

Copy link
Copy Markdown
Contributor Author

@haseebsyed12 haseebsyed12 Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. using PEM block filters

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks. If you want to eliminate the regex and simplify the dataFrom section, sourcing data from multiple credentials like here is possible too, just need to assign secretKey for each remoteRef.

It's okay to leave it as is - your call.

Comment thread docs/operator-guide/nautobot-celery-queues.md Outdated
Comment thread docs/operator-guide/nautobot-mtls-certificate-renewal.md Outdated
Comment thread docs/operator-guide/nautobot.md Outdated
Comment thread docs/operator-guide/nautobot.md
@haseebsyed12 haseebsyed12 force-pushed the site-nautobot-worker branch from 7a26828 to f67b30f Compare April 21, 2026 13:16
@haseebsyed12 haseebsyed12 requested a review from skrobul April 21, 2026 13:17
@haseebsyed12 haseebsyed12 added this pull request to the merge queue Apr 21, 2026
Merged via the queue into main with commit 200ddf8 Apr 21, 2026
20 checks passed
@haseebsyed12 haseebsyed12 deleted the site-nautobot-worker branch April 21, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants