Day 2 workload management for the SAIF Platform via ArgoCD using the App-of-Apps pattern.
This repository is the source of truth for everything deployed on OpenShift clusters after the Day 1 bootstrap. ArgoCD syncs directly from this repository to manage all operators, configurations, and workloads.
Applications are organized into tiers with sync wave ordering to handle dependencies:
| Tier | Purpose | Examples |
|---|---|---|
| tier1-core | Platform prerequisites | IDMS, Sealed Secrets, LVMS, CatalogSources |
| tier2-isovalent | Cilium / eBPF stack | Cilium config, Tetragon, Hubble Timescape |
| tier3-nvidia | GPU / AI inference | GPU Operator, NFD, NIM Operator, NIM LLM |
| tier4-observability | Monitoring / export | Splunk OTEL, Intersight OTEL |
| tier5-demo | Demo applications | MNIST ML Lab, Open WebUI |
clusters/ai-pod-1/kustomization.yaml <-- Per-cluster overlay
└── clusters/_base/tier*/ <-- Base tier definitions
└── apps/<app-name>/ <-- Shared application manifests
Each cluster folder uses Kustomize to select which tiers and applications to deploy.
- Create manifests in
apps/my-app/ - Create an ArgoCD Application in
clusters/_base/tier5-demo/my-app.yaml - Reference it in the tier's
kustomization.yaml - Commit and push -- ArgoCD syncs automatically
gh workflow run gitops-sync.yaml -f cluster=ai-pod-1| Cluster | GPU | Stack |
|---|---|---|
| ai-pod-1 | NVIDIA L40S | Full stack (GPU + AI inference) |
| ai-pod-2 | NVIDIA L40S | Full stack (GPU + AI inference) |
| ai-pod-3 | None | Base stack (no NIM/GPU workloads) |
| ai-pod-4 | None | Base stack (no NIM/GPU workloads) |
saif-gitops/
├── apps/ # Shared application manifests
│ ├── gpu-operator/ # NVIDIA GPU Operator
│ ├── nim-llm/ # LLM model via NIM
│ ├── tetragon/ # Tetragon security policies
│ ├── splunk-otel/ # Splunk OpenTelemetry
│ └── ...
├── clusters/ # Per-cluster configurations
│ ├── _base/ # Base tier definitions
│ │ ├── tier1-core/
│ │ ├── tier2-isovalent/
│ │ ├── tier3-nvidia/
│ │ ├── tier4-observability/
│ │ └── tier5-demo/
│ ├── ai-pod-1/ # Cluster overlays
│ ├── ai-pod-2/
│ ├── ai-pod-3/
│ └── ai-pod-4/
├── charts/ # Helm charts (custom + vendored)
├── scripts/ # Helper scripts
└── .github/workflows/ # CI/CD workflows
- Architecture - GitOps patterns and sync waves
- Observability Architecture - Data flow diagrams
- Customization - Adapting for your environment
- MOPs - Operational procedures
| Repository | Relationship |
|---|---|
| saif-platform | Platform orchestration |
| saif-ai-pod | Day 0/1 - bootstraps ArgoCD pointing here |
| saif-sys-admin | Produces IDMS manifests consumed by this repo |
| saif-splunk-dashboard | Dashboard specifications for Splunk Observability |
This project is licensed under the Cisco Sample Code License, Version 1.1. See LICENSE for details.