Automated test scripts for verifying Volcano GPU NUMA topology-aware scheduling.
PRs under test:
- volcano-sh/volcano#5095 — numaaware: add GPU NUMA topology awareness to scheduler
- volcano-sh/resource-exporter#12 — numatopo: add GPU NUMA topology discovery via sysfs
- volcano-sh/apis#229 — api: add GPUInfo type and GPUDetail field to NumatopoSpec
Issue: volcano-sh/volcano#4998
If you already have a Kubernetes cluster with GPU nodes, use this single script.
- Kubernetes cluster with GPU node (2+ NUMA nodes, 4+ GPUs)
- NVIDIA device plugin installed
- kubelet Topology Manager set to
best-effortorrestricted kubectlconfigured,dockerandgo 1.23+installed
git clone https://github.com/pmady/gpu-numa-test.git # or copy scripts
cd gpu-numa-test
# Full run: build → deploy → test
./test-existing-cluster.sh
# If Volcano images are already loaded:
./test-existing-cluster.sh --skip-build
# Run only the test suite:
./test-existing-cluster.sh --skip-build --skip-deploy
# Cleanup all test resources:
./test-existing-cluster.sh --cleanup- Pre-flight checks — verifies kubectl, GPU nodes, build tools
- Topology probe — deploys a privileged pod to read GPU-to-NUMA mapping via sysfs
- Builds images — clones PR branches, builds
vc-schedulerandresource-exporter - Deploys Volcano — with
numaawareplugin enabled + resource-exporter DaemonSet - Runs test jobs — 2-GPU job (prefer single NUMA) + 4-GPU job (cross-NUMA)
- Checks scheduler logs — for NUMA scoring/hint entries
- Prints screenshot checklist — evidence to post on the PR
nvidia-smi topo -m
kubectl get numatopologies -A -o yaml
kubectl get vcjob -o wide
kubectl logs <2gpu-pod>
kubectl logs <scheduler-pod> -n volcano-system | grep -i numaIf you don't have a GPU cluster, this creates one on GCP with spot pricing.
- GCP account with billing enabled ($300 free credit for new accounts)
- GPU quota:
NVIDIA_T4_GPUS ≥ 4inus-central1 gcloudCLI installed and authenticated
# Default: 4x T4 in us-central1-a (spot pricing ~$2/hr)
./gpu-numa-test.sh
# Custom project/zone
./gpu-numa-test.sh --project my-project --zone us-east1-c
# Use A100 GPUs
./gpu-numa-test.sh --gpu-type nvidia-tesla-a100| Phase | Duration | Description |
|---|---|---|
| Create VM | ~2 min | GCP spot VM: n1-standard-32 + 4× T4 |
| Install drivers | ~10 min | NVIDIA drivers + containerd + reboot |
| Setup K8s | ~5 min | kubeadm + topology manager + device plugin |
| Build Volcano | ~10 min | Build from PR branches, deploy |
| Run tests | ~5 min | 7 automated PASS/FAIL tests |
| Wait for you | — | Take screenshots, then type go-ahead |
| Cleanup | ~1 min | Deletes all GCP resources |
Estimated cost: ~$2-4 (spot), $0 with free credit
| Command | Action |
|---|---|
go-ahead |
Delete VM and stop billing |
cost |
Show elapsed time and cost estimate |
ssh |
Print SSH command |
gpu-numa-test/
├── test-existing-cluster.sh # For existing GPU clusters (Option A)
├── gpu-numa-test.sh # GCP VM orchestrator (Option B)
├── scripts/
│ ├── vm-setup.sh # NVIDIA + K8s install (GCP VM)
│ ├── build-volcano.sh # Build & deploy from PR branches
│ └── run-tests.sh # Standalone test suite
├── manifests/
│ ├── test-gpu-numa-job.yaml # 2-GPU test (single NUMA preferred)
│ ├── test-gpu-cross-numa-job.yaml # 4-GPU test (cross-NUMA)
│ ├── volcano-scheduler-config.yaml # Scheduler config with numaaware
│ └── resource-exporter-daemonset.yaml
└── README.md
After running, results are saved to /tmp/volcano-gpu-numa-test/results/:
| File | Contents |
|---|---|
topology-probe.txt |
GPU-to-NUMA mapping from sysfs |
numatopology-full.yaml |
Numatopology CRD with GPU data |
job-2gpu.txt |
2-GPU job output |
job-4gpu.txt |
4-GPU job output |
scheduler-numa-logs.txt |
Scheduler NUMA scoring entries |