Skip to content

pmady/gpu-numa-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU NUMA-Aware Scheduling — E2E Verification

Automated test scripts for verifying Volcano GPU NUMA topology-aware scheduling.

PRs under test:

Issue: volcano-sh/volcano#4998


Option A: Test on Existing GPU Cluster (Recommended)

If you already have a Kubernetes cluster with GPU nodes, use this single script.

Prerequisites

  • Kubernetes cluster with GPU node (2+ NUMA nodes, 4+ GPUs)
  • NVIDIA device plugin installed
  • kubelet Topology Manager set to best-effort or restricted
  • kubectl configured, docker and go 1.23+ installed

Quick Start

git clone https://github.com/pmady/gpu-numa-test.git  # or copy scripts
cd gpu-numa-test

# Full run: build → deploy → test
./test-existing-cluster.sh

# If Volcano images are already loaded:
./test-existing-cluster.sh --skip-build

# Run only the test suite:
./test-existing-cluster.sh --skip-build --skip-deploy

# Cleanup all test resources:
./test-existing-cluster.sh --cleanup

What It Does

  1. Pre-flight checks — verifies kubectl, GPU nodes, build tools
  2. Topology probe — deploys a privileged pod to read GPU-to-NUMA mapping via sysfs
  3. Builds images — clones PR branches, builds vc-scheduler and resource-exporter
  4. Deploys Volcano — with numaaware plugin enabled + resource-exporter DaemonSet
  5. Runs test jobs — 2-GPU job (prefer single NUMA) + 4-GPU job (cross-NUMA)
  6. Checks scheduler logs — for NUMA scoring/hint entries
  7. Prints screenshot checklist — evidence to post on the PR

Screenshot Checklist (for PR comment)

nvidia-smi topo -m
kubectl get numatopologies -A -o yaml
kubectl get vcjob -o wide
kubectl logs <2gpu-pod>
kubectl logs <scheduler-pod> -n volcano-system | grep -i numa

Option B: Create a GCP GPU VM from Scratch

If you don't have a GPU cluster, this creates one on GCP with spot pricing.

Prerequisites

  • GCP account with billing enabled ($300 free credit for new accounts)
  • GPU quota: NVIDIA_T4_GPUS ≥ 4 in us-central1
  • gcloud CLI installed and authenticated

Usage

# Default: 4x T4 in us-central1-a (spot pricing ~$2/hr)
./gpu-numa-test.sh

# Custom project/zone
./gpu-numa-test.sh --project my-project --zone us-east1-c

# Use A100 GPUs
./gpu-numa-test.sh --gpu-type nvidia-tesla-a100

Phases

Phase Duration Description
Create VM ~2 min GCP spot VM: n1-standard-32 + 4× T4
Install drivers ~10 min NVIDIA drivers + containerd + reboot
Setup K8s ~5 min kubeadm + topology manager + device plugin
Build Volcano ~10 min Build from PR branches, deploy
Run tests ~5 min 7 automated PASS/FAIL tests
Wait for you Take screenshots, then type go-ahead
Cleanup ~1 min Deletes all GCP resources

Estimated cost: ~$2-4 (spot), $0 with free credit

Interactive Commands

Command Action
go-ahead Delete VM and stop billing
cost Show elapsed time and cost estimate
ssh Print SSH command

File Structure

gpu-numa-test/
├── test-existing-cluster.sh            # For existing GPU clusters (Option A)
├── gpu-numa-test.sh                    # GCP VM orchestrator (Option B)
├── scripts/
│   ├── vm-setup.sh                     # NVIDIA + K8s install (GCP VM)
│   ├── build-volcano.sh                # Build & deploy from PR branches
│   └── run-tests.sh                    # Standalone test suite
├── manifests/
│   ├── test-gpu-numa-job.yaml          # 2-GPU test (single NUMA preferred)
│   ├── test-gpu-cross-numa-job.yaml    # 4-GPU test (cross-NUMA)
│   ├── volcano-scheduler-config.yaml   # Scheduler config with numaaware
│   └── resource-exporter-daemonset.yaml
└── README.md

Results Directory

After running, results are saved to /tmp/volcano-gpu-numa-test/results/:

File Contents
topology-probe.txt GPU-to-NUMA mapping from sysfs
numatopology-full.yaml Numatopology CRD with GPU data
job-2gpu.txt 2-GPU job output
job-4gpu.txt 4-GPU job output
scheduler-numa-logs.txt Scheduler NUMA scoring entries

About

E2E test scripts for Volcano GPU NUMA-aware scheduling (PRs #5095, #12, #229)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages