Document NFD for GPU Labeling

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
kubernetes · Jan 29, 2024 · 31c34dc · 31c34dc
1 parent 54ab2e8
commit 31c34dc
Showing 1 changed file with 48 additions and 7 deletions.
diff --git a/content/en/docs/tasks/manage-gpus/scheduling-gpus.md b/content/en/docs/tasks/manage-gpus/scheduling-gpus.md
@@ -64,7 +64,7 @@ spec:
           gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
 ```
 
-## Clusters containing different types of GPUs
+## Manage clusters with different types of GPUs
 
 If different nodes in your cluster have different types of GPUs, then you
 can use [Node Labels and Node Selectors](/docs/tasks/configure-pod-container/assign-pods-nodes/)
@@ -83,10 +83,51 @@ a different label key if you prefer.
 
 ## Automatic node labelling {#node-labeller}
 
-If you're using AMD GPU devices, you can deploy
-[Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller).
-Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically
-labels your nodes with GPU device properties.
+As an administrator, you can automatically discover and label all your GPU enabled nodes
+by deploying Kubernetes [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) (NFD).
+NFD detects the hardware features that are available on each node in a Kubernetes cluster and advertises those features.
+Typically, NFD adds node labels to advertise the features, but NFD can also add extended resources, annotations, and node taints.
+NFD is compatible with all [supported versions](/releases/version-skew-policy/#supported-versions) of Kubernetes.
 
-Similar functionality for NVIDIA is provided by
-[GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).
+Administrators can leverage NFD to also taint nodes with specific features, so that only pods that request those features can be scheduled on those nodes.
+After a cluster is labeled with the GPU feature, you can schedule pods on GPU nodes by adding the following to your pod spec:
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  name: example-vector-add
+spec:
+  # You can then use Kubernetes node affinity to schedule pods on GPU nodes.
+  # https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity
+  affinity:
+    nodeAffinity:
+      requiredDuringSchedulingIgnoredDuringExecution:
+        nodeSelectorTerms:
+        - matchExpressions:
+          - key: "gpu.gpu-vendor.example/installed-memory"
+            operator: Gt
+            values: ["40535"]
+          - key: "gpu.gpu-vendor.example/family"
+            operator: In
+            values:  # examples are GCU families not GPU families 😉
+            - arbitrary
+            - armchair-traveller
+            - just-read-the-instructions
+            - steely-glint
+  restartPolicy: Never
+  containers:
+    - name: example-vector-add
+      image: "registry.example/example-vector-add:v42"
+      resources:
+        limits:
+          gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
+      nodeSelector:
+      # Use Kubernetes node selector to schedule pods on GPU nodes.
+      # https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector
+        gpu-vendor.example/example-gpu-present: "true" 
+```
+
+#### GPU vendor implementations
+
+- NVIDIA [GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/#readme).