From 0d7ce638f4c46afcc8d3665e950ba35c5b53724a Mon Sep 17 00:00:00 2001 From: Mrunal Patel Date: Thu, 9 Oct 2025 12:48:01 -0700 Subject: [PATCH] Add GPU autoscaling example to CMA Prometheus trigger documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added a new section "GPU autoscaling with Prometheus and DCGM metrics" demonstrating how to use Custom Metrics Autoscaler with NVIDIA DCGM metrics for GPU-based autoscaling. The example shows how to configure a ScaledObject to scale workloads based on GPU utilization using Prometheus queries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- ...s-cma-autoscaling-custom-trigger-prom.adoc | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/modules/nodes-cma-autoscaling-custom-trigger-prom.adoc b/modules/nodes-cma-autoscaling-custom-trigger-prom.adoc index 3daaabfbc87a..21af9ca2be43 100644 --- a/modules/nodes-cma-autoscaling-custom-trigger-prom.adoc +++ b/modules/nodes-cma-autoscaling-custom-trigger-prom.adoc @@ -59,3 +59,39 @@ Skipping the check is not recommended. ==== -- <11> Optional: Specifies an HTTP request timeout in milliseconds for the HTTP client used by this Prometheus trigger. This value overrides any global timeout setting. + +[id="nodes-cma-autoscaling-custom-trigger-prom-gpu_{context}"] +== GPU autoscaling with Prometheus and DCGM metrics + +You can use the Custom Metrics Autoscaler with NVIDIA Data Center GPU Manager (DCGM) metrics to scale workloads based on GPU utilization. This is particularly useful for AI and machine learning workloads that require GPU resources. + +.Example scaled object with a Prometheus target for GPU autoscaling +[source,yaml,options="nowrap"] +---- +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +metadata: + name: gpu-scaledobject + namespace: my-namespace +spec: + scaleTargetRef: + kind: Deployment + name: gpu-deployment + minReplicaCount: 1 <1> + maxReplicaCount: 5 <2> + triggers: + - type: prometheus + metadata: + serverAddress: https://thanos-querier.openshift-monitoring.svc.cluster.local:9092 + namespace: my-namespace + metricName: gpu_utilization + threshold: '90' <3> + query: SUM(DCGM_FI_DEV_GPU_UTIL{instance=~".+", gpu=~".+"}) <4> + authModes: bearer + authenticationRef: + name: keda-trigger-auth-prometheus +---- +<1> Specifies the minimum number of replicas to maintain. For GPU workloads, this should not be set to `0` to ensure metrics continue to be collected. +<2> Specifies the maximum number of replicas allowed during scale-up operations. +<3> Specifies the GPU utilization percentage threshold that triggers scaling. When the average GPU utilization exceeds 90%, the autoscaler scales up the deployment. +<4> Specifies a Prometheus query using NVIDIA DCGM (Data Center GPU Manager) metrics to monitor GPU utilization across all GPU devices. The `DCGM_FI_DEV_GPU_UTIL` metric provides GPU utilization percentages.