Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch GPU Util metric to DCGM_FI_PROF_GR_ENGINE_ACTIVE in NVIDIA DCGM Metrics Dashboard #341

Open
wabouhamad opened this issue Jun 11, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@wabouhamad
Copy link

Is this a new feature, an improvement, or a change to existing functionality?

Improvement

Please provide a clear description of the problem this feature solves

The NVIDIA DCGM Metrics Dashboard on OpenShift 4.15 is using the DCGM_FI_DEV_GPU_UTIL metric which only shows 0% or 100% GPU Utilization. The more accurate metric is DCGM_FI_PROF_GR_ENGINE_ACTIVE. Need to switch metrics to report DCGM_FI_PROF_GR_ENGINE_ACTIVE for GPU utilization.

Feature Description

From a user prespective I need to see the more accurate GPU utilization when running a GPU workload and not 0% or 100% utilization. The current counter in the NVIDIA DCGM Dashboard on OpenShift is using an older metric DCGM_FI_PROF_GR_ENGINE_ACTIVE. Need to switch metrics to report DCGM_FI_PROF_GR_ENGINE_ACTIVE for GPU utilization.

Describe your ideal solution

Need to switch to report DCGM_FI_PROF_GR_ENGINE_ACTIVE metric for GPU utilization.

Additional context

No response

@wabouhamad wabouhamad added the enhancement New feature or request label Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant