Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with unsupported new metrics on V100 GPU's #82

Closed
hassanbabaie opened this issue Jul 20, 2022 · 4 comments
Closed

Error with unsupported new metrics on V100 GPU's #82

hassanbabaie opened this issue Jul 20, 2022 · 4 comments

Comments

@hassanbabaie
Copy link

Running 2.4.6-2.6.9 and if I enable the following metrics:

DCGM_FI_PROF_PIPE_TENSOR_IMMA_ACTIVE
DCGM_FI_PROF_PIPE_TENSOR_HMMA_ACTIVE

The DaemonSet set works and reports metrics for nodes with A100 GPU's

For nodes with V100-16GB GPUs the DaemonSet fails with the following message:

setting up csv
/etc/dcgm-exporter/dcp-metrics-bolt.csv
done
time="2022-07-20T01:54:11Z" level=info msg="Starting dcgm-exporter"
time="2022-07-20T01:54:11Z" level=info msg="DCGM successfully initialized!"
time="2022-07-20T01:54:11Z" level=info msg="Collecting DCP Metrics"
time="2022-07-20T01:54:11Z" level=info msg="No configmap data specified, falling back to metric file /etc/dcgm-exporter/dcp-metrics-bolt.csv"
time="2022-07-20T01:54:12Z" level=fatal msg="Error watching fields: Feature not supported"
running
@nikkon-dev
Copy link
Collaborator

@hassanbabaie,

The 1013 and 1014 metrics are not supported on pre-Ampere GPUs.

WBR,
Nik

@hassanbabaie
Copy link
Author

Thanks @nikkon-dev however it would be better if there was some form of graceful skip, if they are not supported log it and don't report any metrics for those.

For example when using profiling metrics with an unsupported driver version the metrics are just not provided/skipped but we can still use the same DCGM DaemonSet for all nodes.

@glowkey
Copy link
Collaborator

glowkey commented Jul 21, 2022

Agreed. We'll modify that behavior.

@hassanbabaie
Copy link
Author

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants