New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AMD GPU collector #15515
Add AMD GPU collector #15515
Conversation
@ilyam8 Removed configuration as discussed. Please let me know if chart IDs and contexts make sense as they are, I think they do now. I tested with multiple GPU cards and aggregated metrics seem to work correctly. |
Hi Dimitri! Nice! So, I tested this on my laptop, which has an AMD 4500U CPU, with integrated GPU. The card is on this list. I've added What I've found next, is that still it wouldn't display any charts for it. The problem was that in my case So a suggestion would be that if in case something is missing, don't give up on the whole card? Also, if possible to capitalize the |
Thanks, good points. About additional GPU marketing names, I believe these can be added later, as it's a never-ending piece of work. I did add some RX 7xxx though that were missing from the original list. I changed how chart updates are performed and now a linked list is used, so any metrics (and charts) that cannot be monitored any more can easily be removed from the list. This seems to be working fine but please test on your system again. About I also checked @Ferroin 's concerns about introducing unnecessary GPU usage due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work fine here! Just needs more cards, we can add later!
title: 'AMD GPUs', | ||
icon: '<i class="fas fa-microchip"></i>', | ||
info: 'Detailed information for each AMD GPU of the system. Temperature, fan speed, voltage and power metrics can be found at the <a href="#menu_sensors">Sensors</a> section.' | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, do not forget to create an analogous PR to also bring this description for new dashboard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will do, just need the information of what needs updating from @netdata/cloud-fe as mentioned in my comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: we shouldn't update this file in netdata/netdata but in https://github.com/netdata/dashboard. If there is a new version of the old dashboard, this file will be overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have hardware to test, bu considering that other teammate already tested it, and possible changes will come in other PRs, LGTM!
Summary
This PR adds a collector that monitors all the AMD GPU cards on the system.
Fixes #12616.
The collector provides the following metrics:
The following can be provided by
Sensors
, so they were not implemented:Out of scope of this PR, but could potentially be implemented in the future (needs further research):
nvtop
)Test Plan
master
.product_name
label).For users: How does this change affect me?
You will be able to monitor basic metrics for AMD GPUs. It does not affect any instances that do not have an AMD GPU (see the nvidia-smi collector for NVIDIA GPUs).