-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ethtool collector #14674
Ethtool collector #14674
Conversation
Hi, @ghanapunq. A question: why do we need |
The stats given by the operating system are not always relevant. If a solution uses directly the driver given (by Mellanox for instance), the operating system will be completely by passed. Whereas ethtool doesn't rely on the counters in /proc/net/dev. This tool is developped by Mellanox and therefore its metrics are relevant regarding Mellanox boards. |
Can you please elaborate on "stats given by the operating system are not always relevant"?
|
For instance on the following server here are the metrics I have
You can see that number of bytes measured by ethtool and the one given in /proc/net/dev are completely different. For your information I am using the following setup:
|
Hello @ilyam8 , could you tell me what is needed to have a review concerning this pull request ? |
Check the ‘Files Changed’ tab on the PR. All of the ‘Review’ jobs add in-line annotations on the files for any issues they flag (in this case, it mostly looks like formatting issues, plus an unused import). Alternatively, you can install flake8 locally and run it against the file, it should report the same warnings there as it is in CI. |
@@ -0,0 +1,46 @@ | |||
<!-- | |||
title: "Nvidia GPU monitoring with Netdata" | |||
custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/nvidia_smi/README.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/nvidia_smi/README.md | |
custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/ethtool/README.md |
@@ -0,0 +1,46 @@ | |||
<!-- | |||
title: "Nvidia GPU monitoring with Netdata" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: "Nvidia GPU monitoring with Netdata" | |
title: "Network with ethtool monitoring with Netdata" |
<!-- | ||
title: "Nvidia GPU monitoring with Netdata" | ||
custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/nvidia_smi/README.md | ||
sidebar_label: "Nvidia GPUs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sidebar_label: "Nvidia GPUs" | |
sidebar_label: "ethtool" |
@@ -0,0 +1,214 @@ | |||
# -*- coding: utf-8 -*- | |||
# Description: nvidia-smi netdata python.d module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Description: nvidia-smi netdata python.d module | |
# Description: ethtool netdata python.d module |
# Original Author: Steven Noonan (tycho) | ||
# Author: Ilya Mashchenko (ilyam8) | ||
# User Memory Stat Author: Guido Scatena (scatenag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Authorship information needs to be updated.
Hi, @ghanapunq. Indeed, I see that counters from Thanks for the PR, but I think we are not going to merge it:
If you are interested in this feature - please open a feature request. If you want to share your implementation, you can:
|
if self.dev_filter in device: | ||
devices_filtered.append(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly critical, but could we extend this to take a regex for the device filter? That would allow us to default to ^eth.*|^en[ops].*
, which should limit things to actual Ethernet devices on a vast majority of systems and would mean that most users would not need to set a custom device filter.
If it’s an issue here we could always just do it in a followup PR instead.
I think that link is broken - this is the list here. https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md#third-party-collectors |
I created a third party collector as proposed by @ilyam8. It is available here : https://github.com/ghanapunq/netdata_ethtool_plugin Should I add it to the list of third party collector or do you prefer to do it @ilyam8 ? |
@ghanapunq please do |
Summary
Add an ethtool collector to retrieve especially the bandwidth of Mellanox boards. Indeed there was no existing plugin that supported the Mellanox connect-X5 or connect-X6. The ethtool tool gives the number of bytes received. Given this value the collector calculates the bandwidth in Gib/s and give also a percentage of the bandwidth used according to the capabilities of the board. Besides this collector provides also number of packet loss in reception and transmission. This could be useful to detect a bottleneck on a server.
Test Plan
This collector have been tested on a connect-X5 and connect-X6 on cent OS servers. As long as the ethtool is installed, this collector should give its metrics.