Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ethtool collector #14674

Closed
wants to merge 12 commits into from
Closed

Ethtool collector #14674

wants to merge 12 commits into from

Conversation

ghanapunq
Copy link
Contributor

Summary

Add an ethtool collector to retrieve especially the bandwidth of Mellanox boards. Indeed there was no existing plugin that supported the Mellanox connect-X5 or connect-X6. The ethtool tool gives the number of bytes received. Given this value the collector calculates the bandwidth in Gib/s and give also a percentage of the bandwidth used according to the capabilities of the board. Besides this collector provides also number of packet loss in reception and transmission. This could be useful to detect a bottleneck on a server.

Test Plan

This collector have been tested on a connect-X5 and connect-X6 on cent OS servers. As long as the ethtool is installed, this collector should give its metrics.

@ilyam8
Copy link
Member

ilyam8 commented Mar 6, 2023

Hi, @ghanapunq. A question: why do we need ethtool to collect network card traffic/bandwidth? Are these network cards not showing up in /proc/net/dev output?

@ghanapunq
Copy link
Contributor Author

Hi, @ghanapunq. A question: why do we need ethtool to collect network card traffic/bandwidth? Are these network cards not showing up in /proc/net/dev output?

The stats given by the operating system are not always relevant. If a solution uses directly the driver given (by Mellanox for instance), the operating system will be completely by passed. Whereas ethtool doesn't rely on the counters in /proc/net/dev. This tool is developped by Mellanox and therefore its metrics are relevant regarding Mellanox boards.

@ilyam8
Copy link
Member

ilyam8 commented Mar 7, 2023

Can you please elaborate on "stats given by the operating system are not always relevant"?

  • What stats from /proc/net/dev are not relevant and what do you mean by "not relevant"?
  • Can you show both stats from /proc/net/dev and ethtool for connect-X5/6 network adapters?

@ghanapunq
Copy link
Contributor Author

For instance on the following server here are the metrics I have

[root@poc-servers]# cat /proc/net/dev &&  ethtool -S enp66s0f0 | grep x_bytes_phy
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
enp66s0f0: 2685980250 2465952    0 1619568    0     0          0   2465952  9542140  108278    0    0    0     0       0          0
     tx_bytes_phy: 28715066441978
     rx_bytes_phy: 12292535778310

You can see that number of bytes measured by ethtool and the one given in /proc/net/dev are completely different.

For your information I am using the following setup:

[root@poc-server]# lspci -v | grep -i Mellanox
42:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

[root@poc-server]# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

@ghanapunq
Copy link
Contributor Author

Hello @ilyam8 , could you tell me what is needed to have a review concerning this pull request ?
And concerning the fail of the job "Review / flake8" I don't understand what is failing.

@Ferroin
Copy link
Member

Ferroin commented Mar 13, 2023

And concerning the fail of the job "Review / flake8" I don't understand what is failing.

Check the ‘Files Changed’ tab on the PR. All of the ‘Review’ jobs add in-line annotations on the files for any issues they flag (in this case, it mostly looks like formatting issues, plus an unused import).

Alternatively, you can install flake8 locally and run it against the file, it should report the same warnings there as it is in CI.

@ghanapunq ghanapunq requested a review from Ferroin as a code owner March 13, 2023 14:59
@github-actions github-actions bot added the area/build Build system (autotools and cmake). label Mar 13, 2023
@@ -0,0 +1,46 @@
<!--
title: "Nvidia GPU monitoring with Netdata"
custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/nvidia_smi/README.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/nvidia_smi/README.md
custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/ethtool/README.md

@@ -0,0 +1,46 @@
<!--
title: "Nvidia GPU monitoring with Netdata"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: "Nvidia GPU monitoring with Netdata"
title: "Network with ethtool monitoring with Netdata"

<!--
title: "Nvidia GPU monitoring with Netdata"
custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/nvidia_smi/README.md
sidebar_label: "Nvidia GPUs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sidebar_label: "Nvidia GPUs"
sidebar_label: "ethtool"

@@ -0,0 +1,214 @@
# -*- coding: utf-8 -*-
# Description: nvidia-smi netdata python.d module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Description: nvidia-smi netdata python.d module
# Description: ethtool netdata python.d module

Comment on lines 3 to 5
# Original Author: Steven Noonan (tycho)
# Author: Ilya Mashchenko (ilyam8)
# User Memory Stat Author: Guido Scatena (scatenag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authorship information needs to be updated.

@ilyam8
Copy link
Member

ilyam8 commented Mar 14, 2023

Hi, @ghanapunq. Indeed, I see that counters from /proc/net/dev and ethtool -S don't match. And it looks like ethtool provides additional counters for Mellanox.

Thanks for the PR, but I think we are not going to merge it:

  • We are migrating python collectors to go.
  • We don't add new python collectors unless there are technical restrictions on their implementation in go (e.g. using CGO).

If you are interested in this feature - please open a feature request.

If you want to share your implementation, you can:

@ilyam8 ilyam8 added the wontfix label Mar 14, 2023
Comment on lines +158 to +159
if self.dev_filter in device:
devices_filtered.append(device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly critical, but could we extend this to take a regex for the device filter? That would allow us to default to ^eth.*|^en[ops].*, which should limit things to actual Ethernet devices on a vast majority of systems and would mean that most users would not need to set a custom device filter.

If it’s an issue here we could always just do it in a followup PR instead.

@andrewm4894
Copy link
Contributor

create a repo with this collector and add it to the list of third-party collectors.

I think that link is broken - this is the list here. https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md#third-party-collectors

@ghanapunq
Copy link
Contributor Author

I created a third party collector as proposed by @ilyam8. It is available here : https://github.com/ghanapunq/netdata_ethtool_plugin

Should I add it to the list of third party collector or do you prefer to do it @ilyam8 ?

@ilyam8
Copy link
Member

ilyam8 commented Mar 16, 2023

@ghanapunq please do

@ghanapunq
Copy link
Contributor Author

@ilyam8 here is a mini PR to add the ethtool collector in the third party collectors : #14753

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build Build system (autotools and cmake). area/collectors Everything related to data collection area/docs collectors/python.d wontfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants