Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process sysfs/class/net/*/bonding #439

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bewing
Copy link

@bewing bewing commented Mar 21, 2022

As part of prometheus/node_exporter#1604, parse information from /sys/class/net/*/bonding*/* to provide information regarding current bonding state for bonding controllers and their devices.

I have some style/format questions that I would appreciate input on as part of the process:

  • There does not appear to be consistency on pointers vs values in many modules of this project. I notice that net_class.go uses pointers for integers, but not strings. Should I follow that, or is there a different preference from the maintainers? It would appear that using pointers would allow for nil values to indicate kernel-level support for a specific file (if the kernel doesn't have the file present, better to indicate with a nil pointer rather than a default empty string or integer?), but I'm curious what the consensus is in the project for indicating this
  • The kernel driver has many different sized ints (uint8, int64, uint64, etc) -- should I try to match what the current kernel defines these as, or just use uint64s as the worst case storage requirement?
  • The bonding driver has many files of the format <uint> <human readable value string> -- should this be implemented as constants in the module itself, or imported constants from a reference module (assuming one exists), or some other approach I haven't thought of?
  • Should I be using pointers to other NetClassIface structs to reference controllers and devices (IE, L34, L64 ?)
  • Should any special care/documentation be taken regarding fields that need CAP_NET_ADMIN to read?

@bewing bewing marked this pull request as draft March 21, 2022 22:33
@bewing bewing force-pushed the sysfs_bonding branch 6 times, most recently from 0a19628 to da42abf Compare March 25, 2022 20:37
@bewing bewing marked this pull request as ready for review March 25, 2022 20:38
@bewing bewing changed the title WIP: Process sysfs/class/net/*/bonding Process sysfs/class/net/*/bonding Mar 25, 2022
@discordianfish
Copy link
Member

There does not appear to be consistency on pointers vs values in many modules of this project. I notice that net_class.go uses pointers for integers, but not strings. Should I follow that, or is there a different preference from the maintainers? It would appear that using pointers would allow for nil values to indicate kernel-level support for a specific file (if the kernel doesn't have the file present, better to indicate with a nil pointer rather than a default empty string or integer?), but I'm curious what the consensus is in the project for indicating this

yes, we use pointers to distinguish between a null value and e.g non existing file as in your example. Other reasons would be larger structs where you want to avoid copying them when you pass them to functions.

The kernel driver has many different sized ints (uint8, int64, uint64, etc) -- should I try to match what the current kernel defines these as, or just use uint64s as the worst case storage requirement?

Yes we try to mimic the underlying sized ints

The bonding driver has many files of the format -- should this be implemented as constants in the module itself, or imported constants from a reference module (assuming one exists), or some other approach I haven't thought of?

Not sure what you mean by that.

Should I be using pointers to other NetClassIface structs to reference controllers and devices (IE, L34, L64 ?)

If you retrieved the data for them I see why not, but we probably shouldn't waste cycles on getting them if we don't need them in which case I'd just use some string? identifier.

Should any special care/documentation be taken regarding fields that need CAP_NET_ADMIN to read?

I don't think we have any convention for that.

@bewing
Copy link
Author

bewing commented Mar 30, 2022

The bonding driver has many files of the format -- should this be implemented as constants in the module itself, or imported constants from a reference module (assuming one exists), or some other approach I haven't thought of?

Not sure what you mean by that.

For example, sys/class/net/bond0/bonding/ad_select :

$ cat /sys/class/net/bond0/bonding/ad_select
stable 0

The actual underlying stored kernel value for ad_select is an integer, but bond_opt_get_val is used to resolve that integer 0 to the string "stable" via a constant lookup table bond_ad_select_tbl, and then both are present in the sysfs file. Should we return the integer, just the string, or a generic bond_options struct with both values?

@bewing bewing marked this pull request as draft April 1, 2022 15:42
@bewing
Copy link
Author

bewing commented Apr 1, 2022

I'm not able to run tests in node_exporter using this branch, so there's still more digging/dev work needed before this is mergable

Comment on lines 449 to 461
devices, err := util.SysReadFile(filepath.Join(path, "slaves"))
if err != nil {
return fmt.Errorf("unable to read %s", filepath.Join(path, "slaves"))
}
for _, device := range strings.Split(devices, " ") {
if intf, exists := (*netClass)[device]; exists {
netClassIface.BondAttrs.Devices = append(netClassIface.BondAttrs.Devices, &intf)
} else {
return fmt.Errorf("unable to find device %s", device)
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it turns out the errors that I was seeing trying to test node_exporter against my branch are because they have an incomplete fixture set:
https://github.com/prometheus/node_exporter/blob/b52bf958f8e2d4ed1624b8122d08af4d12da9322/collector/fixtures/sys.ttar#L1094-L1098

Member devices eth1, eth4, and eth5 are not present in the fixtures. Does it make more sense to fix the upstream fixtures to be complete (either by removing the references to the non-existing devices, or adding dummy fixtures for them), or by not resolving them to pointers here and not returning an error at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're sure that on real systems the member would be present, which I expect, I'd say to update the fixtures.
I'd include the other interfaces if it's not inflating the fixture sizes to much, otherwise remove the references.

Update net_class.go with new structures to handle bonding driver
information

Add new true/false values to internal.util.parse.ParseBool

References:
torvalds/linux/drivers/net/bonding/bond_sysfs.c
torvalds/linux/drivers/net/bonding/bond_sysfs_slave.c
torvalds/linux/include/net/bonding.h
torvalds/linux/include/net/bond_options.h
torvalds/linux/include/net/bond_3ad.h

Signed-off-by: Brandon Ewing <brandon.ewing@warningg.com>
@discordianfish
Copy link
Member

The actual underlying stored kernel value for ad_select is an integer, but bond_opt_get_val is used to resolve that integer 0 to the string "stable" via a constant lookup table bond_ad_select_tbl, and then both are present in the sysfs file. Should we return the integer, just the string, or a generic bond_options struct with both values?

I'd say just the string but I wonder why they expose both the string and the int in procfs and if we should take that decisiopn into account here..

bewing added a commit to bewing/node_exporter that referenced this pull request Apr 11, 2022
As part of change prometheus/procfs#439, update fixtures in
node_exporter to expose all referenced interfaces in /sys/class/net

This was naively accomplished by blindly copying eth0 data to other
physical interfaces, but that does not appear to have resulted in any
failing tests.  Once we begin to actually export bonding/LACP metrics,
additional modification of fixtures may be required.

Signed-off-by: Brandon Ewing <brandon.ewing@warningg.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants