Add qdisc collector for Linux#580
Conversation
This collector gathers basic queueing discipline metrics via netlink, similarly to what `tc -s qdisc show` does.
collector/qdisc_linux.go
Outdated
|
|
||
| type qdiscStatCollector struct { | ||
| bytes typedDesc | ||
| pkts typedDesc |
There was a problem hiding this comment.
No need to use shortened names here.
collector/qdisc_linux.go
Outdated
| []string{"iface", "kind"}, nil, | ||
| ), prometheus.CounterValue}, | ||
| pkts: typedDesc{prometheus.NewDesc( | ||
| prometheus.BuildFQName(Namespace, "qdisc", "pkts"), |
There was a problem hiding this comment.
Same here, no need to shorten packets.
collector/qdisc_linux.go
Outdated
| func NewQdiscStatCollector() (Collector, error) { | ||
| return &qdiscStatCollector{ | ||
| bytes: typedDesc{prometheus.NewDesc( | ||
| prometheus.BuildFQName(Namespace, "qdisc", "bytes"), |
There was a problem hiding this comment.
If these are really counters, we can call them XXXX_total.
|
@mdlayher Hey, do we have any way to mock netlink so we can have tests and end-to-end testing here? |
|
Yep! https://godoc.org/github.com/mdlayher/netlink/nltest The tests show how to use it: https://github.com/mdlayher/netlink/blob/master/nltest/nltest_test.go. The idea is that you check the parameters of the incoming message and pass back a message with whatever data you want via this mechanism. |
mdlayher
left a comment
There was a problem hiding this comment.
Very cool, and I'm thrilled to see my netlink package being used. Left a few comments and questions regarding netlink specifics.
collector/qdisc_linux.go
Outdated
| } | ||
|
|
||
| // See if_indextoname(3) | ||
| func ifIndexToName(index uint32) string { |
There was a problem hiding this comment.
Should be able to avoid Cgo by using: https://golang.org/pkg/net/#InterfaceByIndex.
There was a problem hiding this comment.
Indeed! Done: https://github.com/ema/qdisc/blob/master/get.go#L197
collector/qdisc_linux.go
Outdated
| } | ||
|
|
||
| // See https://tools.ietf.org/html/rfc3549#section-3.1.3 | ||
| func parseMessage(msg netlink.Message) (Metric, error) { |
There was a problem hiding this comment.
Since you're handling netlink messages and the interesting cases that come with that, it might be worth making a separate package like github.com/ema/qdisc and then importing that for node_exporter. I like to promote code reusability wherever possible.
collector/qdisc_linux.go
Outdated
| Flags: netlink.HeaderFlagsRequest | netlink.HeaderFlagsDump, | ||
| Type: 38, // RTM_GETQDISC | ||
| }, | ||
| Data: []byte{0}, |
There was a problem hiding this comment.
Any particular reason why you have to pass a one byte payload? IIRC this will get padded up to 4 null bytes too. Wouldn't an empty payload work fine for a message with flags request|dump?
There was a problem hiding this comment.
With an empty payload I get no response back.
collector/qdisc_linux.go
Outdated
| __u32 tcm_info; | ||
| }; | ||
| */ | ||
| m.IfaceIdx = nlenc.Uint32(msg.Data[4:8]) |
There was a problem hiding this comment.
You'll want to do check if len(msg.Data) < 20 or else any short messages will cause a panic. The attribute parsing routines will do bounds checking for you, so no need to check beyond that.
There was a problem hiding this comment.
- netlink-specific parts moved to github.com/ema/qdisc - avoid using shortened names - counters renamed into XXX_total
|
Tests still TBD. Thanks for the quick reviews! |
grobie
left a comment
There was a problem hiding this comment.
Please vendor your library with govendor.
Also it looks like you're ignoring the error in https://github.com/ema/qdisc/blob/ba575b63dffebb892b58c35c4124efe0ccb5bbbc/get.go#L205. What's the reason?
collector/qdisc_linux.go
Outdated
| } | ||
|
|
||
| for _, msg := range msgs { | ||
| // Only report root qdisc info |
There was a problem hiding this comment.
Prometheus comment guide line is to write them in full sentences ending on a period.
collector/qdisc_linux.go
Outdated
| ), prometheus.CounterValue}, | ||
| drops: typedDesc{prometheus.NewDesc( | ||
| prometheus.BuildFQName(Namespace, "qdisc", "drops_total"), | ||
| "Number of packets sent.", |
There was a problem hiding this comment.
The following help texts are copy&pasted and need to be adjusted. Ideally help texts don't just repeat the metric name, but provide actual additional information.
collector/qdisc_linux.go
Outdated
| ch <- c.drops.mustNewConstMetric(float64(msg.Drops), msg.IfaceName, msg.Kind) | ||
| ch <- c.requeues.mustNewConstMetric(float64(msg.Requeues), msg.IfaceName, msg.Kind) | ||
| ch <- c.overlimits.mustNewConstMetric(float64(msg.Overlimits), msg.IfaceName, msg.Kind) | ||
| //fmt.Printf("%+v\n", m) |
|
@grobie: thanks for your review! The error in ema/qdisc was ignored by mistake, that's fixed now. |
|
Cool. As @mdlayher said, it would be great to add a test for this collector. Otherwise this looks good enough for me, but I don't really know qdisc. |
collector/qdisc_linux.go
Outdated
| bytes: typedDesc{prometheus.NewDesc( | ||
| prometheus.BuildFQName(Namespace, "qdisc", "bytes_total"), | ||
| "Number of bytes sent.", | ||
| []string{"iface", "kind"}, nil, |
There was a problem hiding this comment.
Looks like most network collectors use the label "device" instead of "iface". Can I get a confirmation from maintainers?
There was a problem hiding this comment.
Yes, it looks like node_network_... uses device as the label. I would suggest using that for consistency.
|
Also this would need to be mentioned in the README. |
Update github.com/ema/qdisc dependency to revision 2c7e72d, which includes unit testing.
|
I've added unit tests for qdisc and end-to-end testing to the collector. Let me know if this looks sane! :) |
|
I don't like the inclusion of test code in the collector itself. In this case, it's probably better to write tests in |
|
@grobie the idea of implementing e2e tests that way came from the wifi collector: https://github.com/prometheus/node_exporter/blob/master/collector/wifi_linux.go#L296 I'm open to alternative suggestions, but I'd like to know why you dislike the approach in this case. Thanks! |
|
@ema thanks for the link. I believe runtime code should not include any test code. It's a smell of bad interfaces if that's necessary. This statement also applies to wifi_linux, I wasn't aware of that collector. Given we already include such testing style in master, I'm fine merging this PR for now. I'll open a new issue/PR to propose a different test setup. |
|
@grobie I agree that the current setup isn't ideal, and would be happy to hear any suggestions you might have in mind. Because the collector constructor functions can't accept any arguments, it can be difficult to test them as of now. |
|
The "factories" map expects any Let's tackle that in a follow up PR. 👍 |
* Add qdisc collector for Linux This collector gathers basic queueing discipline metrics via netlink, similarly to what `tc -s qdisc show` does. * qdisc collector: nl-specific code moved, names fixed - netlink-specific parts moved to github.com/ema/qdisc - avoid using shortened names - counters renamed into XXX_total * Get rid of parseMessage error checking leftover * Add github.com/ema/qdisc to vendored packages * Update help texts and comments * Add qdisc collector to README file * qdisc collector end-to-end testing * Update qdisc dependency to latest version Update github.com/ema/qdisc dependency to revision 2c7e72d, which includes unit testing. * qdisc collector: rename "iface" label into "device"
This collector gathers basic queueing discipline metrics via netlink,
similarly to what
tc -s qdisc showdoes.