Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of duplicate ifDescr in provided config #37

Closed
knweiss opened this issue Jun 3, 2016 · 24 comments
Closed

Improve handling of duplicate ifDescr in provided config #37

knweiss opened this issue Jun 3, 2016 · 24 comments

Comments

@knweiss
Copy link

knweiss commented Jun 3, 2016

I'm testing snmp_exporter on a HP ProCurve 3500yl-48G switch with firmware K.16.01.0004 and ROM K.15.30 and noticed the following periodic warning in the Prometheus log:

time="2016-06-03T14:48:35+02:00" level=warning msg="Error on ingesting samples with different value but same timestamp" numDropped=23 source="scrape.go:470"

Turning on -log.level=debug shows the reason: There are duplicate interfaces with the name Switch loopback interface and this triggers 23 debug-level log entries:

time="2016-06-03T15:01:35+02:00" level=debug msg="Sample discarded" error="sample with repeated timestamp but different value" sample=ifAdminStatus{ifDescr="Switch loopback interface", instance="192.168.59.100", job="snmp"} => 1 @[1464958894.34] source="scrape.go:460" 
time="2016-06-03T15:01:35+02:00" level=debug msg="Sample discarded" error="sample with repeated timestamp but different value" sample=ifMtu{ifDescr="Switch loopback interface", instance="192.168.59.100", job="snmp"} => 65535 @[1464958894.34] source="scrape.go:460"
[...]

ifNumber is 54:

# snmpwalk -v2c -c public 192.168.59.100 1.3.6.1.2.1.2.1
IF-MIB::ifNumber.0 = INTEGER: 54

However, ifDescr shows 61 interface descriptions and the last eight are identical:

# snmpwalk -v2c -c public 192.168.59.100 1.3.6.1.2.1.2.2.1.2
IF-MIB::ifDescr.1 = STRING: 1
IF-MIB::ifDescr.2 = STRING: 2
[...]
IF-MIB::ifDescr.48 = STRING: 48
IF-MIB::ifDescr.49 = STRING: A1
IF-MIB::ifDescr.50 = STRING: A2
IF-MIB::ifDescr.51 = STRING: A3
IF-MIB::ifDescr.52 = STRING: A4
IF-MIB::ifDescr.578 = STRING: DEFAULT_VLAN
IF-MIB::ifDescr.4800 = STRING: Switch loopback interface
IF-MIB::ifDescr.4801 = STRING: Switch loopback interface
IF-MIB::ifDescr.4802 = STRING: Switch loopback interface
IF-MIB::ifDescr.4803 = STRING: Switch loopback interface
IF-MIB::ifDescr.4804 = STRING: Switch loopback interface
IF-MIB::ifDescr.4805 = STRING: Switch loopback interface
IF-MIB::ifDescr.4806 = STRING: Switch loopback interface
IF-MIB::ifDescr.4807 = STRING: Switch loopback interface

Is there a nice workaround to prevent these Prometheus warnings?

@brian-brazil
Copy link
Contributor

The default configuration provided presumes that ifDescr is unique for a given switch. If you remove the lookups from those parts of the config it'll just use the index instead.

@knweiss
Copy link
Author

knweiss commented Jun 3, 2016

Thanks, this works. I've cloned the "default" module as "procurve" and removed the lookups entries.

@brian-brazil
Copy link
Contributor

Thinking a bit I believe I've got a solution.

What we want is to use the ifDescr as a label as it's really useful. However it's not unique on all devices.

How does supporting something like "$ifDescr ($index)" and using it in the example config sound?
@SuperQ @RichiH

@brian-brazil brian-brazil changed the title ProCurve: Sample with different value but same timestamp Improve handling of duplicate ifDescr in provided config Jun 6, 2016
@SuperQ
Copy link
Member

SuperQ commented Jun 6, 2016

I would prefer these to be separate labels for ifDescr and index.

@brian-brazil
Copy link
Contributor

brian-brazil commented Jun 6, 2016

That's a bit user-unfriendly, as then every expression needs to deal with both those labels being present.

@RichiH
Copy link
Member

RichiH commented Jun 6, 2016

I would prefer to have them in two distinct labels, as well. The problem is that I see @brian-brazil 's point; in the generic case, you will always expect every tuple of instance and ifDescr to be unique. If, admittedly shoddy, hardware makes Prometheus break this expectation, things might go awry elsewhere.

On the other hand, having ifDescr's name behave differently than all other labels also breaks expectations.

As a compromise of sort, I would lean towards only falling back to $ifDescr ($index) if and only if there's more than one ifDescr. That's still icky, but less icky and unique both ways.

@brian-brazil
Copy link
Contributor

Basically, if we have index as a label it will be the only label as it's sufficient to uniquely describe the time series.

That's still icky, but less icky and unique both ways.

I'd presume there's hardware out there where the interface are dynamic, and thus that wouldn't work out.

@RichiH
Copy link
Member

RichiH commented Jun 6, 2016

So you mean if the reply is

IF-MIB::ifDescr.52 = STRING: A4
IF-MIB::ifDescr.578 = STRING: DEFAULT_VLAN
IF-MIB::ifDescr.4800 = STRING: Switch loopback interface

we get

foo{ifDescr="A4"}
foo{ifDescr="DEFAULT_VLAN"}
foo{ifDescr="Switch loopback interface"}

and if we receive

IF-MIB::ifDescr.52 = STRING: A4
IF-MIB::ifDescr.578 = STRING: DEFAULT_VLAN
IF-MIB::ifDescr.4800 = STRING: Switch loopback interface
IF-MIB::ifDescr.4801 = STRING: Switch loopback interface
IF-MIB::ifDescr.4802 = STRING: Switch loopback interface
IF-MIB::ifDescr.4803 = STRING: Switch loopback interface
IF-MIB::ifDescr.4804 = STRING: Switch loopback interface

we get

foo{ifDescr="A4"}
foo{ifDescr="DEFAULT_VLAN"}
foo{ifDescr="4800"}
foo{ifDescr="4801"}
foo{ifDescr="4802"}
foo{ifDescr="4803"}
foo{ifDescr="4804"}

correct?

As an alternative we might look at

foo{ifDescr="OID 4800"}

@brian-brazil
Copy link
Contributor

No, as that'd change the labels for the 4800 timeseries. It'd have to be some value that always contained both the id and the ifDescr.

@RichiH
Copy link
Member

RichiH commented Jun 6, 2016

I think what you're saying is my initial interpretation. So you mean:

foo{ifDescr="Switch loopback interface (4800)"}

which looks fine to me, provided that

foo{ifDescr="DEFAULT_VLAN"}

stays the same and does not become

foo{ifDescr="DEFAULT_VLAN (578)"}

@brian-brazil
Copy link
Contributor

No, we'd always need to do DEFAULT_VLAN (578) as another DEFAULT_VLAN might come along at any time and we need the old one to keep having the same labels.

@RichiH
Copy link
Member

RichiH commented Jun 6, 2016

That would break Cisco IOS unless you set a specific option. IOS counts physical and then virtual interfaces up during startup. Add interfaces during runtime and it counts up sequentially from there. Reboot the machine with different set up interface line cards, or virtual interfaces, and the OIDs are different.

snmp mib persist is used very often for this very reason, but you can't rely on it being set.

@cron2 this is to CC you

@RichiH
Copy link
Member

RichiH commented Jun 6, 2016

To make the above explicit, this is why network operators (of non-broken hardware) rely on ifDescr as identifier. I suspect I don't have to explain why having the label change arbitrarily is a Bad Thing in Prometheus.

@SuperQ
Copy link
Member

SuperQ commented Jun 6, 2016

It may be more "user unfriendly" but I feel I've been bitten too many times by denormalized labels and have to resort to regexps too often. This to me is more "user unfriendly".

@SuperQ
Copy link
Member

SuperQ commented Jun 6, 2016

Yes, for "non-broken hardware", users can rely on the ifDescr alone and ignore the index label.

@brian-brazil
Copy link
Contributor

By the SNMP RFCs, is a device with duplicate ifDescr's broken?

@brian-brazil
Copy link
Contributor

@RichiH Are there any disadvantages to setting snmp mib persist?

@RichiH
Copy link
Member

RichiH commented Jun 6, 2016

@brian-brazil Consensus amongst a bunch of networking people in #networker is "it's most likely forbidden by RFCs to have the same name, but we are not sure"; as there's a bunch of applicable RFCs, no one was motivated to really drill down.

That being said, strong consensus was that the HP hardware is behaving badly and in an unexpected way.

There's not downside to setting snmp mib persist, but you can't expect it to be set in all cases. Plus, if you replace hardware, that's an internal storage and not in the config you can (easily) back up, so a hardware replacement will reset the OIDs.

Thus, consensus is that it's better to design for the common case "no snmp mib persist (or non-Cisco equivalents) than for outliers like the HP hardware". Conditional, dynamic suffixes of ($OID) or conditional, dynamic labels of index (or something less generic than index) would be better.

@RichiH
Copy link
Member

RichiH commented Jun 6, 2016

And if conditional is not an option, I strongly prefer @SuperQ 's approach.

@brian-brazil
Copy link
Contributor

It sounds then that what we're currently doing is the least-worst choice.

@RichiH
Copy link
Member

RichiH commented Jun 6, 2016

A work-around for @knweiss is most likely appreciated by him, but with my networking hat on, the default should be to not modify ifDescr in any way.

@Svedrin
Copy link

Svedrin commented Jul 19, 2016

I'm running into the same issue, and "putting stuff in strings" sounds exactly like what I hated most about Nagios: Stringly-typed everything. I'd also vote for keeping the fields separately, and not modifying the ifDescr.

@brian-brazil
Copy link
Contributor

As indicated above, that doesn't work for Cisco devices out of the box.

Our problem here is that there is no standard or structure we can work off, so we have no choice but to use strings.

@brian-brazil
Copy link
Contributor

This is as good as it gets unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants