-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telegraf writing metrics twice, but one is wrong? #2041
Comments
cc @phemmer |
I've just tried removing Relay altogether i.e. writing straight to the backends and the issue persists regardless. |
How are you generating these log files? Are you running Can you do an
|
Output is here An abridged output of just the one interface I was using as an example, is as follows:
Not in the config for that one device, no; that's its entire configuration. The intention is to build a resilient, load balanced platform. I have dismantled this for the purposes of fault finding and the setup is now pretty basic (Telegraf > InfluxDB)
It looks like this may be the case. Having taken the snmpwalk output, I can see it seems to be getting the incorrect value from the next object/branch (not sure on terminology) in the tree:
And when I isolate its output in snmpwalk:
ifName is identical. Is this plugin indexing on the value of the ifName field as opposed to the OID/ID (again, terminology)? |
The plugin internally indexes on the OID, so that is why both rows are present, and not merged together. However when reporting, unless you grab enough fields to ensure uniqueness, and use There is another feature request (#1948) to allow adding the OID index as a tag, which would also help differentiating two rows when all tags are identical. But for the time being, the only way to differentiate them would be to either add |
I get you. Whilst reading this, I noticed the following from the SNMP RFC:
The Cisco device I was looking at is doing just this. It was presenting multiple devices with the same ifName. In my specific use case, one solution to this is to add IF-MIB::ifType as a tag. The multiple devices which share the same name are indeed related, but of varying interface types. I don't see any way of distinguishing the ifName values on the device itself, as it's effectively composed of more than one device behind the scenes and as we can see from SNMP. Thank you both for your efforts on this, I'll close this off. |
Bug report
Relevant telegraf.conf:
System info:
[Include Telegraf version, operating system name, and other relevant details]
Telegraf v1.1.0 (git: release-1.1.0 8ecfe13)
InfluxDB shell version: 1.0.2
InfluxDB Relay: latest
Steps to reproduce:
I have setup a system as described here:
Everything works as I expect, except that Telegraf appears to be writing a particular metric twice, once incorrectly.
It only happens in certain circumstances. Generally it affects certain switches with a particularly high counter value. In my example, we're interested in ifHCInOctets, for host Edi.Core1, interface name (ifName) Vl351
Expected behavior:
Telegraf writes a metric once, correctly. It should only write the following:
ifHCInOctets=340772969979493i
Actual behavior:
Telegraf writes a metric twice, once correctly and once with a false value:
Additional info:
These log files reveal the multiple writes to one metric, across two separate writes, since you can see ifHCInOctets is duplicated:
If I poll continuously the relevant OID, from the same one machine as is running Telegraf, for however long, there is never one instance where the OID counter value deviates:
Let me know if you need any more information to help investigate this.
The text was updated successfully, but these errors were encountered: