Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vsphere input - [inputs.vsphere] Unable to find metric name for id %d. Skipping!411 #7940

Closed
jstiops opened this issue Aug 5, 2020 · 9 comments

Comments

@jstiops
Copy link

jstiops commented Aug 5, 2020

Relevant telegraf.conf:

[[inputs.vsphere]]
   vcenters = [ "https://hostname/sdk" ]
   username = ""
   password = ""
     
  interval = "60s"
  insecure_skip_verify = true
  force_discover_on_init = true

  # Exclude all historical metrics
  datastore_metric_exclude = ["*"]
  cluster_metric_exclude = ["*"]
  datacenter_metric_exclude = ["*"]


  #historical stats
 [[inputs.vsphere]]
   vcenters = [ "https://hostname/sdk" ]
   username = ""
   password = ""
   
  interval = "300s"
  insecure_skip_verify = true
  force_discover_on_init = true
  object_discovery_interval = "600s"
  host_metric_exclude = ["*"] # Exclude realtime metrics
  vm_metric_exclude = ["*"] # Exclude realtime metrics

  use_int_samples = true

System info:

using Telegraf v1.15.2 on a Windows 2012 R2 x64 OS connecting to a vCenter 6.7 u3 (latest patches, 6.7.0.44000).
vcenter contains a single cluster with 3 hosts on ESXi 6.7 EP 09 (build 13644319).

Problem:

telegraf started to produce these errors in the telegraf.log upon every interval of the realtime stats collection. I have no idea why this started. I did some troubleshooting and checked my grafana setup and saw the graphs stopped displaying data in graphs that filter on clustername. There was no clustername in my influxdb measurement anymore. When i stopped filtering on clustername, the graph showed data. It seems the clustername stopped being send to output influxdb as a tag at some point.

Other stats seem to still get pushed out to influxdb.

2020-08-05T07:28:00Z I! [inputs.vsphere] Unable to find metric name for id %d. Skipping!411
2020-08-05T07:28:00Z I! [inputs.vsphere] Unable to find metric name for id %d. Skipping!411
2020-08-05T07:28:00Z I! [inputs.vsphere] Unable to find metric name for id %d. Skipping!411
2020-08-05T07:28:00Z I! [inputs.vsphere] Unable to find metric name for id %d. Skipping!411
2020-08-05T07:28:00Z I! [inputs.vsphere] Unable to find metric name for id %d. Skipping!411
@ssoroka
Copy link
Contributor

ssoroka commented Aug 5, 2020

it looks like it uses the counterId to look up the name from the vmware vSphere api and it's not finding anything. Either because the id isn't in the list or because the names aren't being returned. I'm not really sure. You said it worked previously? Was that a different version of Telegraf, or something else?

@jstiops
Copy link
Author

jstiops commented Aug 5, 2020

Grafana stopped displaying the clustername after i upgraded telegraf to the latest release. But even before that, these errors were in the log. So i'm not that sure the errors are related to the fact that clustername not being sent out as a tag to influxdb anymore.
I do have to say, this is the only vSphere environment that has this problem out of quite a few i use telegraf against.
Is there a way i can debug the metric that's being looked up but not found? To see what the errors are about or something?

@ghost
Copy link

ghost commented Aug 14, 2020

I just updated telegraf and I've been having the exact same issue: the "clustername" tag is skipped. It is either not gathered from the vSphere API or it is lost in translation and not forwarded to InfluxDB.

Is there any specific information I can collect to make troubleshooting easier?

@prydin
Copy link
Contributor

prydin commented Aug 19, 2020

I will take a look at this. Looks like this is a regression in 1.15. Probably related to the issue with missing storage tags.

@prydin
Copy link
Contributor

prydin commented Aug 19, 2020

As for the error messages, they are typically benign. It just means that a counter that vCenter said existed when we scanned the metadata ended up not existing when we pulled the actual data.

@bimw520
Copy link

bimw520 commented Sep 7, 2020

i am test on Vsphere 6.7.45000 and CentOS 7 x64
after downgrade to telegraf.x86_64 0:1.14.5-1 on Centos 7 problem is resolved.
for temporary use.

@prydin
Copy link
Contributor

prydin commented Dec 3, 2020

@ssoroka This should be fixed.

@sumptersmartt
Copy link

Is this fixed in PR #8505?

@bimw520
Copy link

bimw520 commented Dec 18, 2020

Is this fixed in PR #8505?

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants