Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in plugin: performing bulk walk for field myfield-program: Request timeout (after 3 retries) #8271

Closed
waqaskhan137 opened this issue Oct 15, 2020 · 6 comments
Labels
area/snmp support Telegraf questions, may be directed to community site or slack

Comments

@waqaskhan137
Copy link

waqaskhan137 commented Oct 15, 2020

Every SNMP command like snmpwalk, snmptable and snmpbulkwalk are working but telegraf commands are being timeout.

Relevant telegraf.conf:

[agent]
    interval = "60s"
    debug = true
    hostname = "10.32.7.170"
    round_interval = true
    flush_interval = "10s"
    flush_jitter = "0s"
    collection_jitter = "0s"
    metric_batch_size = 1000
    metric_buffer_limit = 10000
    quiet = false
    logfile = "/var/log/telegraf/telegraf.log"
    omit_hostname = true

[[outputs.influxdb]]
    urls = ["http://influxdb:8086"]
    database = "monitoring"
    timeout = "0s"
    username = "admin"
    password = "P@ssword11"
    retention_policy = ""

[[inputs.snmp]]
  agents = ["udp://172.21.53.59:1061"]
  interval = "60s"
  timeout = "5s"
  version = 3
  sec_name = "nms-user"
  auth_protocol = "SHA"
  auth_password = "secret12"
  sec_level = "authPriv"
  priv_protocol = "AES"
  priv_password = "secret12"

[[inputs.snmp.table]]
name = "averageSpeedOfAnswerVdn"
oid = "OVERSIGHT-MIB::myfield"

System info:

Telegraf 1.15.3

Docker

version: "3"
services:
  influxdb:
    container_name: nms-influxdb
    image: influxdb
    environment:
      - INFLUXDB_DB=monitoring
      - INFLUXDB_ADMIN_USER=admin
      - INFLUXDB_ADMIN_PASSWORD=Password
      -
    ports:
      - "8083:8083"
      - "8086:8086"
    volumes:
      - influxdb-data:/var/lib/influxdb
    restart: always

  grafana:
    container_name: nms-grafana
    image: grafana/grafana
    ports:
      - "3002:3000"
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
    restart: always

  telegraf:
    container_name: nms-telegraf
    image: telegraf
    ports:
      - "162:162/udp"
    volumes:
      - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf
      - /var/run/docker.sock:/var/run/docker.sock
    restart: always

volumes:
  influxdb-data:

Steps to reproduce:

  1. Start telegraf with the given configurations
  2. observe logs

Expected behavior:

It should collect data and inset in influxdb

Actual behavior:

2020-10-14T21:51:00Z W! [inputs.snmp] Collection took longer than expected; not complete after interval of 1m0s
2020-10-14T21:51:22Z E! [inputs.snmp] Error in plugin: agent udp://172.21.53.59:1061: gathering table myfield: performing bulk walk for field myfield-program: Request timeout (after 3 retries)
2020-10-14T21:52:00Z W! [inputs.snmp] Collection took longer than expected; not complete after interval of 1m0s
2020-10-14T21:52:42Z E! [inputs.snmp] Error in plugin: agent udp://172.21.53.59:1061: gathering table averageSpeedOfAnswerVdn: performing bulk walk for field averageSpeedOfAnswerVdn-program: Request timeout (after 3 retries)
2020-10-14T21:53:00Z W! [inputs.snmp] Collection took longer than expected; not complete after interval of 1m0s

Additional info:

@waqaskhan137 waqaskhan137 added the bug unexpected problem or unintended behavior label Oct 15, 2020
@reimda
Copy link
Contributor

reimda commented Oct 15, 2020

From the errors it looks like the snmp agent is not responding. You may want to make sure you can get the table using the snmptable command from net-snmp to confirm that the snmp agent is responding before trying to get it with telegraf.

It might also help to try again with telegraf but without docker to rule out docker as the problem.

@reimda reimda added support Telegraf questions, may be directed to community site or slack and removed bug unexpected problem or unintended behavior labels Oct 15, 2020
@waqaskhan137
Copy link
Author

From the errors it looks like the snmp agent is not responding. You may want to make sure you can get the table using the snmptable command from net-snmp to confirm that the snmp agent is responding before trying to get it with telegraf.

It might also help to try again with telegraf but without docker to rule out docker as the problem.

Thank you for reply @reimda

snmptable and snmpwalk or snmpbulkwalk are working fine but it seems to be the only thing with the telegraf.
And I don't see any problem with the docker yml file is attached in the question above.

@affanshahid
Copy link

Having the same problem. Running telegraf in a docker container causes: ...performing bulk walk for field ...: Request timeout (after 3 retries). If I remove docker however everything works fine.

@affanshahid
Copy link

affanshahid commented Nov 30, 2020

So more information: I can open bash inside the telegraf container and run snmptable and snmpget against the agent just fine. The agent is running on the same machine. So basically snmpget works fine from the host machine and from inside the telegraf container but telegraf itself fails with the above error.

Running telegraf on the machine directly also works just fine. Also I tried adding the agent to the telegraf container network and using docker's inter-container networking and everything started working perfectly.

Edit: Running the agent on a separate machine also works fine. Also using https://github.com/qoomon/docker-host as a middle-man also works just fine.

@byrdchris
Copy link

I had this exact issue on Centos 7.9 outside of docker.
Telegraf, snmpwalk, snmptable, etc tests worked without issue but any version of Telegraf from 1.14-1.16 hit the "[inputs.snmp] Collection took longer than expected;" after approximately 10-12 checks

I adjusted timeouts, intervals, connection jitter etc, and tested against a few agents all with similar results.

I then tested my same configuration, but with SNMP v2 and have had zero failures.

@telegraf-tiger
Copy link
Contributor

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/snmp support Telegraf questions, may be directed to community site or slack
Projects
None yet
Development

No branches or pull requests

5 participants