New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout with inputs.snmp polling squid-cache #9286
Comments
Using the same telegraf config, but outside the docker container seems to work as expected. So it seems there is some relation with docker. |
In that case, it seems more relevant to create an issue here: https://github.com/influxdata/influxdata-docker But looking at your stack trace, do you also have the snmp_trap plugin configured? As it seems that is the one sending this panic. And what version of telegraf does the container have? |
inside the container: I have no name!@TelegrafHost:/$ telegraf --version The stack trace is indeed from the snmp_trap plugin unable to bind to the socket during the --test run which makes sens because it is already in use by the main process. I would agree with you it's a docker only thingy if an snmpget inside the container had the same issue. but snmpget works fine, telegraf does not. Weird. |
Could you update the output with the results of a test without snmp_trap plugin configured? |
WIthout it, no panic occurs. I updated the report to just the timeout error. Should I file a new bug report for the SNMP trap plugin panic? It's not really bothering me. |
That would be nice. The panic should not occur if it is because the port is already used for listening, an error message would be expected in that case. About your current issue; While I still think it is a docker related problem (since you specified it works perfectly when not running in docker) and it should be filed at the correct repository, I saw this that might have a clue: So you are doing snmp request towards 192.168.3.1, but the response comes from a totally different IP. That might be the reason why it is not working? Do you have an idea where this comes from? And does that also happen when running telegraf on the host instead of in a container? |
You have a good point there. The reply isn't properly NATed. Telegraf does the right thing here and rejects the packet while snmpget does not seem to care that the reply is from a different IP address. This page more or less confirms that. the docker container has IP 172.18.0.5/16, it's gateway to the outside world is 172.18.0.1. This is definitely a docker issue. Sorry for wasting your time. |
You're welcome. I'm closing this issue now, but please keep us in the loop so we also know what the solution/fix for this issue is/was. |
Hello! I recommend posting this question in our Community Slack or Community Page, we have a lot of talented community members there who could help answer your question more quickly. |
It seems that it isn't a docker thing either. The Squid-Cache daemon was replying to the packet with an IP based on the routing table. So it choose the docker_gwbridge interface (172.18.0.1) because of the source IP (172.18.0.5). I fixed it by explicitly by configuring the SNMP IP address in squid,conf with the snmp_incoming_address option, this IP wil then also used for snmp_outgoing_address. The underlying question is if it is required to reply SNMP packets with using the IP address it receives the request on as source IP addres for the reply. Squid-Cache and net-snmp don't think so, Telegraf does and enforces it This seems a larger discussion. |
Indeed, I was also thinking if telegraf was really doing the right thing. But indeed, if I could not find any traces of gosnmp actually being changed on this topic, could you point me the exact PR where this is changed? |
I think PR#277 is the one. |
@reimda I think we still need to implement gosnmp's |
same here. snmpwalk and snmpget from console works but within telegraf a timeout happens even if the hosts are in same subnet and no NAT is there. Seems to be related with snmp version 1 - version 2c works like expected. but the device i must monitor only has version 1 snmp. so what can i do? if i do a tcpdump i see that telegraf sends the request and also gets correct response. Telegraf also listens on the correct ports - so it must be a software related error inside telegraf :( i could also upload the pcap file if this would be helpful. Maybe this is related to #8271 |
maybe this is also related: gosnmp/gosnmp#47 (comment) |
@reimda WDYT? |
@reimda? Maybe @MyaLongmire can also be of help here? |
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you! |
Thank you all for addressing this problem, it works now, I'm able to query data:
|
@OliverTUBAF glad to hear it's working. Did you test a new version or the pr linked above? |
With the latest official release (1.21.4) it did not work, then I tried the pr build (1.22.0) and there it worked. |
@OliverTUBAF Thank you for the clarification! This pr is getting cleaned up and will hopefully be in the next release :) |
Relevant telegraf.conf:
System info:
Running Ubuntu in telegraf:latest docker image:
Docker
Steps to reproduce:
Expected behavior:
From the CLI inside the docker container snmpget works as expected.
Actual behavior:
docker logs show timeout errors on this specific agent
(lot's of these)
Additional info:
Other input.snmp agents work, just this specific one time's out.
a tcpdump gives this when telegraf polls:
but an snmpget from the CLI does not give the ICMP error. (telegraf stops listening at the port before the answer is received?)
test run output:
The text was updated successfully, but these errors were encountered: