-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuring idle incoming connections timeout #330
Comments
Yes, the relay closes connections when they are "idle".
This indeed was necessary to prevent the relay from becoming unreachable due to (Windows) clients which opened for every metric a new connection, and never closed any. Totally off-topic, but if you can send over UDP, then that would be an option perhaps, the Linux kernel guarantees UDP packages arrive for localhost IIRC. |
Unfortunately, we cannot use UDP. We've also tested with a setup where the client writes to carbon-relay which forwards to carbon-c-relay on localhost, but carbon-relay also had troubles with the connection:
So for the moment, we've switched to carbon-relay-ng on the monitoring hosts relaying to carbon-c-relay on the Graphite host. Unfortunately, our test environment did not experience this error, so the error seems to be triggered only in production load. |
I'd be happy to understand better what's going wrong here. Is it the fact that the connection gets closed and the client (icinga2) not being very happy about this? |
FYI: carbon-relay-ng does NOT timeout connections at the moment, but it may get it at some point: grafana/carbon-relay-ng#250 (comment) If disconnects are the problem here, it means your problem would be back. |
I can definitely see to making the timeout configurable. Regardless whether you use c-relay, I think it is in general a good thing to do. The timeout the outgoing connections use can also be specified. |
For Issue #330 it is useful to disable the idle disconnection logic, for it breaks the client on the next send it does. Using the -E flag, this behaviour can be disabled now, and as such the bad interaction avoided.
Hi, |
There might be 100 other reasons why a connection gets broken. IMHO the sender should try to reconnect but it seems the Icinga2 authors disagree with me. Try lowering your /proc/sys/net/ipv4/tcp_keepalive_time , that solves 90% of all network issues... |
Thanks for you answer ! Tried that but it did not helped me... anyway, that is not a big deal :) |
Our setup:
The client application and relay1 are on the same host, as are relay2 and the carbon cache.
The client application monitors remote servers and generates in average 200M metrics/24h. We noticed that the client receives frequently a "Connection refused" error while talking to the relay1:
The carbon-c-relay process is running with the following configuration:
We sniffed the communication between the client and the relay and noticed that the relay closes the connection and directly afterwards, the client tries to send data for the already closed TCP session, which gets only a [RST, ACK] reply by the network stack. We also discuss this with the Icinga2 team in this issue.
From the discussion in issue #118 we gathered that there is an idle timeout in the carbon-c-relay to safeguard the relay from possible connection leaks of clients.
Is it likely that the client doesn't handle the idle timeout correctly and this leads to our problem as described in the Icinga2 issue? And would it be possible to make the idle timeout configurable?
The text was updated successfully, but these errors were encountered: