-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.4.4 -> 3.5.1 Timeout #68
Comments
3.5 uses timers for heartbeats. I suspect there's an interesting side-effect to that that we haven't foreseen. |
Can you please do a wire capture with Wireshark or our Tracer tool (writes to stdout)? Use a small heartbeat timeout value, e.g. 5 seconds. |
Tracer doesn't seem to want to cooperate:
I did find the following in the logs:
|
I could reproduce it but it's sporadic. Investigating. |
With this change .NET client uses the same approach as the Java one. Previously in 3.5.x we've used two timers that re-scheduled themselves and used auto-reset events to be notified about connection activity on the I/O loop. This turned out to have some issues: * It was clever but fairly complicated * Picking the wrong heartbeat interval value led to sporadic failures due to race conditions [on the wire] * It didn't play well with socket timeouts Our new strategy is more straightforward: heartbeat writer uses a periodic timer that always sends a frame, regardless of recent activity, every 1/2nd a timeout. Given that heartbeat timeouts are often in minutes, this is not unreasonable but possibly the most straightforward implementation possible. It also doesn't suffer from race conditions on the wire because we send a heartbeat frame every half the interval, so slight inaccuracy in .NET runtime scheduling of timers won't get in our way with reasonable (> 1 second) heartbeat intervals. Heartbeat "reader" now relies on the socket receive timeout, just like in the Java client. This is more straightforward and there is no dissonance between I/O timeouts and the one we use for incoming heartbeats. Various cases when a SocketException could indicate a problem other than a timeout (usually early in the connection lifecycle) are accounted for. References #68.
Can someone please build the client from branch rabbitmq-dotnet-client-68 and give it a try? I believe it fixes the issue (there was a couple of issues, in fact). I'll leave a test running for 6-8 hours. |
I'll give it a try tomorrow. Thanks for your help. |
I now get a different error, although it may be related: The |
That message means that TCP connection was closed by the server: what's in RabbitMQ log? Please always consult RabbitMQ log when investigating a heartbeat issue. Any activity counts for a heartbeat, including acks. Values < 5 seconds are generally unnecessary and may result in false positives with slow networks, although I've added tests with values from 2 to 6. |
I've just pushed something that should improve the situation for very low heartbeats (1-3 seconds). Please pull, rebuild and give it another try. |
The branch I'm pushing to is rabbitmq-dotnet-client-68. |
I was still able to reproduce timeouts after a longer period of time. Back to the drawing board. |
I just tried the update. From the log:
|
What you see is RabbitMQ closing TCP connection due to a detected heartbeat, then the client fails a write. |
With the most recent precision fix I've been running 500 connections with intervals from 1 to 7 seconds for about an hour now. @btecu we are ready for another round :) |
It works perfectly! Thanks. |
My tests have been running for a few hours and also see not a single heartbeat timeout. Will merge soon. |
Rework heartbeat handling, fixes #68
I change the milestone to 3.6.0 because it was merged to |
So it won't make it into 3.5.2? |
Sorry, disregard my comment, my working copy of the stable branch was out-of-date. It will be in 3.5.2. |
It should be in stable already. MK
|
@michaelklishin Thanks for the help! |
I have a small wrapper using this library on .NET 4.5.
I've updated from 3.4.4 to 3.5.1 and now, the connection gets closed and it crashes whenver I try to awknowledge a message:
AMQP close-reason, initiated by Library, code=0, text="End of stream"
I've found that this only happens when processing the message takes more than the current
RequestHeartbeat
. So if I increase it, itAck()
fine, otherwise it closes the connection and if throws.Has anything changed related to this or am I using
RequestHeartbeat
in the wrong way?The text was updated successfully, but these errors were encountered: