New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to do on Error: Unexpected close #151
Comments
Please post RabbitMQ log files and a script that you use to reproduce. |
It is difficult to recreate an unexpected closure of the socket and have the script not blow up, but here is the best I have got on a friday evening:
If we call this script (sorry for the coffeescript) then we get:
Then we call this script:
we get:
So the core issue is that somewhere 100 messages are in a buffer or somewhere. I have written auto-reconnect code that can handle the Unexpected close and reopen the amqp connection, but without being able to verify that the message was not received by rabbit, I cannot try an resend it. The core problem is that when I call the |
What is in RabbitMQ logs? Given that it happens when or around the time |
I ran the scripts again and got the same results, the trace shows only receiving about 800 messages before the connection ends. Is there a way in amqp.node to be notified that a message was not sent? The rabbitmq log
The rabbitmq trace
This x803 times |
Absolutely: just like regular I/O, messages are buffered, and if they can't be written to the connection, they have to be thrown away. You can use a |
Is there a way to get feedback on which messages make it to the queue, or which ones get thrown away so that my application code can retry posting them? |
confirmationChannels have a similar issue to above where if the connection dies just at the right time then the confirmation of delivery is lost and if retried the message is sent twice. Given I am writing an "at-most-once' kind of system, I was hoping for a cheap client approach to finding most the failed messages. |
In principle the client library could tell you when it writes a message to the socket buffer, but that's the best it can do I think. There's no guarantee that written to the socket means received by the broker (or even transmitted to the broker), of course. But you want to know which messages were definitely not sent. How would you expect (or imagine) this to be represented in the API? |
Could the Then the only problem occurs when a message is written to the socket but never made it to rabbit. This would hopefully reduce the number of non-sent messages. |
I have a similar or maybe the same problem,
if the channel is closed (because of a connection problem) then publish does not return callbacks for unacknowledged message. |
@chiarishow: the callback on a confirmation channel seems to maybe fire if it is close to a connection problem. At this point either you try send the message again if you are a "send-at-least-once" system, or don't if you are a 'send-at-most-once' system. This issue is that if the library knows that the message was NOT sent, to somehow tell the client, instead of telling the client only if it knows the message WAS sent. |
Yeah, it will be very useful to know if the message was not sent. |
I put up a probably horrible solution for my problem, I modified the C.toClosed function of channel.js and I fired the unacknowledged callback when the channel is closed. C.toClosed = function(capturedStack) {
this._rejectPending();
invalidateSend(this, 'Channel closed', capturedStack);
this.accept = invalidOp('Channel closed', capturedStack);
this.connection.releaseChannel(this.ch);
this.unconfirmed.forEach(function sendFail(callbackUnconfirmed) {
if (callbackUnconfirmed !== null) {
callbackUnconfirmed(new Error("Channel closed, no ack received"));
}
});
this.emit('close');
}; Of course, this is not a perfect solution because, as grahamjenson said, the connection can die while the producer is waiting for the ack (but it's a rare case). I don't know if there's a better way to know if the message was not sent. |
@chiarishow That is not too bad, although usually you would be doing your own bookkeeping, and therefore now that if the callback isn't called, the message wasn't acknowledged. There is still the case of messages that could not have been acked because they were never transmitted to RabbitMQ.
It could, but that's third in line for a Maybe such a callback could be a publish option; a bit icky, but not without some precedent, and I struggle to think of where else to put it other than in another argument. |
@squaremo well you are right but usually callback are always called. So in order to understand if a message was acknowledged or not I must listen to the channel "close" event and this is really tricky... Thank you anyway |
@squaremo why not use the call back given to the publish method for more than just confirmChannels, e.g. the err parameter could be used to differ between nack and failed to send. so when you call This would not break the current API, and give the option to listen for when messages fail? |
I totally agree with @grahamjenson solution |
@grahamjenson @chiarishow Yes, I think it is a simple and good idea. Just to run this thought by you: it may be surprising to current users if they start getting what look like nacked messages but are in fact untransmitted messages. Nacks are supposed to be quite rare and indicate a nasty failure, so some people may take drastic measures when they receive them. So: instead of using the error type to distinguish between the failure cases, what about an extra argument to the callback?
A normal channel might take the same callback, with the expectation that nack will never be given a value. Alternatively, maybe the third argument could be "sent"? (It's a bit of a shame they end up in the "wrong order". Oh well.) |
Well it's a good solution too! |
If it doesn't break the existing API and solves the problem I am happy :) |
I think I have a similar issue but in my case I am reading from a message queue rather than publishing to it:
Reading and understanding this whole thread will take a while but I will try - otherwise any insights/comments would be useful. This is the first time I have seen this error with this code. I am using |
Help me ! events.js:182 Error: Unexpected close |
same with me. Please help!!!! node v6.11 and amqplib v0.5.1 |
@imran-uk @vietlv @appableDev After I adjust haproxy timeout config refer |
There is no universal advice on what what to do when a TCP connection to RabbitMQ or intermediary is closed. Best piece of advice is: understand why it happened the best you can first. @squaremo there are no specific improvements suggested in this thread and the underlying reasons are plenty. I think this should be closed as a question. |
Actually, I'm going to post an edited and expanded copy of that response since many never follow any links and read all comments unless it's a wall of text you cannot possibly miss. What Do I Do When Client Connection to RabbitMQ is Interrupted?This thread seems to be a honeypot for all socket operation (in particular read) failures and many Missed (Client) HeartbeatsFirst common reason is missed heartbeats detected by RabbitMQ. When this happens, RabbitMQ will add a log entry about it and then close connection, per specification requirements.
With clients where I/O operations are not concurrent to consumer operations (I believe this is the case here), if consumer operations take longer than the heartbeat timeout, RabbitMQ will detect missed client heartbeats. Disabling heartbeats may An Intermediary Closes "Inactive" TCP ConnectionsSecond common reason: TCP connection is closed by an intermediary (e.g. a proxy or load balancer).
it means that client's TCP connection was closed before AMQP 0-9-1 (this client's) connection was. Sometimes this is harmless and means that apps do not close connections before they terminate. Not very nice but has no functional downsides. Such log entries could also indicate a failing client application process (irrelevant in this thread), or, quite commonly, a proxy closing the connection. Proxies and load balancers have TCP connection inactivity timeouts, mentioned in the Heartbeats guide). They often range from 30s to 5m. Other Connection Lifecycle Log EntriesBelow entries are not necessarily related to failed socket writes but it's worth explaining them If you see just the following in RabbitMQ logs:
(without any mentions or heartbeats, unexpectedly closed TCP connection, connection errors, or any timed out socket writes on RabbitMQ's end), it means that a client connection was cleanly and successfully closed, and it was the application that initiated it. This guy
means that RabbitMQ attempted to write to a socket but that operation timed out. If you see this Other Possible Reasons? TCP Connections Can Fail.In most other cases, a failed socket write is just that, a failed socket write. Network connections can fail or degrade. This client or RabbitMQ cannot avoid that. This is exactly why messaging protocols such as AMQP 0-9-1, STOMP, MQTT have introduced heartbeats, which in this client doesn't serve its purpose very well. An Alternative with Unfortunate Up Defaults: TCP KeepalivesTCP keepalives can be (and were meant to be, if it wasn't for Linux defaults that were great in 1990s but not any more) used as an alternative. Connection RecoveryAutomatic connection recovery is a feature several RabbitMQ clients have supported for years, e.g. Java client and Bunny. This is considered to be a solved problem by Team RabbitMQ, Hopefully this explains what may be going on here and why this issue cannot be once and for all addressed by this client, although a functioning concurrent heartbeat implementation would help a lot, as it does in other clients. |
Same issue for me all of a sudden its causing server restarts..Although I have guarded with .on(error.. check at all places as per the documentation. NOTE: Is it right to throw err on error event, as in documentation I can see it is just always logged and not thrown back to callee method...is this I am doing wrong?
Please help! |
TL;DRQ1. What to do on unexpected close? Q2. How know which messages were sent and which were not? |
A bug occurs when an "Error: Unexpected close" happens:
on('error', ...)
handlerThe messages are added before the error occurs so there is no way to handle it while writing the messages.
I think this happens because the stream/socket buffer is not empty, and when the stream dies the messages are lost with no feedback to the user.
Is there any way to find out what messages did not make it onto the queue, so that once reconnected they can be re-published?
The text was updated successfully, but these errors were encountered: