-
Notifications
You must be signed in to change notification settings - Fork 143
Queue#reset leaks consumer tags #40
Comments
Ack. We will apply this to 0.7 branch soon. |
Here you go 3d7f4d4, thanks for the patch. It shall be in the 0.7.2 release. |
Michael's comment: "It is and it has to be handled carefully: #reset may be used by connection failure handlers and channel/connection-level error handlers. There is no guarantee that AMQP connection or even TCP connection is still open." |
Michael, I'd presume that this should be solved in these error handlers (reconnect and THEN reset). Any reason why it shouldn't be done this way? |
Re: Michael's comment - yeah, lately I've seen some circumstances where this can cause other problems. Problems arise from #unsubscribe sending an AMQP Basic::Cancel Frame when it's not expected. For example, if you haven't yet subscribed (Declare + Consume), this causes the server to Close the Channel (and next publish attempt will raise a AMQP::ChannelClosedError), which effectively means loss of all useful function in the most common ruby-amqp usage cases, sometimes without knowing it. From the POV of a user of the ruby-amqp library, they'd have to catch this exception, re-initiate the connection and then re-attach their subscribe block to the queue. Or maybe they can get away with re-allocating a primary Channel, if no other state is screwed ... but that seems unlikely and impractical for users anyway. This problem can get obscured when the connection is dropped after the Cancel etc. is initiated but before it completes. I'm not certain offhand about the specific conditions, but I recall it looked like the Frame(s) had buffered on the client side, and then went out after the connection re-established (but before my async connection_status handler invoked and re-subscribed to the queue), thus causing the server to Close the Channel. In the cases where my async connection_status handler won the race, I could still be left in a broken state because my client thought it was subscribed but the server didn't (thus no Deliver's). The first case is, IMHO, a consequence of not properly tracking state -- Queue#unsubscribe shouldn't send a (errant) Frame to the server if it isn't currently being subscribed to. In the latter circumstance, this is probably just an artifact from not clearing buffers when re-connecting. Though, I'll admit sometimes this has had a positive consequence: when the server drops a client because of a missed Heartbeat during an over-delayed ACK response, the logical Publish response to the Deliver/block invocation is still sent after the connection is re-established. Still, this is more often a surprise and I'd favor behaviour that's consistent and stable over behaviour that's sometimes good (resending logical responses) and sometimes bad (wedges the reactor/my code sitting on top of the reactor). |
Mainly applies to when reconnections without application restarts occur. This manifests for us in our monitoring systems that are looking for N consumers of a particular queue in RabbitMQ to be present, and N starts growing as transient network errors happen (due to heartbeat being enabled).
Solution is to Queue#unsubscribe and Queue#cancelled inside Queue#reset.
Small patch here: cloudcrowd@417700e
The text was updated successfully, but these errors were encountered: