Queue#reset leaks consumer tags #40

jpr5 · 2011-03-22T17:17:27Z

Mainly applies to when reconnections without application restarts occur. This manifests for us in our monitoring systems that are looking for N consumers of a particular queue in RabbitMQ to be present, and N starts growing as transient network errors happen (due to heartbeat being enabled).

Solution is to Queue#unsubscribe and Queue#cancelled inside Queue#reset.

Small patch here: cloudcrowd@417700e

The text was updated successfully, but these errors were encountered:

michaelklishin · 2011-04-17T01:45:48Z

Ack. We will apply this to 0.7 branch soon.

botanicus · 2011-04-17T10:07:44Z

Here you go 3d7f4d4, thanks for the patch. It shall be in the 0.7.2 release.

botanicus · 2011-04-18T20:33:13Z

Michael's comment: "It is and it has to be handled carefully: #reset may be used by connection failure handlers and channel/connection-level error handlers. There is no guarantee that AMQP connection or even TCP connection is still open."

botanicus · 2011-04-18T20:37:07Z

Michael, I'd presume that this should be solved in these error handlers (reconnect and THEN reset). Any reason why it shouldn't be done this way?

jpr5 · 2011-04-18T22:58:18Z

Re: Michael's comment - yeah, lately I've seen some circumstances where this can cause other problems.

Problems arise from #unsubscribe sending an AMQP Basic::Cancel Frame when it's not expected. For example, if you haven't yet subscribed (Declare + Consume), this causes the server to Close the Channel (and next publish attempt will raise a AMQP::ChannelClosedError), which effectively means loss of all useful function in the most common ruby-amqp usage cases, sometimes without knowing it. From the POV of a user of the ruby-amqp library, they'd have to catch this exception, re-initiate the connection and then re-attach their subscribe block to the queue. Or maybe they can get away with re-allocating a primary Channel, if no other state is screwed ... but that seems unlikely and impractical for users anyway.

This problem can get obscured when the connection is dropped after the Cancel etc. is initiated but before it completes. I'm not certain offhand about the specific conditions, but I recall it looked like the Frame(s) had buffered on the client side, and then went out after the connection re-established (but before my async connection_status handler invoked and re-subscribed to the queue), thus causing the server to Close the Channel. In the cases where my async connection_status handler won the race, I could still be left in a broken state because my client thought it was subscribed but the server didn't (thus no Deliver's).

The first case is, IMHO, a consequence of not properly tracking state -- Queue#unsubscribe shouldn't send a (errant) Frame to the server if it isn't currently being subscribed to.

In the latter circumstance, this is probably just an artifact from not clearing buffers when re-connecting. Though, I'll admit sometimes this has had a positive consequence: when the server drops a client because of a missed Heartbeat during an over-delayed ACK response, the logical Publish response to the Deliver/block invocation is still sent after the connection is re-established. Still, this is more often a surprise and I'd favor behaviour that's consistent and stable over behaviour that's sometimes good (resending logical responses) and sometimes bad (wedges the reactor/my code sitting on top of the reactor).

ghost assigned michaelklishin Apr 17, 2011

botanicus closed this as completed Apr 17, 2011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queue#reset leaks consumer tags #40

Queue#reset leaks consumer tags #40

jpr5 commented Mar 22, 2011

michaelklishin commented Apr 17, 2011

botanicus commented Apr 17, 2011

botanicus commented Apr 18, 2011

botanicus commented Apr 18, 2011

jpr5 commented Apr 18, 2011

Queue#reset leaks consumer tags #40

Queue#reset leaks consumer tags #40

Comments

jpr5 commented Mar 22, 2011

michaelklishin commented Apr 17, 2011

botanicus commented Apr 17, 2011

botanicus commented Apr 18, 2011

botanicus commented Apr 18, 2011

jpr5 commented Apr 18, 2011