Skip to content
This repository has been archived by the owner on Sep 11, 2022. It is now read-only.

Queue#reset leaks consumer tags #40

Closed
jpr5 opened this issue Mar 22, 2011 · 5 comments
Closed

Queue#reset leaks consumer tags #40

jpr5 opened this issue Mar 22, 2011 · 5 comments
Assignees
Milestone

Comments

@jpr5
Copy link
Contributor

jpr5 commented Mar 22, 2011

Mainly applies to when reconnections without application restarts occur. This manifests for us in our monitoring systems that are looking for N consumers of a particular queue in RabbitMQ to be present, and N starts growing as transient network errors happen (due to heartbeat being enabled).

Solution is to Queue#unsubscribe and Queue#cancelled inside Queue#reset.

Small patch here: cloudcrowd@417700e

@michaelklishin
Copy link
Member

Ack. We will apply this to 0.7 branch soon.

@ghost ghost assigned michaelklishin Apr 17, 2011
@botanicus
Copy link
Contributor

Here you go 3d7f4d4, thanks for the patch. It shall be in the 0.7.2 release.

@botanicus
Copy link
Contributor

Michael's comment: "It is and it has to be handled carefully: #reset may be used by connection failure handlers and channel/connection-level error handlers. There is no guarantee that AMQP connection or even TCP connection is still open."

@botanicus
Copy link
Contributor

Michael, I'd presume that this should be solved in these error handlers (reconnect and THEN reset). Any reason why it shouldn't be done this way?

@jpr5
Copy link
Contributor Author

jpr5 commented Apr 18, 2011

Re: Michael's comment - yeah, lately I've seen some circumstances where this can cause other problems.

Problems arise from #unsubscribe sending an AMQP Basic::Cancel Frame when it's not expected. For example, if you haven't yet subscribed (Declare + Consume), this causes the server to Close the Channel (and next publish attempt will raise a AMQP::ChannelClosedError), which effectively means loss of all useful function in the most common ruby-amqp usage cases, sometimes without knowing it. From the POV of a user of the ruby-amqp library, they'd have to catch this exception, re-initiate the connection and then re-attach their subscribe block to the queue. Or maybe they can get away with re-allocating a primary Channel, if no other state is screwed ... but that seems unlikely and impractical for users anyway.

This problem can get obscured when the connection is dropped after the Cancel etc. is initiated but before it completes. I'm not certain offhand about the specific conditions, but I recall it looked like the Frame(s) had buffered on the client side, and then went out after the connection re-established (but before my async connection_status handler invoked and re-subscribed to the queue), thus causing the server to Close the Channel. In the cases where my async connection_status handler won the race, I could still be left in a broken state because my client thought it was subscribed but the server didn't (thus no Deliver's).

The first case is, IMHO, a consequence of not properly tracking state -- Queue#unsubscribe shouldn't send a (errant) Frame to the server if it isn't currently being subscribed to.

In the latter circumstance, this is probably just an artifact from not clearing buffers when re-connecting. Though, I'll admit sometimes this has had a positive consequence: when the server drops a client because of a missed Heartbeat during an over-delayed ACK response, the logical Publish response to the Deliver/block invocation is still sent after the connection is re-established. Still, this is more often a surprise and I'd favor behaviour that's consistent and stable over behaviour that's sometimes good (resending logical responses) and sometimes bad (wedges the reactor/my code sitting on top of the reactor).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants