Skip to content

Conversation

@kjnilsson
Copy link
Contributor

@kjnilsson kjnilsson commented Sep 30, 2022

This fixes a bug where the cancellation of a consumer then a subsequent consume with the same consumer tags whilst there are still messages in flight from the previous "incarnation" of the consumer would cause the channel to crash.

The reason for this was that the subsequent delivery would have an unexpected (for a new consumer) message id which would cause the channel to trigger the code paths it executes if a delivery got lost between the queue and the channel. This is extremely rare given most queue deliveries are done from the local node.

Triggering this code cause another bug where fetching messages would no longer work since 3.10 when messages stopped ever being kept in memory.

This PR fixes both bugs.

NB: it may seem like we are modifying the rabbit_fifo:apply/3 function without incrementing the machine version. We are but we are only modifying the Reply of the checkout command, not the actual state of the queue itself so this does not invalidate determinism. The current code never relies on the return value of this operation so this is safe to change as we also handle the old reply format.

Fixes #5927

@kjnilsson kjnilsson changed the title issue #5927 Issue #5927 Sep 30, 2022
@kjnilsson kjnilsson changed the title Issue #5927 Fix channel crash when cancelling then consuming using the same consumer tag and channel Oct 3, 2022
By returning the next msg id for a merged consumer the rabbit_fifo_client
can set it's next expected msg_id accordingly and avoid triggering
a fetch of "missing" messages from the queue.
As since QQ v2 we don't ever keep any messages in memory and we need
to read them from the log. The only way to do this is by using an
aux command.

Execute get_checked_out query on local members if possible

This reduces the change of crashing a QQ member during a live upgrade
where the follower does not have the appropriate code in handle_aux
@kjnilsson kjnilsson marked this pull request as ready for review October 3, 2022 15:11
@kjnilsson kjnilsson requested a review from ansd October 3, 2022 15:11
@michaelklishin michaelklishin merged commit b31f23c into main Oct 4, 2022
@michaelklishin michaelklishin deleted the gh_5927 branch October 4, 2022 07:20
michaelklishin added a commit that referenced this pull request Oct 4, 2022
Fix channel crash when cancelling then consuming using the same consumer tag and channel (backport #5944)
michaelklishin added a commit that referenced this pull request Oct 4, 2022
Fix channel crash when cancelling then consuming using the same consumer tag and channel (backport #5944) (backport #5985)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Quorum queues: Consumer cancel followed by consume using the same same consumer tag can cause channel crash

3 participants