At-least-once Quorum Queue dead-lettering #3100

kjnilsson · 2021-06-11T09:06:38Z

As explained in the dead-lettering Safety section moving messages between queues using DLX has at-most-once delivery guarantees and thus cannot be considered safe. This violates the safety guarantees provided by quorum queues as messages can get lost if anything goes wrong during the move.

Moving messages between queues using the shovel can be safe when using (as are the defaults) both publisher and consumer acknowledgements.

Given the above observations we could model dead lettering inside quorum queues as a dedicated (qq internal) queue containing any discarded messages. This "discard queue" is consumed by a dedicated "discard consumer" that only receives messages that have been discarded. The consumer (which is a separate erlang process) works a bit like a special shovel that consumes discarded messages and re-routes them according to the dlx configuration for the queue. Once the process receives all publisher confirms for a given message it will ack the consumed discard message and thus ensuring that the message isn't removed until it has been safely delivered to the dlx target queue(s).

Even with this approach it is possible that a message doesn't receive all confirms needed to ack and remove the message from the source queue. For quorum queues this could cause excessive log growth as the source queue will need to retain the discarded message until it has been acked. To handle this case the forwarding processes would still need to ack the message after a given time and/or retried deliveries. To ensure the message isn't completely lost we could introduce a "trash can": node local stream where we write all messages that cannot be delivered to dlx target queues within some time frame.

To ensure availability of the consuming process we can spawn it as a companion process to the QQ leader thus ensuring that it is always available when there is a leader to process commands. If necessary we could later pool these processes if we don't want to add another one for each quorum queue but it may not be necessary as long as they do not set too large a prefetch and hibernate when idle.

edbyford · 2021-11-04T16:32:28Z

Duplicate of rabbitmq/data-plane#1. Keeping due to context in the tickets.

edbyford · 2022-02-15T14:35:45Z

Jepsen tests not given us enough confidence that messages are not being forwarded from source QQ to target QQ. Requires some workarounds as it stands.

Some edge scenarios (on deletion and recreation of target queues) similar issues occurring.

kjnilsson · 2022-03-24T16:38:29Z

This was done in #3121

kjnilsson mentioned this issue Sep 20, 2021

QQ: introduce new machine version (2) #3121

Merged

8 tasks

edbyford assigned ansd Nov 4, 2021

kjnilsson changed the title ~~Safer Quorum Queue dead-lettering~~ At-least-once Quorum Queue dead-lettering Nov 11, 2021

kjnilsson closed this as completed Mar 24, 2022

michaelklishin added this to the 3.10.0 milestone Mar 24, 2022

mkuratczyk mentioned this issue Oct 21, 2022

Dead lettering en masse can overload the DLQ #5312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

At-least-once Quorum Queue dead-lettering #3100

At-least-once Quorum Queue dead-lettering #3100

kjnilsson commented Jun 11, 2021

edbyford commented Nov 4, 2021 •

edited

edbyford commented Feb 15, 2022

kjnilsson commented Mar 24, 2022

At-least-once Quorum Queue dead-lettering #3100

At-least-once Quorum Queue dead-lettering #3100

Comments

kjnilsson commented Jun 11, 2021

edbyford commented Nov 4, 2021 • edited

edbyford commented Feb 15, 2022

kjnilsson commented Mar 24, 2022

edbyford commented Nov 4, 2021 •

edited