Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

At-least-once Quorum Queue dead-lettering #3100

Closed
Tracked by #3121
kjnilsson opened this issue Jun 11, 2021 · 3 comments
Closed
Tracked by #3121

At-least-once Quorum Queue dead-lettering #3100

kjnilsson opened this issue Jun 11, 2021 · 3 comments
Assignees
Milestone

Comments

@kjnilsson
Copy link
Contributor

As explained in the dead-lettering Safety section moving messages between queues using DLX has at-most-once delivery guarantees and thus cannot be considered safe. This violates the safety guarantees provided by quorum queues as messages can get lost if anything goes wrong during the move.

Moving messages between queues using the shovel can be safe when using (as are the defaults) both publisher and consumer acknowledgements.

Given the above observations we could model dead lettering inside quorum queues as a dedicated (qq internal) queue containing any discarded messages. This "discard queue" is consumed by a dedicated "discard consumer" that only receives messages that have been discarded. The consumer (which is a separate erlang process) works a bit like a special shovel that consumes discarded messages and re-routes them according to the dlx configuration for the queue. Once the process receives all publisher confirms for a given message it will ack the consumed discard message and thus ensuring that the message isn't removed until it has been safely delivered to the dlx target queue(s).

Even with this approach it is possible that a message doesn't receive all confirms needed to ack and remove the message from the source queue. For quorum queues this could cause excessive log growth as the source queue will need to retain the discarded message until it has been acked. To handle this case the forwarding processes would still need to ack the message after a given time and/or retried deliveries. To ensure the message isn't completely lost we could introduce a "trash can": node local stream where we write all messages that cannot be delivered to dlx target queues within some time frame.

To ensure availability of the consuming process we can spawn it as a companion process to the QQ leader thus ensuring that it is always available when there is a leader to process commands. If necessary we could later pool these processes if we don't want to add another one for each quorum queue but it may not be necessary as long as they do not set too large a prefetch and hibernate when idle.

@edbyford
Copy link

edbyford commented Nov 4, 2021

Duplicate of rabbitmq/data-plane#1. Keeping due to context in the tickets.

@kjnilsson kjnilsson changed the title Safer Quorum Queue dead-lettering At-least-once Quorum Queue dead-lettering Nov 11, 2021
@edbyford
Copy link

Jepsen tests not given us enough confidence that messages are not being forwarded from source QQ to target QQ. Requires some workarounds as it stands.

Some edge scenarios (on deletion and recreation of target queues) similar issues occurring.

@kjnilsson
Copy link
Contributor Author

This was done in #3121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants