-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expiring queues should dead-letter messages when a dead-letter-exchange is defined #29
Expiring queues should dead-letter messages when a dead-letter-exchange is defined #29
Conversation
…o dead-letter messages in an expiring queue when a dead-letter-exchange was provided
Found an edge case in this code: when using a DLX that points to the expiring queue, the dead-letter feature sees this as a cycle and does not re-publish the messages. I tried calling rabbit_bindings:remove_from_destination to remove the bindings before dead-lettering the messages but it appears to still detect the cycle. All this to say, this PR needs more work. |
…remove bindings from both the expiring queue (destination) and DLX exchange if they exist to prevent consistent hash routing DL'd messages to the expiring queue.
This update cleans up the way the original code was written in addition to resolving the undesired behavior when used with a consistent-hash exchange as the DLX. Interested to hear back on whether this seems like a feasible approach. @simonmacmullen expressed concerns about this feature in general when used with large queues that have messages on disk. I'd imagine the performance would be slow, but I know that I could cause similar behavior today if I set a DLX and Message TTL on a queue, threw tons of messages at it until they hit disk and then started expiring. |
Any word on this pull request? I could also use this functionality. It's very hard to have expiring queues without this. I'm trying to have a system that routes to queues that are transient, and I want to have a combination of an alternate exchange and dead lettering to ensure that messages that don't get consumed end up in a fallback queue. If I unbind each transient queue before its consumer goes away, I can make sure the messages expire and get dead lettered before the queue goes away, and once the queue is unbound the messages flow through the alternate exchange. There's big hole though if the consumer dies suddenly though: if a queue expires while messages are flowing into it, those messages are lost. If there's a workaround I'd be open to it, but this pr seems like a fix for a dangerous bug. |
There is still no consensus about whether this feature is worth including, given the edge cases (e.g. huge queues with most of messages on disk). |
@michaelklishin - just following up: I realize the edge case could cause performance issues but the same edge case already exists for any queue that is backlogged with a dead-letter exchange and long enough message TTL. In other words: there are already other ways one can create performance problems if one doesn't take the proper precautions when designing their topology. Would the Rabbit team consider including cautionary language around this feature as part of the documentation? I've built several systems around Rabbit as the messaging backbone - there has not been a single case where message loss was preferable to temporary performance degradation. The easy way to prevent the performance issue with this feature is to ensure your queues aren't being overrun. With a consistent-hash-exchange (or exchange-queue pairs for each service instance), it's rather simple to solve for this by simply spinning up an additional service instance. Without this feature, we're left with developing an entire subsystem to monitor queues and consumers that waits for a queue's consumer to drop, unbinds the queue from everything, and directs an existing consumer to pick up the slack then monitors the queue so it can be removed when empty. This is actually quite problematic and has the follow undesirable consequences:
Distributed mutual exclusion is a very difficult problem to solve. I believe that with the inclusion of this feature (or a selective-consumer feature), Rabbit would be uniquely positioned to remove or at least greatly reduce the role of a distributed locking mechanism. Without this capability, we're left with complex work-arounds and a new set of problems to address. |
I am positive about including this feature. Let's wait for other team members' opinions. |
I agree this seems a useful feature. It may even be what users expect when using TTL on a queue. |
I don't think anyone is debating whether dlx-on-queue-delete is a useful feature in the abstract. It totally is! My concerns with the design of this pull request are:
Yes theoretically this issue exists already with messages getting DLXed by other means. But it's quite hard to hit; this patch makes it very easy. There's a good chance deleting any large queue with a DLX set would do it. So this really needs addressing.
I also have concerns with the implementation:
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2014-January/033354.html Now, @rade was not too concerned with offering solutions there beyond "it's complicated". But I would suggest that doing this in the termination phase (after You still have the issue with "dark queues" he mentioned, but there is now infrastructure that could let you fix that too; we can show
This could be extended to support |
@simonmacmullen - I can try to address your concerns but I won't pretend that I understand Rabbit well enough to handle it. It's been over a year since I've even looked at this code, I was actually pleasantly surprised to see the PR hadn't been closed. If someone on the core team wanted to take on adding the feature properly, addressing your concerns, we can certainly close this out. Otherwise, do you have recommendations on how I could start working towards addressing your concerns and get some team feedback? Would you prefer I just continue making commits to this branch and request review? Thanks for being open to discussion and for the guidance provided so far. |
I'll take the liberty to close this. Not because it's a feature we don't want but because the implementation doesn't cover some edge cases we are quite worried about. |
Add file_cache_handler and buffer tuning back
* Faster implementation of scan_index * avoid jumping through the file all the time (especially skipping back which invalidates read-ahead buffer) * TODO: scan_index is still called when attaching by timestamp which is unnccesarry * TODO: binary search (I have an implementation locally but need to integrate it) * All tests pass We can now better handle corner cases such as end of log or truncated log. All tests pass without changing them so there's a chance we maintain behaviour in all cases. * Incorporate PR feedback * pass #seg_info * check if offset within the segment range * handle (unlikely) eof * refactor how PreviousChunk is passed * return offset_out_of_range in unlikely corener cases * Return `eof`, instead of `{error, eof}` Co-authored-by: Michal Kuratczyk <mkuratczyk@pivotal.io>
Present Behavior
Even when a dead-letter-exchange is defined for a queue that has a TTL set (x-expires), when the queue expires, the messages are lost.
Use Case
One approach to providing isolation guarantees in addition to horizontal scalability is to allow every service instance to have it's own queue. Unfortunately, if the connection is severed between the service instance and its queue, (or the instance crashes) messages are 'orphaned' in the queue with no elegant way to detect this issue and recover the messages. (per-message TTL is very problematic because its dependent upon average message process time and queue depth)
Test Case
In a system that needs to provide some level of isolation guarantees (say by using a consistent hashing exchange using correlation id for hashing), each new service instance would define its own queue bound to the consistent hash exchange with a TTL and the consistent hash exchange as the dead-letter-exchange.
Messages were published to the consistent hash exchange and split amongst queues. One queue's consumer is terminated and after the TTL expires, the queue is removed and all messages are republished to the consistent hash exchange and effectively redistributed to the remaining queues.
Request For Comments
Please let me know if this behavior would be undesirable, it fits our use case perfectly but if there's a different set of configurations we could use to get the same benefits, we'd like to know about those.
If there are problems with the code as its written, I'd be happy to make any changes necessary in order for this PR to be suitable for further consideration.
If my implementation is so problematic that it would be easier to just ignore and have a core committer add the feature properly, I totally understand. I tried it partly for the learning experience and partly out of a desire to potentially contribute to an excellent piece of software I've been using every day for a long time.
Workflow
It looks like GitHub is not the normal avenue for submitting changes but I'm far less familiar with Hg and wasn't sure how to (or if I could) create a ticket in your system for a feature in order to match the branch to the ticket number.
It says a lot about RabbitMQ's team and implementation that I could pull something like this off in a few hours of source diving. Sorry if I've made a big mess, as I mentioned, this would be a huge win for us going forward if we could get this behavior implemented.