New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle duplicated messages? #111
Comments
I think you're right. Redis with HyperLogLog should be great in this case. I have a lock/counting service I need to open source in like ages (half a year now), it would be ideal for this, and you would access it via a RESTful interface. This would also not be Sneakers specific so you can use it out-of-band with other processes in your architecture. |
I'm not familiar with HyperLogLog algorythm yet. But after some googling, I agree that it's can help a lot. |
Note that HyperLogLog (and bloom filters) are approximations. Any strategy you choose will require having IDs on all messages: is this the case for Sneakers today? |
Another thought: The only thing we need is to ensure that message is unique for some |
Could you use the If you were to extend Sneakers with some functionality supplied by Redis, I could also see adding support for running workers in some sort of isolation, similar to Resque's lonely_job (https://github.com/wallace/resque-lonely_job). In fact some ideas, such as the "key", might be useful even for the duplicate message case. The default could be to use I find both of these use cases come up quite a lot, but more and more I'm leaning towards solving these outside of the queuing system as each of these techniques winds up causing bottlenecks and other performance related issues at the queue level. |
I'm totally agree that this functionality should be implemented somewhere outside of Sneakers. Oh. Here is another feature request: callbacks. It would be great to have ability do run some code before and after job processed. Is this feature a case for Sneakers? |
Yup, @michaelklishin is correct - HLL is an approximation. It will suit cases where you don't want to perform a redundant job - because you don't want to "pay" for it, and if you did perform a redudant job approx %0.18 of the times - it can be tolerated. Callbacks and lonley_job are interesting - for a long while now I was wanting to also provide a generic base worker types such as:
|
That would be great. |
There is my temporary workaround with forever-alone gem: class MyWorker
include Sneakers::Worker
from_queue :email
def work msg
begin
ForeverAlone.new(msg, 30.minutes).ensure
MyJobProcessor.new(msg).perform
ack!
rescue ForeverAlone::MessageIsNotUnique => ex
reject!
end
end
end |
As said in RMQ documentation:
There is some scenarios where consumers can not handle messages in a idempotent way (email delivery, file operations, etc).
redelivered
flag is not a silver bullet, I think. In order to perform deduplication the consumer should have a list of recent messages and reject a duplicated messages. All workers of the consumer should have an access to this list. I think, the best way is to use Redis for this purpose.Ideally, there should be some option for Sneakers to enable messages check for uniqueness before processing them.
What do you think?
The text was updated successfully, but these errors were encountered: