Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
More atomic poller operations #843
I recently had an issue where my application lost a very large number of jobs. The problem was with future scheduled jobs being popped from the scheduled queue but never pushed onto one of the worker queues. Here are the steps that happened:
The end result was that most of the jobs were irretrievably lost. This had also happened two weeks earlier during some other redis maintenance.
This pull request changes the logic in Poller to pop messages one at a time from the retry and schedule queues and immediately push them to the appropriate worker queue. The new logic is:
I couldn't figure out a way to do small batches since the zrembyscore method doesn't take a limit. I think popping them one at a time would do the most to reduce race conditions. The best way to solve the issue would be an atomic pop and push lua script but that would tie sidekiq to redis 2.6 (maybe a feature for 3.0).