-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binding recovery retry should recover all bindings on the recovered queue #667
Conversation
|
||
// we want recovery to fail when recovering the 2nd binding | ||
// give the 2nd recorded binding a bad queue name so it fails | ||
final RecordedBinding binding2 = ((AutorecoveringConnection)connection).getRecordedBindings().get(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is brutal :D
So for every binding out of N, we will now recover N bindings, meaning N^2 |
Removed e2e binding logic as it doesn't apply in this usecase
This retry logic only kicks in when a binding recovery has failed due to a 404 error from the queue being deleted. And it will only recover bindings for the specific queue that was deleted, not all queues. I have now tweaked the logic to only recover the queue bindings that were recovered before the failed binding. |
Yes, this makes more sense. @acogoluegnes any objections from you? |
Thanks! |
Binding recovery retry should recover all bindings on the recovered queue (cherry picked from commit 4cd1e64)
Forward-ported to |
Proposed Changes
When a binding recovery fails and retry logic kicks in, the code should recreate all of the bindings on the destination and not just the binding that failed.
For a queue with a large amount of bindings, its possible that some of the first binding recoveries succeeded, but then the queue got deleted and a later binding failed. In that scenario, we need to make sure all the earlier bindings that were already recovered are created as well.
Background: We hit this scenario on a cluster that had a large amount of queues and bindings. The queue in question had an expiry policy set on it that caused the queue to get deleted before all the binding recoveries had finished. The retry logic did its job and recreated the queue and the failed bindings. But we were silently missing the 1st set of bindings that had originally passed. We made a change to the policy to increase the expiry time so this scenario shouldn't happen again. But I thought it best to implement this fix as well.
Types of Changes
Checklist
Put an
x
in the boxes that apply. You can also fill these out after creatingthe PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments
I believe these tests should work... However, I am on windows and had issues getting the Host.rabbitmqctl() pieces to work. So when running locally i commented those pieces out and manually dropped the connections and queues and the tests were passing that way.