Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binding recovery retry should recover all bindings on the recovered queue #667

Merged
merged 4 commits into from
Feb 15, 2021
Merged

Conversation

vikinghawk
Copy link
Contributor

@vikinghawk vikinghawk commented Feb 13, 2021

Proposed Changes

When a binding recovery fails and retry logic kicks in, the code should recreate all of the bindings on the destination and not just the binding that failed.

For a queue with a large amount of bindings, its possible that some of the first binding recoveries succeeded, but then the queue got deleted and a later binding failed. In that scenario, we need to make sure all the earlier bindings that were already recovered are created as well.

Background: We hit this scenario on a cluster that had a large amount of queues and bindings. The queue in question had an expiry policy set on it that caused the queue to get deleted before all the binding recoveries had finished. The retry logic did its job and recreated the queue and the failed bindings. But we were silently missing the 1st set of bindings that had originally passed. We made a change to the policy to increase the expiry time so this scenario shouldn't happen again. But I thought it best to implement this fix as well.

Types of Changes

  • [ x] Bugfix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation (correction or otherwise)
  • Cosmetics (whitespace, appearance)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.

  • [x ] I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

I believe these tests should work... However, I am on windows and had issues getting the Host.rabbitmqctl() pieces to work. So when running locally i commented those pieces out and manually dropped the connections and queues and the tests were passing that way.

@vikinghawk vikinghawk changed the title Binding recovering retry should recover all bindings on the recovered queue Binding recovery retry should recover all bindings on the recovered queue Feb 13, 2021

// we want recovery to fail when recovering the 2nd binding
// give the 2nd recorded binding a bad queue name so it fails
final RecordedBinding binding2 = ((AutorecoveringConnection)connection).getRecordedBindings().get(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is brutal :D

@michaelklishin
Copy link
Member

So for every binding out of N, we will now recover N bindings, meaning N^2 queue.bind operations will be performed for every queue? This sounds really excessive.

Removed e2e binding logic as it doesn't apply in this usecase
@vikinghawk
Copy link
Contributor Author

This retry logic only kicks in when a binding recovery has failed due to a 404 error from the queue being deleted. And it will only recover bindings for the specific queue that was deleted, not all queues.
But yes, the logic as I had it was potentially recovering more than it needed to and could cause later bindings to be recovered twice.

I have now tweaked the logic to only recover the queue bindings that were recovered before the failed binding.

@michaelklishin
Copy link
Member

Yes, this makes more sense. @acogoluegnes any objections from you?

@acogoluegnes acogoluegnes merged commit 4cd1e64 into rabbitmq:5.x.x-stable Feb 15, 2021
@acogoluegnes
Copy link
Contributor

Thanks!

michaelklishin pushed a commit that referenced this pull request Feb 15, 2021
Binding recovery retry should recover all bindings on the recovered queue

(cherry picked from commit 4cd1e64)
@michaelklishin
Copy link
Member

Forward-ported to master.

acogoluegnes added a commit that referenced this pull request Feb 15, 2021
References #667
acogoluegnes added a commit that referenced this pull request Feb 15, 2021
References #667

(cherry picked from commit a759d59)
@acogoluegnes acogoluegnes added this to the 5.11.0 milestone Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants