Binding recovery retry should recover all bindings on the recovered queue #667

vikinghawk · 2021-02-13T04:48:55Z

Proposed Changes

When a binding recovery fails and retry logic kicks in, the code should recreate all of the bindings on the destination and not just the binding that failed.

For a queue with a large amount of bindings, its possible that some of the first binding recoveries succeeded, but then the queue got deleted and a later binding failed. In that scenario, we need to make sure all the earlier bindings that were already recovered are created as well.

Background: We hit this scenario on a cluster that had a large amount of queues and bindings. The queue in question had an expiry policy set on it that caused the queue to get deleted before all the binding recoveries had finished. The retry logic did its job and recreated the queue and the failed bindings. But we were silently missing the 1st set of bindings that had originally passed. We made a change to the policy to increase the expiry time so this scenario shouldn't happen again. But I thought it best to implement this fix as well.

Types of Changes

[ x] Bugfix (non-breaking change which fixes issue #NNNN)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation (correction or otherwise)
Cosmetics (whitespace, appearance)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.

[x ] I have read the CONTRIBUTING.md document
I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
All tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in related repositories

Further Comments

I believe these tests should work... However, I am on windows and had issues getting the Host.rabbitmqctl() pieces to work. So when running locally i commented those pieces out and manually dropped the connections and queues and the tests were passing that way.

michaelklishin · 2021-02-13T08:36:04Z

src/test/java/com/rabbitmq/client/test/functional/TopologyRecoveryRetry.java

+
+        // we want recovery to fail when recovering the 2nd binding
+        // give the 2nd recorded binding a bad queue name so it fails
+        final RecordedBinding binding2 = ((AutorecoveringConnection)connection).getRecordedBindings().get(1);


This test is brutal :D

michaelklishin · 2021-02-13T08:48:37Z

So for every binding out of N, we will now recover N bindings, meaning N^2 queue.bind operations will be performed for every queue? This sounds really excessive.

Removed e2e binding logic as it doesn't apply in this usecase

vikinghawk · 2021-02-14T03:16:01Z

This retry logic only kicks in when a binding recovery has failed due to a 404 error from the queue being deleted. And it will only recover bindings for the specific queue that was deleted, not all queues.
But yes, the logic as I had it was potentially recovering more than it needed to and could cause later bindings to be recovered twice.

I have now tweaked the logic to only recover the queue bindings that were recovered before the failed binding.

michaelklishin · 2021-02-14T06:30:35Z

Yes, this makes more sense. @acogoluegnes any objections from you?

acogoluegnes · 2021-02-15T15:58:31Z

Thanks!

Binding recovery retry should recover all bindings on the recovered queue (cherry picked from commit 4cd1e64)

michaelklishin · 2021-02-15T16:23:01Z

Forward-ported to master.

References #667

References #667 (cherry picked from commit a759d59)

vikinghawk added 2 commits February 12, 2021 22:28

binding failure retry should recover all queue bindings

b1782f0

only drop on 1st retry

6aea5b2

vikinghawk changed the title ~~Binding recovering retry should recover all bindings on the recovered queue~~ Binding recovery retry should recover all bindings on the recovered queue Feb 13, 2021

michaelklishin reviewed Feb 13, 2021

View reviewed changes

vikinghawk added 2 commits February 13, 2021 21:08

short circuit the queue binding recovery.

ae5972a

Removed e2e binding logic as it doesn't apply in this usecase

remove unused import

30447a7

acogoluegnes approved these changes Feb 15, 2021

View reviewed changes

acogoluegnes merged commit 4cd1e64 into rabbitmq:5.x.x-stable Feb 15, 2021

michaelklishin pushed a commit that referenced this pull request Feb 15, 2021

Merge pull request #667 from vikinghawk/5.x.x-stable

c8cd63d

Binding recovery retry should recover all bindings on the recovered queue (cherry picked from commit 4cd1e64)

acogoluegnes added a commit that referenced this pull request Feb 15, 2021

Polish test

a759d59

References #667

acogoluegnes added a commit that referenced this pull request Feb 15, 2021

Polish test

92a4257

References #667 (cherry picked from commit a759d59)

acogoluegnes added effort-low usability labels Feb 15, 2021

acogoluegnes added this to the 5.11.0 milestone Feb 15, 2021

sharyanrtr mentioned this pull request Jan 25, 2022

DEVELOPERS-116: Upgrade RabbitMQ Client From 5.9.0 To 5.14.1 RentTheRunway/conduit#66

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binding recovery retry should recover all bindings on the recovered queue #667

Binding recovery retry should recover all bindings on the recovered queue #667

vikinghawk commented Feb 13, 2021 •

edited

michaelklishin Feb 13, 2021

michaelklishin commented Feb 13, 2021

vikinghawk commented Feb 14, 2021

michaelklishin commented Feb 14, 2021

acogoluegnes commented Feb 15, 2021

michaelklishin commented Feb 15, 2021

Binding recovery retry should recover all bindings on the recovered queue #667

Binding recovery retry should recover all bindings on the recovered queue #667

Conversation

vikinghawk commented Feb 13, 2021 • edited

Proposed Changes

Types of Changes

Checklist

Further Comments

michaelklishin Feb 13, 2021

Choose a reason for hiding this comment

michaelklishin commented Feb 13, 2021

vikinghawk commented Feb 14, 2021

michaelklishin commented Feb 14, 2021

acogoluegnes commented Feb 15, 2021

michaelklishin commented Feb 15, 2021

vikinghawk commented Feb 13, 2021 •

edited