Add (optional) retry logic to topology recovery #387

acogoluegnes · 2018-08-03T15:20:07Z

Topology recovery could benefit from retry logic on failed operations for some specific cases, e.g. failed binding because auto-delete queues has been deleted during the topology recovery.

The retry logic would be:

optional
configurable to some extent: retry condition (which error, which recoverable entity), retry action, number of retries, backoff policy

There's no topology recovery retry by default. The default implementation is composable: not all have the recoverable entities have to retry and the retry operations don't have to be only the corresponding entity recovery, but also other operations, like recovering the corresponding channel. Fixes #387

Instead of 1 by default. References #387

There's no topology recovery retry by default. The default implementation is composable: not all have the recoverable entities have to retry and the retry operations don't have to be only the corresponding entity recovery, but also other operations, like recovering the corresponding channel. Fixes #387 (cherry picked from commit 34e33ea)

Instead of 1 by default. References #387 (cherry picked from commit 9176062)

There's no topology recovery retry by default. The default implementation is composable: not all have the recoverable entities have to retry and the retry operations don't have to be only the corresponding entity recovery, but also other operations, like recovering the corresponding channel. Fixes #387 (cherry picked from commit 34e33ea) Conflicts: src/main/java/com/rabbitmq/client/ConnectionFactory.java src/main/java/com/rabbitmq/client/impl/ConnectionParams.java src/main/java/com/rabbitmq/client/impl/recovery/AutorecoveringConnection.java src/test/java/com/rabbitmq/client/test/TestUtils.java src/test/java/com/rabbitmq/client/test/functional/TopologyRecoveryFiltering.java

References #387 (cherry picked from commit 9711406)

References #387

vikinghawk · 2018-08-10T20:28:27Z

I like the approach here and gave the latest milestone a try. Looks like the default of 2 retries was enough to fix my previous tests recovery errors.

Some feedback:

It would be nice to have visibility of the entity and exception in the logs when a failure has occurred and its going to retry
- Log either in DefaultRecoveryHandler or TopologyRecoveryRetryLogic predicates?
- Or maybe delegate to the connection's ExceptionHandler and let users decide what to log?
When the queue not found error happens during consumer recovery, it needs to recover all the bindings on the newly recovered queue as well.
Its not possible for users to write their own DefaultRetryHandler.RetryOperation implementations without using reflections to get at the package protected recoverBinding/recoverQueue/etc. methods on AutorecoveringConnection.
It would be nice to have a helper method (perhaps on TopologyRecoveryRetryLogic) that creates common scenarios such as the queue not found [1] for users.
- So all users had to do was something like connectionFactory.setTopologyRecoveryRetryHandler(TopologyRecoveryRetryLogic.retryOnQueueNotFound());
- This would make it easier and less error prone to consume
- It also makes it easier to keep code portable between the 4.x and 5.x releases as it would hide the compile time differences such as RetryCondition in 4.x vs the functional predicates used in 5.x. We have apps that need to run on Java 1.7 but others that run on 1.8 and use the micrometer metrics. So currently i compile my code once to 1.7 with the 4.x version of amqp-client but then use 5.x at runtime when running on java 8. The 4.x and 5.x versions have been compatible for everything I am using so far.

[1] https://github.com/rabbitmq/rabbitmq-java-client/blob/master/src/test/java/com/rabbitmq/client/test/functional/TopologyRecoveryRetry.java#L61

Add log in default retry handler, add operation to recover all the bindings of a queue (useful when the recovery of a consumer fails because isn't found), make AutorecoveringConnection#recoverConsumer and AutorecoveringConnection#recoverQueue public as they contain useful logic that some client code should be able to use, and declared a pre-configured retry handler for the deleted queue case. References #387

Add log in default retry handler, add operation to recover all the bindings of a queue (useful when the recovery of a consumer fails because isn't found), make AutorecoveringConnection#recoverConsumer and AutorecoveringConnection#recoverQueue public as they contain useful logic that some client code should be able to use, and declared a pre-configured retry handler for the deleted queue case. References #387 (cherry picked from commit 2b8d257)

Add log in default retry handler, add operation to recover all the bindings of a queue (useful when the recovery of a consumer fails because isn't found), make AutorecoveringConnection#recoverConsumer and AutorecoveringConnection#recoverQueue public as they contain useful logic that some client code should be able to use, and declared a pre-configured retry handler for the deleted queue case. References #387 (cherry picked from commit 2b8d257) Conflicts: src/main/java/com/rabbitmq/client/impl/recovery/DefaultRetryHandler.java src/main/java/com/rabbitmq/client/impl/recovery/TopologyRecoveryRetryLogic.java

acogoluegnes · 2018-08-13T09:23:31Z

@vikinghawk Thanks for the feedback. I pushed a commit to address your remarks. Snapshots for 5.x and 4.x are available.

I made only recoverConsumer and recoverQueue public, as they contain logic that is worth re-using. For the other entities, calling RecordedEntity#recover should be enough.

vikinghawk · 2018-08-13T15:42:15Z

+1, this looks good. I'll try to test again today and let you know.

acogoluegnes self-assigned this Aug 3, 2018

acogoluegnes added effort-low enhancement labels Aug 3, 2018

acogoluegnes added this to the 4.8.0 milestone Aug 3, 2018

acogoluegnes mentioned this issue Aug 7, 2018

Add optional retry logic to topology recovery #388

Merged

michaelklishin closed this as completed in #388 Aug 9, 2018

acogoluegnes added a commit that referenced this issue Aug 10, 2018

Retry twice in topology recovery retry

9176062

Instead of 1 by default. References #387

acogoluegnes added a commit that referenced this issue Aug 10, 2018

Retry twice in topology recovery retry

2a78454

Instead of 1 by default. References #387 (cherry picked from commit 9176062)

acogoluegnes added a commit that referenced this issue Aug 10, 2018

Make retry condition more tolerant with wildcards

f8b1552

References #387 (cherry picked from commit 9711406)

acogoluegnes added a commit that referenced this issue Aug 10, 2018

Remove unnecessary continue

c1d39d7

References #387

acogoluegnes added a commit that referenced this issue Aug 10, 2018

Make retry condition more tolerant with wildcards

9711406

References #387

acogoluegnes mentioned this issue Dec 2, 2019

Exception during recovery causes recovery failure rabbitmq/rabbitmq-dotnet-client#658

Closed

tporeba mentioned this issue Sep 13, 2022

using TopologyRecoveryRetryHandler to workaround the problem tporeba/rabbitAutorecoveryReproductor#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add (optional) retry logic to topology recovery #387

Add (optional) retry logic to topology recovery #387

acogoluegnes commented Aug 3, 2018

vikinghawk commented Aug 10, 2018 •

edited

Loading

acogoluegnes commented Aug 13, 2018

vikinghawk commented Aug 13, 2018

Add (optional) retry logic to topology recovery #387

Add (optional) retry logic to topology recovery #387

Comments

acogoluegnes commented Aug 3, 2018

vikinghawk commented Aug 10, 2018 • edited Loading

acogoluegnes commented Aug 13, 2018

vikinghawk commented Aug 13, 2018

vikinghawk commented Aug 10, 2018 •

edited

Loading