Skip to content

Implementation of the Exponential BackOff Retry algorithm that increases the backoff period for each retry attempt in case of error

License

Notifications You must be signed in to change notification settings

mulesoft-catalyst/exponential-backoff-retry-policy-mule-4

Repository files navigation

Exponential Backoff Retry Custom Policy

This is a implementation of the Exponential BackOff Retry algorithm that increases the backoff period for each retry attempt in case of app error. Applying this to your API or proxy you would be able to:

  • Define an initial interval whose first attempt should wait
  • Define a multiplier, used to increment the intervals between retries
  • Define a maximum interval, after which no increment on the interval will occur
  • Define a maximum elapsed time as a breaking point to stop retrying
  • Define expressions to validate the application's response. Expected responses can be configured in case no error occurs, or specific error expressions.
  • Perform a dynamic retry strategy, abstracting from the complexity of retrying at application level

Why?

The idea behind the Exponential Backoff Retry algorithm is to provide a greater degree of resilience in our services when performing a backoff that is not static in the event of an error event. In this way, under a persistent error scenario, we do not overload the impacted system/service, but we provide a graceful degradation mechanism increasing the waiting time between retries after each retry failure.

How?

The logic uses the values ​​injected during the configuration of the policy to calculate the following variables:

  • loopSize: this variable indicates the number of iterations to perform (used by the for-each). It is calculated at the beginning, and in the following way: ceil (ROUNDUP) of default_max_elapsed_time / 1000 / default_multiplier
  • waitTime: this value indicates the time to wait. It is updated as follows:
    • At the beginning it is empty, because you have to try at least once to find out if there is an error
    • If there is an error the first time, it takes the value of default_initial_interval
    • If there are successive errors, it takes the value of waitTime * default_multiplier

Other variables are calculated during the process but are not specific to the algorithm, but are for internal use for flow control. This policy does not have any dependency outside http-policy and mule-core components.

Usage

After publishing to Exchange, follow these steps to apply the policy to an existing managed API (or proxy):

  • Log into Anypoint Platform
  • Enter API Manager
  • Click on the API version for the application you want to apply the policy to
  • Click on Policies
  • Click on Apply New Policy
  • Filter by 'Custom' category and select 'Exponential Backoff Retry'. Click on 'Configure Policy' button
  • Give value to the policy's parameters:
Parameter Purpose
default_initial_interval Indicates the initial wait time (when the first error happens)
default_max_elapsed_time Indicates the maximum wait time. After this time, no retry will be performed. Similar to a timeout.
default_max_interval Indicates the upper bound of the wait interval. Once this value is reached, the next retry will not be multiplied by the default_multiplier
default_multiplier Indicates the increment value between repetitions. Between attempts, the calculation of the current wait time is given by previous interval value * multiplier

Development

The following commands are required during development phase

Task Command
Package policy mvn clean install
Publish to Exchange - Make sure to update the pom.xml file with your org ID - mvn deploy

Limitations

This policy makes use of the for-each module. Since there is no break condition to exit the loop, this policy iterates through all the objects in the array (please see "How?" Section). This means that it is convenient to apply this policy with small values ​​of intervals and elapsed time, to keep the loop fast to iterate (please see Benchmark section below). Of course, the current logic could be replaced with custom code (java module, script execution). It is up to the user to implement it. Just keep in mind that one of the main motivations for this policy is not to have too many dependencies.

Benchmark

The policy has been tested using a Mule application that returns error or ok at random, based on a number generated with the random() Dataweave function (https://docs.mulesoft.com/mule-runtime/4.3/dw-core-functions-random). If it is odd, it returns an error (MULE:EXPRESSION, "org.mule.weave.v2.runtime.core.exception.DivisionByZeroException: Division by zero"), otherwise the result is satisfactory. The purpose of this test was to emulate a transient error, i.e. network failure or server overload, to check effectiveness around error rate and overall performance. Please see the following results.

Application config

Deployment Model Runtime Version Worker Size Workers
CloudHub (US East - Ohio - ) 4.3.0 0.1 1

Test #0 - NO policy applied -

Test suite configuration
# of request # of threads # of total concurrent requests
100 2 200
Results
Avg Median Min Max Error Rate Throughput
398 ms 378 ms 349 ms 1410 ms %46.50 4.9/sec

Test #1

Policy configuration
default_initial_interval default_max_elapsed_time default_max_interval default_multiplier
2000 ms 10000 ms 10000 ms 1.5
Test suite configuration
# of request # of threads # of total concurrent requests
100 2 200
Results
Loop Size Avg Median Min Max Error Rate Throughput
7 7813 ms 3391 ms 369 ms 41822 ms %00.00 15.1/min

Test #2

Policy configuration
default_initial_interval default_max_elapsed_time default_max_interval default_multiplier
2000 ms 10000 ms 5000 ms 1.5
Test suite configuration
# of request # of threads # of total concurrent requests
100 2 200
Results
Loop Size Avg Median Min Max Error Rate Throughput
7 4584 ms 2416 ms 369 ms 39133 ms %00.00 25.4/min

Test #3

Policy configuration
default_initial_interval default_max_elapsed_time default_max_interval default_multiplier
2000 ms 10000 ms 3500 ms 1.5
Test suite configuration
# of request # of threads # of total concurrent requests
100 2 200
Results
Loop Size Avg Median Min Max Error Rate Throughput
7 2019 ms 1120 ms 367 ms 20428 ms %00.00 55.4/min

Test #4

Policy configuration
default_initial_interval default_max_elapsed_time default_max_interval default_multiplier
2000 ms 600000 ms 3500 ms 1.5
Test suite configuration
# of request # of threads # of total concurrent requests
100 2 200
Results
Loop Size Avg Median Min Max Error Rate Throughput
400 3579 ms 1572 ms 365 ms 37890 ms %00.00 32.6/min

Test #5

Policy configuration
default_initial_interval default_max_elapsed_time default_max_interval default_multiplier
2000 ms 600000 ms 120000 ms 1.5
Test suite configuration
# of request # of threads # of total concurrent requests
500 1 500
Results
Loop Size Avg Median Min Max Error Rate Throughput 95% percentile
400 2772 ms 1385 ms 361 ms 299453 ms %00.20 21.6/min 5615

Test #6

Policy configuration
default_initial_interval default_max_elapsed_time default_max_interval default_multiplier
2000 ms 600000 ms 60000 ms 1.5
Test suite configuration
# of request # of threads # of total concurrent requests
500 1 500
Results
Loop Size Avg Median Min Max Error Rate Throughput 95% percentile
400 1953 ms 479 ms 365 ms 49785 ms %00.00 30.7/min 5220

Contribution

Want to contribute? Great!

  • For public contributions - Just fork the repo, make your updates and open a pull request!
  • For internal contributions - Use a simplified feature workflow following these steps:
    • Clone the repo
    • Create a feature branch using the naming convention feature/name-of-the-feature
    • Once it's ready, push your changes
    • Open a pull request for a review

About

Implementation of the Exponential BackOff Retry algorithm that increases the backoff period for each retry attempt in case of error

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published