Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiting retries #1565

Closed
jattisha opened this issue Oct 13, 2021 · 6 comments · Fixed by #2067
Closed

Rate limiting retries #1565

jattisha opened this issue Oct 13, 2021 · 6 comments · Fixed by #2067
Milestone

Comments

@jattisha
Copy link
Contributor

Resilience4j version: 1.7.0

Java version: 11

Hello!

We are migrating to Resilience4J from an in-house resilience library built for our company. One limitation that we've found in R4J that we had in our old library was the ability to rate-limit retry calls (e.g. only retry a dependency at a rate no greater than 1 TPS) to prevent retry-storms for our dependency.

I tried doing this in Resilience4J by doing something like this:

RateLimiter rateLimiter; // Rate limiter configured for some TPS, let's say 1 TPS
RetryConfig retryConfig = RetryConfig.custom()
        .maxAttempts(2)
        .intervalFunction((numRetries) -> {
            if (rateLimiter.acquirePermission()) {
                return IntervalFunction.ofExponentialBackoff(5L, 2).apply(numRetries);
            } else {
                return 0L;
            }
        })
        .ignoreExceptions(InvalidInputException.class)
        .build();

and I honestly think this mostly worked, but I think it's out of convenience and not out of design, and I want to get confirmation that I'm not doing something illegal that would be broken later. This code heavily relies on this line of code here. The fact that the interval function returns 0 after hitting the rate limit means that we eventually completeExceptionally, which is what we want to do (stop retrying and just propagate the exception up at this point, since we've reached our limit). This does throw a bit of a wrinkle in some metrics that we built around this, but that's fine and we can assume that this is a possibility. Tl;dr: is this a kosher solution that would be endorsed by R4J?

@RobWin
Copy link
Member

RobWin commented Oct 15, 2021

Hi,
you want to have a different rate limit for retry attempts than for the first call, correct?

That's an interesting challenge.

Of course you can do anything you want in the interval function which calculates the wait duration between retry attempts. That means your solution is not how it was designed, but should work in general. But I agree that it might wrinkle your metrics.

I think I would create two RateLimiter instances and decorate the same endpoint twice and use retry/ratelimiter only in the fallback method.

@jattisha
Copy link
Contributor Author

That's definitely another option as well! This method is definitely more convenient for us, but do you think if we went with this route that it's reasonable to return 0 from an interval function and force the retry to complete exceptionally? I am a bit worried that 0 is treated more like an "undefined" behavior that may change later on, although it definitely seems reasonable to return 0 and just say "don't retry!".

@RobWin
Copy link
Member

RobWin commented Oct 25, 2021

Actually 0 means that there should be no wait duration between retry attempts.
It should not force a retry attempts to complete exceptionally. Unfortunately the code seems to be inconsistent :(

In Reactor RetryOperator it is as follows:

long waitDurationMillis = retryContext.onError(throwable);

if (waitDurationMillis == -1) {
    return Mono.error(throwable);
}

In RxJava RetryTransformer:

long waitDurationMillis = retryContext.onError(throwable);

if (waitDurationMillis == -1) {
    return Observable.error(throwable);
}

or see RetryImpl

 private long handleThrowable(Throwable throwable) {
    if (!exceptionPredicate.test(throwable)) {
        failedWithoutRetryCounter.increment();
        publishRetryEvent(() -> new RetryOnIgnoredErrorEvent(getName(), throwable));
        return -1;
    }
    return handleOnError(throwable);
}

I think you've discovered a bug.
-1 should signal to abort retry attempts.

It's the first time that someone wants to use the intervalFunction to abort retries.

@RobWin RobWin added the bug label Oct 25, 2021
@jattisha
Copy link
Contributor Author

jattisha commented Oct 25, 2021

I think it's a pretty good way to abort retries, at least for our use case. I think you're right that it's actually a bug. From my testing as well, I think if you return a non-positive value for the interval function, it completes exceptionally but never ends up emitting the RetryOnErrorEvent, the only events that get emitted are the RetryOnRetryEvents up until we get a non-positive value, and I believe there's also a RetryOnRetryEvent emitted with an interval of whatever the wait duration is. In that case, I believe it should not emit a RetryOnRetryEvent since there was no retry that occurred (we aborted) and also emit a RetryOnErrorEvent when we do finally complete exceptionally.

@RobWin
Copy link
Member

RobWin commented Oct 25, 2021

Would you like to create a Bugfix?

@jattisha
Copy link
Contributor Author

jattisha commented Dec 4, 2023

Cut a pull request for this issue!

#2067

RobWin pushed a commit that referenced this issue Dec 6, 2023
…han 0. Emit correct events based on result of IF. (#2067)
@RobWin RobWin added this to the 2.1.1 milestone Dec 6, 2023
@RobWin RobWin modified the milestones: 2.1.1, 2.2.0 Dec 17, 2023
RobWin added a commit that referenced this issue Jan 5, 2024
* Issue #201 duplication removed from PredicateCreator clients (#1921)

* Move toString() out of the builder class (#1913)

Co-authored-by: Mike Dias <mdias@atlassian.com>

* docs: remove deprecated http client (#1930)

Co-authored-by: hongbin-deali <hongbin.kim@deali.net>

* configuration for native image added (#1883)

Co-authored-by: Andreas Mautsch <amautsch>

* Bump actions/checkout from 3.2.0 to 3.4.0 (#1911)

Bumps [actions/checkout](https://github.com/actions/checkout) from 3.2.0 to 3.4.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v3.2.0...v3.4.0)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/cache from 3.0.11 to 3.3.1 (#1907)

Bumps [actions/cache](https://github.com/actions/cache) from 3.0.11 to 3.3.1.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](actions/cache@v3.0.11...v3.3.1)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Issue #1875: add copyright information to jar (#1876)

* add copyright information to jar

* add license file to jar

---------

Co-authored-by: TWI <audi-connect@msg.group>

* Add missing validation of config fields (#1931)

* Add missing validation of config fields

Document existing assumptions

* Fix range in exception message

* Mentioned removed deprecated properties in v2.0.0 changelog (#1954)

Co-authored-by: Łukasz Nowak <lukasz.nowak@idemia.com>

* Adding unchecked() method to CheckedConsumer (#1981)

Co-authored-by: Muhammad Sohail <muhammad.sohail@autodesk.com>

* FallbackMethod supports AOP (#1965)

* Add AOP in FallbackMethod

* Add test

---------

Co-authored-by: 임수현 <soohyun.lim2@cj.net>

* Prepare release 2.1.0

* Prepare release 2.1.0

* Updated version to 2.2.0-SNAPSHOT

* Issue #1761: Async retry doesn't emit event bugfix (#1986)

Co-authored-by: Hrbacek, David <david.hrbacek@firma.seznam.cz>

* Issue #1600: Metric for total number of invoked calls from retry (#1895)

* Issue #1600: Metric for total number of invoked calls from retry

* Issue #1600: Metric for total number of invoked calls from retry. Update

* Issue #1600: Metric for total number of invoked calls from retry. Update

* fix Feign fallback from lambda (#1999)

* fix Feign fallback from lambda

* Provide test cases for pr#1999

* Micrometer Timer decorator (#1989)

* Timer reactive support (#2009)

* Timer spring support (#2020)

* Discussion #1962: Added apache commons configuration based registries bootstraping (#1991)

* Removing stale retry configurations from configuration map #2037 (#2039)

* Bump org.jetbrains.kotlin.jvm from 1.7.22 to 1.9.0 (#1990)

Bumps [org.jetbrains.kotlin.jvm](https://github.com/JetBrains/kotlin) from 1.7.22 to 1.9.0.
- [Release notes](https://github.com/JetBrains/kotlin/releases)
- [Changelog](https://github.com/JetBrains/kotlin/blob/master/ChangeLog.md)
- [Commits](JetBrains/kotlin@v1.7.22...v1.9.0)

---
updated-dependencies:
- dependency-name: org.jetbrains.kotlin.jvm
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fixed Issue #2040: Micronaut BulkheadInterceptor always uses the "default" configuration in micronaut application

* Support Micronaut 4 (#1951)

* FallbackMethod supports AOP (#2058)

* Add AOP in FallbackMethod

* Add test

* Fixed the fallback method bug

---------

Co-authored-by: 임수현 <soohyun.lim2@cj.net>

* Support class name using SpEL expression at Circuitbreaker annotation name field. (#2053)

* Issue #1565: Do not retry if IntervalFunction returns interval less than 0. Emit correct events based on result of IF. (#2067)

* Fixed time-based tests

* Updated version to 2.2.0

* Do not interrupt future if running (#2072)

* Avoid to add  duplicate consumer (#2074)

* Updated version to 2.3.0-SNAPSHOT

* Improved EventProcessorTest

* Fix bug in Retry AsyncContext's onResult consumeResultBaeforeRetryAttempt (#2035)

* Adaptive bulkhead 2023.1.4 (#1873)

* Super early draft of adaptive bulkhead.
!!! NOT TESTED !!!

* Small proof of concept test.

* Code cleanup. Builder for BulkheadAdaptationConfig

* Small code clean up. Metrics implementation. Additional TODOs

* Additional TODOs

* Small renaming

* first round of adaptive bulkhead updates

* first round of adaptive bulkhead updates

* first round of adaptive bulkhead updates

* second round of adaptive bulkhead updates

* Sonar fixes

* adaptive bulkhead refactoring to match last discussion

* adaptive bulkhead refactoring to match last discussion

* adaptive bulkhead refactoring to match last discussion

* adaptive bulkhead refactoring to match last discussion

* adaptive bulkhead refactoring to match last discussion

* adaptive bulkhead refactoring to match last discussion

* Sonar fixes

* reduce code duplication

* add exception predicate unit testing

* first round of updates after introducing sliding window with AIMD limiter

* Sonar fixes

* Sonar fixes

* first round of updates for making adaptive policy independent from bulkhead

* first round of applying the review comments

* first round of applying the review comments

* second round of applying the review comments

* Third round of applying the review comments

* fix javadoc

* sonar fixes

* review comments round 4

* code cleanup

* review comments round 5

* review comments round 5

* review comments round 5

* #review comments

* Sonar fixes

* increase test coverage

* first round of review comments apply

* first round of review comments apply

* fix compile issue

* tune the adaptive bulkhead test

* New Draft

* Adaptive Bulkhead Draft

* Adaptive Bulkhead 2 Draft (#1306)

* Remove AIMD leftovers (#1397)

* Issue#651: Support to exclude MetricsAutoConfiguration

* Propagate clock into CircuitBreakerMetrics to allow mocked time-based tests. (#844)

* Issue #596: The Spring Boot fallback method is not invoked when a BulkheadFullException occurs #847

The ThreadPoolBulkhead is does not return a CompletionStage when a task could not be submitted, because the Bulkhead is full. The ThreadPoolBulkhead throws a BulkheadFullException instead. Analog to the ThreadPoolExecutor which throws the RejectedExecutionException. The BulkheadFullException is not handled correctly by the BulkheadAspect.
The BulkheadAspect should convert the BulkheadFullException into a exceptionally completed future so that the FallbackDecorator works as expected.

* Added timelimiter support for resilience4j-ratpack (#865)

* Sonar critical issues solved: (#876)

* Remove usage of generic wildcard type.
* Define a constant instead of duplicating this literal
* Refactor this method to reduce its Cognitive Complexity
* Add a nested comment explaining why this method is empty
* Class names should not shadow interfaces or superclasses
Warnings removed.

* Updated Gradle, Migrated from Bintray to Sonatype, migrated to GitHub Actions

Co-authored-by: Robert Winkler <rwinkler@telekom.de>

* Fixed sonar smells

* Small proof of concept test.

* first round of adaptive bulkhead updates

* second round of adaptive bulkhead updates

* Sonar fixes

* first round of applying the review comments

* Sonar fixes

* first round of review comments apply

* tune the adaptive bulkhead test

* New Draft

* Adaptive Bulkhead 2 Draft (#1306)

* Remove AIMD leftovers (#1397)

* updated to master

---------

Co-authored-by: bstorozhuk <storozhuk.b.m@gmail.com>
Co-authored-by: Mahmoud Romeh <mahmoud.romeh@collibra.com>
Co-authored-by: Robert Winkler <rwinkler@telekom.de>
Co-authored-by: KrnSaurabh <39181662+KrnSaurabh@users.noreply.github.com>
Co-authored-by: Robert Winkler <robwin@t-online.de>
Co-authored-by: Dan Maas <daniel.maas@target.com>

* Issue #201 bulkhead events corrected

* Issue #201 BulkheadOnLimitIncreasedEvent, BulkheadOnLimitDecreasedEvent merged into one event

* Issue #201 adaptive bulkhead config and time fixed

* Adaptive bulkhead 4.1 (#1947)

* Issue #201 EventPublisher hierarchy fixed

* Issue #201 onResult added to AdaptiveBulkhead

* Issue #201 states extracted to own files

* Issue #201 AdaptiveBulkhead states simplified (#1995)

* Issue #201 IncreaseInterval for SlowStartState added (#2001)

* Issue #201 timeUnit replaced by currentTimestamp (#2007)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Mike Dias <mike.rodrigues.dias@gmail.com>
Co-authored-by: Mike Dias <mdias@atlassian.com>
Co-authored-by: Hongbin Kim <fusis1@naver.com>
Co-authored-by: hongbin-deali <hongbin.kim@deali.net>
Co-authored-by: Andreas Mautsch <amautsch@gmx.de>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: tobi5775 <50146675+tobi5775@users.noreply.github.com>
Co-authored-by: TWI <audi-connect@msg.group>
Co-authored-by: Karol Nowak <nowkarol+github@gmail.com>
Co-authored-by: Łukasz Nowak <38426907+nluk@users.noreply.github.com>
Co-authored-by: Łukasz Nowak <lukasz.nowak@idemia.com>
Co-authored-by: Muhammad Sohail <mhsohail56@gmail.com>
Co-authored-by: Muhammad Sohail <muhammad.sohail@autodesk.com>
Co-authored-by: SOOHYUN-LIM <sh.lim7682@gmail.com>
Co-authored-by: 임수현 <soohyun.lim2@cj.net>
Co-authored-by: Robert Winkler <rwinkler@telekom.de>
Co-authored-by: David Hrbacek <35535320+b923@users.noreply.github.com>
Co-authored-by: Hrbacek, David <david.hrbacek@firma.seznam.cz>
Co-authored-by: Oleksandr L <laviua@users.noreply.github.com>
Co-authored-by: Kerwin Bryant <kerwin612@qq.com>
Co-authored-by: Mariusz Kopylec <mariusz.kopylec@o2.pl>
Co-authored-by: Mariusz Kopylec <mariusz.kopylec@allegro.pl>
Co-authored-by: Deepak Kumar <deep.rnj@gmail.com>
Co-authored-by: tanuja5 <1997.tanuja@gmail.com>
Co-authored-by: Graeme Rocher <graeme.rocher@gmail.com>
Co-authored-by: seokgoon28 <148044191+seokgoon28@users.noreply.github.com>
Co-authored-by: jattisha <jovanattisha@yahoo.com>
Co-authored-by: Hartigan <hartigans@live.com>
Co-authored-by: shijun_deng <dengshijun1992@qq.com>
Co-authored-by: mbio <qnwlqnwlxm@naver.com>
Co-authored-by: bstorozhuk <storozhuk.b.m@gmail.com>
Co-authored-by: Mahmoud Romeh <mahmoud.romeh@collibra.com>
Co-authored-by: KrnSaurabh <39181662+KrnSaurabh@users.noreply.github.com>
Co-authored-by: Robert Winkler <robwin@t-online.de>
Co-authored-by: Dan Maas <daniel.maas@target.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants