Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic delay #111

Merged
merged 8 commits into from Apr 5, 2018
Merged

Dynamic delay #111

merged 8 commits into from Apr 5, 2018

Conversation

Tembrel
Copy link
Contributor

@Tembrel Tembrel commented Sep 30, 2017

This PR is in response to #110. It adds a delay function property to RetryPolicy that is used, if set, to compute the next delay from the previous result or exception.

There are some awkward bits here:

  • I used net.jodah.failsafe.util.Duration in signature of the delay function, though
    I really wanted to use java.time.Duration.
  • Combining delay factor other than 1 with delay function would be meaningless, but
    I've done nothing to make them mutually exclusive.
  • The one included test is pretty crude, passing if the actual delay is within a window of the
    requested delay.

Wrap checked exceptions as unchecked in applying
delay function.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 84.035% when pulling 4f3b60d on Tembrel:dynamic-delay into 031f362 on jhalterman:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 84.298% when pulling 05180e1 on Tembrel:dynamic-delay into 031f362 on jhalterman:master.

if (dynamicDelay != null && dynamicDelay.toNanos() >= 0)
delayNanos = dynamicDelay.toNanos();
} catch (Exception ex) {
if (ex instanceof RuntimeException)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a separate catch clause to get rid of the instanceof check:

} catch (RuntimeException e) {
    throw e;
} catch (Exception e) {
    throw new RuntimeExeption("..", e);
}

* Returns the function that determines the next delay given
* a failed attempt with the given {@link Throwable}.
*/
public CheckedBiFunction<Object, Throwable, Duration> getDelayFunction() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered to give this function its own interface? E.g.

@FunctionalInterface
public interface DelayFunction {
    @Nullable
    Duration calculateDelay(@Nullable Object result, @Nullable Throwable exception);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here that DelayFunction is more friendly for Java 6/7 users who can't use lambdas than writing CheckedBiFunction<Object, Throwable, Duration>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good reason, but also: I can't see a compelling need for a delay function to be able to throw a checked exception. I only initially used CheckedBiFunction because there was no BiFunction in net.jodah.failsafe.function. When @whiskeysierra pointed out that rolling a separate @FunctionalInterface would give us an unchecked signature, along with the benefits of a more specific method name and a place to put documentation, I embraced that right away.

@Tembrel
Copy link
Contributor Author

Tembrel commented Oct 1, 2017

All good comments. Waiting to hear from others whether this PR is really worth it. If so, will integrate these suggestions.

Removed test of throwing checked exception from delay function, since
delay functions no longer throw checked exceptions.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 84.317% when pulling 5fc8b8a on Tembrel:dynamic-delay into 031f362 on jhalterman:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 84.317% when pulling 5377904 on Tembrel:dynamic-delay into 031f362 on jhalterman:master.

@Tembrel
Copy link
Contributor Author

Tembrel commented Oct 3, 2017

@whiskeysierra - all your suggestions incorporated (although the second one obviated the need for the first).

@duergner
Copy link

duergner commented Oct 4, 2017

Would love to have that PR merged. I see the most obvious use case for doing "correct" retry based on rate limiting response headers.

@whiskeysierra
Copy link

whiskeysierra commented Oct 12, 2017 via email

@jhalterman
Copy link
Member

jhalterman commented Oct 12, 2017

This is good stuff. I like the use case and the simple solution.

I used net.jodah.failsafe.util.Duration in signature of the delay function, though
I really wanted to use java.time.Duration

I know :) One of these days we can sever all pre-Java 8 compatibility (at which point we can replace some of the anonymous classes with lambdas), but we have to stay friendly for our pre-Java 8 users.

A few other comments:

  • Can (/should) we just call it RetryPolicy.withDelay? I'm a fan of short names and the parameters indicate that the user must compute the delay here.
  • Do we want to be strict and throw IllegalStateException if a delayFunction is already configured and a user calls withDelay or withBackoff, and visa versa? Otherwise we should Javadoc which is used if a delayFunction is configured along with a delay value.
  • How should we handle a computeDelay result of <= 0?
  • Since we're handing execution over to users, how should we handle potential exceptions in computeDelay (that happen for whatever reason)?
  • Is there any use case where a DelayFunction might also need to accept an ExecutionContext (to base a delay on the number of executions so far, for example)?
  • Right now, Failsafe is used in Java 6/7 applications. The way this works is that 6/7 users simply shouldn't use any of the APIs that would attempt to load a Java 8 class such as the .future stuff, otherwise everything else just works. I am wondering if the inclusion of the @FunctionalInterface annotation would cause any problems for 6/7 users.

@whiskeysierra
Copy link

whiskeysierra commented Oct 12, 2017 via email

Duration dynamicDelay = delayFunction.calculateDelay(result, failure);
if (dynamicDelay != null && dynamicDelay.toNanos() >= 0)
delayNanos = dynamicDelay.toNanos();
}
if (delayNanos == -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow the backoff and delayFunction features to be combined? I'm thinking no - we just let the delayFunction do all of the work, in which case I think this could be an else if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@jhalterman
Copy link
Member

jhalterman commented Oct 12, 2017

@whiskeysierra I just left a comment wondering about that. Are you thinking if DelayFunction had a distinguished return value such as <= 0 we'd fallback to the configured delay value (which could include a backoff)?

It might be awkward for a delayFunction's result to be altered by a backoff adjustment - the two could wind up battling each other (backoff adjustment lowers the waitTime, delayFunction might increase it again) which has me wondering - is there a use case for delay + delayFunction?

@whiskeysierra
Copy link

@jhalterman My primary use case is the Retry-After header. We may get it as part of a response in some case and we may not get it for the rest. That leaves me with the problem how I'm calculating a delay in the second case. If I could easily fallback to exponential backoffs (or fixed delays) as configured and provided by failsafe directly, then I wouldn't have to worry about that case. I wouldn't alter the result, but define a special one (as you suggested, e.g. -1) to signal a fallback.

@Tembrel
Copy link
Contributor Author

Tembrel commented Oct 12, 2017

@whiskeysierra @jhalterman - If I'm following correctly, it sounds like everyone is in favor of having dynamicDelay < 0 mean "fall through to regular delay/backoff behavior", i.e., as if no delay function had been provided. And it's hard to imagine a use for a calculated delay with the delay factor applied.

@whiskeysierra
Copy link

And it's hard to imagine a use for a calculated delay with the delay factor applied.

I agree. If the delay function calculates something, I believe it should be taken as is.

@Tembrel
Copy link
Contributor Author

Tembrel commented Oct 12, 2017

Can (/should) we just call it RetryPolicy.withDelay? I'm a fan of short names and the parameters indicate that the user must compute the delay here.

Funny, I was just talking of Josh Bloch (of Effective Java fame) yesterday, and he reminded me not to
be too clever with overloading. Short names are great, and maybe there's no danger in this particular
overload, but if the longer name would avoid confusion for even one user, it's worth the extra 8 chars.

Do we want to be strict and throw IllegalStateException if a delayFunction is already configured and a user calls withDelay or withBackoff, and visa versa? Otherwise we should Javadoc which is used if a delayFunction is configured along with a delay value.

I like your suggestion of using negative delay function return values as a signal to revert to regular delay/backoff.

How should we handle a computeDelay result of <= 0?

(You mean < 0, right? It's reasonable to return 0 as a delay.)

Answered by previous: Let negative computed delays mean "revert to regular delay/backoff".

To implement this, we would need a separate variable to hold the final delay value to use for this retry, so we don't trash the delayNanos field with earlier computeDelay results; we need that so that if/when we revert to delay/backoff logic we're not interfering with the sequence of backoff values.

Since we're handing execution over to users, how should we handle potential exceptions in computeDelay (that happen for whatever reason)?

Elsewhere I said that I can't find a compelling reason for computeDelay to throw checked exceptions,
so I think we're just talking about unchecked exceptions. Furthermore, since the programmer is free to handle any expected exceptions in the implementation of computeDelay, we're really only talking about unexpected exceptions/errors.

I think the right behavior is to let unexpected exceptions break the whole Failsafe computation by
propagating unhandled out of complete(obj, ex, checkArgs).

For example, if my attempt to read the Retry-After header throws an OutOfMemoryException, I wouldn't want the fact that memory is exhausted to get swept under the rug. It's not part of the computation I'm running under Failsafe, it's part of the surrounding machinery. When something unexpected happens in that machinery, I want to hear about it fast.

Is there any use case where a DelayFunction might also need to accept an ExecutionContext (to base a delay on the number of executions so far, for example)?

Oh, yes, that does make sense. Another argument to computeDelay?

(If so, that's more grist for the custom functional interface mill.)

Right now, Failsafe is used in Java 6/7 applications. The way this works is that 6/7 users simply shouldn't use any of the APIs that would attempt to load a Java 8 class such as the .future stuff, otherwise everything else just works. I am wondering if the inclusion of the @FunctionalInterface annotation would cause any problems for 6/7 users.

I would have thought it wouldn't be a problem, but I notice that @FunctionalInterface has RUNTIME retention.

I'll find out.

@whiskeysierra
Copy link

whiskeysierra commented Oct 12, 2017

I would have thought it wouldn't be a problem, but I notice that @FunctionalInterface has RUNTIME retention.

I believe that the JVM will just silently ignore annotations that are not on the classpath. We should be fine here.

@Tembrel
Copy link
Contributor Author

Tembrel commented Oct 12, 2017

@whiskeysierra - I think you're right. Here's some supporting evidence: https://stackoverflow.com/a/3567969

@jhalterman
Copy link
Member

jhalterman commented Oct 12, 2017

Just realized, DelayFunction's result needs to use a type parameter in order to avoid a cast when using in a lambda:

withDelayFunction((HttpRequest req, Throwable failure) -> ...)

Short names are great, and maybe there's no danger in this particular
overload, but if the longer name would avoid confusion for even one user, it's worth the extra 8 chars.

Yea, this one is borderline, but we already overload other parts of the API with functional things (retryOn, abortOn), so I'd prefer the overload here and just call it withDelay.

And it's hard to imagine a use for a calculated delay with the delay factor applied.

Cool - let's document < 0 as a distinguished return value that falls back to the withDelay configured value, else no delay. Since backoffs may not play nice with a DelayFunction, let's disallow them by throwing IllegalStateException in .withDelay or .withBackoff if the other is already configured. Backoff adjustments should not be made to DelayFunction provided values.

I think the right behavior is to let unexpected exceptions break the whole Failsafe computation by propagating unhandled out of complete(obj, ex, checkArgs).
For example, if my attempt to read the Retry-After header throws an OutOfMemoryException, I wouldn't want the fact that memory is exhausted to get swept under the rug. It's not part of the computation I'm running under Failsafe, it's part of the surrounding machinery. When something unexpected happens in that machinery, I want to hear about it fast.

Good point re: fail fast. We could still fail on errors such as OutOfMemoryError, just ignore exceptions. My argument for ignoring is that I wouldn't want something minor like a retry policy delay computation to ruin a Failsafe execution. Counter-argument would be as you say, what if it's not minor :) Not sure...

Either way, I think we can add a throws Exception to the DelayFunction's method clause just to be more friendly to code that may throw a checked exception.

I think the right behavior is to let unexpected exceptions break the whole Failsafe computation by propagating unhandled

If we do propagate instead of ignore, we'll want to test that the propagation works for async executions as well.

Another argument to computeDelay

Yea - we should include ExecutionContext because the attempt count could for people to compute their own backoff. We do actually have a type for this, ContextualResultListener, but probably makes sense to define a new one for computing a delay.

@Tembrel
Copy link
Contributor Author

Tembrel commented Oct 15, 2017

Either way, I think we can add a throws Exception to the DelayFunction's method clause just to be more friendly to code that may throw a checked exception.

The whole point of checked exceptions is to make it hard for the programmer to forget about handling a predictable exceptional condition. If we added throws Exception to computeDelay we'd be asking for trouble. Better to force them to think about the checked exceptions thrown by the code they use to implement computeDelay. Anyone who really wants to avoid thinking can catch Exception and wrap as unchecked, but at least then the code will have an obvious bad smell.

@Tembrel
Copy link
Contributor Author

Tembrel commented Oct 29, 2017

On the subject of overloading withDelay: We have to provide an accessor that returns the currently configured DelayFunction. We can't overload the existing getDelay, so this accessor has to be called getDelayFunction. But if we have withDelay(long, TimeUnit) and withDelay(DelayFunction) that makes for an awkward non-parallel with getDelay and getDelayFunction. (It's already a little strange that withDelay takes a long and a TimeUnit, but getDelay returns a Duration, but that one is understandable.)

@Tembrel
Copy link
Contributor Author

Tembrel commented Oct 30, 2017

@jhalterman wrote:

Just realized, DelayFunction's result needs to use a type parameter in order to avoid a cast when using in a lambda:

withDelayFunction((HttpRequest req, Throwable failure) -> ...)

It's not so easy to do this, because the type of the delay function field on the RetryPolicy has to be something like DelayFunction<?>. RetryPolicies don't know what contexts the delay function will be called in.

That means that AbstractExecution.complete(Object result, Throwable failure, boolean checkArgs) has no way of knowing whether it can legitimately (i.e., in a type-safe way) call its retry policy's delay function on the result argument.

The best we can do is provide a <T>-parameterized overloading of withDelayFunction (or withDelay -- still waiting for clarity on this) that takes an additional Class<T> result type argument along with a DelayFunction<T> argument and stores it in a field of the RetryPolicy. Then we can check in AbstractExecution.complete whether the provided result is either null or an instance of the provided result type.

Similarly with the failure type: We'd need another overloading to include a restriction on what kinds of failures the delay function is interested in.

I'm going to check in modifications to include these overloadings, for now without changing the method names from withDelayFunction to withDelay. I've also included the logic that makes backoff delays and dynamic delays mutually exclusive, and provided tests for all of this.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 84.769% when pulling a4d519d on Tembrel:dynamic-delay into 031f362 on jhalterman:master.

add overload with delay function and failure type.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 84.796% when pulling 1833068 on Tembrel:dynamic-delay into 031f362 on jhalterman:master.

@Tembrel
Copy link
Contributor Author

Tembrel commented Oct 30, 2017

I did the method renaming that @jhalterman asked for, realizing that the get... methods are already not parallell with the with... methods.

I don't think there's a need for variants that just take a result type or a failure type, since the typical use will have to mention the expected types explicitly anyway in the lambda, e.g.,

RetryPolicy retryPolicy = new RetryPolicy()
    .withDelay((SpecificResult result, SpecificException failure, ExecutionContext context) -> ...,
        SpecificResult.class, SpecificException.class)
    ...

So I removed those, leaving just

RetryPolicy withDelay(DelayFunction<?, ? extends Throwable> delayFunction)
// and
<R, F extends Throwable> RetryPolicy withDelay(
    DelayFunction<R, F> delayFunction, Class<R> resultType, Class<F> failureType)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 84.769% when pulling 57073c2 on Tembrel:dynamic-delay into 031f362 on jhalterman:master.

* {@code failureType} is null
* @throws IllegalStateException if backoff delays have already been set
*/
public <R, F extends Throwable> RetryPolicy withDelay(DelayFunction<R, F> delayFunction,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this PR after too-long (apologies) and I like it overall except for this method signature, which seems a bit too prescriptive for certain cases and awkward for others, ex:

rp.withDelay(delayFn, Object.class, TooManyRequestsException.class);

Foremost, I'm wondering about the use case for this, where we'd care about the result and exception types together. Most likely we only care about one or the other. So with that in mind, along with consistency with the rest of the API, how about:

rp.withDelayOn(delayFunction, TooManyRequestsException.class); // delay on specific failure

// or

rp.withDelayWhen(delayFunction, 500); // delay on specific result

...where the use of the "On" and "When" wording indicates that the delay is conditional. Of course all of these are just convenience methods since withDelay(DelayFunction) can always be used by itself.

Thoughts?

As part of this, we also may consider whether to expose the delay result and failure type or if we should just add a canApplyDelayFunction to RetryPolicy which, similar to canRetry and canAbort would evaluate the conditionals internally for some result/failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you went ahead and merged. I haven't had a chance to respond to your last comment, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I wasn't sure if you were tied up (which I can sympathize with) so I decided to just merge the PR as is and make the tweaks I had in mind above afterwards. Feel free to share your thoughts on that whenever you can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All seems reasonable, and I see you've already started the renaming in #128.

+1 for canApplyDelayFn, if it's straightforward to implement.

Copy link
Member

@jhalterman jhalterman Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. #128 was just me testing something against Travis, but the commits for this are already in master. I also added the ability to fallback to static or backoff delays if the DelayFunction returns a negative value. Supporting both seemed more consistent.

@jhalterman jhalterman merged commit e5d274d into failsafe-lib:master Apr 5, 2018
@Tembrel Tembrel deleted the dynamic-delay branch April 6, 2018 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants