retry: new package #245

Open
wants to merge 1 commit into
from

Conversation

Projects
None yet
4 participants
Owner

rogpeppe commented Oct 6, 2016

The plan is that this should eventually replace utils.Attempt
and github.com/juju/retry.

It provides functionality for stopping and clock mocking and for pluggable
strategies.

It is (arguably) quite a bit simpler to use than github.com/juju/retry and
the usage remains basically compatible with utils.Attempt, so
only minor code changes will be needed to change the existing
code base to use it.

As discussed in IRC, we already have 4-5 ways of doing retries in Juju and we're meant to be consolidating to github.com/juju/retry.

We're going to put this on the tech review board agenda for discussion.

rogpeppe added a commit to rogpeppe/juju that referenced this pull request Oct 6, 2016

all: refactor juju/retry to new retry package
NOT FOR SUBMISSION

This demonstrates how the new retry package proposed in
juju/utils#245 can be used to
replace existing use of the juju/retry package.

I'll be missing the tech board meeting, hence I'm reviewing here.

+1 in principle, if the HasNext blocking issue can be resolved (which I think it can, based on our discussions)

This does need buy-in, because we don't want 3 different implementations. I'd like us to revisit why juju/retry was introduced in the first place: what issues were we working around in utils.Attempt?

retry/retry.go
+// stop while waiting, the attempt will be aborted.
+func Start(strategy Strategy, clk clock.Clock, stop <-chan struct{}) *Attempt {
+ if clk == nil {
+ clk = clock.WallClock
@axw

axw Oct 7, 2016

Member

I'm somewhere between -0 and -1 on a default here, because I think it'll discourage people from parameterising the clock in calling code. OTOH, a panic might not be ideal either, and an error result will be ugly.

@rogpeppe

rogpeppe Oct 7, 2016

Owner

I'd like this package to be useful for one-off programs too and requiring every such use
to have an explicit dependency on juju/utils/clock seems a bit wrong to me.
If people are using a mock clock, they'll pass it in here. If they're not, I don't think
typing "clock.WallClock" is going to change matters much.

retry/retry.go
+ return a.count
+}
+
+// HasNext waits until it is time to perform the next attempt
@axw

axw Oct 7, 2016

Member

I can't bring myself to think this is OK. It's not intuitive for HasNext to block, and I expect this will lead to code taking longer than necessary.

@rogpeppe

rogpeppe Oct 7, 2016

Owner

Fixed as discussed online, by renaming to More and changing the Next semantics somewhat.

+ "time"
+)
+
+type strategyFunc func(now time.Time) Timer
@axw

axw Oct 7, 2016

Member

I love the composable strategies.

I don't like the API on this package. it's very confusing to me, and not at all intuitive.

retry/exp.go
+}
+
+// Start is short for Start(r, clk, stop).
+func (r Exponential) Start(clk clock.Clock, stop <-chan struct{}) *Attempt {
@natefinch

natefinch Oct 7, 2016

Contributor

ew. If we can avoid the default case being (nil, nil), that would be really nice. Why not make the clock part of the struct? Surely no one is going to need to change out the clock when reusing an attempt strategy.

Then you could just have Start and StartWithCancel or something.

@rogpeppe

rogpeppe Oct 10, 2016

Owner

It's important that the clock isn't part of the struct because the strategies are intended to be static
and suitable for putting in global variables. To start an attempt, we bind the static strategy together
with the required runtime values, which allows us to implement the tactics implied by the strategy.

So I think that it's important to pass the clock in, because current guidance is to use a mockable clock
whereever possible. I don't think that passing a single argument is a huge burden.

Given that very little code currently uses cancellable attempts, we could indeed have
two methods, for each variant. That seems like a reasonable idea. I don't think it's worth
doing that here, because an Exponential is almost never worth using on its own without
composing with LimitTime or LimitCount because otherwise it'll go on forever.

So I've added a top level StartWithCancel function, taken the stop argument out of retry.Start, removed the stop argument to Regular.Start (you can always use retry.StartWithCancel) and removed Exponential.Start.

retry/regular.go
+type Regular struct {
+ Total time.Duration // total duration of attempt.
+ Delay time.Duration // interval between each try in the burst.
+ Min int // minimum number of retries; overrides Total
@natefinch

natefinch Oct 7, 2016

Contributor

I find it weird to have a minimum number of retries, and not a maximum. I get it, in combination with the time limit, but still, I'd also expect there to be a way to say "just try 3 times"... I guess min works for that, if Total is zero.... but that's really counterintuitive... and doesn't give you the way to say "run for a max of Total time or max retries, whichever comes first"

Also, "overrides Total" is not exactly correct Override, to me, means "totally ignore" ... but what actually happens is, you run for a min number of retries or for Total time, whichever comes later.

@rogpeppe

rogpeppe Oct 10, 2016

Owner

I find it weird to have a minimum number of retries, and not a maximum.

This is a reasonable point. FWIW the Regular API is exactly the old AttemptStrategy API already used in dozens of places in Juju, so that we can easily mechanically replace the existing uses with this package. The comments are the same too (see commit afe327a7f7 in the goamz.v2 repo).

But it occurs to me that in practice you're almost never going to want to use Min and Total together. A quick grep of the source confirms that they are almost never used together. There is however one case in gopkg.in/amz.v3/s3/s3.go:

    defaultAttempts = aws.AttemptStrategy{
        Min:   5,
        Total: 5 * time.Second,
        Delay: 200 * time.Millisecond,
    }

and it seems to me that this is actually a reasonable use case - we'll keep trying for 5 seconds, but if individual attempts are taking a very long time (for example because of network delays) then we will be sure to make at least 5 attempts.

and doesn't give you the way to say "run for a max of Total time or max retries, whichever comes first"

You can already do that with LimitCount. For example:

retry.LimitCount(maxRetries, retry.Regular{
    Total: totalTime,
    Delay: time.Second,
})

So on balance I'm somewhat inclined to leave Regular.Min as is, because the behaviour it chooses can't be obtained any other way. Have you got any specific suggestions for rephrasing the comments?

@natefinch

natefinch Oct 11, 2016

Contributor

the Regular API is exactly the old AttemptStrategy API

But this is a brand new package, so we should take the opportunity to make things less confusing.

You can already do that with LimitCount. For example:

But that's a completely different mechanism than the minimum, which is why it's confusing. It's not consistent.... so, for example, when someone looks at the fields of AttemptStrategy, they see minimum, but no maximum and think there's no way to do a maximum. And if there's retry.LimitCount, why not retry.MinCount?

@rogpeppe

rogpeppe Oct 12, 2016

Owner

But that's a completely different mechanism than the minimum, which is why it's confusing.

Yes, it's a different mechanism because you can't do the min count thing by strategy composition (what timing would the extra events get?). We could use MinCount too, but I honestly don't think that name is confusing as it's in the context of the Regular struct type and documented.

I've updated the field documentation for Regular to use a more conventional comment style
and to mention LimitCount.

retry/retry.go
+
+// Strategy is implemented by types that represent a retry strategy.
+type Strategy interface {
+ NewTimer(now time.Time) Timer
@natefinch

natefinch Oct 7, 2016

Contributor

This type needs a much bigger explanation of how someone would implement a Strategy. I have no idea what NewTimer is supposed to return.

@rogpeppe

rogpeppe Oct 10, 2016

Owner

In general, people won't need to implement strategies themselves - the existing strategy types are intended to be sufficient for almost all purposes. We could even unexport the NewTimer method, but I think that's going a bit far. I have toyed with the idea of using "Tactics" instead of "Timer" for the interface type returned by the Strategy method, but I think that's probably too cute and not that helpful.

I've adjusted the comments to make things a little clearer, hopefully.

retry/retry.go
+ // Note that NextSleep is called once after
+ // each iteration has completed, assuming the
+ // retry loop is continuing.
+ NextSleep(now time.Time) (time.Duration, bool)
@natefinch

natefinch Oct 7, 2016

Contributor

When would you ever call this with anything other than time.Now()? Isn't this why we have a fake clock implementation?

@rogpeppe

rogpeppe Oct 10, 2016

Owner

When would you ever call this with anything other than time.Now()?

You wouldn't. But I think it's better to pass in a more restricted value when possible. I don't want to pass the fake clock implementation to NewTimer, because that would imply that it could sleep. The Timer interface is intended to be a non-blocking one that always returns its result immediately - the Attempt type is the one responsible for waiting.

retry/retry.go
+}
+
+// More reports whether the next call to Next will sleep again to wait
+// for another attempt. It does not itself block.
@natefinch

natefinch Oct 7, 2016

Contributor

This comment is really confusing. The firs thing it says is that it'll report if Next will sleep... .but that's an implementation detail. What it reports is whether or not the attempt will retry. I think that also indicates the name is not very good. I don't know what More() means in the course of a an Attempt. If it was called WillRetry, it would be very clear.

@rogpeppe

rogpeppe Oct 10, 2016

Owner

Unfortunately it doesn't necessarily report whether Next will retry, because if the attempt has been explicitly stopped then Next may return false even if More has returned true. That difference is why I renamed the method from HasNext (the semantics are otherwise identical) after some discussion with axw.

I've rewritten the comment to try and make it clearer.

I don't like the API on this package. it's very confusing to me, and not at all intuitive.

Thank you for the useful feedback. Although you say that you don't like the API on this package, most of your comments seem to be about fairly cosmetic issues (one exception is the semantics of the Min field, but that's a pre-existing API).

I've made some changes in response to your comments, but I'd like to know your opinion of the API as whole, in particular with relation to that of github.com/juju/retry. Here's a PR has contrasts the two approaches by changing existing uses of juju/retry with this one: https://github.com/juju/juju/pull/6396/files

retry/exp.go
+}
+
+// Start is short for Start(r, clk, stop).
+func (r Exponential) Start(clk clock.Clock, stop <-chan struct{}) *Attempt {
@natefinch

natefinch Oct 7, 2016

Contributor

ew. If we can avoid the default case being (nil, nil), that would be really nice. Why not make the clock part of the struct? Surely no one is going to need to change out the clock when reusing an attempt strategy.

Then you could just have Start and StartWithCancel or something.

@rogpeppe

rogpeppe Oct 10, 2016

Owner

It's important that the clock isn't part of the struct because the strategies are intended to be static
and suitable for putting in global variables. To start an attempt, we bind the static strategy together
with the required runtime values, which allows us to implement the tactics implied by the strategy.

So I think that it's important to pass the clock in, because current guidance is to use a mockable clock
whereever possible. I don't think that passing a single argument is a huge burden.

Given that very little code currently uses cancellable attempts, we could indeed have
two methods, for each variant. That seems like a reasonable idea. I don't think it's worth
doing that here, because an Exponential is almost never worth using on its own without
composing with LimitTime or LimitCount because otherwise it'll go on forever.

So I've added a top level StartWithCancel function, taken the stop argument out of retry.Start, removed the stop argument to Regular.Start (you can always use retry.StartWithCancel) and removed Exponential.Start.

retry/regular.go
+type Regular struct {
+ Total time.Duration // total duration of attempt.
+ Delay time.Duration // interval between each try in the burst.
+ Min int // minimum number of retries; overrides Total
@natefinch

natefinch Oct 7, 2016

Contributor

I find it weird to have a minimum number of retries, and not a maximum. I get it, in combination with the time limit, but still, I'd also expect there to be a way to say "just try 3 times"... I guess min works for that, if Total is zero.... but that's really counterintuitive... and doesn't give you the way to say "run for a max of Total time or max retries, whichever comes first"

Also, "overrides Total" is not exactly correct Override, to me, means "totally ignore" ... but what actually happens is, you run for a min number of retries or for Total time, whichever comes later.

@rogpeppe

rogpeppe Oct 10, 2016

Owner

I find it weird to have a minimum number of retries, and not a maximum.

This is a reasonable point. FWIW the Regular API is exactly the old AttemptStrategy API already used in dozens of places in Juju, so that we can easily mechanically replace the existing uses with this package. The comments are the same too (see commit afe327a7f7 in the goamz.v2 repo).

But it occurs to me that in practice you're almost never going to want to use Min and Total together. A quick grep of the source confirms that they are almost never used together. There is however one case in gopkg.in/amz.v3/s3/s3.go:

    defaultAttempts = aws.AttemptStrategy{
        Min:   5,
        Total: 5 * time.Second,
        Delay: 200 * time.Millisecond,
    }

and it seems to me that this is actually a reasonable use case - we'll keep trying for 5 seconds, but if individual attempts are taking a very long time (for example because of network delays) then we will be sure to make at least 5 attempts.

and doesn't give you the way to say "run for a max of Total time or max retries, whichever comes first"

You can already do that with LimitCount. For example:

retry.LimitCount(maxRetries, retry.Regular{
    Total: totalTime,
    Delay: time.Second,
})

So on balance I'm somewhat inclined to leave Regular.Min as is, because the behaviour it chooses can't be obtained any other way. Have you got any specific suggestions for rephrasing the comments?

@natefinch

natefinch Oct 11, 2016

Contributor

the Regular API is exactly the old AttemptStrategy API

But this is a brand new package, so we should take the opportunity to make things less confusing.

You can already do that with LimitCount. For example:

But that's a completely different mechanism than the minimum, which is why it's confusing. It's not consistent.... so, for example, when someone looks at the fields of AttemptStrategy, they see minimum, but no maximum and think there's no way to do a maximum. And if there's retry.LimitCount, why not retry.MinCount?

@rogpeppe

rogpeppe Oct 12, 2016

Owner

But that's a completely different mechanism than the minimum, which is why it's confusing.

Yes, it's a different mechanism because you can't do the min count thing by strategy composition (what timing would the extra events get?). We could use MinCount too, but I honestly don't think that name is confusing as it's in the context of the Regular struct type and documented.

I've updated the field documentation for Regular to use a more conventional comment style
and to mention LimitCount.

retry/retry.go
+
+// Strategy is implemented by types that represent a retry strategy.
+type Strategy interface {
+ NewTimer(now time.Time) Timer
@natefinch

natefinch Oct 7, 2016

Contributor

This type needs a much bigger explanation of how someone would implement a Strategy. I have no idea what NewTimer is supposed to return.

@rogpeppe

rogpeppe Oct 10, 2016

Owner

In general, people won't need to implement strategies themselves - the existing strategy types are intended to be sufficient for almost all purposes. We could even unexport the NewTimer method, but I think that's going a bit far. I have toyed with the idea of using "Tactics" instead of "Timer" for the interface type returned by the Strategy method, but I think that's probably too cute and not that helpful.

I've adjusted the comments to make things a little clearer, hopefully.

retry/retry.go
+ // Note that NextSleep is called once after
+ // each iteration has completed, assuming the
+ // retry loop is continuing.
+ NextSleep(now time.Time) (time.Duration, bool)
@natefinch

natefinch Oct 7, 2016

Contributor

When would you ever call this with anything other than time.Now()? Isn't this why we have a fake clock implementation?

@rogpeppe

rogpeppe Oct 10, 2016

Owner

When would you ever call this with anything other than time.Now()?

You wouldn't. But I think it's better to pass in a more restricted value when possible. I don't want to pass the fake clock implementation to NewTimer, because that would imply that it could sleep. The Timer interface is intended to be a non-blocking one that always returns its result immediately - the Attempt type is the one responsible for waiting.

retry/retry.go
+}
+
+// More reports whether the next call to Next will sleep again to wait
+// for another attempt. It does not itself block.
@natefinch

natefinch Oct 7, 2016

Contributor

This comment is really confusing. The firs thing it says is that it'll report if Next will sleep... .but that's an implementation detail. What it reports is whether or not the attempt will retry. I think that also indicates the name is not very good. I don't know what More() means in the course of a an Attempt. If it was called WillRetry, it would be very clear.

@rogpeppe

rogpeppe Oct 10, 2016

Owner

Unfortunately it doesn't necessarily report whether Next will retry, because if the attempt has been explicitly stopped then Next may return false even if More has returned true. That difference is why I renamed the method from HasNext (the semantics are otherwise identical) after some discussion with axw.

I've rewritten the comment to try and make it clearer.

retry: new package
The plan is that this should eventually replace utils.Attempt
and github.com/juju/try.

It provides functionality for stopping and clock mocking and for pluggable
strategies.

It is (arguably) quite a bit simpler to use than github.com/juju/try and
the usage remains basically compatible with utils.Attempt, so
only minor code changes will be needed to change the existing
code base to use it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment