Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ChallengeProviderTimeout type to acme package #148

Merged
merged 1 commit into from
Mar 17, 2016
Merged

Add ChallengeProviderTimeout type to acme package #148

merged 1 commit into from
Mar 17, 2016

Conversation

xi2
Copy link
Contributor

@xi2 xi2 commented Mar 8, 2016

There was what I perceived as a problem with my Gandi ChallengeProvider in #133. Because of Gandi's long DNS record propagation time (in my tests, up to 25 minutes), I had to alter the timeout the acme package used when checking for record propagation. However, this affected all the DNS providers, the remainder of which just require the existing 30 second timeout. Forcing them all to have a 40 minute timeout for the sake of Gandi seemed... wrong?

This PR adds a new ChallengeProviderTimeout type to the acme package which allows for implementing DNS providers that require an unsually long timeout when checking for record propagation. In fact, although it would only be necessary for DNS providers like Gandi at present, it need not be solely for DNS providers. There could be in the future some yet not-invented challenge type, for which the same problem occured.

The Gandi provider in #133 has been altered to use this new interface, so all the remaining DNS providers will now use the 30 second timeout again as normal.

Thoughts?

EDIT: This PR now sits on top of #144, due to new WaitFor semantics.

@beevik
Copy link
Contributor

beevik commented Mar 9, 2016

Out of curiosity, is there any reason this couldn't have been made a command-line option (like --dns-timeout)? Making the timeout configurable on the command line would be a more general-purpose solution. That way if one of the existing providers happens to be propagating more slowly than configured in the code, the user can address the issue without having to modify the code.

@xi2
Copy link
Contributor Author

xi2 commented Mar 10, 2016

A command line flag could be useful. It depends how many people (other than me and you) are having problems I suppose. If this gets accepted, I'd be happy to add a separate PR for one (I've just seen #149).

However a command line flag alone does not solve the use case of using the library without the CLI. Also, the way this PR works, the knowledge of what a good value for the timeout is is put into the provider itself (by it implementing the Timeout method). And each provider probably knows best what it's own timeout should be.

@xi2
Copy link
Contributor Author

xi2 commented Mar 10, 2016

An alternative way would be to return an extra value via the Present method, altering its signature. But that forces all providers to think about a timeout, when it may not be relevant for them (eg http-01) or they just need the default value (most of the DNS providers).

@xenolf
Copy link
Member

xenolf commented Mar 11, 2016

Wouldn't it make more sense to just push the timeout into the provider entirely and providing a default via a const?

@xi2
Copy link
Contributor Author

xi2 commented Mar 11, 2016

Hi xenolf. Forgive me, but I'm not 100% certain I understand your idea, so I'll state what I think you mean.

At the moment the value of the timeout is needed by the waitFor in dnsChallenge.Solve, which with this PR it finds by calling the provider's Timeout method.

When you say push the timeout into the provider, do you mean that the Present method of each provider must call waitFor itself? i.e. each provider's Present must contain the following

    waitFor(timeout, 30, func() (bool, error) {
        return preCheckDNS(fqdn, value)
    })

where a default timeout value can be found from a constant?

If you meant something else, I apologise, but could you elaborate? If you did mean that then of course that would work, but I passed on this strategy initially since it meant modifying the providers. However, if you think that's better then no problems. We can do that :) In some ways I do think it is better not to introduce extra exported API if we can help it.

@janeczku
Copy link
Contributor

Do we still need this to deal with higher than average propagation latencies for specific providers now that acme.WaitFor is exported? The Gandi provider can just invoke the wait routine within its DNSProvider.Present() implementation before returning, right?

@xi2
Copy link
Contributor Author

xi2 commented Mar 11, 2016

Hi @janeczku. I am happy to abandon this PR and just call acme.WaitFor from Present as you suggest. However in order to do so, as you can see from the code snippit above, I'll need preCheckDNS to be exported (or checkDNSPropagation), since I don't want to duplicate all that DNS code in the provider. Presently in #144, only WaitFor is exported which isn't quite enough. If we export preCheckDNS (and probably it's type preCheckDNSFunc too) I'm good as far as I can see. Is this the way forward @xenolf?

@janeczku
Copy link
Contributor

However in order to do so, as you can see from the code snippit above, I'll need preCheckDNS to be exported

Right... totally missed that!

@xenolf
Copy link
Member

xenolf commented Mar 12, 2016

I'm not sure if we should push the functionality of calling preCheckDNS down into the providers... I'd rather leave that in the library code to ensure it get's called.

I like the idea more of just giving the providers the ability to customize the timer.

})
switch provider := s.provider.(type) {
case ChallengeProviderTimeout:
err = waitFor(int(provider.Timeout().Seconds()), 30, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err = waitFor(int(provider.Timeout().Seconds()), 30, func()
I don't think we really want a 30 seconds interval. 😄

@xi2
Copy link
Contributor Author

xi2 commented Mar 12, 2016

@xenolf, so will this patch suffice then (once it's fixed for #144)? Or is there a better way?

Hi @janeczku. I'm not sure I understand your point about the 30 second interval. Bear in mind that you get the default 2 second interval/30 second timout unless you specify the optional Timeout() method to signal an extended timeout. Only then do you get the 30 second interval. Given that people are going to be specifying timeouts in the range of around 2 minutes -> 1 hour, I thought that 30 seconds was reasonable. What would you suggest as a better value?

@xi2
Copy link
Contributor Author

xi2 commented Mar 12, 2016

Given that the semantics of WaitFor have changed in #144, I have just rebased this PR to sit on top of #144.

@janeczku
Copy link
Contributor

I'm not sure I understand your point about the 30 second interval. Bear in mind that you get the default 2 second interval/30 second timout unless you specify the optional Timeout() method to signal an extended timeout. Only then do you get the 30 second interval.

This exactly.

Given that people are going to be specifying timeouts in the range of around 2 minutes -> 1 hour, I thought that 30 seconds was reasonable. What would you suggest as a better value?

1 hour? Good lord, i must have overlooked the PR for the mail order DNS provider... 😆

But seriously now: A 30 seconds interval will significantly slow down things for those providers that are in the more typical range of propagation delay between maybe 2 to 120 seconds. For example, CloudFlare usually propagates within 2-3 seconds but can spike up to 60-80 seconds. Hence i would specify the ChallengeProviderTimeout for that provider as 90 seconds.
With the proposed 30 seconds interval a certificate requests that would now complete in maybe ~6 seconds with the default 2 seconds interval would suddenly take ~6+30 seconds.

Now, seeing that we are adding an additional interface anyway, we might as well make it so:

type ChallengeProviderTimeout interface {
    ChallengeProvider
    Timeout() time.Duration
    Interval() time.Duration
}

Otherwise i strongly suggest to set the interval to not higher than 5 seconds.
What do you think?

@xi2
Copy link
Contributor Author

xi2 commented Mar 12, 2016

@janeczku, the way this PR was intended to be used, I was not expecting any of the existing providers currently in master to specify Timeout (just as they don't at present). It is purely for Gandi, Namecheap and other future providers who are, as you say, "mail order DNS" ;) Of course a 30 second interval will slow down Cloudflare. Hence Cloudflare should not (just as now) specify Timeout method, and so gets the fast timeout=60/interval=2. That is the default case in the type switch.

30 second interval is appropriate, however, for when timeout is in the range 60 seconds -> 1 hour though? No?

Are you somehow wanting to give a Timeout method for Cloudflare? If so, why? I thought it worked just fine with the current default of timeout=60/interval=2?

If cloudflare or other can spike to 80 seconds, I suggest the default timeout is changed to 80 seconds. This PR was only intended for the real slow coaches, for which the default timeout was inappropriate. I don't think I made myself too clear about its intended use though, for which I apologise. I'm still learning this github stuff ;)

@xi2
Copy link
Contributor Author

xi2 commented Mar 12, 2016

If you think that all providers should specify Timeout then we can just do away with ChallengeProviderTimeout altogether and move Timeout into ChallengeProvider. But then you have the case that HTTP and TLS ChallengeProviders have to define Timeout methods when timeout has no real meaning for them since everything is done by lego.

@janeczku
Copy link
Contributor

If cloudflare or other can spike to 80 seconds, I suggest the default timeout is changed to 80 seconds.

What if another provider needs 90 seconds? There is no one sane default wait time that fits all.
That's why the custom timeout interface should not be pre-tuned to fit a specific set of providers (like those that propagate within 2 mins to 1 hour per your example). It should be general enough to yield acceptable results regardless of which provider implements it.
If thats not the case than it does not belong in the main library, but in the provider package.

@xi2
Copy link
Contributor Author

xi2 commented Mar 12, 2016

Ok. I understand your point I think. You object to a hard line where interval suddenly changes from 2 seconds to 30 seconds, since there will always be some value for timeout you could choose (e.g. 90 seconds when the default is 80) where interval will then be inappropriate. If I've understood you right, I accept that as a valid criticism.

In that case how about this? We change WaitFor to do away with the interval parameter altogether so its signature is then just this:

WaitFor(timeout time.Duration, f func() (bool, error)) error

Internally we then make it set interval itself based on timeout. How about something like

interval := timeout / 60

where timeout has a minimum value of 60 seconds. What do you think?

@xi2
Copy link
Contributor Author

xi2 commented Mar 12, 2016

If that's too simplistic, we could make WaitFor do a more complicated backoff, but the simple solution above would address your points?

@xi2
Copy link
Contributor Author

xi2 commented Mar 12, 2016

I felt like doing a bit of math, so here's a more complicated idea for automatic setting of interval based on timeout. Set successive values of interval in each loop iteration in WaitFor based on the terms of the geometric progression 2, 2r, 2r^2, 2r^3, ..... for a value of r=1.016111. Why this value? It was so that after 1 hour the interval is approximately 60 seconds. You then have the following nice properties:

Total time (seconds) check # next interval (seconds)
0 1 2
60.97 26 2.98
300.85 78 6.85
3611.57 214 60.19

So you can see that for a 60 second timeout, interval stays below 3 seconds, for 5 minutes it's under 7 seconds, but for a timeout of 1 hour it raises to (at the end) 60 seconds, but requires only 214 checks, which I don't think is unreasonable for 1 hour.

Anyone like this idea?

@xi2
Copy link
Contributor Author

xi2 commented Mar 12, 2016

I have implemented the geometric progression idea to have a look at. I can always discard it if people don't like it. It's an additional commit on this PR.

To verify it actually does what I say it does, run the following program which measures the elapsed time and interval time. It will print the lines of the table above (including the missing lines :) )

func main() {
    initialTime := time.Now()
    var lastTime time.Time
    var lastSince time.Duration
    n := 1
    acme.WaitFor(3700*time.Second, 2*time.Second, func() (bool, error) {
        if n-1 > 0 {
            fmt.Println(lastSince.Seconds(), n-1, time.Now().Sub(lastTime))
        }
        lastTime = time.Now()
        lastSince = time.Since(initialTime)
        n++
        return false, nil
    })
}

Note to @janeczku: Please see that following this commit all calls to WaitFor have an initialInterval of 2 seconds. No more 30 seconds! :) :)

@janeczku
Copy link
Contributor

@xi2 I am all in changing the wait function to using an incremental sleep based on retry count.

I would re-evaluate the way you implement this though. You don't need the math package to get an exponential increment. Hint: sleep := 2000 + (10 << uint(retryCount))
But better yet, just use a plain simple linear function from 1 to 30 seconds.

@xi2
Copy link
Contributor Author

xi2 commented Mar 13, 2016

@janeczku. Great that you agree in principle to changing waitFor in this way.

I have modified the commit so that no sum of the progression is made, and there is no use of the math package any more. Just a simple

interval *= ratio

each loop interation. So the numbers in the table above remain the same (assuming negligible time in the poll function).

If however we were to use a linear function as you suggest, I don't think the numbers work quite as nicely for lower timeout values? e.g. if we used sleep := 2 + d*retryCount and we choose a value of d so that after one hour the sleep is 60 seconds (say d=0.495453) we would get the following

Total time (seconds) check # next interval (seconds)
0 1 2
64.65 14 8.44
309.74 33 17.85
3656.11 119 60.54

So the the interval goes up too quickly for lower timeout users? What do you think? Are things better now there is no use of math package?

@janeczku
Copy link
Contributor

@xi2 Here is an example of calculating exponential delay using bitwise shift operator.
This will result in small interval times within the first 30 seconds and a maximum interval of 35 seconds after 1.5 minutes.
http://play.golang.org/p/EYHKgi0F7i

@xi2
Copy link
Contributor Author

xi2 commented Mar 14, 2016

@janeczku. Thank you for your example on play.golang.org. I ran the program and observed what it did. While I agree fully with your previous point about removing unnecessary use of the math package, which I did of course, I don't see the benefit of increasing the interval via a bit shift operator vs using the existing approach of a float multiply. The code is not performance sensitive, and I think the multiply code is slightly more readable.

However, I really want this PR to get to whatever state @xenolf wants it to be in, so it can be merged. If @xenolf wants to use the exact algorithm from http://play.golang.org/p/EYHKgi0F7i I will change it to that without delay. If he wants something else, I will change it to that also. Basically, whatever he wants I will do! :) Thanks again for your many good suggestions @janeczku.

What shall it be @xenolf?

@xenolf
Copy link
Member

xenolf commented Mar 15, 2016

I really do not have a strong opinion on this. I think it's fine as is but I'd introduce a maximum interval so it doesn't grow indefinitely.

@xi2
Copy link
Contributor Author

xi2 commented Mar 15, 2016

Maximum interval set to max(initialInterval,60*time.Second). This caps the increasing interval at 60 seconds, but allows for intentionally setting a large interval such as 5 minutes.

EDIT: sorry there was a bug in this - now fixed

@xi2
Copy link
Contributor Author

xi2 commented Mar 15, 2016

It would seem to me that the ChallengeProviderTimeout definition really belongs in provider.go, alongside that of ChallengeProvider, so I have moved it there.

@beevik
Copy link
Contributor

beevik commented Mar 15, 2016

I like the addition of exponential backoff in the WaitFor function, but it feels like the growth rate is too slow. With namecheap, where it can take as long as 30 minutes for a change to propagate, the lego client ends up spamming DNS requests every 3 to 5 seconds for the first 5 minutes or so. A simple doubling algorithm, as follows, would be sufficient:

func WaitFor(timeout, interval time.Duration, f func() (bool, error)) error {
    var lastErr string
    maxInterval := 60 * time.Second
    timeup := time.After(timeout)

    for {
        select {
        case <-timeup:
            return fmt.Errorf("Time limit exceeded. Last error: %s", lastErr)
        default:
        }

        stop, err := f()
        if stop {
            return nil
        }
        if err != nil {
            lastErr = err.Error()
        }

        if interval > maxInterval {
            interval = maxInterval
        }

        time.Sleep(interval)
        interval = interval * 2
    }
}

For DNS providers that provide rapid updates, the algorithm will easily get you a response within 30 seconds (and 4 DNS requests). Whereas with a slow provider, you'll quickly be requesting once per minute, which is where you'd want to be anyway.

@xi2
Copy link
Contributor Author

xi2 commented Mar 16, 2016

Thanks for your feedback, @beevik. I notice that the code you have posted is more or less exactly what is in the PR, but with a higher multiplier each loop iteration. So it seems that the only issue is the correct and appropriate multiplier to use.

With a multiplier of 2.0 as you suggest, the interval does raise very quickly. By my reckoning it hits 30 seconds after total waiting time of just 30 seconds. And this being too long was @janeczku's original complaint that motivated changing WaitFor in the first place.

Since @janeczku also had a higher multiplier in his suggested code, and you have suggested the current multiplier is spamming, I have raised it. It now hits 60 seconds after 20 minutes (instead of 60 minutes) and stays there. I hope that is a more appropriate value and you no longer feel it is spamming. The revised numbers are:

Total time (seconds) Poll# Next interval (seconds)
0 1 2
64.75 21 5.11
301.93 46 16.49
1235.25 74 60.00
>1200 >74 60.00

Ultimately though I think there is going to be no value that pleases everyone. This has proved far more contentious than I had envisaged! So it may be down to @xenolf to make an executive decision. I personally do not care at all what the multiplier is (as long is it isn't spamming), since Gandi takes 20 minutes anyway. And I have it automated and running in what amounts to a cron job, so I'm never actually waiting for it either. I just don't want it to timeout! :)

@xenolf
Copy link
Member

xenolf commented Mar 16, 2016

The only solution which would maybe cater to "everyone" is probably to allow customisation of the initial interval. This way we can adjust the interval to a high number for providers we know take longer to propagate.

@xi2
Copy link
Contributor Author

xi2 commented Mar 16, 2016

Right you are @xenolf . This was actually @janeczku's suggestion way back in this thread:

type ChallengeProviderTimeout interface {
    ChallengeProvider
    Timeout() time.Duration
    Interval() time.Duration
}

which I didn't action at the time, but maybe is the best idea (sorry @janeczku!). Before I change things again, is there any need for interval growth at all if we are all specifying the interval in the provider? e.g. I could just specify Gandi as timeout=40*time.Minute/interval=60*time.Second and be done with it. Interval growth would seem an unnecessary complication?

EDIT: It may be better to keep things to one method. i.e.

type ChallengeProviderTimeout interface {
    ChallengeProvider
    Timeout() (timeout, interval time.Duration)
}

This would prevent the programming error of specifying Timeout(), but not Interval(), and then having the type switch fail unexpectedly at runtime.

@xi2
Copy link
Contributor Author

xi2 commented Mar 16, 2016

I have gone ahead and implemented this. While I was at it I have squashed the commits, since they had gone round in circles trying things, and arrived almost back at the beginning.

As I said, I have removed the interval growth, since it now seems unnecessary, but if I have overlooked a reason to keep it then please say and it can come back from the dead :)

// waiting for an ACME challenge to be satisfied. It can be used to
// implement a DNS ChallengeProvider which requires a long timeout
// when checking for DNS record progagation. If an implementor of a
// ChallengeProvider also provides a Timeout method, then the return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the first two sentences here should be merged as they are basically saying the same thing.

@xenolf
Copy link
Member

xenolf commented Mar 16, 2016

Looking good in my opinion. Maybe we could mention the default values for the timeout / interval on the type to get it into godoc.

@xi2
Copy link
Contributor Author

xi2 commented Mar 16, 2016

Thanks @xenolf! I've made the comment changes you suggest. Please read over the comments again.

// of the Timeout method will be used when appropriate by the acme
// package. The interval value is the time between checks.
//
// The default values used by acme for timeout and interval are 60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use "acme" here as this timeout does not have anything to do with ACME to avoid confusion. You could just write "The default values for timeout and interval are ...".

@xi2
Copy link
Contributor Author

xi2 commented Mar 16, 2016

I meant "package acme", but you are right that it is confusing the way it is written. I have now used your exact suggested wording. Thanks.

This type allows for implementing DNS ChallengeProviders that require
an unsually long timeout when checking for record propagation.
xenolf added a commit that referenced this pull request Mar 17, 2016
Add ChallengeProviderTimeout type to acme package
@xenolf xenolf merged commit faed8af into go-acme:master Mar 17, 2016
@ldez ldez added this to the v0.3 milestone Dec 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants