-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ChallengeProviderTimeout type to acme package #148
Conversation
Out of curiosity, is there any reason this couldn't have been made a command-line option (like --dns-timeout)? Making the timeout configurable on the command line would be a more general-purpose solution. That way if one of the existing providers happens to be propagating more slowly than configured in the code, the user can address the issue without having to modify the code. |
A command line flag could be useful. It depends how many people (other than me and you) are having problems I suppose. If this gets accepted, I'd be happy to add a separate PR for one (I've just seen #149). However a command line flag alone does not solve the use case of using the library without the CLI. Also, the way this PR works, the knowledge of what a good value for the timeout is is put into the provider itself (by it implementing the |
An alternative way would be to return an extra value via the |
Wouldn't it make more sense to just push the timeout into the provider entirely and providing a default via a |
Hi xenolf. Forgive me, but I'm not 100% certain I understand your idea, so I'll state what I think you mean. At the moment the value of the timeout is needed by the When you say push the timeout into the provider, do you mean that the waitFor(timeout, 30, func() (bool, error) {
return preCheckDNS(fqdn, value)
}) where a default If you meant something else, I apologise, but could you elaborate? If you did mean that then of course that would work, but I passed on this strategy initially since it meant modifying the providers. However, if you think that's better then no problems. We can do that :) In some ways I do think it is better not to introduce extra exported API if we can help it. |
Do we still need this to deal with higher than average propagation latencies for specific providers now that |
Hi @janeczku. I am happy to abandon this PR and just call |
Right... totally missed that! |
I'm not sure if we should push the functionality of calling I like the idea more of just giving the providers the ability to customize the timer. |
}) | ||
switch provider := s.provider.(type) { | ||
case ChallengeProviderTimeout: | ||
err = waitFor(int(provider.Timeout().Seconds()), 30, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err = waitFor(int(provider.Timeout().Seconds()), 30, func()
I don't think we really want a 30 seconds interval. 😄
@xenolf, so will this patch suffice then (once it's fixed for #144)? Or is there a better way? Hi @janeczku. I'm not sure I understand your point about the 30 second interval. Bear in mind that you get the default 2 second interval/30 second timout unless you specify the optional |
This exactly.
1 hour? Good lord, i must have overlooked the PR for the mail order DNS provider... 😆 But seriously now: A 30 seconds interval will significantly slow down things for those providers that are in the more typical range of propagation delay between maybe 2 to 120 seconds. For example, CloudFlare usually propagates within 2-3 seconds but can spike up to 60-80 seconds. Hence i would specify the ChallengeProviderTimeout for that provider as 90 seconds. Now, seeing that we are adding an additional interface anyway, we might as well make it so: type ChallengeProviderTimeout interface {
ChallengeProvider
Timeout() time.Duration
Interval() time.Duration
} Otherwise i strongly suggest to set the interval to not higher than 5 seconds. |
@janeczku, the way this PR was intended to be used, I was not expecting any of the existing providers currently in master to specify 30 second interval is appropriate, however, for when timeout is in the range 60 seconds -> 1 hour though? No? Are you somehow wanting to give a If cloudflare or other can spike to 80 seconds, I suggest the default timeout is changed to 80 seconds. This PR was only intended for the real slow coaches, for which the default timeout was inappropriate. I don't think I made myself too clear about its intended use though, for which I apologise. I'm still learning this github stuff ;) |
If you think that all providers should specify |
What if another provider needs 90 seconds? There is no one sane default wait time that fits all. |
Ok. I understand your point I think. You object to a hard line where interval suddenly changes from 2 seconds to 30 seconds, since there will always be some value for timeout you could choose (e.g. 90 seconds when the default is 80) where interval will then be inappropriate. If I've understood you right, I accept that as a valid criticism. In that case how about this? We change WaitFor(timeout time.Duration, f func() (bool, error)) error Internally we then make it set interval itself based on timeout. How about something like interval := timeout / 60 where |
If that's too simplistic, we could make |
I felt like doing a bit of math, so here's a more complicated idea for automatic setting of interval based on timeout. Set successive values of interval in each loop iteration in
So you can see that for a 60 second timeout, interval stays below 3 seconds, for 5 minutes it's under 7 seconds, but for a timeout of 1 hour it raises to (at the end) 60 seconds, but requires only 214 checks, which I don't think is unreasonable for 1 hour. Anyone like this idea? |
I have implemented the geometric progression idea to have a look at. I can always discard it if people don't like it. It's an additional commit on this PR. To verify it actually does what I say it does, run the following program which measures the elapsed time and interval time. It will print the lines of the table above (including the missing lines :) ) func main() {
initialTime := time.Now()
var lastTime time.Time
var lastSince time.Duration
n := 1
acme.WaitFor(3700*time.Second, 2*time.Second, func() (bool, error) {
if n-1 > 0 {
fmt.Println(lastSince.Seconds(), n-1, time.Now().Sub(lastTime))
}
lastTime = time.Now()
lastSince = time.Since(initialTime)
n++
return false, nil
})
} Note to @janeczku: Please see that following this commit all calls to |
@xi2 I am all in changing the wait function to using an incremental sleep based on retry count. I would re-evaluate the way you implement this though. You don't need the math package to get an exponential increment. Hint: |
@janeczku. Great that you agree in principle to changing I have modified the commit so that no sum of the progression is made, and there is no use of the interval *= ratio each loop interation. So the numbers in the table above remain the same (assuming negligible time in the poll function). If however we were to use a linear function as you suggest, I don't think the numbers work quite as nicely for lower timeout values? e.g. if we used
So the the interval goes up too quickly for lower timeout users? What do you think? Are things better now there is no use of |
@xi2 Here is an example of calculating exponential delay using bitwise shift operator. |
@janeczku. Thank you for your example on play.golang.org. I ran the program and observed what it did. While I agree fully with your previous point about removing unnecessary use of the However, I really want this PR to get to whatever state @xenolf wants it to be in, so it can be merged. If @xenolf wants to use the exact algorithm from http://play.golang.org/p/EYHKgi0F7i I will change it to that without delay. If he wants something else, I will change it to that also. Basically, whatever he wants I will do! :) Thanks again for your many good suggestions @janeczku. What shall it be @xenolf? |
I really do not have a strong opinion on this. I think it's fine as is but I'd introduce a maximum interval so it doesn't grow indefinitely. |
Maximum interval set to EDIT: sorry there was a bug in this - now fixed |
It would seem to me that the |
I like the addition of exponential backoff in the WaitFor function, but it feels like the growth rate is too slow. With namecheap, where it can take as long as 30 minutes for a change to propagate, the lego client ends up spamming DNS requests every 3 to 5 seconds for the first 5 minutes or so. A simple doubling algorithm, as follows, would be sufficient: func WaitFor(timeout, interval time.Duration, f func() (bool, error)) error {
var lastErr string
maxInterval := 60 * time.Second
timeup := time.After(timeout)
for {
select {
case <-timeup:
return fmt.Errorf("Time limit exceeded. Last error: %s", lastErr)
default:
}
stop, err := f()
if stop {
return nil
}
if err != nil {
lastErr = err.Error()
}
if interval > maxInterval {
interval = maxInterval
}
time.Sleep(interval)
interval = interval * 2
}
} For DNS providers that provide rapid updates, the algorithm will easily get you a response within 30 seconds (and 4 DNS requests). Whereas with a slow provider, you'll quickly be requesting once per minute, which is where you'd want to be anyway. |
Thanks for your feedback, @beevik. I notice that the code you have posted is more or less exactly what is in the PR, but with a higher multiplier each loop iteration. So it seems that the only issue is the correct and appropriate multiplier to use. With a multiplier of 2.0 as you suggest, the interval does raise very quickly. By my reckoning it hits 30 seconds after total waiting time of just 30 seconds. And this being too long was @janeczku's original complaint that motivated changing Since @janeczku also had a higher multiplier in his suggested code, and you have suggested the current multiplier is spamming, I have raised it. It now hits 60 seconds after 20 minutes (instead of 60 minutes) and stays there. I hope that is a more appropriate value and you no longer feel it is spamming. The revised numbers are:
Ultimately though I think there is going to be no value that pleases everyone. This has proved far more contentious than I had envisaged! So it may be down to @xenolf to make an executive decision. I personally do not care at all what the multiplier is (as long is it isn't spamming), since Gandi takes 20 minutes anyway. And I have it automated and running in what amounts to a cron job, so I'm never actually waiting for it either. I just don't want it to timeout! :) |
The only solution which would maybe cater to "everyone" is probably to allow customisation of the initial interval. This way we can adjust the interval to a high number for providers we know take longer to propagate. |
Right you are @xenolf . This was actually @janeczku's suggestion way back in this thread: type ChallengeProviderTimeout interface {
ChallengeProvider
Timeout() time.Duration
Interval() time.Duration
} which I didn't action at the time, but maybe is the best idea (sorry @janeczku!). Before I change things again, is there any need for interval growth at all if we are all specifying the interval in the provider? e.g. I could just specify Gandi as EDIT: It may be better to keep things to one method. i.e. type ChallengeProviderTimeout interface {
ChallengeProvider
Timeout() (timeout, interval time.Duration)
} This would prevent the programming error of specifying |
I have gone ahead and implemented this. While I was at it I have squashed the commits, since they had gone round in circles trying things, and arrived almost back at the beginning. As I said, I have removed the interval growth, since it now seems unnecessary, but if I have overlooked a reason to keep it then please say and it can come back from the dead :) |
// waiting for an ACME challenge to be satisfied. It can be used to | ||
// implement a DNS ChallengeProvider which requires a long timeout | ||
// when checking for DNS record progagation. If an implementor of a | ||
// ChallengeProvider also provides a Timeout method, then the return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the first two sentences here should be merged as they are basically saying the same thing.
Looking good in my opinion. Maybe we could mention the default values for the timeout / interval on the type to get it into godoc. |
Thanks @xenolf! I've made the comment changes you suggest. Please read over the comments again. |
// of the Timeout method will be used when appropriate by the acme | ||
// package. The interval value is the time between checks. | ||
// | ||
// The default values used by acme for timeout and interval are 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use "acme" here as this timeout does not have anything to do with ACME to avoid confusion. You could just write "The default values for timeout and interval are ...".
I meant "package acme", but you are right that it is confusing the way it is written. I have now used your exact suggested wording. Thanks. |
This type allows for implementing DNS ChallengeProviders that require an unsually long timeout when checking for record propagation.
Add ChallengeProviderTimeout type to acme package
There was what I perceived as a problem with my Gandi
ChallengeProvider
in #133. Because of Gandi's long DNS record propagation time (in my tests, up to 25 minutes), I had to alter the timeout theacme
package used when checking for record propagation. However, this affected all the DNS providers, the remainder of which just require the existing 30 second timeout. Forcing them all to have a 40 minute timeout for the sake of Gandi seemed... wrong?This PR adds a new
ChallengeProviderTimeout
type to the acme package which allows for implementing DNS providers that require an unsually long timeout when checking for record propagation. In fact, although it would only be necessary for DNS providers like Gandi at present, it need not be solely for DNS providers. There could be in the future some yet not-invented challenge type, for which the same problem occured.The Gandi provider in #133 has been altered to use this new interface, so all the remaining DNS providers will now use the 30 second timeout again as normal.
Thoughts?
EDIT: This PR now sits on top of #144, due to new
WaitFor
semantics.