Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudflare propagation times out continuously #167

Closed
JorritSalverda opened this issue Mar 23, 2016 · 23 comments
Closed

Cloudflare propagation times out continuously #167

JorritSalverda opened this issue Mar 23, 2016 · 23 comments

Comments

@JorritSalverda
Copy link
Contributor

Most of the days it works fine, but today the configured 30 seconds timeout isn't sufficient. I get the following error over and over:

Time limit exceeded. Last error: NS nora.ns.cloudflare.com. did not return the expected TXT record

Either bumping it up to a much higher value - it does multiple checks within that timeout, right? - or making it configurable by command line parameter would help to cater for this incidental high propagation times.

@xenolf
Copy link
Member

xenolf commented Mar 23, 2016

Hello there! Thanks for reporting this. I expected that we'd have to tweak these numbers a bit. Do you have an idea as to how much higher we should set it?

it does multiple checks within that timeout, right?

Yes, it does.

@JorritSalverda
Copy link
Contributor Author

Not sure, but in the timeout PR - #148 - @janeczku mentions that it can go all the way up to 80 seconds:

For example, CloudFlare usually propagates within 2-3 seconds but can spike up to 60-80 seconds. Hence i would specify the ChallengeProviderTimeout for that provider as 90 seconds.

@xenolf
Copy link
Member

xenolf commented Mar 23, 2016

Closed via #168

@xenolf xenolf closed this as completed Mar 23, 2016
@JorritSalverda
Copy link
Contributor Author

I actually changed the wrong timeout. The 30 seconds I updated wasn't the propagation timeout, but an individual http call timeout.

It should either have been the global 60 seconds timeout - see 8aa797f from PR #148 - or by adding a Timeout func to the Cloudflare provider as is done in https://github.com/xenolf/lego/blob/master/providers/dns/namecheap/namecheap.go (line 79 at this moment).

What would be your preferred approach? Revert #168 and then add the timeout function in a new PR?

@JorritSalverda
Copy link
Contributor Author

I'm checking with Cloudflare if this is abnormal and something to fix on their end or if there's a higher maximum propagation time they're dedicated to deliver, which can then be put into the lego client. I'll post it here once I know more.

xenolf added a commit that referenced this issue Mar 23, 2016
@xenolf
Copy link
Member

xenolf commented Mar 23, 2016

I actually changed the wrong timeout.

And I didn't properly look at it. 👎 I reverted the change and pushed the proper fix with a Timeout function.

@xenolf
Copy link
Member

xenolf commented Mar 23, 2016

I'm checking with Cloudflare if this is abnormal and something to fix on their end or if there's a higher maximum propagation time they're dedicated to deliver, which can then be put into the lego client. I'll post it here once I know more.

Thank you! 😃

@janeczku
Copy link
Contributor

Thanks guys! 😄 👍

@JorritSalverda
Copy link
Contributor Author

Thanks. I'll test this latest version, but 120 seconds might not be enough. According to Cloudflare it ranges up to 10 minutes at this moment, something they're looking into and trying to get fixed. Not sure if you have to code your way around issues on their side though.

A command line configurable timeout might still make sense though, so the user of lego can decide for themselves what they find acceptable. Defaulting to what's currently set for individual providers of course, so people aren't surprised that the slower ones are so slow.

@xenolf
Copy link
Member

xenolf commented Mar 24, 2016

Thanks for this information @JorritSalverda. Let me know how this latest version work out for you. I'm currently thinking of a way to customize the values of Timeout functions from the outside in a sane way.

@janeczku
Copy link
Contributor

@JorritSalverda
Copy link
Contributor Author

Yes, all is working fine again. Ran lego tons of times successfully today.

@lenovouser
Copy link
Contributor

@xenolf I had this problem a few days ago too before you fixed it. Which is why I had to force-quit lego because it would have taken ages to finish (50+ certificates). Now I get the error

2016/03/29 18:22:17 [domain.tld] Could not obtain certificates
        acme: Error 429 - urn:acme:error:rateLimited - Error creating new authz :: Too many currently pending authorizations.

which is described here too:

Is there any way I can fix this by e.g. deleting or re-using the pending authorizations?

@xenolf
Copy link
Member

xenolf commented Mar 29, 2016

@lenovouser Authz deletion was only added recently to the ACME spec (ietf-wg-acme/acme#98). I'm not sure if boulder already implements it. What I'm unsure about is how you ended up with that many pending authz. Did you abort the client multiple times?

@lenovouser
Copy link
Contributor

@xenolf two times, yes. First time because I waited like 30 minutes and thought "maybe something is wrong with the internet connection lego is getting". I quit it, restarted my root and tried a 2nd time. Then I realized it has to be something deeper in lego and found this issue. This was like 4 days ago and I thought maybe the pending authorizations would go away at some point but they don't seem to.

I have also asked the same question in the let's encrypt community

@janeczku
Copy link
Contributor

Two times? How many domains did you pass? 150?

@xenolf
Copy link
Member

xenolf commented Mar 29, 2016

@lenovouser I'm not sure your problem is related to the one discussed in this issue. This issue caused lego to bail before the DNS record was propagated because of an insufficient timeout value. Your issue sounds like lego hung entirely and was not doing anything at all which should not happen as the timeouts should expire.

@lenovouser
Copy link
Contributor

@xenolf I am not sure if lego hung entirely or if it just took longer and longer to set the DNS records. At some point it didn't do anything for about 14 minutes which is why I force-quit it. I just counted and it are exactly 80 subdomains spread over 3 domains all on CloudFlare. And yes, this doesn't really have to do with the issue here but I though it would be the best place to post as this was the reason for why I had to kill lego. I can create a new issue though if you want.

@xenolf
Copy link
Member

xenolf commented Mar 29, 2016

@lenovouser The reason I think it's not related to this issue is because lego should have bailed after 60 seconds of not being able to determine the DNS record (120 seconds in latest master) and after 30 seconds in case of a HTTP hang while talking to CF. There should be no way for it to hang for 14 minutes.

@lenovouser
Copy link
Contributor

@xenolf yeah, it probably is not related. Even though this happened 7+ days ago which means it was before you fixed some stuff with CloudFlare in af94ecc and some commits before that one. I just though I'd wait a few days before I ask how to fix this here.

@lenovouser
Copy link
Contributor

@xenolf I am unsure whether I should create a new issue or if this is not fixable at all now?

@xenolf
Copy link
Member

xenolf commented Mar 29, 2016

@lenovouser Well, if it's only about the deletion of authz objects, then you are out of luck at the moment. The client does not save the authz objects it creates to resume aborted operations and boulder does not yet implement the authz link in registrations from 6.1.1 of the spec.
Once this gets implemented in boulder we can add support for detecting already valid / pending authz objects for a certain identifier and re-use that.

@lenovouser
Copy link
Contributor

@xenolf okay, I'll hope someone in the LE forum has an idea on how to fix this. Thanks for your help!

blueskyleader01 added a commit to blueskyleader01/lego-public that referenced this issue Feb 8, 2024
Updated timeout for cloudflare dns challenge record propagation to fix issue as described in go-acme/lego#167
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants