New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
export resolver.obtainError? #793
Comments
This also means that when I'm mocking |
If you have an issue related to I made the choice and spend time on v2 to not export |
It's not clear to me if I run into the 5- In some ways, I don't really care; doubling the limit to 10 might make it work for more cases, but I suspect that there will always be some scenario where there's no suitable number, and lego really has no choice but to fail. That's fine, but my code has to be able to deal with that situation, and I have to be able to write tests for my code that can exercise its handling of it. Other than doing a bunch of string processing, I don't know how to do either with lego as it is now. |
I need more context, because |
What context are you looking for? I can give you the logs from that run. Is there anything else? |
Do you have many SAN? Do you make concurrency calls? etc. I need to understand how the issue appears for you. |
This happened with two concurrent calls to I hope to drive the concurrency up a lot (to anywhere between 50 and 500, probably), but have been testing with small batches so that the logs are easier to read. I'm running this from a VM inside Google Cloud. It doesn't have a permanent external IP address, at least for IPv4; not sure whether it's ever connecting to LE via IPv6, and if so, what that looks like. I do have the patch from #783 applied, and am using it to institute a two minute delay prior pre-check, and that seems to be working for me as I expect. For the order that did work, the time from "preparing to solve" to "validated our request" was 2m18.5s for the wildcard, and 2m15s for the other, spanning a total of 2m19s. The "Wait for apply change" and "Wait for propagation" aren't tagged with their respective domains, but there aren't many of them; only two of the latter (suggesting that it didn't actually have to do any waiting), and six of the former. There are actually six nonce errors in the logs; the first one is actually for a challenge URI for the order that ended up succeeding. About .15s later, there are five nonce errors in a row for an authorization URI from the failed order. They're spaced by .12s, .31, .19, and .24s, respectively (the "too many retry" error comes .2s later). |
That was what I thought. I know there is a concurrency problem that produces I wanted to correct that for v2, I de-prioritized because I had no issue on the subject. |
We have a service that uses LetsEncrypt to provide certificates for many different user apps. Some of our users misconfigure their DNS and so we are sometimes not able to provide them certificates. We used a different Go ACME library (github.com/hlandau/acme/acmeapi) which does not support ACMEv2 (though the author may be fixing that right now?). Porting to Lego has overall been a great experience that let us delete the majority of our code. However, our old implementation did parse the specific errors (which here you have as acme.ProblemDetails.Type) returned from LetsEncrypt for monitoring purposes. Some error types we wanted to get alerted on; others were just user misconfigurations and we wanted to display them in our dashboard to users but not alert us. We can almost implement that with Lego, except that we can't break into the obtainError returned by Obtain to get the nested ProblemDetails. |
@glasser Just curious, what is the service you're using? (I ask because several of us in the industry are currently drafting up a document about best practices for ACME clients, especially those which operate at scale.) |
I want to log original errors received from let's encrypt but since |
For me, the current |
any updates on this issue? or maybe a recommendations for a workaround? |
any updates? |
sorry it takes more time than I was thinking, but now the underlying errors related to |
I got a
badNonce
duringObtain()
. Based on the logs, it looks like it failed before it got confirmation of validation: I see the fivebadNonce
errors afterChecking DNS record propagation
, but not theThe server validated our request
message; if IGET
the auth URL, it tells me that the status isvalid
.Obtain()
returns an error that, according to%T
, is of typeresolver.obtainError
. If there's a way to "cast" this to the underlyingmap[string]error
, I can't figure it out, which means I can't get at any of the underlying errors and either report them, store them, act on any data they may contain, etc. In this case, I think I'd be able to assert that the error value in the map is actually anacme.NonceError
and retry more (though I don't think I have access to either the URL I'd need or the routines to operate on it).The only workaround I can think of is to parse the error string. At least that's fairly nicely structured, but it's still kinda gross.
Can
resolver.obtainError
be made public? That should be non-breaking, yes? Are there other errors that might be useful, as well?I guess what I'm really looking for is a way for lego to tell me that although it failed, the operation is retryable (the failure isn't terminal), and to give me a way to retry it from whatever point it failed. This would take care of #771, too, but is probably a huge change.
The text was updated successfully, but these errors were encountered: