Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acme.sh fails with "Sign error, wrong status" when a2c ca_server.get_cert() fails with error: The NETBIOS connection with the remote host timed out. #154

Open
okorsky opened this issue Apr 15, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@okorsky
Copy link

okorsky commented Apr 15, 2024

When using acme.sh with a2c and mswcce_ca_handler.py, there's a strange behavior that happens.

All VMs (a2c, acme.sh, acme-dns, MS CA and domain controller) are all in one network and have direct access. in this environment it's on Azure VMs. But I also reproduced the same with VMs running on VMware workstation.

  • a2c finishes the validation and tries to enroll for a certificate from MS CA using mswcce_ca_handler
  • the certificate gets issued by the MS CA, but a2c shows NETBIOS connection error when it's trying to get the certificate
  • acme.sh ends with error Sign error, wrong status
  • acme.sh tries again with a new request and all goes smoothly

any idea what could be the main issue here? is it possible to re-try to pull the certificate from the MS CA after the NETBIOS failure?

Logs from both acme.sh and a2c are attached here.
a2clog_amcesh.txt
acmeshlog.txt

@grindsa grindsa added the bug Something isn't working label Apr 18, 2024
@grindsa
Copy link
Owner

grindsa commented Apr 18, 2024

I see this error from time to time in my lab as well. In the past I thought its related to my setup (i am accessing the MS CA via ssh tunnel) but it does not seem to be the case.

I will look into it during the upcoming days however are you saying that the same setup works fine with other acme clients? Is it a permanent issue when using acme-sh?

@okorsky
Copy link
Author

okorsky commented Apr 29, 2024

I normally test with acme.sh so I've seen this error mostly when using acme.sh.

however, there was one instance where it also happened with cert-manager.

@okorsky
Copy link
Author

okorsky commented May 8, 2024

the error appears more and more now, from both cert-manager and acme.sh.

any idea yet?

@webprofusion-chrisc
Copy link
Contributor

It's unlikely that the choice of ACME client would affect a server side NETBIOS connection. The error in question is being thrown by the impacket library: https://github.com/fortra/impacket/blob/master/impacket/nmb.py#L285 however the wording is their default for that exception type and the point where it occurs will vary.

Beware cached name resolution if your target machines IP address will change.

@okorsky
Copy link
Author

okorsky commented May 8, 2024

@webprofusion-chrisc I agree, I was just answering the previous question if the error occurs from different clients.

I'm not sure if the cached name resolution is the issue here. the target machines (both DC and ADCS CA) have static IP addresses.

Also it's worth noting:

  • on ADCS CA, the certificate gets issued (so the request does reach the CA)
  • right after the error, when the client tries again (within seconds) it gets a cert successfully

I was thinking to have a temporary workaround to read the error in the exception at

except Exception as err_:
cert_raw = None
self.logger.error("ca_server.get_cert() failed with error: %s", err_)
and if the error is "the NETBIOS connection with the remote host timed out" then to simply try the cert_raw = convert_byte_to_string(request.get_cert(convert_string_to_byte(csr))) again before failing.

@okorsky
Copy link
Author

okorsky commented May 13, 2024

the workaround didn't work after all.

I tried to catch the error, sleep for 2 seconds then try building the request again request = self.request_create(), but the same error showed at the retry.

@grindsa
Copy link
Owner

grindsa commented May 17, 2024

Hi,

Sorry for not commenting earlier but i was quite busy the last few weeks.

I agree that the error is most likely not related to the acme-client. The reason for asking is that I am looking for a reliable way to replicate the issue.

Let me give it another try over the weekend.

/G.

@grindsa
Copy link
Owner

grindsa commented May 19, 2024

Hi,

Sorry, I am still not able to replicate the issue. However, its worth to try if increasing the timeout of the dce-connection will help you to overcome the issue. Default is 5 seconds, maybe a higher value works better in your environment.

I updated the handler and introduced an timeout option in acme_srv.cfg to make the timeout configurable.

[CAhandler]
...
timeout: 20

Please give it a try with the updated handler) and check if things get better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants