Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SERVFAIL looking up A for subdomain.example.com (but worked before) #1411

Closed
djc opened this issue Jan 24, 2016 · 29 comments
Closed

SERVFAIL looking up A for subdomain.example.com (but worked before) #1411

djc opened this issue Jan 24, 2016 · 29 comments

Comments

@djc
Copy link

djc commented Jan 24, 2016

In trying to create a certificate for a new subdomain I wanted to set up, I ran into this problem:

DNS problem: SERVFAIL looking up A for mysql.xavamedia.nl

There is no exact A record for mysql.xavamedia.nl; but it should be resolvable using a wildcard DNS entry for *.xavamedia.nl. Previously (December 5), this worked correctly for dirkjan.ochtman.nl, which is the same kind of setup, with the same name servers and the same actual host. However, when trying to create a new certificate for dirkjan.ochtman.nl, it fails the same way as mysql.xavamedia.nl.

On the other hand, I was able to have LE certify ochtman.nl and xavamedia.nl today. I also just successfully requested a certificate for enrai.xavamedia.nl, which does have an explicit A record in the DNS setup for xavamedia.nl. (FWIW, I was using acme-tiny, not the official client.)

Slight update: I just created an explicit A record for mysql.xavamedia.nl, and was able to successfully create a certificate for it, so I confirmed that there is a difference for me.

Is this an intentional change, or is there a regression (maybe having to do with the DNS validation feature that was deployed recently)? To me, the previous behavior made more sense.

@rolandshoemaker
Copy link
Contributor

Nothing has changed recently in our DNS resolution methods, looking at how my local Unbound resolver responds to a query for you domain it seems like your DNS server is returning a malformed response.

unbound[25976:3] info: resolving mysql.xavamedia.nl. A IN
unbound[25976:3] info: response for mysql.xavamedia.nl. A IN
unbound[25976:3] info: reply from <xavamedia.nl.> 217.194.122.34#53
unbound[25976:3] info: query response was ANSWER
unbound[25976:3] info: Validate: message contains bad rrsets

@djc
Copy link
Author

djc commented Jan 25, 2016

I checked the DNS setup with like 5 online tools, and the only problem they detect is that some TTL values are lower than recommended. Does letsencrypt.org run Unbound as well? I tried to read the Unbound source to see if I could make sense out of what triggers that validation error; in detect_wrongly_truncated (from validator.c) is seems to find something wrong.

Still, I don't understand what it considers wrong while all the other tools seem to work okay.

@djc
Copy link
Author

djc commented Feb 20, 2016

Okay, I'm convinced now that the problem is on my side somehow.

@djc djc closed this as completed Feb 20, 2016
@bigeagle
Copy link

Hi @djc,
I'm having the same issue #1308, how did you solve this problem?
However, I can correctly resolve my domain (ldap.nics.cc) using unbound.

Cheers,

@djc
Copy link
Author

djc commented Feb 24, 2016

I haven't solved it yet, but here's more context from my problem:

http://serverfault.com/questions/755803/how-are-dns-timeouts-supposed-to-work/755918

@bigeagle
Copy link

@djc I've solve my issue by making my self-made DNS server case-insensitive. #1308

@djc
Copy link
Author

djc commented Feb 27, 2016

So I've tried to debug this more and now I think this is a Boulder problem, again. I've seen that some of the answers can be of the NODATA variety, which might point to an intermittent problem in my DNS hosting setup, but that should not result in the consistent SERVFAIL I get from Let's Encrypt.

@djc djc reopened this Feb 27, 2016
@riking
Copy link
Contributor

riking commented Mar 4, 2016

The DNS server we are using also converts REFUSED into SERVFAIL, possibly that's what you're seeing?

Timeouts are also a issuance blocker.

@djc
Copy link
Author

djc commented Mar 4, 2016

It seems unlikely. I tried to compile and execute the bdns code to see if I could figure out what was going wrong, but haven't been able to so far. I'm still inclined to think the processing of wild card domains is (at least part of) the problem.

@bigeagle
Copy link

bigeagle commented Mar 5, 2016

maybe you can print out query logs

@djc
Copy link
Author

djc commented Mar 5, 2016

BTW, yesterday one of the domains where I was having this problem had its previous LE certificate (from 2015-12-05) expire. Since I used HSTS, I could not do http-01 validation for it, so I ended up doing dns-01 validation. This worked fine. So:

  • It definitely seems like this issue is a regression since 2015-12-05.
  • DNS validation works, which likely excludes at least some types of problems.

@jsha
Copy link
Contributor

jsha commented Mar 5, 2016

Since I used HSTS, I could not do http-01 validation for it

FYI, http-01 works fine even with HSTS and http -> https redirects. The Boulder server does not record HSTS state, and it does follow redirects.

@djc
Copy link
Author

djc commented Mar 5, 2016

That makes sense, but I bet it doesn't work when the https is on an (LE) expired cert.

@riking
Copy link
Contributor

riking commented Mar 7, 2016

@djc: Nope, still works. Because the authentication is control of the IP, TLS validation is ignored.

@deonthomasgy
Copy link

i'm having the same problem with deonthomas.com, it seems opendns and other dns servers have my record but not google. you can test by running.
nslookup -debug deonthomas.com 8.8.4.4 (not found)
nslookup -debug deonthomas.com 208.67.222.222 (found)

@jsha
Copy link
Contributor

jsha commented May 12, 2016

I'm going to close out this issue because there are a lot of possible causes of SERVFAIL, and they're generally not related to each other.

@princeamd: Your problem sounds like it might be bad DNSSEC records, since Google and Let's Encrypt validate DNSSEC, but OpenDNS may not. If that doesn't solve your problem, could you post on https://community.letsencrypt.org for more help? Thanks!

@jsha jsha closed this as completed May 12, 2016
@djc
Copy link
Author

djc commented May 13, 2016

@jsha as far as I know my original issue still stands; renewing my certificates in April, I had the same problem again.

@rolandshoemaker
Copy link
Contributor

@djc testing locally with unbound I no longer get errors for your mysql.xavamedia.nl domain, is this specific domain still failing for you or are there others?

@djc
Copy link
Author

djc commented May 13, 2016

@rolandshoemaker I've now added an explicit CNAME record for mysql.xavamedia.nl because of this issue. Maybe you can try test.xavamedia.nl?

@rolandshoemaker
Copy link
Contributor

test.xavamedia.nl also seems to be behaving properly.

@djc
Copy link
Author

djc commented May 13, 2016

djc@enrai lecertman $ date
Fri May 13 10:52:14 CEST 2016
djc@enrai lecertman $ python lecertman.py xavamedia.cfg 
Verifying test.xavamedia.nl domain... Traceback (most recent call last):
  File "lecertman.py", line 263, in <module>
    main(args[0], opts)
  File "lecertman.py", line 255, in main
    get_certificate(ca, account_key, cert_metadata, cert_name, domains)
  File "lecertman.py", line 185, in get_certificate
    raise ValueError(msg % (domain, status))
ValueError: test.xavamedia.nl challenge did not pass: {u'status': u'invalid', u'validationRecord': [{u'url': u'http://test.xavamedia.nl/.well-known/acme-challenge/uAW3x_gTiwrb4b_dAB-cFl9R0GHR9s_kAPANAp2U9Uk', u'hostname': u'test.xavamedia.nl', u'addressUsed': u'', u'port': u'80', u'addressesResolved': None}], u'keyAuthorization': u'uAW3x_gTiwrb4b_dAB-cFl9R0GHR9s_kAPANAp2U9Uk.9fYUZ2hpX7v7Oz_NL6GN-gGOxU4lOJNXbbLuba8wiNM', u'uri': u'https://acme-v01.api.letsencrypt.org/acme/challenge/4DSXpfWCPd3dp0Yaqy_82BM6Wl_V1K4N7FyBU0abOMw/85350887', u'token': u'uAW3x_gTiwrb4b_dAB-cFl9R0GHR9s_kAPANAp2U9Uk', u'error': {u'type': u'urn:acme:error:connection', u'detail': u'DNS problem: SERVFAIL looking up A for test.xavamedia.nl'}, u'type': u'http-01'}

@rolandshoemaker
Copy link
Contributor

It seems that I'm using a different version of unbound locally to what we use in production, I'm going to go recompile and try again.

@bortzmeyer
Copy link

I find no problem with the domain xavamedia.nl , neither by hand or with DNSviz. RIPE Atlas probes (many of them use a validating resolver) can resolve it. (There are only two things to notice, both legal: their name servers return sometimes the RRSIG before the SOA, legal but unusual, and the domain uses wildcards, always a risky thing with DNSSEC.)

Therefore, I tend to say that Let's Encrypt is wrong.

@djc
Copy link
Author

djc commented Jul 24, 2016

There's no DNSSEC going on here, though, right?

@Techwolf12
Copy link

I am currently having the same issue using bind as DNS server. Everything resolves fine from multiple points, the subdomain is explicitly defined, DNSSEC has been turned off as a test, both don't work.

@cpu
Copy link
Contributor

cpu commented Jan 3, 2017

@Techwolf12 If you're having an issue with DNS resolution please open a new issue that includes the domain name(s) in question.

@Techwolf12
Copy link

@cpu After testing with a lower TTL, it seems to be an issue with DNSSEC and Letsencrypt.

@cpu
Copy link
Contributor

cpu commented Jan 3, 2017

@Techwolf12 Interesting - happy to help troubleshoot that in a separate issue as well. As far as I'm aware our resolver should support DNSSEC without any known issues.

@jsha
Copy link
Contributor

jsha commented Jan 3, 2017

Hi all! This issue thread has been popular for people to post a variety of DNS resolution problems with Let's Encrypt, not necessarily related. I'm guessing that's because it ranks high on Google for its error message.

Most types of problems that result in "SERVFAIL looking up A for ..." are not related to the original post here, and most of them require individual debugging. If you're having this problem, please visit https://community.letsencrypt.org/ and post a new topic describing your setup and including your real domain name. We have a pretty active community there that will help diagnose the problem.

Thanks!

@letsencrypt letsencrypt locked and limited conversation to collaborators Jan 3, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants