Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubles with DNS resolution #18164

Closed
kud opened this issue Jul 31, 2019 · 11 comments
Closed

Troubles with DNS resolution #18164

kud opened this issue Jul 31, 2019 · 11 comments

Comments

@kud
Copy link

kud commented Jul 31, 2019

Hello,

I'm sorry if I won't be clear because the problem isn't a subject I master a lot.

One of our users has some troubles to reach your servers.

What he told me:

cdn.jsdelivr.net answers NXDOMAIN when we activate QNAME minimisation (the server answers NXDOMAIN for ENT (Empty Non-Terminal))

The test:

 dig @ns1.flexbalancer.net. A a7e454.flexbalancer.net 

; <<>> DiG 9.10.3-P4-Debian <<>> @ns1.flexbalancer.net. A a7e454.flexbalancer.net
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 627
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 512
;; QUESTION SECTION:
;a7e454.flexbalancer.net. IN A

;; Query time: 20 msec
;; SERVER: 2400:cb00:2049:1::a29f:18a3#53(2400:cb00:2049:1::a29f:18a3)
;; WHEN: Wed Jul 31 12:16:34 CEST 2019
;; MSG SIZE  rcvd: 52

% dig @ns1.flexbalancer.net. A jsdelivr.a7e454.flexbalancer.net

; <<>> DiG 9.10.3-P4-Debian <<>> @ns1.flexbalancer.net. A jsdelivr.a7e454.flexbalancer.net
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17070
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 512
;; QUESTION SECTION:
;jsdelivr.a7e454.flexbalancer.net. IN A

;; ANSWER SECTION:
jsdelivr.a7e454.flexbalancer.net. 10 IN	CNAME dualstack.f3.shared.global.fastly.net.

;; Query time: 20 msec
;; SERVER: 2400:cb00:2049:1::a29f:18a3#53(2400:cb00:2049:1::a29f:18a3)
;; WHEN: Wed Jul 31 12:17:37 CEST 2019
;; MSG SIZE  rcvd: 109

It's apparently not normal.

Could you try to fix this? If you need more information, do not hesitate to ask me. :)

@jimaek
Copy link
Member

jimaek commented Jul 31, 2019

May I ask where the user with the issue is located?

@kud
Copy link
Author

kud commented Jul 31, 2019

France!

@bortzmeyer
Copy link

May I ask where the user with the issue is located?

This is irrelevant since you can see by yourself (using the dig queries mentioned) that it happens everywhere. (I tested in Japan and in the USA.)

@jimaek
Copy link
Member

jimaek commented Jul 31, 2019

Can you give me some more information? How exactly are they enabling qname minimization? I want to replicate the issue

@bortzmeyer
Copy link

Can you give me some more information? How exactly are they enabling qname minimization? I want to replicate the issue

Configuration of my resolver (Unbound):

server:
    # Send minimum amount of information to upstream servers to enhance
    # privacy. Only sends minimum required labels of the QNAME and sets
    # QTYPE to NS when possible.
    # See RFC 7816 "DNS Query Name Minimisation to Improve Privacy" for
    # details.
    qname-minimisation: yes
    # If you use this, you also need the above
    qname-minimisation-strict: yes

@bortzmeyer
Copy link

To add some explanations (warning: long and may be useless):

First, the symptom: cdn.jsdelivr.net does not exist:

% dig A cdn.jsdelivr.net 
...
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 8702
                                                                   ^^^^^^^^^^^
                                                                   No such domain

Second, why does it work for some people? To explain it, we have to dig (pun intended) deeper:

̀cdn.jsdelivr.net is an alias to jsdelivr.a7e454.flexbalancer.net:

% dig A cdn.jsdelivr.net        
...
;; ANSWER SECTION:
cdn.jsdelivr.net.	60 IN CNAME jsdelivr.a7e454.flexbalancer.net.

The authoritative name servers for flexbalancer.net are:

% dig +short NS flexbalancer.net
ns1.flexbalancer.net.
ns2.flexbalancer.net.

Both reply correctly when queried about jsdelivr.a7e454.flexbalancer.net but not when queried about a7e454.flexbalancer.net:

% dig @ns2.flexbalancer.net A a7e454.flexbalancer.net
...
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 21877
                                                                   ^^^^^^^^^^^^

That's the crux of the issue: they should not reply NXDOMAIN since the domain exists. jsdelivr wrongly aliases to a name which is served by broken name servers.

a7e454.flexbalancer.net exists because a subdomain, `jsdelivr.a7e454.flexbalancer.net exists, even if it has no resource records itself. This is what is called an ENT (Empty Non-Terminal domain name). Replying NXDOMAIN for an ENT is a bug that was present in some CDN some time ago (Akamai had it, at a time), but that I thought almost disappeared (among other issues, it is incompatible with DNSSEC).

So, flexbalancer.net is clearly wrong, and jsdelivr should have them fix that, or should move to another name. But why does it work for some people?

This is because the traditional way of doing DNS resolution was to send the entire name to every authoritative name server queried. So, flexbalancer.net's servers receive the full name jsdelivr.a7e454.flexbalancer.net, for which they work. But the traditional way is bad for privacy (for instance, the root name servers, and the Verisign .net servers see the entire query), leading to QNAME minimization, described in RFC 7816. QNAME minimization, following a general principle of privacy, sends only the minimum data required to accomplish a task. So, the root name servers will only receive a query for net, the Verisign name servers only a query for flexbalancer.net and so on. Consequence: flexbalancer.net's name servers will first receive a query for a7e454.flexbalancer.net, revealing the bug.

Not all DNS resolvers use QNAME minimization yet. Also, some who do use it in a lax mode, where the resolver retries with the full name, to work around broken name servers like yours. (The qname-minimisation-strict: yes option above disables this lax mode.) So, not everyone will see the problem, although they will probably be more and more as time goes.

@jimaek
Copy link
Member

jimaek commented Jul 31, 2019

Thank you for the detailed reply, it was really helpful.
This is something that https://perfops.net/ needs to fix, but it should happen soon and by effect jsDelivr will work correctly as well!

@jimaek
Copy link
Member

jimaek commented Aug 3, 2019

The change was deployed. Could you please try again?

@bortzmeyer
Copy link

Works for me, thanks.

@kud
Copy link
Author

kud commented Aug 3, 2019

That's great how it was fixed so quickly. Thank you @bortzmeyer for the report and thank you @jimaek for the fix. :)

@jimaek
Copy link
Member

jimaek commented Aug 3, 2019

Great! Thanks for reporting this, if you have any other feedback please let me know.

@jimaek jimaek closed this as completed Aug 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants