Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying to resolve www.netflix.com #144

Closed
omgold opened this issue Jan 23, 2023 · 17 comments
Closed

Error when trying to resolve www.netflix.com #144

omgold opened this issue Jan 23, 2023 · 17 comments

Comments

@omgold
Copy link

omgold commented Jan 23, 2023

Since some days ago, resolving www.netflix.com through dns-over-https fails for some unknown reason. It doesn't seem to be an upstream problem and all other domains I try work as expected.

I'm running version 2.3.2 on Arch Linux.

Upstream is configured like this:

bootstrap = [
    # CloudFlare's resolver, bad ECS, good DNSSEC
    "1.1.1.1:53",
    "1.0.0.1:53",
]

When using the host command, I get this:

> host www.netflix.com
;; Got bad packet: unexpected end of input
512 bytes
00 1a 83 80 00 01 00 06 00 00 00 01 03 77 77 77          .............www
07 6e 65 74 66 6c 69 78 03 63 6f 6d 00 00 01 00          .netflix.com....
01 03 77 77 77 07 6e 65 74 66 6c 69 78 03 63 6f          ..www.netflix.co
6d 00 00 05 00 01 00 00 01 2b 00 18 03 77 77 77          m........+...www
06 64 72 61 64 69 73 07 6e 65 74 66 6c 69 78 03          .dradis.netflix.
63 6f 6d 00 03 77 77 77 06 64 72 61 64 69 73 07          com..www.dradis.
6e 65 74 66 6c 69 78 03 63 6f 6d 00 00 05 00 01          netflix.com.....
00 00 00 3b 00 2b 03 77 77 77 09 65 75 2d 77 65          ...;.+.www.eu-we
73 74 2d 31 08 69 6e 74 65 72 6e 61 6c 06 64 72          st-1.internal.dr
61 64 69 73 07 6e 65 74 66 6c 69 78 03 63 6f 6d          adis.netflix.com
00 03 77 77 77 09 65 75 2d 77 65 73 74 2d 31 08          ..www.eu-west-1.
69 6e 74 65 72 6e 61 6c 06 64 72 61 64 69 73 07          internal.dradis.
6e 65 74 66 6c 69 78 03 63 6f 6d 00 00 05 00 01          netflix.com.....
00 00 00 3b 00 4a 2c 61 70 69 70 72 6f 78 79 2d          ...;.J,apiproxy-
77 65 62 73 69 74 65 2d 6e 6c 62 2d 70 72 6f 64          website-nlb-prod
2d 33 2d 61 63 31 31 30 66 36 61 65 34 37 32 62          -3-ac110f6ae472b
38 35 61 03 65 6c 62 09 65 75 2d 77 65 73 74 2d          85a.elb.eu-west-
31 09 61 6d 61 7a 6f 6e 61 77 73 03 63 6f 6d 00          1.amazonaws.com.
2c 61 70 69 70 72 6f 78 79 2d 77 65 62 73 69 74          ,apiproxy-websit
65 2d 6e 6c 62 2d 70 72 6f 64 2d 33 2d 61 63 31          e-nlb-prod-3-ac1
31 30 66 36 61 65 34 37 32 62 38 35 61 03 65 6c          10f6ae472b85a.el
62 09 65 75 2d 77 65 73 74 2d 31 09 61 6d 61 7a          b.eu-west-1.amaz
6f 6e 61 77 73 03 63 6f 6d 00 00 01 00 01 00 00          onaws.com.......
00 3b 00 04 36 4a 49 1f 2c 61 70 69 70 72 6f 78          .;..6JI.,apiprox
79 2d 77 65 62 73 69 74 65 2d 6e 6c 62 2d 70 72          y-website-nlb-pr
6f 64 2d 33 2d 61 63 31 31 30 66 36 61 65 34 37          od-3-ac110f6ae47
32 62 38 35 61 03 65 6c 62 09 65 75 2d 77 65 73          2b85a.elb.eu-wes
74 2d 31 09 61 6d 61 7a 6f 6e 61 77 73 03 63 6f          t-1.amazonaws.co
6d 00 00 01 00 01 00 00 00 3b 00 04 03 fb 32 95          m........;....2.
2c 61 70 69 70 72 6f 78 79 2d 77 65 62 73 69 74          ,apiproxy-websit
65 2d 6e 6c 62 2d 70 72 6f 64 2d 33 2d 61 63 31          e-nlb-prod-3-ac1
31 30 66 36 61 65 34 37 32 62 38 35 61 03 65 6c          10f6ae472b85a.el

But everything seems fine when asking upstream directly:

> host www.netflix.com 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases: 

www.netflix.com is an alias for www.dradis.netflix.com.
www.dradis.netflix.com is an alias for www.eu-west-1.internal.dradis.netflix.com.
www.eu-west-1.internal.dradis.netflix.com is an alias for apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com.
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has address 18.200.8.190
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has address 54.155.246.232
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has address 54.73.148.110
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has IPv6 address 2a05:d018:76c:b683:e1fe:9fbf:c403:57f1
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has IPv6 address 2a05:d018:76c:b684:b233:ac1f:be1f:7
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has IPv6 address 2a05:d018:76c:b685:c898:aa3a:42c7:9d21

The only unusual thing I see about Netflix is the rather long list of results. Could imagine there is a limit on message size in dns-over-https which is exceeded because of that.

@m13253
Copy link
Owner

m13253 commented Jan 23, 2023

I remember I have written the logic to detect long packets or truncated packets. Maybe it didn't work for some reason…

@maxbraeutigam
Copy link

maxbraeutigam commented Jan 24, 2023

Hi @m13253 – I can confirm the bug in ArchLinux version community/dns-over-https 2.3.2-1

> host asana-user-private-us-east-1.s3.amazonaws.com
Host asana-user-private-us-east-1.s3.amazonaws.com not found: 2(SERVFAIL)
> host asana-user-private-us-east-1.s3.amazonaws.com 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases: 

asana-user-private-us-east-1.s3.amazonaws.com is an alias for s3-1-w.amazonaws.com.
s3-1-w.amazonaws.com is an alias for s3-w.us-east-1.amazonaws.com.
s3-w.us-east-1.amazonaws.com has address 3.5.21.101
s3-w.us-east-1.amazonaws.com has address 52.216.110.35
s3-w.us-east-1.amazonaws.com has address 3.5.11.228
s3-w.us-east-1.amazonaws.com has address 52.217.133.193
s3-w.us-east-1.amazonaws.com has address 52.217.84.52
s3-w.us-east-1.amazonaws.com has address 52.217.134.57
s3-w.us-east-1.amazonaws.com has address 54.231.197.217
s3-w.us-east-1.amazonaws.com has address 3.5.10.23

@maxbraeutigam
Copy link

Same error on latest commit 70fc857

@m13253
Copy link
Owner

m13253 commented Jan 24, 2023

Thanks for the reports. Will spend some time investigating it.

@satishweb
Copy link
Collaborator

Do we need a new release for this bug fix?

@GreenYun
Copy link

I used dig with dns-over-https and it returned the answers correctly, while host complained for bad packets (same as the issue).

I think these were something happening that resulted UDP packet chunked with the connection between host and doh-client.

@m13253
Copy link
Owner

m13253 commented Jan 25, 2023

Do we need a new release for this bug fix?

If it get fixed, definitely we need to bump the version number.
I am currently trying to reproduce this bug along with @GreenYun.

I guess the problem is that, host doesn’t support large UDP packets, while doh-client mistakenly thought host does support it (probably due to a mistake in parsing EDNS data).
Meanwhile, most modern DNS resolvers does support large UDP packets. For host, you can use -T to temporarily force TCP.

@m13253
Copy link
Owner

m13253 commented Jan 25, 2023

I think this logic is correct… Not sure why it doesn’t work.

if !isTCP && len(buf) > int(req.udpSize) {
fullReply.Truncated = true
buf, err = fullReply.Pack()
if err != nil {
log.Printf("re-packing error with upstream %s: %v\n", req.currentUpstream, err)
return
}
buf = buf[:req.udpSize]
}

m13253 added a commit that referenced this issue Jan 25, 2023
This should fix issue #144.
@m13253
Copy link
Owner

m13253 commented Jan 25, 2023

Please test the newer version fdc1b81 and let me know if it fixes the problem.

@GreenYun
Copy link

Do we need a new release for this bug fix?

If it get fixed, definitely we need to bump the version number. I am currently trying to reproduce this bug along with @GreenYun.

I guess the problem is that, host doesn’t support large UDP packets, while doh-client mistakenly thought host does support it (probably due to a mistake in parsing EDNS data). Meanwhile, most modern DNS resolvers does support large UDP packets. For host, you can use -T to temporarily force TCP.

It is correct that host does not send OPT part to announce a maximum UDP packet data size, while dig does. That means host only accepts a 512-byte datagram of the response.

The source code of host shows that it parses the datagram first then check the TC bit for redoing the lookup via TCP. The truncated diagram may lead an error of malform data, as the following shows:

https://github.com/isc-projects/bind9/blob/d8f98cec4857babd9250d2270f6432e100eebf51/bin/dig/dighost.c#L4165-L4176

After the processing, host checks the TC bit and requeue a further lookup:

https://github.com/isc-projects/bind9/blob/d8f98cec4857babd9250d2270f6432e100eebf51/bin/dig/dighost.c#L4272-L4285

@maxbraeutigam
Copy link

Hi @m13253 - At the first glance it looks good. Thank you very much for the quick fix, it’s highly appreciated since I love the doh client. Tomorrow, I am gonna test it at home where I first stumbled across the error, but at work I have an almost identical setup.

Hi @omgold - can you confirm this?

Hi @GreenYun - thanks for checking host. In my case it is not about host itself, I was not able to open Netflix and some more in Firefox or Chromium.

@omgold
Copy link
Author

omgold commented Jan 25, 2023

Yes. for me the fix works also.

@GreenYun
Copy link

Hi @m13253 - At the first glance it looks good. Thank you very much for the quick fix, it’s highly appreciated since I love the doh client. Tomorrow, I am gonna test it at home where I first stumbled across the error, but at work I have an almost identical setup.

Hi @omgold - can you confirm this?

Hi @GreenYun - thanks for checking host. In my case it is not about host itself, I was not able to open Netflix and some more in Firefox or Chromium.

Checking host helped us to know how the TC bit work and found the problems. We have mistaken something before.

@m13253 m13253 closed this as completed Jan 26, 2023
@m13253
Copy link
Owner

m13253 commented Jan 26, 2023

Do we need a new release for this bug fix?

I published the v2.3.3 release to include this fix.
This fix solved a bug so I want to push it to downstream sooner.

@satishweb
Copy link
Collaborator

I will generate new container image tonight

@satishweb
Copy link
Collaborator

v2.3.3 container image released. Local tests passed.

@m13253
Copy link
Owner

m13253 commented Jan 27, 2023

v2.3.3 container image released. Local tests passed.

Big thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants