Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-resolved sometimes does not produce IPv4 addresses with Cloudflare DNS #17745

Open
Lastique opened this issue Nov 27, 2020 · 23 comments
Open
Labels

Comments

@Lastique
Copy link

Lastique commented Nov 27, 2020

Sometimes (often immediately after boot), systemd-resolved does not produce IPv4 addresses for www.youtube.com, only IPv6 addresses. This happens when systemd-resolved is configured to use Cloudflare DNS (1.1.1.1, 1.0.0.1) in the config file and no link-specific DNS is set. As a result, applications (e.g. ping, traceroute) cannot connect with error "Network is unreachable" because IPv6 is not operational on this machine.

The problem is not stable, as after some delay systemd-resolved may start returning IPv4 addresses along with IPv6. But at other points it may stop returning IPv4 addresses again. I did not notice this problem with hosts other than www.youtube.com, although obviously I cannot know if other problematic hosts exist.

Here is a console log of one of such occurrence:

$ ping www.youtube.com
ping: connect: Network is unreachable

$ resolvectl query www.youtube.com
www.youtube.com: 2a00:1450:4010:c08::5b        -- link: wlp4s0
                 2a00:1450:4010:c08::88        -- link: wlp4s0
                 2a00:1450:4010:c08::be        -- link: wlp4s0
                 2a00:1450:4010:c08::5d        -- link: wlp4s0
                 (youtube-ui.l.google.com)

-- Information acquired via protocol DNS in 1.0ms.
-- Data is authenticated: no

$ dig www.youtube.com

; <<>> DiG 9.16.6-Ubuntu <<>> www.youtube.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28671
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;www.youtube.com.               IN      A

;; ANSWER SECTION:
www.youtube.com.        7178    IN      CNAME   youtube-ui.l.google.com.
youtube-ui.l.google.com. 23     IN      CNAME   wide-youtube.l.google.com.
wide-youtube.l.google.com. 23   IN      A       64.233.165.198

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Fri Nov 27 03:09:48 MSK 2020
;; MSG SIZE  rcvd: 121

$ dig +all @1.1.1.1 -q www.youtube.com

; <<>> DiG 9.16.6-Ubuntu <<>> +all @1.1.1.1 -q www.youtube.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54530
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;www.youtube.com.               IN      A

;; ANSWER SECTION:
www.youtube.com.        86229   IN      CNAME   youtube-ui.l.google.com.
youtube-ui.l.google.com. 129    IN      CNAME   wide-youtube.l.google.com.
wide-youtube.l.google.com. 129  IN      A       64.233.162.198

;; Query time: 15 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Fri Nov 27 03:22:37 MSK 2020
;; MSG SIZE  rcvd: 121

$ resolvectl query www.youtube.com
www.youtube.com: 2a00:1450:4010:c08::5b        -- link: wlp4s0
                 2a00:1450:4010:c08::88        -- link: wlp4s0
                 2a00:1450:4010:c08::be        -- link: wlp4s0
                 2a00:1450:4010:c08::5d        -- link: wlp4s0
                 (youtube-ui.l.google.com)

-- Information acquired via protocol DNS in 1.1ms.
-- Data is authenticated: no

The following is the resolve output when IPv4 addresses are returned:

$ resolvectl query www.youtube.com
www.youtube.com: 2a00:1450:4010:c08::5d        -- link: wlp4s0
                 2a00:1450:4010:c08::88        -- link: wlp4s0
                 2a00:1450:4010:c08::be        -- link: wlp4s0
                 2a00:1450:4010:c08::5b        -- link: wlp4s0
                 173.194.221.190               -- link: wlp4s0
                 173.194.73.91                 -- link: wlp4s0
                 64.233.164.93                 -- link: wlp4s0
                 173.194.73.190                -- link: wlp4s0
                 209.85.233.190                -- link: wlp4s0
                 64.233.165.93                 -- link: wlp4s0
                 64.233.165.190                -- link: wlp4s0
                 64.233.165.91                 -- link: wlp4s0
                 173.194.222.93                -- link: wlp4s0
                 64.233.164.190                -- link: wlp4s0
                 64.233.165.136                -- link: wlp4s0
                 64.233.164.136                -- link: wlp4s0
                 209.85.233.91                 -- link: wlp4s0
                 173.194.221.93                -- link: wlp4s0
                 64.233.162.136                -- link: wlp4s0
                 64.233.162.190                -- link: wlp4s0
                 (youtube-ui.l.google.com)

-- Information acquired via protocol DNS in 15.3ms.
-- Data is authenticated: no

resolvectl status output:

$ resolvectl 
Global
       LLMNR setting: no     
MulticastDNS setting: no     
  DNSOverTLS setting: no     
      DNSSEC setting: no     
    DNSSEC supported: no     
  Current DNS Server: 1.1.1.1
         DNS Servers: 1.1.1.1
Fallback DNS Servers: 1.0.0.1

Link 2 (eth0)
      Current Scopes: none
DefaultRoute setting: no  
       LLMNR setting: yes 
MulticastDNS setting: no  
  DNSOverTLS setting: no  
      DNSSEC setting: no  
    DNSSEC supported: no  

Link 3 (wlp4s0)
      Current Scopes: none
DefaultRoute setting: no  
       LLMNR setting: yes 
MulticastDNS setting: no  
  DNSOverTLS setting: no  
      DNSSEC setting: no  
    DNSSEC supported: no  

There are no errors in the journal, except:

Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.

Although the above messages don't seem to correlate with the problem occurrence.

I do not see this behavior e.g. with Google DNS. Although it doesn't return as many addresses, it always produces IPv4 and IPv6.

systemd version the issue has been seen with

246

Used distribution

Kubuntu 20.10

Linux kernel version used (uname -a)

5.8.0-29-lowlatency

CPU architecture issue was seen on

x86-64

Expected behaviour you didn't see

systemd-resolved should produce IPv4 addresses.

Unexpected behaviour you saw

systemd-resolved sometimes does not produce IPv4 addresses.

Steps to reproduce the problem

Set systemd-resolved config:

[Resolve]
DNS=1.1.1.1
FallbackDNS=1.0.0.1

In NetworkManager, configure the network connection to not have DNS servers. E.g. set configuration mode to Automatic (only address) and leave DNS address empty.

Reboot.

After booting, immediately issue ping www.youtube.com and resolvectl query www.youtube.com. Usually, this will preproduce the problem, but it may not work every time. In a dozen seconds the issue may seem to resolve (i.e. IPv4 addresses would appear), but the addresses may disappear again at a later time. I cannot tell when the addresses disappear or what causes that.

@poettering
Copy link
Member

Can you turn on debug logging in resolved please, and reproduce the issue and paste the generated log output somewhere? For enabling debug logging do "systemctl edit systemd-resolved", then type:

[Service]
Environment=SYSTEMD_LOG_LEVEL=debug

then issue systemctl restart systemd-resolved, to retrigger it. Then issue a lookup where the issue happens

@poettering poettering added resolve RFE 🎁 Request for Enhancement, i.e. a feature request and removed RFE 🎁 Request for Enhancement, i.e. a feature request labels Dec 2, 2020
@Lastique
Copy link
Author

Lastique commented Dec 2, 2020

I tested it for about half an hour, and I cannot reproduce this now. resolved always returns IPv4 addresses. However, it sometimes doesn't return IPv6 addresses now. I'm attaching the log from one such case.

resolved-no-ipv6.txt

Lack of IPv6 addresses is not a problem for me, but maybe this log will be helpful still.

I will keep using Cloudflare DNS for now, in case the problem reproduces.

@Lastique
Copy link
Author

Lastique commented Dec 2, 2020

I managed to reproduce the problem again, here are the logs:

ping www.youtube.com: resolved-ping-no-ipv4.txt

resolvectl query www.youtube.com: resolved-query-no-ipv4.txt

In order to trigger the issue I had to browse different web sites for about 15 minutes before trying to resolve www.youtube.com.

@Lastique
Copy link
Author

Lastique commented Dec 7, 2020

I think, I've also observed this problem with Google DNS (8.8.8.8, 8.8.4.4) at least once: ru.archive.ubuntu.com resolved to an IPv6 address, which also resulted in a "Network is unreachable" error. Normally, this host name resolves to an IPv4 address. I can also see resolvectl returning only IPv4 address for this host and no IPv6 sometimes. I did not have a chance to debug this, and it didn't happen after that. However, it looks like this problem may not be specific to Cloudflare DNS.

PS: In this case, Google DNS was configured as DoT. With Cloudflare DNS the issue reproduces both in basic DNS and DoT modes the same way, so DoT is not a factor.

@Lastique
Copy link
Author

Lastique commented Dec 7, 2020

I should add that this started happening after upgrade from Kubuntu 20.04 (systemd 245.4) to 20.10 (systemd 246.6).

@poettering
Copy link
Member

Is this still reproducible? All logs you attached actually show successful resolutions, i.e. nothing in those logs suggests any look-up failed

@Lastique
Copy link
Author

Is this still reproducible?

Currently, I have Google DNS configured, and with it the issue reproduces rarely (but still does occasionally). Nothing has changed on my end, no systemd package updates.

All logs you attached actually show successful resolutions, i.e. nothing in those logs suggests any look-up failed

How can the resolution fail in the calling application then?

@samm-git
Copy link

@poettering we have a case looking very similar - flatcar/Flatcar#374

@poettering
Copy link
Member

@samm-git not a useful bug report for us, doesn't even mention what resolved version that is...

@samm-git
Copy link

@poettering i reported it here - #19118. Please let me know if anything else needed.

@keszybz
Copy link
Member

keszybz commented Mar 26, 2021

OK, I think this is just use-after-free.

#19118 (comment)

Mar 26 06:42:05 ip-172-31-63-37.ec2.internal systemd-resolved[788]: Transaction 59199 for <collector.sysdigcloud.com IN A> on scope dns on eth0/* now complete with <success> from network (unsigned).
Mar 26 06:42:05 ip-172-31-63-37.ec2.internal systemd-resolved[788]: Freeing transaction 38985.
Mar 26 06:42:05 ip-172-31-63-37.ec2.internal systemd-resolved[788]: varlink-12: Sending message: {"error":"io.systemd.System","parameters":{"errno":22}}

#17384 (comment)

Okt 19 07:32:20 nb1 systemd-resolved[13845]: Transaction 19260 for <www.heise.de IN A> on scope dns on wlp0s20f3/* now complete with <success> from network (unsigned).
Okt 19 07:32:20 nb1 systemd-resolved[13845]: Freeing transaction 37307.
Okt 19 07:32:20 nb1 systemd-resolved[13845]: Sent message type=error sender=n/a destination=:1.285 path=n/a interface=n/a member=n/a cookie=22 reply_cookie=2 signature=s error-name=org.freedesktop.DBus.Error.InvalidArgs error-message=Lookup failed due to system error: Invalid argument
Okt 19 07:32:20 nb1 systemd-resolved[13845]: Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=RemoveMatch cookie=23 reply_cookie=0 signature=s error-name=n/a error-message=n/a

#9690 (comment)

Aug 11 16:35:02 myhostname systemd-resolved[117723]: Transaction 51585 for <target.subd.sub.d.tld IN AAAA> on scope dns on enp0s31f6/* now complete with <success> from network (unsigned).
Aug 11 16:35:02 myhostname systemd-resolved[117723]: Freeing transaction 56429.
Aug 11 16:35:02 myhostname systemd-resolved[117723]: Sent message type=error sender=n/a destination=:1.2716 path=n/a interface=n/a member=n/a cookie=205 reply_cookie=2 signature=s error-name=org.freedesktop.DBus.Error.InvalidArgs error-message=Lookup failed due to system error: Invalid argument
Aug 11 16:35:02 myhostname systemd-resolved[117723]: Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=RemoveMatch cookie=206 reply_cookie=0 signature=s error-name=n/a error-message=n/a

Looking at the code, in bus_method_resolve_hostname_complete() we hit the check q->state != DNS_TRANSACTION_SUCCESS and reply_query_state() generates the error. But right before that, in dns_transaction_complete, we set state=DNS_TRANSACTION_SUCCESS, generating the message "…now complete with ".

Despite changes in the local transport protocol, the scenario seems the same.

The big question now is: is this reproducible with a) main branch, b) v247.6, c) v246.13.
In #19118 (comment) @samm-git says that the issue reproduces in b). So it would seem that the bug is still there.

@samm-git
Copy link

@keszybz thank you for looking into that. I will try to get master running to validate. I think that systemd-resolved replacement should be just enough, no need to change everything

@keszybz
Copy link
Member

keszybz commented Mar 28, 2021

@keszybz thank you for looking into that. I will try to get master running to validate. I think that systemd-resolved replacement should be just enough, no need to change everything

It's fine to run systemd-resolved from the build directory.

@samm-git
Copy link

@keszybz i found that master is broken on build with some python-related errors. Problem with flatcar is that this is immutable OS, so its not that easy to hack the things. May be i will try to reproduce that with some more traditional (fedora, debian?) distro so it would be easier to hack the things and play with builds.

@keszybz
Copy link
Member

keszybz commented Mar 29, 2021

@amaltsev
Copy link

amaltsev commented Jan 20, 2022

I still see this problem with the current Fedora 35 / systemd 249 (v249.9-1.fc35). Same diagnostics in the log and same observed behavior, occasional failures in resolution from apps point of view.

Jan 20 11:02:22 xxx systemd-resolved[1314]: Regular transaction 25998 for <pics.paypal.com IN A> on scope dns on eno1/* now complete with <success> from network (unsigned; non-confidential).
Jan 20 11:02:22 xxx systemd-resolved[1314]: Received dns UDP packet of size 201, ifindex=2, ttl=0, fragsize=0
Jan 20 11:02:22 xxx systemd-resolved[1314]: Freeing transaction 25998.
Jan 20 11:02:22 xxx systemd-resolved[1314]: varlink-20: Sending message: {"error":"io.systemd.System","parameters":{"errno":22}}

This request to pics.paypal.com actually failed from the browser point of view.

@bionade24
Copy link

I permanently do have this issue with systemd 250.3 on Arch Linux. Interestingly, it even occurs when I have my Router set as DNS server, as it runs dnsmasq querying 1.1.1.1

Why does this issue then only occur on machines with resolved?

@yuwata
Copy link
Member

yuwata commented Jan 29, 2022

Not sure, but already fixed by #22132? Could you test the current git HEAD?

@Lastique
Copy link
Author

The problem was reproducible without DoT too.

@samm-git
Copy link

@yuwata i was not using dot, so unlikely

@PilzAdam
Copy link

PilzAdam commented Feb 5, 2022

@keszybz

The big question now is: is this reproducible with a) main branch, b) v247.6, c) v246.13.

I can reproduce it in current Arch Linux (systemd 250 (250.3-2-arch)).

OK, I think this is just use-after-free.

Valgrind would report this then, wouldn't it? In #19118 (comment) I reproduced the bug while running systemd-resolved in valgrind and get no error messages from valgrind.

Is there anything I can do to hunt down this bug? I can semi-reliably reproduce it (takes a few tries each time).

@bionade24
Copy link

For anyone wanting to work on this, here a tip to narrow it down: Even the DNS queries are proxied through dnsmasq, the bug still appears. So DNSmasq relays the problematic answers.

@cstamas
Copy link

cstamas commented Feb 12, 2024

Still happening with Fedora 39 WS:
https://gist.github.com/cstamas/c241ab5303aa05522aac2ff757cb27ce
https://gist.github.com/cstamas/53c85e2d9d8b94eaaab8aae0ded33bbb

Host in question is repo.maven.apache.org

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

9 participants