Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-resolved sends icmp port unreachable to the slowest upstreams #17421

Closed
wdoekes opened this issue Oct 22, 2020 · 0 comments
Closed

systemd-resolved sends icmp port unreachable to the slowest upstreams #17421

wdoekes opened this issue Oct 22, 2020 · 0 comments
Labels

Comments

@wdoekes
Copy link

wdoekes commented Oct 22, 2020

systemd version the issue has been seen with

245.4-4ubuntu3.2

Used distribution

ubuntu/focal

Linux kernel version used (uname -a)

Linux walter-desktop 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

CPU architecture issue was seen on

amd64

Expected behaviour you didn't see

  • systemd-resolved sends queries to nameservers (UDP to port upstream:53)
  • answers are read (UDP from upstream:53)

Unexpected behaviour you saw

  • systemd-resolved sends queries to nameservers (UDP to port upstream:53)
  • answers are ignored (UDP from upstream:53)
  • kernel sends ICMP port unreachable

Actually, I was tcpdumping a different issue when I noticed that icmp port unreachable messages were frequently sent to nameservers. First I thought a firewall was misconfigured somewhere, but it turned out to be caused by systemd-resolved on a ubuntu/focal machine.

Steps to reproduce the problem

Not sure; over here I see systemd-resolved sometimes send a query to one nameserver, and sometimes to several ones. When it sends to several, the Freeing transaction nnn code is run before recvmsg() has been called on the socket -- or rather: close() has been called before the dns response has even arrived.

Example tcpdump

(addresses and names have been rewritten below)

16:27:43.653358 IP 192.168.2.3.52937 > 192.168.1.1.53: 4099+ [1au] A? example.nl. (39)
16:27:43.653788 IP 192.168.2.3.52875 > 8.8.8.8.53: 29616+ [1au] A? example.nl. (39)
16:27:43.654747 IP 192.168.1.1.53 > 192.168.2.3.52937: 4099 1/0/1 A 1.2.3.4 (55)
16:27:43.655448 IP 8.8.8.8.53 > 192.168.2.3.52875: 29616 1/0/1 A 1.2.3.4 (55)
16:27:43.655464 IP 192.168.2.3 > 8.8.8.8: ICMP 192.168.2.3 udp port 52875 unreachable, length 91

That is:

  • one query to 192.168.1.1
  • one query to 8.8.8.8
  • answer from 192.168.1.1
  • answer from 8.8.8.8
  • one additional icmp-port-unreachable to 8.8.8.8

I don't mind two queries and getting the fast results. But I do mind the unnecessary port-unreachable.

Corresponding strace

/* receive request */
16:27:43.652036 recvfrom(12, NULL, 0, MSG_PEEK|MSG_TRUNC, NULL, NULL) = 39     
16:27:43.652181 recvmsg(12, {msg_name={sa_family=AF_INET, sin_port=htons(52259), sin_addr=inet_addr("127.0.0.1")}, 
  msg_namelen=128->16, msg_iov=[{iov_base="\352\231\1\0\0\1\0\0\0\0\0\1\7example\2nl\0\0\1\0\1\0\0)\4\260\0\0\0\0\0\0", iov_len=3928}],
  msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("lo"),
  ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("127.0.0.53")}},
  {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 39

/* send request to upstream 192.168.1.1 */
16:27:43.653068 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 15
16:27:43.653112 setsockopt(15, SOL_IP, IP_UNICAST_IF, [687865856], 4) = 0
16:27:43.653159 setsockopt(15, SOL_IP, IP_RECVERR, [1], 4) = 0
16:27:43.653199 setsockopt(15, SOL_IP, IP_PKTINFO, [1], 4) = 0
16:27:43.653229 connect(15, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.1.1")}, 16) = 0
16:27:43.653256 epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN, {u32=4027372288, u64=93849062789888}}) = 0
16:27:43.653337 write(15, "\20\3\1\0\0\1\0\0\0\0\0\1\7example\2nl\0\0\1\0\1\0\0)\2\0\0\0\0\0\0\0", 39) = 39

/* send request to upstream 8.8.8.8 */
16:27:43.653571 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 16
16:27:43.653598 setsockopt(16, SOL_IP, IP_UNICAST_IF, [50331648], 4) = 0
16:27:43.653624 setsockopt(16, SOL_IP, IP_RECVERR, [1], 4) = 0
16:27:43.653648 setsockopt(16, SOL_IP, IP_PKTINFO, [1], 4) = 0
16:27:43.653672 connect(16, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
16:27:43.653701 epoll_ctl(4, EPOLL_CTL_ADD, 16, {EPOLLIN, {u32=4027370752, u64=93849062788352}}) = 0
16:27:43.653772 write(16, "s\260\1\0\0\1\0\0\0\0\0\1\7example\2nl\0\0\1\0\1\0\0)\2\0\0\0\0\0\0\0", 39) = 39
16:27:43.653912 epoll_wait(4, [{EPOLLIN, {u32=4027372288, u64=93849062789888}}], 19, -1) = 1

/* get response from 192.168.1.1 */
16:27:43.654782 recvfrom(15, NULL, 0, MSG_PEEK|MSG_TRUNC, NULL, NULL) = 55
16:27:43.654821 recvmsg(15, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.1.1")},
  msg_namelen=128->16, msg_iov=[{iov_base="\20\3\201\200\0\1\0\1\0\0\0\1\7example\2nl\0\0\1\0\1\300\f\0\1\0\1\0\0\1\r\0\4\331\25\305\202\0\0)\20\0\0\0\0\0\0\0", iov_len=3928}],
  msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("vpn-iface"),
  ipi_spec_dst=inet_addr("192.168.2.3"), ipi_addr=inet_addr("192.168.2.3")}}], msg_controllen=32, msg_flags=0}, 0) = 55 
16:27:43.655120 epoll_ctl(4, EPOLL_CTL_DEL, 15, NULL) = 0
16:27:43.655146 close(15)               = 0

/* simply kill the socket where responses from 8.8.8.8 are supposed to go */
16:27:43.655219 epoll_ctl(4, EPOLL_CTL_DEL, 16, NULL) = 0
16:27:43.655247 close(16)               = 0

/* send answer */
16:27:43.655331 sendmsg(12, {msg_name={sa_family=AF_INET, sin_port=htons(52259), sin_addr=inet_addr("127.0.0.1")},
  msg_namelen=16, msg_iov=[{iov_base="\352\231\201\200\0\1\0\1\0\0\0\1\7example\2nl\0\0\1\0\1\300\f\0\1\0\1\0\0\1\r\0\4\331\25\305\202\0\0)\377\326\0\0\0\0\0\0", iov_len=55}],
  msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("lo"),
  ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("0.0.0.0")}}], msg_controllen=28, msg_flags=0}, 0) = 55

Corresponding debug logs

16:27:43.652356: Got DNS stub UDP query packet for id 39402
16:27:43.652434: Looking up RR for example.nl IN A.
16:27:43.652869: Cache miss for example.nl IN A
16:27:43.652930: Transaction 784 for <example.nl IN A> scope dns on vpn-iface/*.
16:27:43.652994: Using feature level UDP+EDNS0 for transaction 784.
16:27:43.653046: Using DNS server 192.168.1.1 for transaction 784. 
16:27:43.653314: Sending query packet with id 784.
16:27:43.653433: Cache miss for example.nl IN A
16:27:43.653478: Transaction 45171 for <example.nl IN A> scope dns on eno1/*.
16:27:43.653519: Using feature level UDP+EDNS0 for transaction 45171.
16:27:43.653556: Using DNS server 8.8.8.8 for transaction 45171.
16:27:43.653755: Sending query packet with id 45171.
16:27:43.653837: Processing query...
16:27:43.654904: Processing incoming packet on transaction 784 (rcode=SUCCESS).
16:27:43.655057: Added positive unauthenticated cache entry for example.nl IN A 269s on vpn-iface/INET/192.168.1.1 
16:27:43.655103: Transaction 784 for <example.nl IN A> on scope dns on vpn-iface/* now complete with <success> from network (unsigned). 
16:27:43.655200: Freeing transaction 45171.
16:27:43.655312: Sending response packet with id 39402 on interface 1/AF_INET.
16:27:43.655418: Freeing transaction 784.

systemd-resolved --status

Global
       LLMNR setting: no
MulticastDNS setting: no
  DNSOverTLS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
...

Link 41 (vpn-iface)
      Current Scopes: DNS
...
  Current DNS Server: 192.168.1.1
         DNS Servers: 192.168.1.1
          DNS Domain: irrelevant.net
                      ~168.192.in-addr.arpa
                      ~.
...
Link 3 (eno1)
      Current Scopes: DNS
...
  Current DNS Server: 8.8.8.8
         DNS Servers: 8.8.8.8
          DNS Domain: ~.

Thoughts

So, I'm sorry that I'm not using the latest version, but it should be close enough. I searched for this issue, and I didn't find anything related.

I assume dns_transaction_free is called on the 2nd transaction as soon as the response to the first has arrived and is validated. Is it possible to make that one wait a few seconds for a definitive answer, reading the data, discarding it, and then closing the socket? Right now it appears to cause noise on the internet.

Cheers,
Walter Doekes
OSSO B.V.

poettering added a commit to poettering/systemd that referenced this issue Nov 6, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Nov 6, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Nov 9, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Nov 11, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Nov 11, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Nov 12, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Nov 17, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Nov 18, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Nov 19, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Nov 20, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Dec 2, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Dec 3, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Dec 4, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Dec 7, 2020
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
poettering added a commit to poettering/systemd that referenced this issue Feb 15, 2021
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
rpurdie pushed a commit to rpurdie/systemd that referenced this issue Feb 16, 2021
…them to a socket "graveyard"

The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.

Fixes: systemd#17421
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants