Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbound: external IPv6 DNS Servers are sometimes passed to client but should not #2822

Closed
borisneubert opened this issue Oct 18, 2018 · 19 comments
Assignees
Labels
bug Production bug
Milestone

Comments

@borisneubert
Copy link

this is the relevant configuration of opnsense box:

OPNsense 18.7.4-amd64
DHCPv6 is off
DNSmasq DNS is off
Unbound DNS is on, in Forwarding Mode, local zone type= transparent
Settings | General | DNS servers: set to Google IPv4 and IPv6 webservers

My Ubuntu 18.04 workstation is set to static IPv4. IPv6 is autoconfigured (opnsense box internal network "home" tracks WAN).

During the last months I have seen the following behavior on my workstation:

  • In normal operation, the IPv4 and IPv6 addresses of the opnsense box are passed as DNS servers to the workstation.
  • From time to time, the IPv6 addresses of the Google DNS servers are passed to the workstation as well, thus bypassing the resolution of the hostnames in the internal network configured in Unbound DNS Server overrides. This happens spontaneously, i.e. the workstation boots up in the desired state and spontaneously transitions into the undesired state.

When the issue is present, systemd-resolve --status on the workstation gives (shortened):

Global
          DNS Domain: home.mydomain.de
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      ...
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

Link 2 (enp0s25)
      Current Scopes: DNS
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
         DNS Servers: 192.168.31.1         <---- opnsense box
                      2001:4860:4860::8888   <----  Google DNS server
                      2001:4860:4860::8844  <----  Google DNS server
                      2a02:...:fe5d:4ca1     <---- opnsense box
          DNS Domain: home.mydomain.de

Here is a log extract from when this issue occured: unbound has stopped working at 12:35:28 (last log entry) for no reason and does not come back to live. system.log shows the following:

Sep 15 12:35:23 opnsense syslogd: sendto: Network is down
Sep 15 12:35:23 opnsense kernel: igb1: Watchdog timeout -- resetting
Sep 15 12:35:23 opnsense syslogd: sendto: Network is down
Sep 15 12:35:23 opnsense kernel: igb1: Queue(218489344) tdh = -1, hw tdt = -1
Sep 15 12:35:23 opnsense syslogd: sendto: Network is down
Sep 15 12:35:23 opnsense kernel: igb1: TX(218489344) desc avail = 0,Next TX to Clean = 0
Sep 15 12:35:23 opnsense syslogd: sendto: Network is down
Sep 15 12:35:23 opnsense kernel: igb1: link state changed to DOWN
Sep 15 12:35:23 opnsense syslogd: sendto: Network is down
Sep 15 12:35:23 opnsense syslogd: sendto: Network is down
Sep 15 12:35:23 opnsense syslogd: sendto: Network is down
Sep 15 12:35:23 opnsense syslogd: sendto: Network is down
Sep 15 12:35:23 opnsense opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet detached event for lan
Sep 15 12:35:28 opnsense kernel: igb1: link state changed to UP
Sep 15 12:35:28 opnsense opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet attached event for lan
Sep 15 12:35:28 opnsense opnsense: /usr/local/etc/rc.linkup: HOTPLUG: Configuring interface lan
Sep 15 12:35:28 opnsense opnsense: /usr/local/etc/rc.linkup: ROUTING: entering configure using 'lan'
Sep 15 12:35:28 opnsense opnsense: /usr/local/etc/rc.linkup: ROUTING: IPv6 default gateway set to wan
Sep 15 12:35:28 opnsense opnsense: /usr/local/etc/rc.linkup: ROUTING: IPv4 default gateway set to wan
Sep 15 12:35:28 opnsense opnsense: /usr/local/etc/rc.linkup: ROUTING: skipping IPv4 default route
Sep 15 12:35:28 opnsense opnsense: /usr/local/etc/rc.linkup: ROUTING: skipping IPv6 default route
Sep 15 12:35:28 opnsense opnsense: /usr/local/etc/rc.newwanipv6: IP renewal is starting on 'igb0'
Sep 15 12:35:28 opnsense opnsense: /usr/local/etc/rc.newwanipv6: On (IP address: ) (interface: wan[wan]) (real interface: igb0).
Sep 15 12:35:28 opnsense opnsense: /usr/local/etc/rc.newwanipv6: Failed to detect IP for wan[wan]
Sep 15 12:35:29 opnsense opnsense: /usr/local/etc/rc.newwanipv6: IP renewal is starting on 'igb0'
Sep 15 12:35:29 opnsense opnsense: /usr/local/etc/rc.newwanipv6: On (IP address: 2a02:908:2543:9fe0:225:90ff:fe5d:4ca0) (interface: wan[wan]) (real interface: igb0).
Sep 15 12:35:30 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: entering configure using 'wan'
Sep 15 12:35:30 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: IPv6 default gateway set to wan
Sep 15 12:35:30 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: IPv4 default gateway set to wan
Sep 15 12:35:30 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: setting IPv4 default route to 192.168.178.1
Sep 15 12:35:30 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway '192.168.178.1'
Sep 15 12:35:30 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: setting IPv6 default route to fe80::ca0e:14ff:fec8:d750
Sep 15 12:35:30 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway 'fe80::ca0e:14ff:fec8:d750%igb0'
Sep 15 12:35:30 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway '192.168.178.1'
Sep 15 12:35:30 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway 'fe80::ca0e:14ff:fec8:d750%igb0'
Sep 15 12:35:31 opnsense opnsense: /usr/local/etc/rc.newwanipv6: Resyncing OpenVPN instances for interface wan.
Sep 15 12:35:31 opnsense kernel: ovpnc1: link state changed to DOWN
Sep 15 12:35:31 opnsense opnsense: /usr/local/etc/rc.filter_configure: ROUTING: keeping current default gateway '192.168.178.1'
Sep 15 12:35:31 opnsense opnsense: /usr/local/etc/rc.filter_configure: ROUTING: keeping current default gateway 'fe80::ca0e:14ff:fec8:d750%igb0'
Sep 15 12:35:33 opnsense opnsense: /usr/local/etc/rc.filter_configure: ROUTING: keeping current default gateway '192.168.178.1'
Sep 15 12:35:33 opnsense opnsense: /usr/local/etc/rc.filter_configure: ROUTING: keeping current default gateway 'fe80::ca0e:14ff:fec8:d750%igb0'
Sep 15 12:35:34 opnsense opnsense: /usr/local/etc/rc.linkup: The command '/usr/local/sbin/unbound -c '/var/unbound/unbound.conf'' returned exit code '1', the output was '[1537007734] unbound[54994:0] error: can't bind socket: Can't assign requested address for 192.168.38.254 [1537007734] unbound[54994:0] fatal error: could not open ports'
Sep 15 12:35:35 opnsense opnsense: /usr/local/etc/rc.newwanipv6: IP renewal is starting on 'igb0'
Sep 15 12:35:35 opnsense opnsense: /usr/local/etc/rc.newwanipv6: On (IP address: 2a02:908:2543:9fe0:225:90ff:fe5d:4ca0) (interface: wan[wan]) (real interface: igb0).
Sep 15 12:35:36 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: entering configure using 'wan'
Sep 15 12:35:36 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: IPv6 default gateway set to wan
Sep 15 12:35:36 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: IPv4 default gateway set to wan
Sep 15 12:35:36 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: setting IPv4 default route to 192.168.178.1
Sep 15 12:35:36 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway '192.168.178.1'
Sep 15 12:35:36 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: setting IPv6 default route to fe80::ca0e:14ff:fec8:d750
Sep 15 12:35:36 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway 'fe80::ca0e:14ff:fec8:d750%igb0'
Sep 15 12:35:36 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway '192.168.178.1'
Sep 15 12:35:36 opnsense opnsense: /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway 'fe80::ca0e:14ff:fec8:d750%igb0'
Sep 15 12:35:37 opnsense opnsense: /usr/local/etc/rc.newwanipv6: Resyncing OpenVPN instances for interface wan.
Sep 15 12:35:43 opnsense opnsense: /usr/local/etc/rc.filter_configure: ROUTING: keeping current default gateway '192.168.178.1'
Sep 15 12:35:43 opnsense opnsense: /usr/local/etc/rc.filter_configure: ROUTING: keeping current default gateway 'fe80::ca0e:14ff:fec8:d750%igb0'
Sep 15 12:35:54 opnsense kernel: ovpnc1: link state changed to UP
Sep 15 12:35:55 opnsense opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'ovpnc1'
Sep 15 12:35:55 opnsense opnsense: /usr/local/etc/rc.newwanip: Interface '' is disabled or empty, nothing to do.

I suspect that this behavior occurs when the external IP address of the opnsense box changes (DSLight Unitymedia cable connection).

Restarting unbound seems to solve the issue. But sometimes it is also necessary to force the workstation to clear the list of nameservers and request them again from the OPNsense box.

@fichtner fichtner added the support Community support label Oct 20, 2018
@fichtner
Copy link
Member

Under system: settings: general there was a bug with the the dns checkbox options. I suspect you see the same and simply saving the page again would fix it. It was in 18.7.4.

If not the issue is a bit weird, it should never publish external servers, no matter if Unbound is running or not since it only checks the enabled flag. But it could be a side effect of IPv6 not having fully reloaded.

@fichtner
Copy link
Member

Found it...

https://github.com/opnsense/core/blob/master/src/etc/inc/services.inc#L338

Dhcpv6 probably has a similar defect.

The question is: should we avoid pushing servers in this case altogether?

@fichtner fichtner self-assigned this Oct 20, 2018
@fichtner fichtner added bug Production bug and removed support Community support labels Oct 20, 2018
@fichtner fichtner added this to the 19.1 milestone Oct 20, 2018
@fichtner
Copy link
Member

So, we need to do a few things it seems:

  1. do not push DNS servers to clients when no IPv6 has been found
  2. avoid starting RADVD / DHCPv6 when no IPv6 has been found
  3. log if 1. happens when 2. does not work
  4. reload the services when the actual IPv6 is available.

@fichtner
Copy link
Member

While fc3ec19 addresses the issue, there's no easy way to test this on 18.7.5 due to a few (potentially) conflicting changes, but I'll provide a clean backport later... :)

@borisneubert
Copy link
Author

https://github.com/opnsense/core/blob/master/src/etc/inc/services.inc#L338

So when configuring radvd for a certain interface, it checks whether a DNS server is enabled (Unbound or DNSMasq) for the interface. If so, the IP address of the interface is pushed, else the list of external DNS servers. All for IPv6 only, though.

The question is: should we avoid pushing servers in this case altogether?

In general, this sound the reasonable thing to do. But why does it follow the else branch?

In my case, Unbound is enabled and DNSMasq is not. Does enabled mean running or configured to run? In my case it could also be that Unbound stops from time to time (crash?). Still unclear why the program flow enters services_radvd_configure() at all from time to time.

Another observation (may be related or not): radvd spits out

Oct 21 17:31:14 opnsense radvd[25435]: sendmsg: Permission denied

about 5 to 10 times per minute.

fichtner added a commit that referenced this issue Oct 21, 2018
@fichtner
Copy link
Member

I'm not sure what you are saying. The code line does what you described as faulty, but also explains it completely: when no IPv6 has been found, it skips to the configured servers -- likely not to break DNS resolution for clients.

The issues are that:

a) we shouldn't serve clients if we don't have an IP address yet
b) it forgets to reload the service later to be able to push the correct IP address

The code is wrong in assuming it should continue to the next else.

@borisneubert
Copy link
Author

Sorry, your comments and my comment crossed each other.

I understand what you explained. No further comment.

@fichtner
Copy link
Member

Ah ok, sorry for the overlap :)

@fichtner
Copy link
Member

Can you try this patch 749f7f7 for 18.7.5?

# opnsense-patch 749f7f7

@borisneubert
Copy link
Author

Patch applied. Will not be able to report back before next week, though.

@fichtner
Copy link
Member

No problem. Thanks in advance.

@borisneubert
Copy link
Author

The issue did not appear during the last two days. I am tempted to say that the problem is solved. I suggest to close the issue. I will reopen if the issue reappears.

@fichtner
Copy link
Member

@borisneubert thanks for the feedback. I have a minor issue at one office deployment during reboot, but I'll look at it next week and leave this open for now.

@borisneubert
Copy link
Author

Today, the issue occurred again.

I had to reboot the FritzBox that conveys the internet connection. After the FritzBox came back and provided a new IPv6 prefix to the OPNSense box, the latter propagated the IPv6 addresses of the Google webservers again to my workstation.

@fichtner
Copy link
Member

fichtner commented Nov 6, 2018

A number of changes on master now, test patch was a silly early proposal. Circled back to preventing DHCPv6 start when the IPv6 is not there, which makes the code clearer. Still worried about RADVD leaking the servers too, little harder to pull it off there. To be continued...

@fichtner
Copy link
Member

fichtner commented Nov 7, 2018

I stand corrected. There's a commit in tomorrow's 18.7.7 that is more to the point of your described issue than the previous test commit mentioned:

86be9ca#diff-23f22aca2e953811c28d5b034d367737R1251

What it may not do is reload, so while the servers are not leaked, it will give out no servers whatsoever. As said initially, the issue is tricky and I still don't understand the code past that introduced this issue back in 2012 for no apparent reason.

@fichtner
Copy link
Member

@borisneubert how's 18.7.7 treating you then? :)

@borisneubert
Copy link
Author

Sweet.

No leaking of the OPNsense box's DNS servers in everyday work so far. Right now, tried to provoke the issue by restarting the upstream cable modem (Fritzbox) and observing the output of systemd-status --resolve on my Ubuntu workstation every five seconds: neither did I see other DNS server's IP addresses than those of the OPNsense box.

To complete the test, I restarted the cable modem again and watched the output of radvdump: when the OPNsense box lost the IPv6 connectivity, it stopped announcing prefix and DNS server. After some time, after the Fritzbox had come up again and received a new IPv6 prefix, it continued to send the new IPv6 address of the OPNsense box as the DNS server. No leaking of the OPNsense box's DNS servers either.

Looks good to me.

@fichtner
Copy link
Member

Perfect, will close then. Thanks for the report and testing! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests

2 participants