Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbound doesn't know about DHCP lease that expires then appears again #4714

Closed
2 tasks done
cpw opened this issue Feb 17, 2021 · 5 comments · Fixed by #4727
Closed
2 tasks done

Unbound doesn't know about DHCP lease that expires then appears again #4714

cpw opened this issue Feb 17, 2021 · 5 comments · Fixed by #4727
Assignees
Labels
bug Production bug
Milestone

Comments

@cpw
Copy link

cpw commented Feb 17, 2021

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

I have a laptop I shut off nightly. When I power it on again next day, it doesn't appear in DNS. To date, I have restarted unbound after this happens (it happens about 75% of the time), assuming something had crashed (unbound was until recently very crashy), and the laptop immediately resolves again.

I installed a monit monitor on the dhcp watcher, because that seemed to be the most likely culprit, but it hasn't crashed and seems to be working fine. In fact, everything seems to be correct except unbound itself - which knows not about the DHCP lease.

I'm guessing that the dhcp watcher somehow removed the lease from unbound, and unbound refuses to accept the new lease?

To Reproduce

Steps to reproduce the behavior:

  1. Not sure. Probably remove a DHCP lease, let it expire (how?) then try and re-acquire the same lease again?

Expected behavior

DHCP and DNS should work reliably together, no matter the status of the in-and-out of the individual DHCP requests.

Describe alternatives you considered

I'm tempted to abandon DHCPD and unbound and go to dnsmasq, but I know that's not the direction of the project, so not sure.

Relevant log files

From /var/unbound/dhcpleases.conf

local-data-ptr: "10.0.0.103 cweeks-lap.*.*"
local-data: "cweeks-lap.*.* IN A 10.0.0.103"

So it looks like the dhcpleases knows about the cweeks-lap entry.

From clog /var/log/dhcpd.log | fgrep cweeks-lap

Feb 16 18:51:29 wall dhcpd[87272]: DHCPREQUEST for 10.0.0.103 from 50:7b:9d:d7:f6:8a (cweeks-lap) via lagg0_vlan20 
Feb 16 18:51:29 wall dhcpd[87272]: DHCPACK on 10.0.0.103 to 50:7b:9d:d7:f6:8a (cweeks-lap) via lagg0_vlan20 
Feb 17 08:58:46 wall dhcpd[87272]: DHCPACK on 10.0.0.103 to 50:7b:9d:d7:f6:8a (cweeks-lap) via lagg0_vlan20 

We can see that the request is made when the laptop starts.

From /var/dhcpd/var/db/dhcpd.leases

lease 10.0.0.103 {
  starts 3 2021/02/17 13:58:46;
  ends 3 2021/02/17 15:58:46;
  cltt 3 2021/02/17 13:58:46;
  binding state active;
  next binding state free;
  rewind binding state free;
  hardware ethernet 50:7b:9d:d7:f6:8a;
  uid "\000cweeks-lap";
  client-hostname "cweeks-lap";
}

There was a second "dormant" entry in here earlier, but it has since been removed by the DHCP process I guess.

lease 10.0.0.103 {
  starts 2 2021/02/16 23:51:29;
  ends 3 2021/02/17 01:51:29;
  tstp 3 2021/02/17 01:51:29;
  cltt 2 2021/02/16 23:51:29;
  binding state free;
  hardware ethernet 50:7b:9d:d7:f6:8a;
  uid "\000cweeks-lap";
}

The second entry above was present in the DHCP leases file above the first entry for sometime.

unbound-control -c /var/unbound/unbound.conf list_local_data | fgrep cweeks-lap. returns nothing (and the DNS fails to resolve)

NOTE:

If I run the command from #4597

unbound-control -c /var/unbound/unbound.conf local_data "cweeks-lap.*.* IN A 10.0.0.103"
and
unbound-control -c /var/unbound/unbound.conf local_data "103.0.0.10.in-addr.arpa. IN PTR cweeks-lap.*.*"

then I can now resolve the machine. Is the watcher somehow ignoring the updated lease because it was present, then it wasn't and now it is again?

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 21.1.1-amd64
FreeBSD 12.1-RELEASE-p13-HBSD
LibreSSL 3.2.3

Hardware is a little intel box:
Intel(R) Core(TM) i3-4030U CPU @ 1.90GHz (4 cores)

@AdSchellevis AdSchellevis added the support Community support label Feb 17, 2021
@gwjo
Copy link
Contributor

gwjo commented Feb 20, 2021

There is a logic bug in

def run_watcher(target_filename, domain):

Consider the following scenario:

  • When a DHCP client first registers, the watcher does two things:
    1. Updates Unbound dynamically with new DNS entry for this endpoint
    2. Writes this entry to /var/unbound/dhcpleases.conf so it is persistent if Unbound restarts
      RESULT: endpoint has DNS entry
  • Now restart Unbound
    1. Unbound reads the cached entries in /var/unbound/dhcpleases.conf
    2. The watcher reads these and stores them in unbound_local_data
      RESULT: endpoint has DNS entry
  • Endpoint refreshes it's DHCP lease
    1. watcher adds this entry to cached_leases
    2. Does not update Unbound, because entry is listed in unbound_local_data
      RESULT: endpoint has DNS entry
  • If endpoint now doesn't renew it's DHCP and times out
    1. watcher removes entry from cached_leases
    2. It also updates Unbound dynamically
    3. Critically it does not remove the entry from unbound_local_data
      RESULT: No DNS entry for endpoint
  • If endpoint later comes back and gets the same IP address from the DHCP server
    1. watcher adds entry into cached_leases
    2. But when it checks unbound_local_data the entry exists, so it does not update Unbound
      RESULT: No DNS entry for endpoint

Not updating unbound_local_data when a DHCP client times out, means that Unbound isn't updated later when the DHCP endpoint comes back later.

@AdSchellevis
Copy link
Member

@gwjo I'm not sure I agree, the only delete on cached_leases I could find was this:

unbound_control(['local_data_remove', cached_leases[address]['client-hostname']])
del cached_leases[address]
dhcpd_changed = True

after which dhcpd_changed flushes the running config to /var/unbound/dhcpleases.conf in the next row. If there's an issue we're missing here, please point to the line in question.

gwjo added a commit to gwjo/opnsense-core that referenced this issue Feb 20, 2021
Cleanup local data cache when a DHCP endpoint expires, so that it is
kept in sync with dynamic changes.  This ensures that if an expired DHCP
endpoint returns and is assigned the same IP address the local cache is
correct and doesn't block the entry being dynamically re-added to Unbound.

Also don't cache the blacklist entries, which aren't needed to manage
the DHCP DNS entries. There can easily be 1M+ blacklist entries, so
ignoring these improves startup speed and reduces memory footprint

Fixes opnsense#4714
@gwjo
Copy link
Contributor

gwjo commented Feb 20, 2021

@AdSchellevis, that is the "problematic" delete.

This removes the entry from cached_leases but does not update unbound_local_data, so when the DHCP endpoint comes back, the following check fails and the entry is never dynamically re-added to Unbound.

if not unbound_local_data.is_equal(address, fqdn):
for tmp_fqdn in unbound_local_data.all_fqdns(address, fqdn):
syslog.syslog(syslog.LOG_NOTICE, 'dhcpd entry changed %s @ %s.' % (tmp_fqdn, address))
unbound_control(['local_data_remove', tmp_fqdn])
unbound_local_data.cleanup(address, fqdn)
unbound_control(['local_data', address, 'PTR', fqdn])
unbound_control(['local_data', fqdn, 'IN A', address])
unbound_local_data.add_address(address, fqdn)

You are correct that it still gets written to /var/unbound/dhcpleases.conf but this doesn't get read unless Unbound restarts or something triggers a config reload, like the blacklist cronjob running. During normal running the DHCP entries are managed only via unbound-control

@cpw
Copy link
Author

cpw commented Feb 20, 2021

Unbound restarts is how I've handled the DHCP not reappearing historically, yes. It seems you might have found the problem 👍

@AdSchellevis
Copy link
Member

@gwjo got it, thanks for the PR!

AdSchellevis pushed a commit that referenced this issue Feb 21, 2021
Cleanup local data cache when a DHCP endpoint expires, so that it is
kept in sync with dynamic changes.  This ensures that if an expired DHCP
endpoint returns and is assigned the same IP address the local cache is
correct and doesn't block the entry being dynamically re-added to Unbound.

Also don't cache the blacklist entries, which aren't needed to manage
the DHCP DNS entries. There can easily be 1M+ blacklist entries, so
ignoring these improves startup speed and reduces memory footprint

Fixes #4714
@fichtner fichtner added bug Production bug and removed support Community support labels Feb 22, 2021
@fichtner fichtner added this to the 21.7 milestone Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

Successfully merging a pull request may close this issue.

4 participants