NSD doesn't refresh zones after extended downtime #25

aabdnn · 2019-07-18T10:25:24Z

I have a test server, where I had last run NSD in March 2019. Then I had shut it down. Today, I installed a new version of NSD, and started it. It has 3 slave zones configured. The log showed this:

[2019-07-18 10:02:56.198] nsd[1187]: notice: nsd starting (NSD 4.2.1)
[2019-07-18 10:02:56.765] nsd[1188]: info: zone . read with success
[2019-07-18 10:02:56.768] nsd[1188]: info: zone arpa. read with success
[2019-07-18 10:02:56.768] nsd[1188]: info: zone root-servers.net. read with success
[2019-07-18 10:02:56.768] nsd[1188]: notice: nsd started (NSD 4.2.1), pid 1187
[2019-07-18 10:06:10.608] nsd[1188]: warning: signal received, shutting down...

Notice that NSD did not refresh any of the zones, even though they are vastly out of date. Now, this is caused by the timers on xfrd.state (which is shown below). I think the issue here is that NSD isn't checking the time of the state file with current system time, and so doesn't realise that the refresh timers are too old, and that it should immediately refresh these zones. Even at exit, it still saves the refresh timers, and won't update if I start it again. I think the value of next_timeout should take into account the current system time.

NSDXFRD2
# This file is written on exit by nsd xfr daemon.
# This file contains slave zone information:
#       * timeouts (when was zone data acquired)
#       * state (OK, refreshing, expired)
#       * which master transfer to attempt next
# The file is read on start (but not on reload) by nsd xfr daemon.
# You can edit; but do not change statement order
# and no fancy stuff (like quoted "strings").
#
# If you remove a zone entry, it will be refreshed.
# This can be useful for an expired zone; it revives
# the zone temporarily, from refresh-expiry time.
# If you delete the file all slave zones are updated.
#
# Note: if you edit this file while nsd is running,
#       it will be overwritten on exit by nsd.

filetime: 1563444370    # Thu Jul 18 10:06:10 2019

# The number of zone entries in this file
numzones: 3

zone:   name: .
        state: 0 # OK
        master: 0
        next_master: -1
        round_num: -1
        next_timeout: 1707      # = 28m 27s
        backoff: 0
        soa_nsd_acquired: 1553543862    # was 114d 14h 8m 28s ago
        soa_nsd: 6 1 86400 1792 a.root-servers.net. nstld.verisign-grs.com. 2019032501 1800 900 604800 86400
        # refresh = 30m retry = 15m expire = 7d minimum = 1d
        soa_disk_acquired: 1563444176   # was 3m 14s ago
        soa_disk: 6 1 86400 1792 a.root-servers.net. nstld.verisign-grs.com. 2019032501 1800 900 604800 86400
        # refresh = 30m retry = 15m expire = 7d minimum = 1d
        soa_notify_acquired: 0

zone:   name: arpa.
        state: 0 # OK
        master: 0
        next_master: -1
        round_num: -1
        next_timeout: 1572      # = 26m 12s
        backoff: 0
        soa_nsd_acquired: 1553543862    # was 114d 14h 8m 28s ago
        soa_nsd: 6 1 86400 1792 a.root-servers.net. nstld.verisign-grs.com. 2019032501 1800 900 604800 86400
        # refresh = 30m retry = 15m expire = 7d minimum = 1d
        soa_disk_acquired: 1563444176   # was 3m 14s ago
        soa_disk: 6 1 86400 1792 a.root-servers.net. nstld.verisign-grs.com. 2019032501 1800 900 604800 86400
        # refresh = 30m retry = 15m expire = 7d minimum = 1d
        soa_notify_acquired: 0

zone:   name: root-servers.net.
        state: 0 # OK
        master: 0
        next_master: -1
        round_num: -1
        next_timeout: 12555     # = 3h 29m 15s
        backoff: 0
        soa_nsd_acquired: 1553543861    # was 114d 14h 8m 29s ago
        soa_nsd: 6 1 3600000 1792 a.root-servers.net. nstld.verisign-grs.com. 2019031301 14400 7200 1209600 3600000
        # refresh = 4h retry = 2h expire = 14d minimum = 41d 16h
        soa_disk_acquired: 1563444176   # was 3m 14s ago
        soa_disk: 6 1 3600000 1792 a.root-servers.net. nstld.verisign-grs.com. 2019031301 14400 7200 1209600 3600000
        # refresh = 4h retry = 2h expire = 14d minimum = 41d 16h
        soa_notify_acquired: 0

NSDXFRD2

The text was updated successfully, but these errors were encountered:

wcawijngaards · 2019-07-18T11:26:26Z

Hi Anand,
Thanks for the report! I added the logic so that when it reads in the xfrdfile it checks if the timeout is in the past. If so, it attempts to refetch the zone contents. Because that probably affects all of the zones in the file, it spreads the load over a couple of seconds with a random(10) second delay. That works in a test setup for me.
Best regards, Wouter

aabdnn · 2019-07-18T13:23:46Z

Hi Wouter! Thanks for the fix. Actually I was aware of this issue for ages, but kept forgetting to open a report. Today, when I built 4.2.1 for testing, and noticed it again, I decided to open the report before I forgot again. I will try to rebuild with this patch and report to you, or just test with 4.2.2 comes out.

aabdnn · 2019-07-22T09:56:55Z

Hi Wouter. I tried this today, and it works as you described. NSD starts, notices that the zones are way out of date, and schedules refresh for them with a random delay of up to 10 seconds. I guess this is fine, but in reality, there is probably no need for the delay, because it would be fine if NSD just immediately refreshes the zones. If you start NSD without any zone files or xfrd.state, then it XFRs all the slaves zones in immediately. It doesn't make too much sense to add a random 10s delay for this specific case of extended downtime. For consistency, if you're going to add a random delay, then it should be for all cases, to avoid flooding a master server.

wcawijngaards · 2019-07-22T11:17:19Z

Hi Anand. Yes that is right, fixed it in commit 784600e where it fetches immediately, and then if that needs retries, the already existing logic for spreading retries can spread load. This is also how it works when NSD is started without files. So this fix makes all the zones get fetched immediately.

wcawijngaards self-assigned this Jul 18, 2019

wcawijngaards closed this as completed in e606817 Jul 18, 2019

wcawijngaards mentioned this issue Sep 3, 2020

Uncleanly closed recreated database does not force transfer all secondary zones or report the true zone status #120

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NSD doesn't refresh zones after extended downtime #25

NSD doesn't refresh zones after extended downtime #25

aabdnn commented Jul 18, 2019

wcawijngaards commented Jul 18, 2019

aabdnn commented Jul 18, 2019

aabdnn commented Jul 22, 2019

wcawijngaards commented Jul 22, 2019

NSD doesn't refresh zones after extended downtime #25

NSD doesn't refresh zones after extended downtime #25

Comments

aabdnn commented Jul 18, 2019

wcawijngaards commented Jul 18, 2019

aabdnn commented Jul 18, 2019

aabdnn commented Jul 22, 2019

wcawijngaards commented Jul 22, 2019