Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible regression of Unbound since 18.7.6 #2894

Closed
soder10 opened this issue Nov 10, 2018 · 37 comments
Closed

Possible regression of Unbound since 18.7.6 #2894

soder10 opened this issue Nov 10, 2018 · 37 comments
Labels
support Community support

Comments

@soder10
Copy link

soder10 commented Nov 10, 2018

Hello devs,

since upgraded to 18.7.6 that introduced unbound 1.8.1, I face "frequent" DNS resolver service shutdowns. Checking the Unbound log I dont see any reason why the service suddenly stopped itself, rendering all name resolution stop.

All I could find in the unbound log is this line:

Nov 10 10:35:53 | unbound: [67269:0] fatal error: Could not read config file: /unbound.conf. Maybe try unbound -dd, it stays on the commandline to see more errors, or unbound-checkconf

When this is observed, and I manually try to start the service again, it starts successfully, and seems to be working for a couple of days, then stops suddently again.

I tried set the logging level2 to get more logs, but in this case everye single name resolution attempt is alao logged, and it will be difficult to catch the event that has anything to do with the stop/crash.

Where is the unbound log stored in the underlying system, so I could use some clever grep to extra t meaningful entries?

@fichtner fichtner added the upstream Third party issue label Nov 10, 2018
@fichtner
Copy link
Member

It looks like an issue with Unbound 1.8. 1.7 was reported to work fine. Our integration didn't change in this regard. You could verify this theory by reverting to 1.7:

# opnsense-revert -r 18.7.5 unbound

Cheers,
Franco

@soder10
Copy link
Author

soder10 commented Nov 10, 2018

Thanks, with 1.7 I didnt face such issue. I will try to revert back to that older unbound version. After rollback should I take care of anything regarding this package is no longer in sync with what is nornally shipped with 18.7.6+ ?

@fichtner
Copy link
Member

Reload Unbound manually to load the previous version. Code-wise both are compatible.

On next OPNsense update, however, it will try to install 1.8 again so you can lock the package from System: Firmware: Packages to prevent such an update. It could be that the update will stall in this case. Unbound is intertwined in some other packages... but nothing bad happens as it just won't update in this case.

This is just for confirmation. I do believe it needs to be reported to upstream developers that something is wrong with 1.8.

@soder10
Copy link
Author

soder10 commented Nov 12, 2018

Is there anything to check in the logs in case the next time it stops (was it a graceful stop by some opnsense service, or it died ungracefully by its own) ? Or this is more related to Unbound devs?

Ps. The unbound log is located at /var/log/resolver.log in case someone wants to read it from CLI (took me some time to figure out this fact)

@soder10
Copy link
Author

soder10 commented Nov 14, 2018

It happened again a couple of minutes ago. I added some DHCP static mapping, applied the change. This seemed trigger the restart of Unbound as well. I could see in the log, that the last line was this same error, as I reported the last time:

Nov 14 12:29:03 unbound: [58191:0] fatal error: Could not read config file: /unbound.conf. Maybe try unbound -dd, it stays on the commandline to see more errors, or unbound-checkconf
Nov 14 12:29:03 unbound: [58191:0] notice: Restart of unbound 1.8.1.

Seems this fatal error blocked Unbound from successfully starting. Strange thing, if I press the service start from GUI, it starts without issues.

@soder10
Copy link
Author

soder10 commented Nov 14, 2018

Yep, its reproducible every single time I apply config.

@EugenMayer
Copy link
Contributor

@fichtner IMHO i would expect this to be a timing issue during the generation of the unbound configuration.

Since it happens from time to time, it should be expected to the DHCP hostname auto-registering in ubound which causes that. Maybe in 1.8. they changed the timings / blocking mechanism during the reload or they just "hard bind" to the inode of the configuration, while we "replace" the file with a new one during the generation of the configuration - and the new unbound version is more sensible to that.

I am pretty sure this is not an upstream bug or at least if, it would be a "freebsd" related upstream bug and that just sounds rather unlikely.

I am currently waiting for arm32/arm64 alpine to get fixed on docker.hub so i can package 1.8 so i could try to reproduce this on https://github.com/EugenMayer/docker-unbound .. but i have to wait. I could though try this with amd64 since i already packaged eugenmayer/unbound:1.8. Maybe we can isolate the issue better with a docker-image, especially for reproduction, be it a upstream issue or a opsense issue.

@soder10
Copy link
Author

soder10 commented Nov 24, 2018

Guys, while you are investigating this issue:
can you recommend me how to schedule "CRON - unbound service restart" every day at midnight and sometime during the day? Users are complaining to throw away this router and bring the former Asus soho box back, because the internet is practically dead until I manually start the unbound service from GUI. I hope scheduled service restart multiple times per day can keep the service alive without this sudden stop triggered.

I checked System\Settings\Cron \ Add new job --> but I can only find 8-10 predefined jobs, none of them are custom service restart. Dont want to mess with anything non-GUI way unless someone can give me 2-3 lines of clear instruction.
Thanks guys!

@soder10
Copy link
Author

soder10 commented Nov 24, 2018

Forgot to say: I was afraid of rolling back to unbound 1.7, fearing what may break when the next opnsense upgrade tries to reinstall it / tries to skip that out-of-sync package. So yes, I am still on this buggy 1.8.1, and if you need any log to capture the stop-reason, let me know.

@fichtner
Copy link
Member

the command to restart unbound is "/usr/local/sbin/pluginctl dns", just add it to /etc/crontab (the next firmware update will overwrite crontab, but if you're looking for a solution until unbound is updated that's probably it)

@EugenMayer
Copy link
Contributor

EugenMayer commented Nov 24, 2018

Rather use monit to monitor and restart the service if port 53 is dead? Monit is part of the plugins, so this should be easy and far better then the cron based restart, since it instantly self heals

you can also just test-resolve with monit but be aware, exit code of nslookup / dig are nonsense, use grep to make sense of it

@soder10
Copy link
Author

soder10 commented Nov 24, 2018

@EugenMayer : thanks for the hint, monit should be the real solution, and would prefer that definitely. Unfortunately I dont have the spare time to become expert of the nuances of monit, meanwhile the next time unbound stops, angry users would throw the firewall to the trash. So would like something very basic and very reliable to restart that damn unbound.

@EugenMayer
Copy link
Contributor

Google is your friend :)

check process dnsmasq with pidfile /var/run/unbound.pid
  start program = "/usr/local/sbin/pluginctl dns start" with timeout 60 seconds
  stop program = "/usr/local/sbin/pluginctl dns stop"
  if failed
    host 127.0.0.1
    port 53 use type udp
    protocol dns
    with timeout 60 seconds
  then restart
  if 3 restarts within 5 cycles then timeout

Dont make your life harder :)

@soder10
Copy link
Author

soder10 commented Dec 1, 2018

Small update: it seems it is triggered on Saturday, around 10:36 am. That is the 3rd Saturday morning when it happens, always around this 10:36 am time.

@soder10
Copy link
Author

soder10 commented Dec 8, 2018

Just as I expected: happened today again, at exactly 1 week later.
I checked what was happening: I get a PPPOE disconnect event (most probably my ISP drops my connection after 1 week uptime, dont know for sure who initiates that)

//the last line before the current issue is from dec1, when the previous disconnect-reconnect event happened, so betwen the 2 events there is exactly 1 week silence //
Dec 8 10:37:07 FW01 ppp: [wan_link0] LCP: rec'd Terminate Request #2 (Opened)
Dec 8 10:37:07 FW01 ppp: [wan_link0] LCP: state change Opened --> Stopping
Dec 8 10:37:07 FW01 ppp: [wan_link0] Link: Leave bundle "wan"
Dec 8 10:37:07 FW01 ppp: [wan] Bundle: Status update: up 0 links, total bandwidth 9600 bps
Dec 8 10:37:07 FW01 ppp: [wan] IPCP: Close event
Dec 8 10:37:07 FW01 ppp: [wan] IPCP: state change Opened --> Closing
Dec 8 10:37:07 FW01 ppp: [wan] IPCP: SendTerminateReq #16
Dec 8 10:37:07 FW01 ppp: [wan] IPCP: LayerDown
Dec 8 10:37:08 FW01 ppp: [wan] IPV6CP: Close event
Dec 8 10:37:08 FW01 ppp: [wan] IPV6CP: state change Opened --> Closing
Dec 8 10:37:08 FW01 ppp: [wan] IPV6CP: SendTerminateReq #8
Dec 8 10:37:08 FW01 ppp: [wan] IPV6CP: LayerDown
Dec 8 10:37:09 FW01 ppp: [wan] IFACE: Down event
Dec 8 10:37:09 FW01 ppp: [wan] IFACE: Rename interface pppoe0 to pppoe0
Dec 8 10:37:09 FW01 ppp: [wan] IPCP: Down event
Dec 8 10:37:09 FW01 ppp: [wan] IPCP: LayerFinish
Dec 8 10:37:09 FW01 ppp: [wan] IPCP: state change Closing --> Initial
Dec 8 10:37:09 FW01 ppp: [wan] IPV6CP: Down event
Dec 8 10:37:09 FW01 ppp: [wan] IPV6CP: LayerFinish
Dec 8 10:37:09 FW01 ppp: [wan] Bundle: No NCPs left. Closing links...
Dec 8 10:37:09 FW01 ppp: [wan] IPV6CP: state change Closing --> Initial
Dec 8 10:37:09 FW01 ppp: [wan_link0] LCP: SendTerminateAck #16
Dec 8 10:37:09 FW01 ppp: [wan_link0] LCP: LayerDown
Dec 8 10:37:10 FW01 ppp: [wan_link0] PPPoE: connection closed
Dec 8 10:37:10 FW01 ppp: [wan_link0] Link: DOWN event
Dec 8 10:37:10 FW01 ppp: [wan_link0] LCP: Down event
Dec 8 10:37:10 FW01 ppp: [wan_link0] LCP: state change Stopping --> Starting
Dec 8 10:37:10 FW01 ppp: [wan_link0] Link: reconnection attempt 1 in 2 seconds
// then the reconnect finishes successfully //

During this exact time, unbound is stopped, and fails to restart correctly, with the same error message posted in my initial post.

@fichtner
Copy link
Member

fichtner commented Dec 8, 2018

You probably set explicit Unbound Interfaces?

@soder10
Copy link
Author

soder10 commented Dec 8, 2018

You are correct, I mentioned recently to protect DNS by not even listening on WAN. But after you said it may silently break things, I reverted that config back, so it listens on ALL since a couple of weeks.

@fichtner
Copy link
Member

fichtner commented Dec 8, 2018

Hmmm, maybe your reconnect comes back with the same IP? it might be that it doesn't know that it must rerun configuration when the IP is back

@soder10
Copy link
Author

soder10 commented Dec 8, 2018

Nope, it gets a completely different IP after the succesful reconnect

Dec 1 10:36:54 FW01 ppp: [wan] IPCP: LayerUp
Dec 1 10:36:54 FW01 ppp: [wan] 77.234.86.41 -> 10.0.0.1
Dec 8 10:37:07 FW01 ppp: [wan_link0] LCP: rec'd Terminate Request #2 (Opened)
Dec 8 10:37:07 FW01 ppp: [wan_link0] LCP: state change Opened --> Stopping
Dec 8 10:37:07 FW01 ppp: [wan_link0] Link: Leave bundle "wan"
Dec 8 10:37:07 FW01 ppp: [wan] Bundle: Status update: up 0 links, total bandwidth 9600 bps
Dec 8 10:37:07 FW01 ppp: [wan] IPCP: Close event
Dec 8 10:37:07 FW01 ppp: [wan] IPCP: state change Opened --> Closing
Dec 8 10:37:07 FW01 ppp: [wan] IPCP: SendTerminateReq #16
Dec 8 10:37:07 FW01 ppp: [wan] IPCP: LayerDown
Dec 8 10:37:08 FW01 ppp: [wan] IPV6CP: Close event
Dec 8 10:37:08 FW01 ppp: [wan] IPV6CP: state change Opened --> Closing
Dec 8 10:37:08 FW01 ppp: [wan] IPV6CP: SendTerminateReq #8
Dec 8 10:37:08 FW01 ppp: [wan] IPV6CP: LayerDown
Dec 8 10:37:09 FW01 ppp: [wan] IFACE: Down event
Dec 8 10:37:09 FW01 ppp: [wan] IFACE: Rename interface pppoe0 to pppoe0
Dec 8 10:37:09 FW01 ppp: [wan] IPCP: Down event
Dec 8 10:37:09 FW01 ppp: [wan] IPCP: LayerFinish
Dec 8 10:37:09 FW01 ppp: [wan] IPCP: state change Closing --> Initial
Dec 8 10:37:09 FW01 ppp: [wan] IPV6CP: Down event
Dec 8 10:37:09 FW01 ppp: [wan] IPV6CP: LayerFinish
Dec 8 10:37:09 FW01 ppp: [wan] Bundle: No NCPs left. Closing links...
Dec 8 10:37:09 FW01 ppp: [wan] IPV6CP: state change Closing --> Initial
Dec 8 10:37:09 FW01 ppp: [wan_link0] LCP: SendTerminateAck #16
Dec 8 10:37:09 FW01 ppp: [wan_link0] LCP: LayerDown
Dec 8 10:37:10 FW01 ppp: [wan_link0] PPPoE: connection closed
Dec 8 10:37:10 FW01 ppp: [wan_link0] Link: DOWN event
Dec 8 10:37:10 FW01 ppp: [wan_link0] LCP: Down event
Dec 8 10:37:10 FW01 ppp: [wan_link0] LCP: state change Stopping --> Starting
Dec 8 10:37:10 FW01 ppp: [wan_link0] Link: reconnection attempt 1 in 2 seconds
Dec 8 10:37:12 FW01 ppp: [wan_link0] Link: reconnection attempt 1
Dec 8 10:37:12 FW01 ppp: [wan_link0] PPPoE: Connecting to ''
Dec 8 10:37:12 FW01 ppp: PPPoE: rec'd ACNAME "DIGI"
Dec 8 10:37:12 FW01 ppp: [wan_link0] PPPoE: connection successful
Dec 8 10:37:12 FW01 ppp: [wan_link0] Link: UP event
Dec 8 10:37:12 FW01 ppp: [wan_link0] LCP: Up event
Dec 8 10:37:12 FW01 ppp: [wan_link0] LCP: state change Starting --> Req-Sent
Dec 8 10:37:12 FW01 ppp: [wan_link0] LCP: SendConfigReq #17
Dec 8 10:37:12 FW01 ppp: [wan_link0] PROTOCOMP
Dec 8 10:37:12 FW01 ppp: [wan_link0] MRU 1492
Dec 8 10:37:12 FW01 ppp: [wan_link0] MAGICNUM 0x741a782c
Dec 8 10:37:12 FW01 ppp: [wan_link0] LCP: rec'd Configure Request #1 (Req-Sent)
Dec 8 10:37:12 FW01 ppp: [wan_link0] MRU 1492
Dec 8 10:37:12 FW01 ppp: [wan_link0] AUTHPROTO PAP
Dec 8 10:37:12 FW01 ppp: [wan_link0] MAGICNUM 0xc83917d5
Dec 8 10:37:12 FW01 ppp: [wan_link0] LCP: SendConfigAck #1
Dec 8 10:37:12 FW01 ppp: [wan_link0] MRU 1492
Dec 8 10:37:12 FW01 ppp: [wan_link0] AUTHPROTO PAP
Dec 8 10:37:12 FW01 ppp: [wan_link0] MAGICNUM 0xc83917d5
Dec 8 10:37:12 FW01 ppp: [wan_link0] LCP: state change Req-Sent --> Ack-Sent
Dec 8 10:37:14 FW01 ppp: [wan_link0] LCP: SendConfigReq #18
Dec 8 10:37:14 FW01 ppp: [wan_link0] PROTOCOMP
Dec 8 10:37:14 FW01 ppp: [wan_link0] MRU 1492
Dec 8 10:37:14 FW01 ppp: [wan_link0] MAGICNUM 0x741a782c
Dec 8 10:37:14 FW01 ppp: [wan_link0] LCP: rec'd Configure Reject #18 (Ack-Sent)
Dec 8 10:37:14 FW01 ppp: [wan_link0] PROTOCOMP
Dec 8 10:37:14 FW01 ppp: [wan_link0] LCP: SendConfigReq #19
Dec 8 10:37:14 FW01 ppp: [wan_link0] MRU 1492
Dec 8 10:37:14 FW01 ppp: [wan_link0] MAGICNUM 0x741a782c
Dec 8 10:37:14 FW01 ppp: [wan_link0] LCP: rec'd Configure Ack #19 (Ack-Sent)
Dec 8 10:37:14 FW01 ppp: [wan_link0] MRU 1492
Dec 8 10:37:14 FW01 ppp: [wan_link0] MAGICNUM 0x741a782c
Dec 8 10:37:14 FW01 ppp: [wan_link0] LCP: state change Ack-Sent --> Opened
Dec 8 10:37:14 FW01 ppp: [wan_link0] LCP: auth: peer wants PAP, I want nothing
Dec 8 10:37:14 FW01 ppp: [wan_link0] PAP: using authname "pasztorattila-3p9"
Dec 8 10:37:14 FW01 ppp: [wan_link0] PAP: sending REQUEST #1 len: 31
Dec 8 10:37:14 FW01 ppp: [wan_link0] LCP: LayerUp
Dec 8 10:37:14 FW01 ppp: [wan_link0] PAP: rec'd ACK #1 len: 5
Dec 8 10:37:14 FW01 ppp: [wan_link0] LCP: authorization successful
Dec 8 10:37:14 FW01 ppp: [wan_link0] Link: Matched action 'bundle "wan" ""'
Dec 8 10:37:14 FW01 ppp: [wan_link0] Link: Join bundle "wan"
Dec 8 10:37:14 FW01 ppp: [wan] Bundle: Status update: up 1 link, total bandwidth 64000 bps
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: Open event
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: state change Initial --> Starting
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: LayerStart
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: Open event
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: state change Initial --> Starting
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: LayerStart
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: Up event
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: state change Starting --> Req-Sent
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: SendConfigReq #17
Dec 8 10:37:14 FW01 ppp: [wan] IPADDR 0.0.0.0
Dec 8 10:37:14 FW01 ppp: [wan] COMPPROTO VJCOMP, 16 comp. channels, no comp-cid
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: Up event
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: state change Starting --> Req-Sent
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: SendConfigReq #9
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: rec'd Configure Request #1 (Req-Sent)
Dec 8 10:37:14 FW01 ppp: [wan] IPADDR 10.0.0.1
Dec 8 10:37:14 FW01 ppp: [wan] 10.0.0.1 is OK
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: SendConfigAck #1
Dec 8 10:37:14 FW01 ppp: [wan] IPADDR 10.0.0.1
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: state change Req-Sent --> Ack-Sent
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: rec'd Configure Request #1 (Req-Sent)
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: SendConfigAck #1
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: state change Req-Sent --> Ack-Sent
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: rec'd Configure Reject #17 (Ack-Sent)
Dec 8 10:37:14 FW01 ppp: [wan] COMPPROTO VJCOMP, 16 comp. channels, no comp-cid
Dec 8 10:37:14 FW01 ppp: [wan] IPCP: SendConfigReq #18
Dec 8 10:37:14 FW01 ppp: [wan] IPADDR 0.0.0.0
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: rec'd Configure Ack #9 (Ack-Sent)
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: state change Ack-Sent --> Opened
Dec 8 10:37:14 FW01 ppp: [wan] IPV6CP: LayerUp
Dec 8 10:37:14 FW01 ppp: [wan] 70c0:1440:dab2:c427 -> d8ef:f77e:50f3:7078
Dec 8 10:37:15 FW01 ppp: [wan] IFACE: Up event
Dec 8 10:37:15 FW01 ppp: [wan] IFACE: Rename interface ng0 to pppoe0
Dec 8 10:37:15 FW01 ppp: [wan] IPCP: rec'd Configure Nak #18 (Ack-Sent)
Dec 8 10:37:15 FW01 ppp: [wan] IPADDR 178.164.188.118
Dec 8 10:37:15 FW01 ppp: [wan] 178.164.188.118 is OK
Dec 8 10:37:15 FW01 ppp: [wan] IPCP: SendConfigReq #19
Dec 8 10:37:15 FW01 ppp: [wan] IPADDR 178.164.188.118
Dec 8 10:37:15 FW01 ppp: [wan] IPCP: rec'd Configure Ack #19 (Ack-Sent)
Dec 8 10:37:15 FW01 ppp: [wan] IPADDR 178.164.188.118
Dec 8 10:37:15 FW01 ppp: [wan] IPCP: state change Ack-Sent --> Opened
Dec 8 10:37:15 FW01 ppp: [wan] IPCP: LayerUp
Dec 8 10:37:15 FW01 ppp: [wan] 178.164.188.118 -> 10.0.0.1
root@FW01:~ # clog /var/log/ppps.log

@fichtner
Copy link
Member

fichtner commented Dec 8, 2018

Ok, I'm testing Unbound 1.8.2 just now, but here's another patch that will help recover from a crashed unbound in your particular case (listen "all"): 3d8fd00

# opnsense-patch 3d8fd0088af

@soder10
Copy link
Author

soder10 commented Dec 8, 2018

Thanks, I applied the patch (seems succesfully). Did a forced pppoe disconnect -> unbound stopped and could not start with the same error message.
But after approx 1 minute, it tried another attempt to start, and finally it started.

@fichtner
Copy link
Member

fichtner commented Dec 8, 2018

TBH, if it says "Could not read config file: /unbound.conf" it really mixes up the chroot for no apparent reason. The file is always there but it may be missing that "/var/unbound" is actually "./" but we changed nothing in this regard as far as I'm aware.

@soder10
Copy link
Author

soder10 commented Dec 8, 2018

I see what you mean: this patch is a workaround, the real issue is why does unbound search for the config file at the wrong location?

@fichtner
Copy link
Member

fichtner commented Dec 8, 2018

mwexecf('/usr/local/sbin/unbound -c %s', '/var/unbound/unbound.conf');

That's the only spot we ever start Unbound and as you can see it has a full path to the file. The chroot directive is inside unbound.conf, so how Unbound can say "./unbound.conf" is not found while adhering to the information inside the file is beyond me. :D

chroot: /var/unbound

@soder10
Copy link
Author

soder10 commented Dec 8, 2018

The only speciality I have is the following in the Custom options section:

include: /usr/local/etc/unbound/unbound_advertisement_serverlist_2018.11.28.txt
extended-statistics: yes
( I have a suspicion the 1st line is causibg the problem)

I did a quick test and removed these 2 lines, saved the change, applied it, then disconnect-reconnect ppoe -> seems unbound started correctly this time

@fichtner
Copy link
Member

fichtner commented Dec 8, 2018

you should place this file in /var/unbound/ with chown unbound:unbound to be sure...

@soder10
Copy link
Author

soder10 commented Dec 8, 2018

The only problem my /var is on tmpfs, so the next reboot wipes it. Yet another syshook script?

@fichtner
Copy link
Member

fichtner commented Dec 8, 2018

symlink might work. it's your setup :)

@fichtner fichtner added support Community support and removed upstream Third party issue labels Dec 8, 2018
@soder10
Copy link
Author

soder10 commented Dec 8, 2018

Nope, symlink didnt work.

custom options:
include: /var/unbound/unbound_advertisement_serverlist.txt
extended-statistics: yes

ls /var/unbound
lrwxr-xr-x 1 unbound unbound 70 Dec 8 12:04 unbound_advertisement_serverlist.txt -> /usr/local/etc/unbound/unbound_advertisement_serverlist_2018.11.28.txt

ls /usr/local/etc/unbound/
-rw-r--r-- 1 unbound unbound 239583 Nov 28 14:15 unbound_advertisement_serverlist_2018.11.28.txt

Still the same error, and after 1 min, the automated retry starts it successfully.

If I copied the file directly under /var/unbound, it works though. So because of /var is a tmpfs, I need to find a way to copy the file from permanent storage to /var right after tmpfs is mounted, but before unbound is being started (otherwise unbound fails with missing include file, and stops). So only way to do this, is syshook script, am I right?
But which stage (early / start?) and which unallocated sequence number preempts unbound service start?

fichtner added a commit that referenced this issue Dec 16, 2018
@fichtner
Copy link
Member

The right time is whenever you update the file you push it to /var/unbound and then run "pluginctl dns".

I'm sorry, there's nothing else we can try here in an open source scope. The discussed workaround goes into 18.7.10.

@soder10
Copy link
Author

soder10 commented Jan 2, 2019

@fichtner: can you tell me, at which stage of the boot process /syshook script run do you generate the unbound.conf file that gets copied to /var/unbound? I would need to hack into that step to add the extra file copy from persistent filesystem to /var/unbound tmpfs, and the unbound service will find that copied file when it is getting started. At least thats my understanding.

@fichtner
Copy link
Member

fichtner commented Jan 2, 2019

use "start" syshook, /var/unbound will be populated and running

copy the file

run "pluginctl dns" to restart unbound.

@soder10
Copy link
Author

soder10 commented Jan 2, 2019

Cool, I will try to implement this and hope it will work :)

@fichtner
Copy link
Member

fichtner commented Jan 2, 2019

it should. if not let me know :)

@soder10
Copy link
Author

soder10 commented Jan 3, 2019

I have to wait couple of days (most probably until the weekend) for this. When is the 18.7.10 planned, so I could merge the 2 action?

@fichtner
Copy link
Member

fichtner commented Jan 3, 2019

Preliminary target for 18.7.10 is Tuesday (Jan 8)

@soder10
Copy link
Author

soder10 commented Jan 4, 2019

Thanks, will wait for it then, before going for custom scripting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Community support
Development

No branches or pull requests

3 participants