Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbound DNSBL download unable to resolve DNS at boot, and previously downloaded BL file does not load. Removing 85-dnsbl helps. #6523

Closed
Chaskel opened this issue Apr 24, 2023 · 11 comments
Assignees
Labels
cleanup Low impact changes
Milestone

Comments

@Chaskel
Copy link

Chaskel commented Apr 24, 2023

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

  • Reboot of OPNSense at 2 locations I have running OPNsense 23.1.5_4-amd64 yields the following log entries each time, and as a result clients start seeing ads in their web browsers:

Notice unbound blocklist: https://adaway.org/hosts.txt (exclude: 0 block: 0)
Notice unbound blocklist download: 0 total lines downloaded for https://adaway.org/hosts.txt
Error unbound blocklist download : unable to download file from https://adaway.org/hosts.txt (error : HTTPSConnectionPool(host='adaway.org', port=443): Max retries exceeded with url: /hosts.txt (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x8027cf640>: Failed to establish a new connection: [Errno 8] Name does not resolve')))

NOTE: After startup of OPNsense, OPNsense diagnostic tools and client systems do not show any DNS problems. It appears only Unbound DNSBL has problems during boot time.

Manual restarting of Unbound service (e.g. restart service button on Blocklist page) does not appear to initiate download of list (based on not seeing messages such as those listed above).

If I disable Blocklist/Apply, then Enable Blocklist/Apply, it appears to trigger getting data:

Notice unbound blocklist parsing done in 0.58 seconds (7355 records)
Notice unbound blocklist: https://adaway.org/hosts.txt (exclude: 2 block: 7355)
Notice unbound blocklist download: 11782 total lines downloaded for https://adaway.org/hosts.txt
Notice unbound blocklist download : exclude domains matching ^(?![a-zA-Z_\d]).*|.*localhost$

NOTE: Even though the data seems to be retrieved, it appears it is not active until I then restart the service* (e.g. restart service button on Blocklist page).

*It also seems as though I need to go through the disable/enable steps then restart service an additional time to have everything fully work. I am not sure if it is always just one time, but I do know that doing the entire process once does not usually get everything working.

To Reproduce

Steps to reproduce the behavior:

  1. Configure DNS-related items:
    Services->Unbound DNS->Blocklist - "AdAway List" selected and all other fields empty.
    Services->Unbound DNS->DNS over TLS - 2 IPv4 and 2 IPv6 servers defined. All 4 using port 853. (1.1.1.1 / 1.0.0.1 / 2606:4700:4700::1111 / 2606:4700:4700::1001)
    Services->Unbound DNS->General - DNSSEC support enabled.
    System->Settings->General - No DNS servers manually defined.
    System->Settings->General - Allow DNS server list to be overridden by DHCP/PPP on WAN is enabled.
  2. Reboot OPNsense
  3. Check Unbound log to see if it was able to successfully download the BL file.

Expected behavior

  • DNSBL file should download at boot without name resolution issues.
  • Manual start/restart of DNSBL service should trigger download of DNSBL file.

Describe alternatives you considered

  • Per discussions with AdSchellevis on issue 6514, it would appear removing /usr/local/etc/rc.syshook.d/start/85-dnsbl prevents the failed log entries (but I suspect that would mean that a new DNSBL file will not get downloaded and processed until a cron job to update is executed).
  • Tried renaming file to 90-unbounddnsbl but that did not help.

Additional context

Thoughts:

  • For bootup item (DNS resolution error), perhaps a service dependency needs to be made if the DNSBL download process is launching before DNS resolution services are fully up and running (if that is what is actually happening).
  • For the manual DNSBL service startup item, perhaps there are additional processes that could be restarted behind the scenes as part of manual service start/restart to trigger getting the URL to process the data.

Environment

OPNsense 23.1.5_4-amd64 / FreeBSD 13.1-RELEASE-p7 / OpenSSL 1.1.1t 7 Feb 2023
VNOPN Micro Firewall Appliance with 4 Intel 2.5GbE Intel i225 NIC Ports
Intel N3700 Quad Core, Support AES-NI, 8GB DDR3

@AdSchellevis AdSchellevis self-assigned this Apr 25, 2023
@AdSchellevis AdSchellevis added the cleanup Low impact changes label Apr 25, 2023
@kulikov-a
Copy link
Member

Hi
one more thought is that 'requests' library dont retry by default. adding retry plan (like kulikov-a@c669765) (ref. https://forum.opnsense.org/index.php?topic=32327.0) may help if it's a matter of some small overlays. but.. it may produce a very long blocklist download task run if there is a real lack of connection

@AdSchellevis
Copy link
Member

@kulikov-a I'm doubting the download should be in the boot flow to be honest, which is the main cause of this issue in my opinion. I'll need to discuss this internally, scheduling a download after a delay could also be an option.

@fichtner
Copy link
Member

The previous design intentionally kept the pre-reboot setting in the staging area, also because of volatile /var MFS which is no longer present. Not sure when it was lost.

@fichtner fichtner added this to the 23.7 milestone Apr 30, 2023
@kulikov-a
Copy link
Member

@AdSchellevis Hi!)

I'm doubting the download should be in the boot flow

fair enough. but there is another thought: technically this situation (problems with name resolution or connection) can occur not only at boot. Is it generally correct to overwrite the cache file with empty data in such cases? or is it better to somehow track _uri_reader exceptions\errors and leave the list untouched in some cases?

@AdSchellevis
Copy link
Member

@kulikov-a well, it would likely be better to keep the previous situation when no files can be downloaded.

@kulikov-a
Copy link
Member

kulikov-a commented Apr 30, 2023

@AdSchellevis got it, thanks! (so, the expired cache-files reuse can be useful on network problems. but not right after the reboot since the all cache is gone. no universal solution comes to my mind except moving cache from /tmp)

@fichtner
Copy link
Member

The cache is restored unless the user doesn’t want it. 😉

@kulikov-a
Copy link
Member

not unbound cache - block-list content cache ;)

def _blocklist_reader(self, uri):
"""
Decides whether a blocklist can be read from a cached file or
needs to be downloaded. Yields (unformatted) domains either way
"""
total_lines = 0
from_cache = False
h = hashlib.md5(uri.encode()).hexdigest()
cache_loc = '/tmp/bl_cache/'
if os.path.exists(cache_loc):
filep = cache_loc + h
if os.path.exists(filep) and os.path.getsize(filep) > 0:
fstat = os.stat(filep).st_ctime
if (time.time() - fstat) < self.cache_ttl: # 20 hours, a bit under the recommended cron time
from_cache = True
for line in open(filep):
total_lines += 1
yield line

@fichtner
Copy link
Member

Ok, /tmp is obviously cleared on boot.

@AdSchellevis
Copy link
Member

We discussed this internally, at the moment the best option seems to be to remove the syshook causing the download to be performed on boot as it is only relevant after a reinstall or configuration import.

The downside might be that after an import, the user will need to download manually, but that's the case for most components at the moment and we don't have a hook for that.

i'll remove the file and close this in the next commit.

fichtner pushed a commit that referenced this issue May 8, 2023
…closes #6523

(cherry picked from commit 99438a8)
(cherry picked from commit 5852897)
@Chaskel
Copy link
Author

Chaskel commented May 8, 2023

Thank you for the update. In case the following idea is of use, it may help to mention on the DNSBL configuration page that a cron job will need to be manually created. While that reference is in the online documentation, when I originally configured the options (just using the GUI as my initial guide), it was not immediately clear to me that this step was needed. When researching the issue described in this Github issue, it seemed to me that it wasn't always clear to others as well.

I suspect this could apply to other items that may require a cron job, but thought I would mention it in case it helps.

Thank you again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Low impact changes
Development

No branches or pull requests

4 participants