unbound blocklist download failures: error : HTTPSConnectionPool: Read timed out. #7371

CallMeR · 2024-04-10T11:41:41Z

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

[*] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
[*] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue

Describe the bug

Unbound often logs download failures, but the download can be completed using a Windows browser.
Unbound will use the content of incompletely downloaded files and import them into the DNSBL list.

Perhaps these list files are relatively large, or the file links are behind a CDN network.

However, Unbound's timeout retry strategy is somewhat aggressive, with a timeout of only 5 seconds, and it will use incompletely downloaded files, resulting in an incomplete list

Tip: to validate your setup was working with the previous version, use opnsense-revert (https://docs.opnsense.org/manual/opnsense_tools.html#opnsense-revert)

To Reproduce

Steps to reproduce the behavior:

Add the following custom list links in the Unbound DNSBL

https://malware-filter.gitlab.io/malware-filter/urlhaus-filter-hosts.txt
https://anti-ad.net/domains.txt
https://neodev.team/lite_host

Click the "Apply" button to let Unbound start downloading the target file

Expected behavior

Follow the CDN associated with the list download link.
Wait for the file download to complete.
Adjust the timeout and retry strategy.
If unable to download the complete list in the end, consider abandoning the current list update instead of importing an incomplete list.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 24.1.5_3-amd64
FreeBSD 13.2-RELEASE-p11
OpenSSL 3.0.13

AdSchellevis · 2024-04-15T17:55:48Z

changing the timeout is relatively easy:

core/src/opnsense/scripts/unbound/blocklists/__init__.py

Line 111 in 1afe040

'timeout': 5,

I don't think we should try to modify the streaming behavior of the script, when for example 80% of the file can be downloaded, it's usually better to process it than ignore it in full. These sets don't really represent a consistent state that is only valid when processed in full.....

CallMeR · 2024-04-16T11:03:20Z

it's usually better to process it than ignore it in full

It does make sense in most scenarios, but if 90% was downloaded the first time and only 10% this time, it would result in the 10% of valid data overwriting the initial 90% of valid data.

AdSchellevis · 2024-04-16T11:15:38Z

True, I missed the caching part in the code here

core/src/opnsense/scripts/unbound/blocklists/default_bl.py

Lines 124 to 130 in b551927

    
           if not from_cache: 
        
               os.makedirs(cache_loc, exist_ok=True) 
        
               with open(cache_loc + h, 'w') as outf: 
        
                   for line in self._uri_reader(uri): 
        
                       outf.write(line + '\n') 
        
                       total_lines += 1 
        
                       yield line

CallMeR · 2024-04-16T11:48:39Z

I just went to check my DNSBL list again, and it seems that this time I have encountered a more serious download problem.

My list is now left with only 1000+ entries.

Currently, I am updating the DNSBL every 8 hours through a scheduled task, but it doesn't seem to have solved this problem.

According to the code you provided earlier, the cache TTL is set to 20 hours.

This means that I may have inadvertently missed almost a day's worth of data, causing some leaks as that data was not properly blocked.

AdSchellevis · 2024-04-16T15:53:17Z

this 597b65a should improve the situation, also increased the timeout a bit, but the main change is about process ordering (use cached when failed)

CallMeR changed the title ~~Easy to encounter download failures when downloading custom DNSBL in Unbound~~ unbound blocklist download : error reading file from https://neodev.team/lite_host (error : HTTPSConnectionPool(host='neodev.team', port=443): Read timed out.) Apr 12, 2024

CallMeR changed the title ~~unbound blocklist download : error reading file from https://neodev.team/lite_host (error : HTTPSConnectionPool(host='neodev.team', port=443): Read timed out.)~~ unbound blocklist download failures: error : HTTPSConnectionPool: Read timed out. Apr 12, 2024

AdSchellevis self-assigned this Apr 16, 2024

AdSchellevis added the cleanup Low impact changes label Apr 16, 2024

AdSchellevis added this to the 24.7 milestone Apr 16, 2024

AdSchellevis closed this as completed in 597b65a Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unbound blocklist download failures: error : HTTPSConnectionPool: Read timed out. #7371

unbound blocklist download failures: error : HTTPSConnectionPool: Read timed out. #7371

CallMeR commented Apr 10, 2024 •

edited

AdSchellevis commented Apr 15, 2024

CallMeR commented Apr 16, 2024

AdSchellevis commented Apr 16, 2024

CallMeR commented Apr 16, 2024

AdSchellevis commented Apr 16, 2024

unbound blocklist download failures: error : HTTPSConnectionPool: Read timed out. #7371

unbound blocklist download failures: error : HTTPSConnectionPool: Read timed out. #7371

Comments

CallMeR commented Apr 10, 2024 • edited

AdSchellevis commented Apr 15, 2024

CallMeR commented Apr 16, 2024

AdSchellevis commented Apr 16, 2024

CallMeR commented Apr 16, 2024

AdSchellevis commented Apr 16, 2024

CallMeR commented Apr 10, 2024 •

edited