Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unbound/logger.py crash #7049

Closed
2 tasks done
zt-luo opened this issue Dec 3, 2023 · 12 comments
Closed
2 tasks done

unbound/logger.py crash #7049

zt-luo opened this issue Dec 3, 2023 · 12 comments
Assignees
Labels
bug Production bug

Comments

@zt-luo
Copy link

zt-luo commented Dec 3, 2023

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

A clear and concise description of what the bug is, including last known working version (if any).

Tip: to validate your setup was working with the previous version, use opnsense-revert (https://docs.opnsense.org/manual/opnsense_tools.html#opnsense-revert)

To Reproduce

Steps to reproduce the behavior:

  1. stop Unbound DNS service
  2. start Unbound DNS service
  3. about 4 hours after, unbound/logger.py start crash

Expected behavior

no crash

Describe alternatives you considered

no

Screenshots

no

Relevant log files

System log:
2023-12-03T16:47:46 | Notice | kernel | <6>pid 82433 (python3.9), jid 0, uid 0: exited on signal 6

Services: Unbound DNS: Log File

2023-12-03T16:48:12 Informational unbound [78837:0] info: dnsbl_module: no logging backend found.
2023-12-03T16:48:12 Informational unbound [78837:0] info: dnsbl_module: attempting to open pipe
2023-12-03T16:48:01 Informational unbound [78837:1] info: dnsbl_module: no logging backend found.
2023-12-03T16:48:01 Informational unbound [78837:1] info: dnsbl_module: attempting to open pipe
2023-12-03T16:47:46 Informational unbound [78837:6] info: dnsbl_module: no logging backend found.
2023-12-03T16:47:46 Informational unbound [78837:6] info: dnsbl_module: attempting to open pipe
2023-12-03T16:47:46 Informational unbound [78837:5] info: dnsbl_module: Logging backend closed connection. Closing pipe and continuing.

then i found unbound/logger.py crashing...
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
8192 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
9754 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
11230 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
12236 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
14652 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
16047 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
17997 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
18315 0 S+ 0:00.01 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
20902 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
21931 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
22734 0 S+ 0:00.01 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
23895 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
25625 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
25998 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
27812 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py
root@opn:~ # ps -A | grep "/usr/local/opnsense/scripts/unbound/logger.py"
28992 0 S+ 0:00.00 grep /usr/local/opnsense/scripts/unbound/logger.py

Additional context

no

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 23.7.9 (amd64).
Intel(R) Core(TM) i3-N305 (8 cores, 8 threads)

@AdSchellevis AdSchellevis added the support Community support label Dec 3, 2023
@AdSchellevis
Copy link
Member

intel microcode issues? the N families seem to be famous for strange behavior

@Karo7
Copy link

Karo7 commented Dec 10, 2023

I've the same issue with Celeron N5095
and we've this issue before on mutiple versions

@AdSchellevis
Copy link
Member

any idea when the issue started?

If logger.py is not running and crashes quite rapidly, you could try to run the logger in a shell to see if it produces any readable errors using the following command:

/usr/local/opnsense/scripts/unbound/logger.py

@zt-luo
Copy link
Author

zt-luo commented Dec 12, 2023

Screenshot_2023-12-12-12-50-28-506_com termux-edit

just abort

@zt-luo
Copy link
Author

zt-luo commented Dec 12, 2023

more information:
unbound/logger.py will never crash until click Reporting->unbound dns

@AdSchellevis AdSchellevis added bug Production bug and removed support Community support labels Dec 12, 2023
@AdSchellevis AdSchellevis self-assigned this Dec 12, 2023
@AdSchellevis
Copy link
Member

no clue what happened there, could be an issue with duckdb, it doesn't seem to produce a python exception which points to something more low level

@AdSchellevis
Copy link
Member

This might be caused by a broken database file, can you try to install 29377ce and export/import the database using the commands below?

opnsense-patch 29377cecc4db6b073567ab8abdb62e158a92bdce
/usr/local/etc/rc.syshook.d/upgrade/20-unbound-duckdb.py && /usr/local/opnsense/scripts/unbound/restore_db.py && pluginctl -s unbound restart

This likely also shrinks the db file (/var/unbound/data/unbound.duckdb ) significantly as well.

fichtner pushed a commit that referenced this issue Dec 18, 2023
…air. (ref #7049)

(cherry picked from commit 29377ce)
(cherry picked from commit c847fb5)
@zt-luo
Copy link
Author

zt-luo commented Dec 23, 2023

This might be caused by a broken database file, can you try to install 29377ce and export/import the database using the commands below?

opnsense-patch 29377cecc4db6b073567ab8abdb62e158a92bdce
/usr/local/etc/rc.syshook.d/upgrade/20-unbound-duckdb.py && /usr/local/opnsense/scripts/unbound/restore_db.py && pluginctl -s unbound restart

This likely also shrinks the db file (/var/unbound/data/unbound.duckdb ) significantly as well.

caused by a broken database file.
restore_db.py won't work.
after delete unbound.duckdb, unbound works back to normal.

@AdSchellevis
Copy link
Member

close issue then?

@Kornelius777
Copy link

Thank you very much!
I just ran into the same issue. (OPNsense 23.7.10_1-amd64 with unbound 1.19.0)
Deleting /var/unbound/data/unbound.duckdb solved it for me.
Kind regards!

@tvr256
Copy link

tvr256 commented Jan 4, 2024

I've also been seeing this since December. Deleting the database is a workaround, but ideally we should solve the root cause of the database corruption. It seems strange that everyone has started seeing it recently, has anything changed in the logging code?

@AdSchellevis
Copy link
Member

Instead of dropping the database, one could try to repair it as suggested earlier (#7049 (comment)). If a repair helps, that's something we might be able to automate at some point in time. Otherwise we can wait for the next duckdb version and see if that improves the situation..... Unfortunately we can only do that on major upgrades at the moment due to duckdb files not being binary compatible....

fichtner pushed a commit that referenced this issue Jan 15, 2024
PR: #7049

(cherry picked from commit 13408c8)
(cherry picked from commit 587375a)
fichtner pushed a commit that referenced this issue Mar 12, 2024
…nd always dump before system shutdown to be able to ship duckdb library upgrades on minor releases in the future.

In an attempt to keep the database clean and hopefully error prone, the logger will export/import now every ~24 hours. As these operations are rather quick it should help to minimize the footprint as well (duckdb files keep growing after deletes).

closes #7049
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests

5 participants