Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTL Crash issues? Read this thread first! #705

Closed
3 tasks done
WiredLife opened this issue Mar 3, 2020 · 89 comments
Closed
3 tasks done

FTL Crash issues? Read this thread first! #705

WiredLife opened this issue Mar 3, 2020 · 89 comments

Comments

@WiredLife
Copy link

WiredLife commented Mar 3, 2020

In raising this issue, I confirm the following (please check boxes, eg [X]) Failure to fill the template will close your issue:

  • I have read and understood the contributors guide.
  • The issue I am reporting can be replicated
  • The issue I am reporting isn't a duplicate

How familiar are you with the codebase?:

1


[BUG | ISSUE] Expected Behaviour:

[BUG | ISSUE] Actual Behaviour:
pi-hole on my 2 completly different systems crashed @ nearly the same time.

[BUG | ISSUE] Steps to reproduce:

Log file output [if available]
RPi4 Log:

[2020-03-04 00:18:39.277 4096] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-03-04 00:18:39.277 4096] ---------------------------->  FTL crashed!  <----------------------------
[2020-03-04 00:18:39.277 4096] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-03-04 00:18:39.277 4096] Please report a bug at https://github.com/pi-hole/FTL/issues
[2020-03-04 00:18:39.278 4096] and include in your report already the following details:
[2020-03-04 00:18:39.278 4096] FTL has been running for 106087 seconds
[2020-03-04 00:18:39.278 4096] FTL branch: master
[2020-03-04 00:18:39.278 4096] FTL version: v4.3.1
[2020-03-04 00:18:39.278 4096] FTL commit: b60d63f
[2020-03-04 00:18:39.278 4096] FTL date: 2019-05-25 21:37:26 +0200
[2020-03-04 00:18:39.278 4096] FTL user: started as pihole, ended as pihole
[2020-03-04 00:18:39.278 4096] Received signal: Segmentation fault
[2020-03-04 00:18:39.278 4096]      at address: 0
[2020-03-04 00:18:39.278 4096]      with code: SEGV_MAPERR (Address not mapped to object)
[2020-03-04 00:18:39.279 4096] Backtrace:
[2020-03-04 00:18:39.279 4096] B[0000]: /usr/bin/pihole-FTL(+0x1a25c) [0x47125c]
[2020-03-04 00:18:39.279 4096] B[0001]: /lib/arm-linux-gnueabihf/libc.so.6(__default_rt_sa_restorer+0) [0xb6d4c130]
[2020-03-04 00:18:39.279 4096] B[0002]: /usr/bin/pihole-FTL(+0x32798) [0x489798]
[2020-03-04 00:18:39.279 4096] B[0003]: /usr/bin/pihole-FTL(receive_query+0x5d1) [0x48a4ce]
[2020-03-04 00:18:39.279 4096] B[0004]: /usr/bin/pihole-FTL(+0x40ed6) [0x497ed6]
[2020-03-04 00:18:39.279 4096] B[0005]: /usr/bin/pihole-FTL(main_dnsmasq+0xa3f) [0x49913c]
[2020-03-04 00:18:39.279 4096] B[0006]: /usr/bin/pihole-FTL(main+0x87) [0x46fe18]
[2020-03-04 00:18:39.279 4096] B[0007]: /lib/arm-linux-gnueabihf/libc.so.6(__libc_start_main+0x10c) [0xb6d36718]
[2020-03-04 00:18:39.279 4096] Thank you for helping us to improve our FTL engine!
[2020-03-04 00:18:39.279 4096] FTL terminated!`

PC Log:

[2020-03-04 00:29:13.585 22510] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-03-04 00:29:13.585 22510] ---------------------------->  FTL crashed!  <----------------------------
[2020-03-04 00:29:13.585 22510] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-03-04 00:29:13.585 22510] Please report a bug at https://github.com/pi-hole/FTL/issues
[2020-03-04 00:29:13.585 22510] and include in your report already the following details:
[2020-03-04 00:29:13.585 22510] FTL has been running for 104188 seconds
[2020-03-04 00:29:13.585 22510] FTL branch: master
[2020-03-04 00:29:13.585 22510] FTL version: v4.3.1
[2020-03-04 00:29:13.585 22510] FTL commit: b60d63f
[2020-03-04 00:29:13.585 22510] FTL date: 2019-05-25 21:37:26 +0200
[2020-03-04 00:29:13.585 22510] FTL user: started as pihole, ended as pihole
[2020-03-04 00:29:13.585 22510] Received signal: Segmentation fault
[2020-03-04 00:29:13.585 22510]      at address: 0
[2020-03-04 00:29:13.585 22510]      with code: SEGV_MAPERR (Address not mapped to object)
[2020-03-04 00:29:13.586 22510] Backtrace:
[2020-03-04 00:29:13.586 22510] B[0000]: /usr/bin/pihole-FTL(+0x255e5) [0x55d1fce755e5]
[2020-03-04 00:29:13.586 22510] B[0001]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7fd2fa55c890]
[2020-03-04 00:29:13.586 22510] B[0002]: /usr/bin/pihole-FTL(+0x47a9a) [0x55d1fce97a9a]
[2020-03-04 00:29:13.586 22510] B[0003]: /usr/bin/pihole-FTL(receive_query+0x905) [0x55d1fce98e05]
[2020-03-04 00:29:13.586 22510] B[0004]: /usr/bin/pihole-FTL(+0x5db5b) [0x55d1fceadb5b]
[2020-03-04 00:29:13.586 22510] B[0005]: /usr/bin/pihole-FTL(main_dnsmasq+0xfdc) [0x55d1fceaf67c]
[2020-03-04 00:29:13.586 22510] B[0006]: /usr/bin/pihole-FTL(main+0xbc) [0x55d1fce73acc]
[2020-03-04 00:29:13.586 22510] B[0007]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7fd2fa17ab97]
[2020-03-04 00:29:13.586 22510] B[0008]: /usr/bin/pihole-FTL(_start+0x2a) [0x55d1fce73bfa]
[2020-03-04 00:29:13.586 22510] Thank you for helping us to improve our FTL engine!
[2020-03-04 00:29:13.586 22510] FTL terminated!`

Device specifics

Hardware Type: RPi4 4GB and a PC
OS: newest Raspbian on RPi4 and Ubuntu Server on PC

This template was created based on the work of udemy-dl.

@rtgibbons
Copy link

rtgibbons commented Mar 4, 2020

Just had my crash at 2020-03-03 16:40:31.791 CST

Working to get GDB but nothing is cooperating right now

@WiredLife
Copy link
Author

WiredLife commented Mar 4, 2020

GNU gdb (Raspbian 8.2.1-2) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 666
[New LWP 667]
[New LWP 668]
[New LWP 669]
[New LWP 670]
[New LWP 671]
[New LWP 672]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
__GI___poll (timeout=-1, nfds=6, fds=0xb1fab0) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) handle SIGHUP nostop SIGPIPE nostop
Signal        Stop      Print   Pass to program Description
SIGHUP        No        Yes     Yes             Hangup
SIGPIPE       No        Yes     Yes             Broken pipe
(gdb) continue
Continuing.

Thread 1 "pihole-FTL" received signal SIGSEGV, Segmentation fault.
0x004ab798 in forward_query (udpfd=4, udpaddr=0x2, udpaddr@entry=0xbedf99d8, dst_addr=0xbedf99d8, dst_addr@entry=0xbedf9a08, dst_iface=3202324856, dst_iface@entry=2, header=<optimized out>, header@entry=0xb1be48, plen=47,
    plen@entry=30, now=<optimized out>, now@entry=4096, forward=0x1b854d0, ad_reqd=ad_reqd@entry=0, do_bit=0, do_bit@entry=11648584) at dnsmasq/forward.c:313
313     dnsmasq/forward.c: No such file or directory.
(gdb) backtrace
#0  0x004ab798 in forward_query (udpfd=4, udpaddr=0x2, udpaddr@entry=0xbedf99d8, dst_addr=0xbedf99d8, dst_addr@entry=0xbedf9a08, dst_iface=3202324856, dst_iface@entry=2, header=<optimized out>, header@entry=0xb1be48, plen=47,
    plen@entry=30, now=<optimized out>, now@entry=4096, forward=0x1b854d0, ad_reqd=ad_reqd@entry=0, do_bit=0, do_bit@entry=11648584) at dnsmasq/forward.c:313
#1  0x004ac4ce in receive_query (listen=listen@entry=0xb532c0, now=4096, now@entry=1583279886) at dnsmasq/forward.c:1641
#2  0x004b9ed6 in check_dns_listeners (now=now@entry=1583279886) at dnsmasq/dnsmasq.c:1657
#3  0x004bb13c in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at dnsmasq/dnsmasq.c:1108
#4  0x00491e18 in main (argc=<optimized out>, argv=<optimized out>) at main.c:71
(gdb) continue
Continuing.

Thread 1 "pihole-FTL" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0xb6daf230 in __GI_abort () at abort.c:79
#2  0x004932ba in SIGSEGV_handler (sig=<optimized out>, si=<optimized out>, unused=<optimized out>) at signals.c:66
#3  <signal handler called>
#4  0x004ab798 in forward_query (udpfd=4, udpaddr=0x2, udpaddr@entry=0xbedf99d8, dst_addr=0xbedf99d8, dst_addr@entry=0xbedf9a08, dst_iface=3202324856, dst_iface@entry=2, header=<optimized out>, header@entry=0xb1be48, plen=47,
    plen@entry=30, now=<optimized out>, now@entry=4096, forward=0x1b854d0, ad_reqd=ad_reqd@entry=0, do_bit=0, do_bit@entry=11648584) at dnsmasq/forward.c:313
#5  0x004ac4ce in receive_query (listen=listen@entry=0xb532c0, now=4096, now@entry=1583279886) at dnsmasq/forward.c:1641
#6  0x004b9ed6 in check_dns_listeners (now=now@entry=1583279886) at dnsmasq/dnsmasq.c:1657
#7  0x004bb13c in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at dnsmasq/dnsmasq.c:1108
#8  0x00491e18 in main (argc=<optimized out>, argv=<optimized out>) at main.c:71`

@dschaper
Copy link
Member

dschaper commented Mar 4, 2020

Just had my crash at FTL date: 2019-05-25 21:37:26 +0200

That's the date the binary was compiled.

@WiredLife
Copy link
Author

WiredLife commented Mar 4, 2020

Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 1000
[New LWP 1001]
[New LWP 1002]
[New LWP 1003]
[New LWP 1004]
[New LWP 1005]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fa1aa8a8bf9 in __GI___poll (fds=0x55844a51a500, nfds=4, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: Datei oder Verzeichnis nicht gefunden.
(gdb) handle SIGHUP nostop SIGPIPE nostop
Signal        Stop      Print   Pass to program Description
SIGHUP        No        Yes     Yes             Hangup
SIGPIPE       No        Yes     Yes             Broken pipe
(gdb) continue
Continuing.

Thread 1 "pihole-FTL" received signal SIGSEGV, Segmentation fault.
0x000055844839fa9a in forward_query (udpfd=4, udpaddr=udpaddr@entry=0x7ffd19b78510, dst_addr=dst_addr@entry=0x7ffd19b784f0, dst_iface=dst_iface@entry=2, header=header@entry=0x55844a514e40, plen=44, plen@entry=43, now=1583280653,
    forward=0x55844a51c310, ad_reqd=0, do_bit=0) at dnsmasq/forward.c:313
313     dnsmasq/forward.c: Datei oder Verzeichnis nicht gefunden.
(gdb) backtrace
#0  0x000055844839fa9a in forward_query (udpfd=4, udpaddr=udpaddr@entry=0x7ffd19b78510, dst_addr=dst_addr@entry=0x7ffd19b784f0, dst_iface=dst_iface@entry=2, header=header@entry=0x55844a514e40, plen=44, plen@entry=43, now=1583280653,
    forward=0x55844a51c310, ad_reqd=0, do_bit=0) at dnsmasq/forward.c:313
#1  0x00005584483a0e05 in receive_query (listen=listen@entry=0x55844a52a720, now=now@entry=1583280653) at dnsmasq/forward.c:1641
#2  0x00005584483b5b5b in check_dns_listeners (now=now@entry=1583280653) at dnsmasq/dnsmasq.c:1657
#3  0x00005584483b767c in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at dnsmasq/dnsmasq.c:1108
#4  0x000055844837bacc in main (argc=1, argv=0x7ffd19b78908) at main.c:71
(gdb) continue
Continuing.

Thread 1 "pihole-FTL" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht gefunden.
(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fa1aa7d4801 in __GI_abort () at abort.c:79
#2  0x000055844837d675 in SIGSEGV_handler (sig=<optimized out>, si=<optimized out>, unused=<optimized out>) at signals.c:66
#3  <signal handler called>
#4  0x000055844839fa9a in forward_query (udpfd=4, udpaddr=udpaddr@entry=0x7ffd19b78510, dst_addr=dst_addr@entry=0x7ffd19b784f0, dst_iface=dst_iface@entry=2, header=header@entry=0x55844a514e40, plen=44, plen@entry=43, now=1583280653,
    forward=0x55844a51c310, ad_reqd=0, do_bit=0) at dnsmasq/forward.c:313
#5  0x00005584483a0e05 in receive_query (listen=listen@entry=0x55844a52a720, now=now@entry=1583280653) at dnsmasq/forward.c:1641
#6  0x00005584483b5b5b in check_dns_listeners (now=now@entry=1583280653) at dnsmasq/dnsmasq.c:1657
#7  0x00005584483b767c in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at dnsmasq/dnsmasq.c:1108
#8  0x000055844837bacc in main (argc=1, argv=0x7ffd19b78908) at main.c:71
(gdb) continue
Continuing.
Register können nicht ausgelesen werden: Kein passender Prozess gefunden.
Register können nicht ausgelesen werden: Kein passender Prozess gefunden.
(gdb) [Thread 0x7fa1a7ed7700 (LWP 1005) exited]
[Thread 0x7fa1a86d8700 (LWP 1004) exited]
[Thread 0x7fa1a8ed9700 (LWP 1003) exited]
[Thread 0x7fa1a96da700 (LWP 1002) exited]
[Thread 0x7fa1a9edb700 (LWP 1001) exited]

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.`

@antila
Copy link

antila commented Mar 4, 2020

I also have the same issue. dnsmasq/forward.c:313
It started an hour ago or so, and it crashes FTL after the first lookup completes.

Thread 1 "pihole-FTL" received signal SIGSEGV, Segmentation fault.
0x004ef798 in forward_query (udpfd=4, udpaddr=0x2, udpaddr@entry=0x7e8219f0, dst_addr=0x7e8219f0, dst_addr@entry=0x7e821a20, dst_iface=2122455440, dst_iface@entry=2,
    header=<optimized out>, header@entry=0xa86470, plen=42, plen@entry=31, now=<optimized out>, now@entry=4096, forward=0xa8d1a0, ad_reqd=ad_reqd@entry=0, do_bit=0,
    do_bit@entry=11035760) at dnsmasq/forward.c:313
313     dnsmasq/forward.c: No such file or directory.
(gdb) p forward
$1 = (struct frec *) 0xa8d1a0
(gdb) p forward[0]
$2 = {source = {sa = {sa_family = 2, sa_data = "\nw\300\250\001\001\000\000\000\000\000\000\000"}, in = {sin_family = 2, sin_port = 30474, sin_addr = {s_addr = 16885952},
      sin_zero = "\000\000\000\000\000\000\000"}, in6 = {sin6_family = 2, sin6_port = 30474, sin6_flowinfo = 16885952, sin6_addr = {__in6_u = {
          __u6_addr8 = "\000\000\000\000\000\000\000\000\060Ө\000\000\000\000", __u6_addr16 = {0, 0, 0, 0, 54064, 168, 0, 0}, __u6_addr32 = {0, 0, 11064112, 0}}},
      sin6_scope_id = 0}}, dest = {addr = {addr4 = {s_addr = 2265032896}, addr6 = {__in6_u = {__u6_addr8 = "\300\250\001\207LӨ\000\002\000\000\000\000\000\000", __u6_addr16 = {
            43200, 34561, 54092, 168, 2, 0, 0, 0}, __u6_addr32 = {2265032896, 11064140, 2, 0}}}, log = {keytag = 43200, algo = 34561, digest = 54092}, rcode = {rcode = 2265032896},
      dnssec = {class = 43200, type = 34561}}}, sentto = 0x0, rfd4 = 0x0, rfd6 = 0x0, iface = 2, orig_id = 9638, new_id = 49468, log_id = 144, fd = 4, forwardall = 1, flags = 256,
  time = 1583280319, hash = {0x23ff40cb <error: Cannot access memory at address 0x23ff40cb>, 0x6e77a7d <error: Cannot access memory at address 0x6e77a7d>,
    0x605b5f4b <error: Cannot access memory at address 0x605b5f4b>, 0xede55c47 <error: Cannot access memory at address 0xede55c47>,
    0x5b3a2cd1 <error: Cannot access memory at address 0x5b3a2cd1>, 0x0 <repeats 15 times>}, class = 1, work_counter = 49, stash = 0x0, stash_len = 42, dependent = 0x0,
  blocking_query = 0x0, next = 0xa8d0d8}
(gdb) p forward->sentto
$3 = (struct server *) 0x0
(gdb) p forward->sentto[0]
Cannot access memory at address 0x0
(gdb) p forward->sentto->addr
Cannot access memory at address 0x0
(gdb) p forward->sentto->addr.sa
Cannot access memory at address 0x0
(gdb) p forward->sentto->addr.sa.sa_family
Cannot access memory at address 0x0
[2020-03-04 00:09:29.869 5079] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-03-04 00:09:29.869 5079] ---------------------------->  FTL crashed!  <----------------------------
[2020-03-04 00:09:29.869 5079] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-03-04 00:09:29.870 5079] Please report a bug at https://github.com/pi-hole/FTL/issues
[2020-03-04 00:09:29.870 5079] and include in your report already the following details:
[2020-03-04 00:09:29.870 5079] FTL has been running for 275 seconds
[2020-03-04 00:09:29.870 5079] FTL branch: master
[2020-03-04 00:09:29.870 5079] FTL version: v4.3.1
[2020-03-04 00:09:29.870 5079] FTL commit: b60d63f
[2020-03-04 00:09:29.870 5079] FTL date: 2019-05-25 21:37:26 +0200
[2020-03-04 00:09:29.871 5079] FTL user: started as pihole, ended as pihole
[2020-03-04 00:09:29.871 5079] Received signal: Segmentation fault
[2020-03-04 00:09:29.873 5079]      at address: 0
[2020-03-04 00:09:29.874 5079]      with code: SEGV_MAPERR (Address not mapped to object)
[2020-03-04 00:09:29.917 5079] Backtrace:
[2020-03-04 00:09:29.917 5079] B[0000]: /usr/bin/pihole-FTL(+0x1a25c) [0x4d725c]
[2020-03-04 00:09:29.918 5079] B[0001]: /lib/arm-linux-gnueabihf/libc.so.6(__default_rt_sa_restorer+0) [0x76dec6c0]
[2020-03-04 00:09:29.918 5079] B[0002]: /usr/bin/pihole-FTL(+0x32798) [0x4ef798]
[2020-03-04 00:09:29.918 5079] B[0003]: /usr/bin/pihole-FTL(receive_query+0x5d1) [0x4f04ce]
[2020-03-04 00:09:29.918 5079] B[0004]: /usr/bin/pihole-FTL(+0x40ed6) [0x4fded6]
[2020-03-04 00:09:29.918 5079] B[0005]: /usr/bin/pihole-FTL(main_dnsmasq+0xa3f) [0x4ff13c]
[2020-03-04 00:09:29.918 5079] B[0006]: /usr/bin/pihole-FTL(main+0x87) [0x4d5e18]
[2020-03-04 00:09:29.919 5079] B[0007]: /lib/arm-linux-gnueabihf/libc.so.6(__libc_start_main+0x114) [0x76dd6678]
[2020-03-04 00:09:29.919 5079] Thank you for helping us to improve our FTL engine!
[2020-03-04 00:09:29.920 5079] FTL terminated!

@rtgibbons
Copy link

via https://docs.pi-hole.net/ftldns/debugging/

Thread 1 "pihole-FTL" received signal SIGSEGV, Segmentation fault.
0x000055f95a111a9a in forward_query (udpfd=4, udpaddr=udpaddr@entry=0x7ffe6502a020, dst_addr=dst_addr@entry=0x7ffe6502a000, dst_iface=dst_iface@entry=0, header=header@entry=0x55f95b57d2f0, plen=39, 
    plen@entry=33, now=1583281752, forward=0x55f95b582220, ad_reqd=0, do_bit=0) at dnsmasq/forward.c:313
313     dnsmasq/forward.c: No such file or directory.
(gdb) backtrace
#0  0x000055f95a111a9a in forward_query (udpfd=4, udpaddr=udpaddr@entry=0x7ffe6502a020, dst_addr=dst_addr@entry=0x7ffe6502a000, dst_iface=dst_iface@entry=0, header=header@entry=0x55f95b57d2f0, plen=39, 
    plen@entry=33, now=1583281752, forward=0x55f95b582220, ad_reqd=0, do_bit=0) at dnsmasq/forward.c:313
#1  0x000055f95a112e05 in receive_query (listen=listen@entry=0x55f95b588030, now=now@entry=1583281752) at dnsmasq/forward.c:1641
#2  0x000055f95a127b5b in check_dns_listeners (now=now@entry=1583281752) at dnsmasq/dnsmasq.c:1657
#3  0x000055f95a12967c in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at dnsmasq/dnsmasq.c:1108
#4  0x000055f95a0edacc in main (argc=1, argv=0x7ffe6502a418) at main.c:71```

@nullify005
Copy link

I was facing this as well this morning & disabling DNSSEC in the options appears to get FTL back to stable running (at least for the last 30mins at least)

@sylveon
Copy link

sylveon commented Mar 4, 2020

FTL also started crashing here an hour or two ago, after doing no changes to the setup in months.

@WiredLife
Copy link
Author

WiredLife commented Mar 4, 2020

yep i confirm must be something with dnssec
if i open http://en.conn.internet.nl/connection/ it crashs constantly
when i disable dnssec, it runs

@sylveon
Copy link

sylveon commented Mar 4, 2020

Disabling DNSSEC also fixed the constant crashing

@dschaper
Copy link
Member

dschaper commented Mar 4, 2020

Is anyone using CloudFlare as the upstream service? And is DNSSEC enabled or disabled?

@BPitts2
Copy link

BPitts2 commented Mar 4, 2020

Ah ha! Yes, I am using Cloudflare as up stream w/ DNSSEC

@WiredLife
Copy link
Author

yep using cloudflare through stubby because of DoT and enabled dnssec

@dschaper
Copy link
Member

dschaper commented Mar 4, 2020

OP on reddit mentions Cloudflare:
https://www.reddit.com/r/pihole/comments/fd44dc/ftl_crashing/

@sylveon
Copy link

sylveon commented Mar 4, 2020

I'm using stubby to forward queries to Cloudflare using DNS over TLS w/ DNSSEC

@kevindelaney
Copy link

Exact same issue for me - using cloudflared. Disabling DNSSEC has me back up for now.

@WiredLife
Copy link
Author

WiredLife commented Mar 4, 2020

i will try to change the server @ stubby and report back if its running with another company :D

@jdrch
Copy link

jdrch commented Mar 4, 2020

Is anyone using CloudFlare as the upstream service? And is DNSSEC enabled or disabled?

CloudFlare + DNSSEC via dnscrypt-proxy.

@WiredLife
Copy link
Author

WiredLife commented Mar 4, 2020

with the google dns servers it runs fine

@sylveon sylveon mentioned this issue Mar 4, 2020
@jdrch
Copy link

jdrch commented Mar 4, 2020

Created a CloudFlare Support ticket: 1842464 (not sure if that's publicly accessible, but feel free to reference it)

@jdrch
Copy link

jdrch commented Mar 4, 2020

CloudFlare Community thread created.

@WiredLife
Copy link
Author

WiredLife commented Mar 4, 2020

maybe they're trying to exploit some vulnerabilities? 😄

@dschaper
Copy link
Member

dschaper commented Mar 4, 2020

Meanwhile https://docs.pi-hole.net/guides/unbound/ is something to consider. Unbound as a local upstream and no longer depending on a service for upstreams.

@pralor-bot
Copy link

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/lost-connection-to-api/29062/2

@jdrch
Copy link

jdrch commented Mar 4, 2020

Which other DNS services besides CloudFlare support DNSCrypt and DNSSEC?

Meanwhile https://docs.pi-hole.net/guides/unbound/ is something to consider. Unbound as a local upstream and no longer depending on a service for upstreams.

I'm having the issue using dnscrypt-proxy, which I believe fulfills the same local upstream DNS functionality as unbound.

@MattLParker
Copy link

Op Here from https://www.reddit.com/r/pihole/comments/fd44dc/ftl_crashing/
I can finally start doing some real digging on root cause as well.

@sylveon
Copy link

sylveon commented Mar 4, 2020

@jdrch No. dnscrypt-proxy is a forwarding DNS resolver: it just forwards the request to 1.1.1.1. Meanwhile, unbound is a recursive DNS server: it provides the same service as 1.1.1.1 itself, meaning that it contacts the root DNS nameservers for the TLD nameserver, which it asks for the domain nameserver, and so on until it gets the actual IP.

DNS over TLS and DNS over HTTPS are not used between authoritative and recursive DNS server communication, only DNSSEC is used (if available), so you get the same result as contacting 1.1.1.1 encryption-wise. In fact, the RFC for DoT only mentions authoritative twice.

@jdrch
Copy link

jdrch commented Mar 4, 2020

@sylveon TIL, thanks. The performance hit does seem to be a significant drawback, though, which is bad for gaming ping times, among other things.

@vavrusa
Copy link

vavrusa commented Mar 5, 2020

@madpsy thanks, that's super helpful! I'll see if I can reproduce and find a workaround until the fixed dnsmasq package is released.

@vavrusa
Copy link

vavrusa commented Mar 5, 2020

@madpsy any chance you could try now and see if you still experience crashes?

@madpsy
Copy link

madpsy commented Mar 5, 2020

@vavrusa Looking good!

@TC1977
Copy link

TC1977 commented Mar 5, 2020

@DL6ER I just reviewed this issue after getting tagged by you, and checked out my setup again.

Shortly after I closed #645, I actually went back to my entire original config that was causing the crashes - DNSSEC, dnscrypt-proxy, using Cloudflare alternating with ventricle.us and even doh-crypto-sx (the originally problematic server) on occasion, and have had absolutely no problems over the last couple of days. The one thing that's different is that I switched to a new ISP, which doesn't have IPv6 - so I removed ::1 from the list of custom DNS servers in the Pi-hole settings.

  • pihole -d gives no issues (other than IPv6 not working),
  • journalctl -u dnscrypt-proxy shows no recent problems,
  • grep "RESPONSE_ERROR" /var/log/dnscrypt-proxy/query.log gives no response errors,
  • Pihole logs over the last couple of days doesn't show any unusual drop in queries.

So perhaps the error is now being triggered by Cloudflare in some different way (IPv6?), that isn't affecting me. Anyway, hope the work you did way back when is paying off for people now.

@jedisct1
Copy link

jedisct1 commented Mar 5, 2020

The common thing between cloudflare, ventricle.us and doh-crypto-sx is that they support padding.

@jedisct1
Copy link

jedisct1 commented Mar 5, 2020

Also, not only they support padding, but they also return padded responses for queries over DoH even if the query wasn't padded (some people may qualify that behavior as "not right", but it can only improve security).

@vavrusa
Copy link

vavrusa commented Mar 5, 2020

@jedisct1 I assumed it was padding as well as that's the only difference, but couldn't reproduce it. It looks like dnsmasq crashes upon receiving REFUSED response (I've managed to reproduce that). I did some digging and some portion of frequent DNSKEY queries could have been throttled as part of abuse traffic in some PoPs for the last few days, particularly if it's coming from shared prefixes. I've added an exception, so this shouldn't be happening anymore, so it'd be great if more people could confirm.

@MattLParker
Copy link

@vavrusa Sorry for the late delay, my box that was running 4 needed to be rebuilt, its up and looks stable so far.

@madpsy
Copy link

madpsy commented Mar 6, 2020

@vavrusa Has been stable for me since you made the change at cloudflare's side too. Thanks.

@ntomka
Copy link

ntomka commented Mar 6, 2020

I can confirm it also. I re-enabled DNSSEC about 4 hours ago and I didn't have any issues since.

@jdrch
Copy link

jdrch commented Mar 6, 2020

great if more people could confirm.

FTL v4.3.1 + dnscrypt-proxy 2.0.39 + Cloudflare upstream here.

@vavrusa I re-enabled DNSSEC in Pi-hole this morning and it ran just fine for ~2h before I left for work. I'll consider the bug fixed if it lasts 24h without a crash (though I don't see why it shouldn't.) Thanks so much!

@cheesedasher
Copy link

Re-enabled DNSSEC about 2 hours ago. My 21 clients are happy again. Thank you all.

@darkameba
Copy link

@madpsy any chance you could try now and see if you still experience crashes?

Seems to be working again. Thanks!

@jdrch
Copy link

jdrch commented Mar 7, 2020

Everything's been working now for over 24 hours, so I'd say this issue is resolved.

@derekslenk
Copy link

Everything's been working now for over 24 hours, so I'd say this issue is resolved.

Is this back on the main 5.0 branch or the feature branch noted above?

@jdrch
Copy link

jdrch commented Mar 7, 2020 via email

@DL6ER
Copy link
Member

DL6ER commented Mar 8, 2020

Thanks for all your input. This bug seems to have been fixed in two independent ways:

  1. Everyone on our update/dnsmasq branch got the fix on our side
  2. Everyone else got the issue fixed externally (Cloudflare fixed their issue)

We'll absorb our fix in the main code as soon as dnsmasq v2.81 is released. This will hopefully not take long, they have already set up a release candidate. It fails to compile on FreeBSD (a platform we don't support). A patch has already been worked out and submitted so we can expect a second release candidate, soon.

@pralor-bot
Copy link

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/ftl-crash/29699/8

@DL6ER DL6ER closed this as completed Apr 2, 2020
@DL6ER DL6ER unpinned this issue Apr 7, 2020
@biship
Copy link

biship commented May 15, 2020

Still occurring. Just had 2 pi-holes crash within 15 mins of each other. v5. Where exactly is the fix?

@jdrch
Copy link

jdrch commented May 15, 2020

@biship Still working on v5 over here with dnscrypt-proxy and CloudFlare upstream. What's your upstream?

FWIW the fix was made on CloudFlare's side over 2 months ago.

@biship
Copy link

biship commented May 15, 2020

1.1.1.2
1.0.0.2

@biship
Copy link

biship commented May 15, 2020

I'm not using dnscrypt-proxy, whatever that is.

@jdrch
Copy link

jdrch commented May 15, 2020

OK so we both use CloudFlare. Still up and running here ...

@biship
Copy link

biship commented May 15, 2020

it's happened twice in the last week.
if it becomes regular i'll have to turn on debug logs and open a new issue.

@cmjordan42
Copy link

I just experienced this issue when enabling DNSSEC on an otherwise working setup. After some brief debugging, it seems that FTL does not gracefully transition between certain upstream DNS configurations. Leaving DNSSEC and restarting my docker container seems to have handled it. I also tested switching between some other combinations of DNS configurations and it really struggles. Seems like FTL just shuts down and doesn't bother trying to come up...

[2022-03-23 00:23:33.042 871M] Shutting down...
[2022-03-23 00:23:33.305 871M] Finished final database update (stored 55 queries)
[2022-03-23 00:23:33.305 871M] Waiting for threads to join
[2022-03-23 00:23:33.305 871M] Thread telnet-IPv4 (0) is idle, terminating it.
[2022-03-23 00:23:33.305 871M] Thread telnet-socket (2) is idle, terminating it.
[2022-03-23 00:23:33.305 871M] Thread database (3) is idle, terminating it.
[2022-03-23 00:23:33.305 871M] Thread housekeeper (4) is idle, terminating it.
[2022-03-23 00:23:33.305 871M] Thread DNS client (5) is idle, terminating it.
[2022-03-23 00:23:33.305 871M] All threads joined
[2022-03-23 00:23:33.306 871M] ########## FTL terminated after 6m 24s (code 0)! ##########

And nothing thereafter, but saving again (even with the same configuration which it just failed to start) will start FTL again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests