Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTL crashing while DHCP server is enabled #718

Closed
3 tasks done
igalita opened this issue Mar 30, 2020 · 65 comments · Fixed by #723
Closed
3 tasks done

FTL crashing while DHCP server is enabled #718

igalita opened this issue Mar 30, 2020 · 65 comments · Fixed by #723
Assignees
Labels
Milestone

Comments

@igalita
Copy link

igalita commented Mar 30, 2020

In raising this issue, I confirm the following (please check boxes, eg [X]) Failure to fill the template will close your issue:

  • I have read and understood the contributors guide.
  • The issue I am reporting can be replicated
  • The issue I am reporting isn't a duplicate

How familiar are you with the codebase?:

7


[ISSUE] Expected Behaviour:

After update to vDev-b6364d0, FTL should not crash while DHCP server is enabled.

[BUG] Actual Behaviour:

FTL became unstable after update to vDev-b6364d0. It crashes after 10-30 seconds.

[BUG | ISSUE] Steps to reproduce:

Enable DHCP server. FTL should crash soon after that.

Log file output [if available]

https://tricorder.pi-hole.net/dli98t4why

(gdb) handle SIGHUP nostop SIGPIPE nostop
Signal        Stop      Print   Pass to program Description
SIGHUP        No        Yes     Yes             Hangup
SIGPIPE       No        Yes     Yes             Broken pipe
(gdb) continue
Continuing.
[Detaching after fork from child process 5222]
[New Thread 0x739ff460 (LWP 5242)]
[Thread 0x739ff460 (LWP 5242) exited]
[New Thread 0x739ff460 (LWP 5243)]
[Thread 0x739ff460 (LWP 5243) exited]
(gdb) [Detaching after fork from child process 5244]

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x75369460 (LWP 5191)]
__strchrnul (s=0x2 <error: Cannot access memory at address 0x2>, c_in=37) at strchrnul.c:50
50      strchrnul.c: No such file or directory.
(gdb) backtrace
#0  __strchrnul (s=0x2 <error: Cannot access memory at address 0x2>, c_in=37) at strchrnul.c:50
#1  0x76d75174 in __find_specmb (format=0x2 <error: Cannot access memory at address 0x2>) at printf-parse.h:108
#2  _IO_vfprintf_internal (s=s@entry=0x75368ab8, format=format@entry=0x2 <error: Cannot access memory at address 0x2>, ap=..., ap@entry=...) at vfprintf.c:1315
#3  0x76e1d024 in __GI___vasprintf_chk (result_ptr=result_ptr@entry=0x75368c60, flags=flags@entry=1, format=0x2 <error: Cannot access memory at address 0x2>, format@entry=0x0, args=..., args@entry=...) at vasprintf_chk.c:66
#4  0x76e1cf30 in __asprintf_chk (result_ptr=result_ptr@entry=0x75368c60, flags=flags@entry=1, format=0x2 <error: Cannot access memory at address 0x2>) at asprintf_chk.c:32
#5  0x00448a76 in asprintf (__fmt=0x532f5c "SELECT id FROM network WHERE hwaddr = '%s';", __ptr=0x75368c60) at /usr/arm-linux-gnueabihf/include/bits/stdio2.h:178
#6  parse_neighbor_cache () at src/database/network-table.c:325
#7  0x0044b5e8 in DB_thread (val=<optimized out>) at src/database/database-thread.c:64
#8  0x76e88494 in start_thread (arg=0x75369460) at pthread_create.c:486
#9  0x76e0b578 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Device specifics

Hardware Type: rPi 3B+
OS: Raspbian Buster

This template was created based on the work of udemy-dl.

@DL6ER
Copy link
Member

DL6ER commented Mar 30, 2020

I'm running FTL (with the exact same commit) as DHCP server locally as well and am not able to reproduce this issue. Could you run the following debug instructions when you observe the crash in the debugger? This should help us narrowing down the issue.

p linebuffer
p num
p ip
p iface
p hwaddr

Thanks a lot!

@DL6ER DL6ER added the Bug label Mar 30, 2020
@DL6ER DL6ER added this to the v5.0 milestone Mar 30, 2020
@DL6ER DL6ER self-assigned this Mar 30, 2020
@TylerVigario
Copy link

TylerVigario commented Mar 30, 2020

Same symptoms as above as well as another who posted today on Reddit.

This is in pihole-FTL.log

[2020-03-30 15:52:00.376 14894] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-03-30 15:52:00.377 14894] ---------------------------->  FTL crashed!  <----------------------------
[2020-03-30 15:52:00.377 14894] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-03-30 15:52:00.378 14894] Please report a bug at https://github.com/pi-hole/FTL/issues
[2020-03-30 15:52:00.378 14894] and include in your report already the following details:
[2020-03-30 15:52:00.379 14894] FTL has been running for 20 seconds
[2020-03-30 15:52:00.379 14894] FTL branch: release/v5.0
[2020-03-30 15:52:00.380 14894] FTL version: vDev-b6364d0
[2020-03-30 15:52:00.380 14894] FTL commit: b6364d0
[2020-03-30 15:52:00.381 14894] FTL date: 2020-03-29 23:23:02 +0200
[2020-03-30 15:52:00.381 14894] FTL user: started as pihole, ended as pihole
[2020-03-30 15:52:00.382 14894] Compiled for x86_64 (compiled on CI) using gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
[2020-03-30 15:52:00.382 14894] Received signal: Segmentation fault
[2020-03-30 15:52:00.382 14894]      at address: 0x200000001
[2020-03-30 15:52:00.383 14894]      with code: SEGV_MAPERR (Address not mapped to object)
[2020-03-30 15:52:00.384 14894] Backtrace:
[2020-03-30 15:52:00.384 14894] B[0000]: 0x5635dd526189, /usr/bin/pihole-FTL(+0x31189) [0x5635dd526189]
[2020-03-30 15:52:00.385 14894] B[0001]: 0x7f82ed462dc0, /lib64/libpthread.so.0(+0x12dc0) [0x7f82ed462dc0]
[2020-03-30 15:52:00.385 14894] B[0002]: 0x7f82ed19543a, /lib64/libc.so.6(__vasprintf_chk+0x10a) [0x7f82ed19543a]
[2020-03-30 15:52:00.386 14894] B[0003]: 0x7f82ed195313, /lib64/libc.so.6(__asprintf_chk+0x93) [0x7f82ed195313]
[2020-03-30 15:52:00.386 14894] B[0004]: 0x5635dd51bc78, /usr/bin/pihole-FTL(parse_neighbor_cache+0x248) [0x5635dd51bc78]
[2020-03-30 15:52:00.387 14894] B[0005]: 0x5635dd51f87d, /usr/bin/pihole-FTL(DB_thread+0x10d) [0x5635dd51f87d]
[2020-03-30 15:52:00.387 14894] B[0006]: 0x7f82ed4582de, /lib64/libpthread.so.0(+0x82de) [0x7f82ed4582de]
[2020-03-30 15:52:00.387 14894] B[0007]: 0x7f82ed184133, /lib64/libc.so.6(clone+0x43) [0x7f82ed184133]
[2020-03-30 15:52:00.388 14894] ------ Listing content of directory /dev/shm ------
[2020-03-30 15:52:00.388 14894] File Mode User:Group  Filesize Filename
[2020-03-30 15:52:00.389 14894] rwxrwxrwx root:root 300 .
[2020-03-30 15:52:00.390 14894] rwxr-xr-x root:root 4K ..
[2020-03-30 15:52:00.390 14894] rw------- pihole:pihole 4K FTL-per-client-regex
[2020-03-30 15:52:00.390 14894] rw------- pihole:pihole 4K FTL-dns-cache
[2020-03-30 15:52:00.391 14894] rw------- pihole:pihole 12K FTL-overTime
[2020-03-30 15:52:00.391 14894] rw------- pihole:pihole 5M FTL-queries
[2020-03-30 15:52:00.392 14894] rw------- pihole:pihole 4K FTL-upstreams
[2020-03-30 15:52:00.392 14894] rw------- pihole:pihole 340K FTL-clients
[2020-03-30 15:52:00.393 14894] rw------- pihole:pihole 98K FTL-domains
[2020-03-30 15:52:00.393 14894] rw------- pihole:pihole 106K FTL-strings
[2020-03-30 15:52:00.394 14894] rw------- pihole:pihole 12 FTL-settings
[2020-03-30 15:52:00.394 14894] rw------- pihole:pihole 124 FTL-counters
[2020-03-30 15:52:00.395 14894] rw------- pihole:pihole 48 FTL-lock
[2020-03-30 15:52:00.395 14894] rw-r--r-- plex:plex 32 sem.F8EEB26D-AB23-4061-8250-130AB5285A4A
[2020-03-30 15:52:00.396 14894] rw-r--r-- plex:plex 32 sem.9334581e-7251-4ef7-a8ec-5bfe8e89ff68
[2020-03-30 15:52:00.396 14894] ---------------------------------------------------
[2020-03-30 15:52:00.397 14894] Thank you for helping us to improve our FTL engine!
[2020-03-30 15:52:00.397 14894] FTL terminated!

@unbekannt3
Copy link

Same issue with a Pi4 but not running a DHCP server on it

[2020-03-31 02:42:00.180 991] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-03-31 02:42:00.180 991] ---------------------------->  FTL crashed!  <----------------------------
[2020-03-31 02:42:00.180 991] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-03-31 02:42:00.180 991] Please report a bug at https://github.com/pi-hole/FTL/issues
[2020-03-31 02:42:00.180 991] and include in your report already the following details:
[2020-03-31 02:42:00.180 991] FTL has been running for 25 seconds
[2020-03-31 02:42:00.180 991] FTL branch: release/v5.0
[2020-03-31 02:42:00.180 991] FTL version: vDev-b6364d0
[2020-03-31 02:42:00.180 991] FTL commit: b6364d0
[2020-03-31 02:42:00.180 991] FTL date: 2020-03-29 23:23:02 +0200
[2020-03-31 02:42:00.180 991] FTL user: started as pihole, ended as pihole
[2020-03-31 02:42:00.180 991] Compiled for armhf (compiled on CI) using arm-linux-gnueabihf-gcc (Debian 6.3.0-18) 6.3.0 20170516
[2020-03-31 02:42:00.181 991] Received signal: Segmentation fault
[2020-03-31 02:42:00.181 991]      at address: 0xb6cea9ec
[2020-03-31 02:42:00.181 991]      with code: SEGV_MAPERR (Address not mapped to object)
[2020-03-31 02:42:00.181 991] Backtrace:
[2020-03-31 02:42:00.181 991] B[0000]: 0x51c618, /usr/bin/pihole-FTL(+0x23618) [0x51c618]
[2020-03-31 02:42:00.181 991] B[0001]: 0xb6d4d130, /lib/arm-linux-gnueabihf/libc.so.6(__default_rt_sa_restorer+0) [0xb6d4d130]
[2020-03-31 02:42:00.181 991] B[0002]: 0x521b50, /usr/bin/pihole-FTL(resolveClients+0xb7) [0x521b50]
[2020-03-31 02:42:00.181 991] B[0003]: 0x521d7e, /usr/bin/pihole-FTL(DNSclient_thread+0x85) [0x521d7e]
[2020-03-31 02:42:00.181 991] ------ Listing content of directory /dev/shm ------
[2020-03-31 02:42:00.182 991] File Mode User:Group  Filesize Filename
[2020-03-31 02:42:00.182 991] rwxrwxrwx root:root 260 .
[2020-03-31 02:42:00.182 991] rwxr-xr-x root:root 4K ..
[2020-03-31 02:42:00.182 991] rw------- pihole:pihole 4K FTL-per-client-regex
[2020-03-31 02:42:00.182 991] rw------- pihole:pihole 4K FTL-dns-cache
[2020-03-31 02:42:00.183 991] rw------- pihole:pihole 29K FTL-overTime
[2020-03-31 02:42:00.183 991] rw------- pihole:pihole 18M FTL-queries
[2020-03-31 02:42:00.183 991] rw------- pihole:pihole 20K FTL-upstreams
[2020-03-31 02:42:00.183 991] rw------- pihole:pihole 41K FTL-clients
[2020-03-31 02:42:00.183 991] rw------- pihole:pihole 66K FTL-domains
[2020-03-31 02:42:00.184 991] rw------- pihole:pihole 45K FTL-strings
[2020-03-31 02:42:00.184 991] rw------- pihole:pihole 12 FTL-settings
[2020-03-31 02:42:00.184 991] rw------- pihole:pihole 124 FTL-counters
[2020-03-31 02:42:00.184 991] rw------- pihole:pihole 28 FTL-lock
[2020-03-31 02:42:00.184 991] ---------------------------------------------------
[2020-03-31 02:42:00.184 991] Thank you for helping us to improve our FTL engine!
[2020-03-31 02:42:00.184 991] FTL terminated!

@DL6ER
Copy link
Member

DL6ER commented Mar 31, 2020

I'm not yet convinced @unbekannt3DE 's bug is the same as in the other two backtraces (which are very likely the exact same thing), however, it's possible that they have the same origin somewhere else. And I wasn't expecting the DHCP server to be the reason, so that matches with @unbekannt3DE's observation.

Could you two also run FTL in the debugger? See https://docs.pi-hole.net/ftldns/debugging/ for more details.

@unbekannt3
Copy link

I restarted my Pi 4 several times tonight and then restarted the FTL service manually about 3 hours ago. It hasn't crashed again since.

@igalita
Copy link
Author

igalita commented Mar 31, 2020

@DL6ER - Sorry for the late reply.

Here is the output you requested - I hope I did it right :)

[New LWP 20695]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
__GI___poll (timeout=-1, nfds=9, fds=0x1e0c1d8) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) handle SIGHUP nostop SIGPIPE nostop
Signal        Stop      Print   Pass to program Description
SIGHUP        No        Yes     Yes             Hangup
SIGPIPE       No        Yes     Yes             Broken pipe
(gdb) continue
Continuing.
[New Thread 0x72dff460 (LWP 20716)]
[Thread 0x72dff460 (LWP 20716) exited]
[New Thread 0x72dff460 (LWP 20717)]
[Thread 0x72dff460 (LWP 20717) exited]
[New Thread 0x72dff460 (LWP 20718)]
[Thread 0x72dff460 (LWP 20718) exited]
[New Thread 0x72dff460 (LWP 20719)]
[Thread 0x72dff460 (LWP 20719) exited]
[New Thread 0x72dff460 (LWP 20720)]
[Thread 0x72dff460 (LWP 20720) exited]
[New Thread 0x72dff460 (LWP 20721)]
[Thread 0x72dff460 (LWP 20721) exited]
[New Thread 0x72dff460 (LWP 20722)]
[Thread 0x72dff460 (LWP 20722) exited]
[New Thread 0x72dff460 (LWP 20725)]
[Thread 0x72dff460 (LWP 20725) exited]
[New Thread 0x72dff460 (LWP 20726)]
[Thread 0x72dff460 (LWP 20726) exited]
[New Thread 0x72dff460 (LWP 20727)]
[Thread 0x72dff460 (LWP 20727) exited]
[New Thread 0x72dff460 (LWP 20728)]
[Thread 0x72dff460 (LWP 20728) exited]
[New Thread 0x72dff460 (LWP 20729)]
[Thread 0x72dff460 (LWP 20729) exited]
[New Thread 0x72dff460 (LWP 20730)]
[Thread 0x72dff460 (LWP 20730) exited]
[New Thread 0x72dff460 (LWP 20731)]
[Thread 0x72dff460 (LWP 20731) exited]
[New Thread 0x72dff460 (LWP 20732)]
[Thread 0x72dff460 (LWP 20732) exited]
[New Thread 0x72dff460 (LWP 20733)]
[Thread 0x72dff460 (LWP 20733) exited]
[New Thread 0x72dff460 (LWP 20734)]
[Thread 0x72dff460 (LWP 20734) exited]
[New Thread 0x72dff460 (LWP 20735)]
[Detaching after fork from child process 20736]

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x75201460 (LWP 20693)]
__strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37) at strchrnul.c:50
50      strchrnul.c: No such file or directory.
(gdb) p linebuffer
$1 = 0x0
(gdb) p num
$2 = num
(gdb) p ip
No symbol "ip" in current context.
(gdb) p iface
No symbol "iface" in current context.
(gdb) p hwaddr
No symbol "hwaddr" in current context.
(gdb)

Edit:

Also, maybe this helps. These strange MAC's were always there, but maybe with the last FTL update something changed ...

1

@DL6ER
Copy link
Member

DL6ER commented Mar 31, 2020

@ionutgalita No worries. Unfortunately, the crash happened in another place this time so the instructions I gave you were useless. Could you try this again maybe once or twice?
And please always include a backtrace output as well.

@pralor-bot
Copy link

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/ftl-crash-after-update-v4-3-1/29920/28

@igalita
Copy link
Author

igalita commented Mar 31, 2020

@DL6ER
I updated FTL to version vDev-81c4eac and everything seems to work now.

I will report again tomorrow to confirm.

@TylerVigario
Copy link

Seems like I got to the same place as above

[root@tylervigario ~]# pihole restartdns
  [✓] Restarting DNS server
[root@tylervigario ~]# sudo gdb -p $(pidof pihole-FTL)
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-6.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 13040
[New LWP 13041]
[New LWP 13042]
[New LWP 13043]
[New LWP 13044]
[New LWP 13045]
[New LWP 13046]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

warning: Loadable section ".note.gnu.property" outside of ELF segments
0x00007f8d809a1211 in poll () from /lib64/libc.so.6
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-72.el8_1.1.x86_64 sssd-client-2.2.0-19.el8.x86_64
(gdb) handle SIGHUP nostop SIGPIPE nostop
Signal        Stop      Print   Pass to program Description
SIGHUP        No        Yes     Yes             Hangup
SIGPIPE       No        Yes     Yes             Broken pipe
(gdb) continue
Continuing.
[Detaching after fork from child process 13069]

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f8d7dd4c700 (LWP 13044)]
0x00007f8d809bd43a in __vasprintf_chk () from /lib64/libc.so.6
(gdb) continue
Continuing.
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments

Thread 5 "database" received signal SIGABRT, Aborted.
0x00007f8d808e78df in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: yum debuginfo-install libblkid-2.32.1-17.el8.x86_64 libgcc-8.3.1-4.5.el8.x86_64 libmount-2.32.1-17.el8.x86_64 libselinux-2.9-2.1.el8.x86_64 libuuid-2.32.1-17.el8.x86_64 pcre2-10.32-1.el8.x86_64 systemd-libs-239-18.el8_1.4.x86_64
(gdb) backtrace
#0  0x00007f8d808e78df in raise () from /lib64/libc.so.6
#1  0x00007f8d808d1cf5 in abort () from /lib64/libc.so.6
#2  0x000056264d42f7f6 in SIGSEGV_handler (sig=<optimized out>, si=<optimized out>, unused=<optimized out>)
    at src/signals.c:80
#3  <signal handler called>
#4  0x00007f8d809bd43a in __vasprintf_chk () from /lib64/libc.so.6
#5  0x00007f8d809bd313 in __asprintf_chk () from /lib64/libc.so.6
#6  0x000056264d424c10 in asprintf (__fmt=0x56264d5816c8 "SELECT id FROM network WHERE hwaddr = '%s';",
    __ptr=0x7f8d00000001) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:178
#7  parse_neighbor_cache () at src/database/network-table.c:382
#8  0x000056264d428e03 in DB_thread (val=<optimized out>) at src/database/database-thread.c:68
#9  0x00007f8d80c802de in start_thread () from /lib64/libpthread.so.0
#10 0x00007f8d809ac133 in clone () from /lib64/libc.so.6
(gdb) p linebuffer
$1 = 0x0
(gdb) p num
No symbol "num" in current context.
(gdb) p ip
No symbol "ip" in current context.
(gdb) p iface
No symbol "iface" in current context.
(gdb) p hwaddr
No symbol "hwaddr" in current context.
(gdb)

@igalita
Copy link
Author

igalita commented Apr 1, 2020

@MeekLogic

It happend to me also, run pihole -r (repair) and everything should be fixed.

@TylerVigario
Copy link

@MeekLogic

It happend to me also, run pihole -r (repair) and everything should be fixed.

Unfortunately, this did not fix the issue for me.

FTL still crashes like crazy and I feel like I am not getting the desired output from the debugger.

@igalita
Copy link
Author

igalita commented Apr 1, 2020

@MeekLogic

Sorry to hear that. Worked for me (at least for the last ~9 hours)

I am on release/5.0 branch (81c4eac).

In case you use DHCP, try flushing Network table (Settings > Flush network table), restart FTP from cli sudo service pihole-FTL restartand after run again sudo pihole -r.

This is exactly what I did last night and now it seems to work.

@TylerVigario
Copy link

@ionutgalita Thanks for the help but I'm pretty sure @DL6ER just needs the right debug output from one of us. He's stated the "core" teams' inability to produce it "in-house", so that means it lands on one of us to work with them.

@igalita
Copy link
Author

igalita commented Apr 1, 2020

@DL6ER

Yes, I know. Watching closely to see if my installation crashes again.

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

Ah, yeah, sorry. We coded FTL asynchronously multi-threaded so it can do multiple tasks at the same time independently. This is to ensure highest performance and low delays for DNS resolution.
Debugging such applications is generally more tricky.

When you see an output like

Thread 5 "database" received signal ...

this indicates in which thread the issue occurred. Thurs does not mean that gdb will also analyze this thread. Extract the number from above's message (5) and switch the context using:

thread 5

Then run the p ... commands again. Debugging multi-threaded code can become much more complex, but let's that we won't need this for now.

@pralor-bot
Copy link

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/ftl-crash-after-update-v4-3-1/29920/47

@FizzGiGG
Copy link

FizzGiGG commented Apr 1, 2020

FWIW I do not run, nor have I ever, DHCP Server. My crashing issue started last night with the update to vDev-81c4eac, I believe.

https://tricorder.pi-hole.net/zo2xrt03ql

@TylerVigario
Copy link

@DL6ER Got it, thanks for that information.

If you need further information just let me know. I'll do my best to be thorough and prompt.

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f426655c700 (LWP 1781)]
0x00007f42691cd43a in __vasprintf_chk () from /lib64/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f426655c700 (LWP 1781))]
#0  0x00007f42691cd43a in __vasprintf_chk () from /lib64/libc.so.6
(gdb) backtrace
#0  0x00007f42691cd43a in __vasprintf_chk () from /lib64/libc.so.6
#1  0x00007f42691cd313 in __asprintf_chk () from /lib64/libc.so.6
#2  0x0000564f91ca0c10 in asprintf (__fmt=0x564f91dfd6c8 "SELECT id FROM network WHERE hwaddr = '%s';",
    __ptr=0x7f4200000001) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:178
#3  parse_neighbor_cache () at src/database/network-table.c:382
#4  0x0000564f91ca4e03 in DB_thread (val=<optimized out>) at src/database/database-thread.c:68
#5  0x00007f42694902de in start_thread () from /lib64/libpthread.so.0
#6  0x00007f42691bc133 in clone () from /lib64/libc.so.6
(gdb) p linebuffer
$1 = 0x0
(gdb) p num
No symbol "num" in current context.
(gdb) p ip
No symbol "ip" in current context.
(gdb) p iface
No symbol "iface" in current context.
(gdb) p hwaddr
No symbol "hwaddr" in current context.
(gdb) continue
Continuing.
[New Thread 0x7f425df34700 (LWP 7260)]

Thread 5 "database" received signal SIGABRT, Aborted.
0x00007f42690f78df in raise () from /lib64/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f426655c700 (LWP 1781))]
#0  0x00007f42690f78df in raise () from /lib64/libc.so.6
(gdb) backtrace
#0  0x00007f42690f78df in raise () from /lib64/libc.so.6
#1  0x00007f42690e1cf5 in abort () from /lib64/libc.so.6
#2  0x0000564f91cab7f6 in SIGSEGV_handler (sig=<optimized out>, si=<optimized out>, unused=<optimized out>)
    at src/signals.c:80
#3  <signal handler called>
#4  0x00007f42691cd43a in __vasprintf_chk () from /lib64/libc.so.6
#5  0x00007f42691cd313 in __asprintf_chk () from /lib64/libc.so.6
#6  0x0000564f91ca0c10 in asprintf (__fmt=0x564f91dfd6c8 "SELECT id FROM network WHERE hwaddr = '%s';",
    __ptr=0x7f4200000001) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:178
#7  parse_neighbor_cache () at src/database/network-table.c:382
#8  0x0000564f91ca4e03 in DB_thread (val=<optimized out>) at src/database/database-thread.c:68
#9  0x00007f42694902de in start_thread () from /lib64/libpthread.so.0
#10 0x00007f42691bc133 in clone () from /lib64/libc.so.6
(gdb) p linebuffer
$2 = 0x0
(gdb) p num
No symbol "num" in current context.
(gdb) p ip
No symbol "ip" in current context.
(gdb) p iface
No symbol "iface" in current context.
(gdb) p hwaddr
No symbol "hwaddr" in current context.
(gdb) continue
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) [Thread 0x7f425df34700 (LWP 7260) exited]
[Thread 0x7f426555a700 (LWP 1783) exited]
[Thread 0x7f4265d5b700 (LWP 1782) exited]
[Thread 0x7f426655c700 (LWP 1781) exited]
[Thread 0x7f4266d5d700 (LWP 1780) exited]
[Thread 0x7f4267d5f700 (LWP 1778) exited]
[Thread 0x7f4269acd740 (LWP 1777) exited]

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

Thanks, what you did was almost correct, just don't run a continue in between, because FTL will, well, try to continue and debugging doesn't work because FTL starts to do something else.

@igalita
Copy link
Author

igalita commented Apr 1, 2020

@DL6ER

I hope is correct now :)

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7513c460 (LWP 17761)]
__strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37)
    at strchrnul.c:50
50      strchrnul.c: No such file or directory.
(gdb) thread 5
[Switching to thread 5 (Thread 0x7513c460 (LWP 17761))]
#0  __strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37)
    at strchrnul.c:50
50      in strchrnul.c
(gdb) backtrace
#0  __strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37)
    at strchrnul.c:50
#1  0x76d44174 in __find_specmb (
    format=0x1 <error: Cannot access memory at address 0x1>)
    at printf-parse.h:108
#2  _IO_vfprintf_internal (s=s@entry=0x7513ba68,
    format=format@entry=0x1 <error: Cannot access memory at address 0x1>,
    ap=..., ap@entry=...) at vfprintf.c:1315
#3  0x76dec024 in __GI___vasprintf_chk (
    result_ptr=result_ptr@entry=0x7513bc60, flags=flags@entry=1,
    format=0x1 <error: Cannot access memory at address 0x1>, format@entry=0x0,
    args=..., args@entry=...) at vasprintf_chk.c:66
#4  0x76debf30 in __asprintf_chk (result_ptr=result_ptr@entry=0x7513bc60,
    flags=flags@entry=1,
    format=0x1 <error: Cannot access memory at address 0x1>)
    at asprintf_chk.c:32
#5  0x004edb2a in asprintf (
    __fmt=0x5d85d4 "SELECT id FROM network WHERE hwaddr = '%s';",
    __ptr=0x7513bc60) at /usr/arm-linux-gnueabihf/include/bits/stdio2.h:178
#6  parse_neighbor_cache () at src/database/network-table.c:382
#7  0x004f0ae0 in DB_thread (val=<optimized out>)
    at src/database/database-thread.c:68
#8  0x76e57494 in start_thread (arg=0x7513c460) at pthread_create.c:486
--Type <RET> for more, q to quit, c to continue without paging--c
#9  0x76dda578 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p num
$1 = num
(gdb) p ip
No symbol "ip" in current context.
(gdb) p iface
No symbol "iface" in current context.
(gdb) p hwaddr
No symbol "hwaddr" in current context.
(gdb) p linebuffer
$2 = 0x0
(gdb)

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

Hmm, this was done correctly, but it still doesn't show the information I wanted to see, but, well, this is going to be a bit complicated, could you try:

pihole checkout ftl fix/neighcrash

?

This will also print some more debugging output.

edit Renamed branch. Just in case someone else tries this as well.

@igalita
Copy link
Author

igalita commented Apr 1, 2020

@DL6ER

pi@pi:~ $ pihole checkout ftl fix/neigh_crash
  Please note that changing branches severely alters your Pi-hole subsystems
  Features that work on the master branch, may not on a development branch
  This feature is NOT supported unless a Pi-hole developer explicitly asks!
  Have you read and understood this? [y/N] y

  [✗] Requested branch "fix/neigh_crash" is not available
  [i] Available branches for FTL are:
      - FTLDNS
      - bughaunting/overTime
      - development
      - feature/deb-and-rpm
      - fix/dnssec-retry-crash
      - fix/issue_template
      - fix/msatter_crazy_IPv6
      - fix/neigh_crash
      - master
      - master-find-missing-reference
      - new/GeoIP
      - new/all_clients_network_table_no-auto
      - new/http
      - release/v5.0
      - revert-689-ltaub
      - tweak/external_blocked_IPs
      - tweak/log_upstream_errors
      - tweak/remove-unused-FTL-var
      - update/dnsmasq

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

You were likely too fast for our automated build system, please try again.

@igalita
Copy link
Author

igalita commented Apr 1, 2020

@DL6ER

I am .. same result until now

@yubiuser
Copy link
Member

yubiuser commented Apr 1, 2020

I'm not affected by the bug, but I can checkout the branch

nanopi@nanopi:~$ pihole checkout ftl fix/neigh_crash
  Please note that changing branches severely alters your Pi-hole subsystems
  Features that work on the master branch, may not on a development branch
  This feature is NOT supported unless a Pi-hole developer explicitly asks!
  Have you read and understood this? [y/N] y

  [✓] Branch fix/neigh_crash exists
  [✓] Downloading and Installing FTL
  [✓] Restarting pihole-FTL service...
  [✓] Enabling pihole-FTL service to start on reboot...

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

@ionutgalita No need to ping me every time ;-)

[✗] Requested branch "fix/neigh_crash" is not available

but below there is:

  • fix/neigh_crash

Can you check if maybe the underscore was copied as some strange special character? This happens with github from time to time, maybe try

pihole checkout ftl fix/neighcrash

(without underscore), which I just created.

@yubiuser I renamed the branch, the other one will not be maintained, just to mention.

@TylerVigario
Copy link

TylerVigario commented Apr 1, 2020

Unfortunately, I'm not too sure about that.

Memory usage hovers around 2.1GB (8GB total) even when running Pihole multiple times.

If you have any more precise methods I'd be more than happy to do that.

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

For those of you still seeing the crash, could you try something else as well?
We still need to isolate where the code leading to the crash was introduced. I ran pihole-FTL through a full memory leak check and nothing was found, so a leak seems rather unlikely (however, it may still be triggerable only under very specific conditions).

pihole checkout ftl 71e8498

pihole checkout ftl f9476dd

pihole checkout ftl 7759a76

pihole checkout ftl ec0579a

The first is before #708 and #711 were merged, the second is with only #708, the third is with only #711, the fourth one is with both of them.

If one of them fails to check out, they are still being built. I pushed all four at the same time, this will keep our binary building jobs busy for some time. In this case, please try again later.

@TylerVigario
Copy link

TylerVigario commented Apr 1, 2020

Did in reverse order

71e8498 did not immediately crash

f9476dd did not immediately crash

7759a76 did not immediately crash

ec0579a

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ff583cdc700 (LWP 1276)]
0x00007ff58694d43a in __vasprintf_chk () from /lib64/libc.so.6
(gdb) backtrace
#0  0x00007ff58694d43a in __vasprintf_chk () from /lib64/libc.so.6
#1  0x00007ff58694d313 in __asprintf_chk () from /lib64/libc.so.6
#2  0x000056488c500c38 in asprintf (__fmt=0x56488c65cd98 "SELECT id FROM network WHERE hwaddr = '%s';",
    __ptr=0x7ff500000001) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:178
#3  parse_neighbor_cache () at src/database/network-table.c:325
#4  0x000056488c50483d in DB_thread (val=<optimized out>) at src/database/database-thread.c:64
#5  0x00007ff586c102de in start_thread () from /lib64/libpthread.so.0
#6  0x00007ff58693c133 in clone () from /lib64/libc.so.6

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

Thanks. Another two then:

pihole checkout ftl 149f656
pihole checkout ftl c3147cc

(the two commits between 7759a76 and ec0579a)

You should try the 149f656 first, as it is the first one that the CI will build binaries for (will be available earlier).

@TylerVigario
Copy link

TylerVigario commented Apr 1, 2020

149f656 did not immediately crash

c3147cc crashed immediately

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

This is a really tough bug. The only difference in c3147cc is the introduction of the SQLite3 extension for CIDR filtering. Please test, once again,

pihole checkout ftl fix/neighcrash

I added an explicit initialization of the SQLite3 library here. We never used to do this, however, it's the only thing I could see right now.

@TylerVigario
Copy link

The same results, unfortunately.

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

Okay, thanks for your assistance, another try...

This time, I added a new config option:

USE_SQLITE3_EXTENSION

defaulting to true.

The expected behavior is that not setting this (or setting it to true) in /etc/pihole/pihole-FTL.conf crashes while FTL runs stable when setting USE_SQLITE3_EXTENSION=false in said file.

Can you confirm this? If so, then we know that it is the registration of the SQLite3 extension we wrote recently. It's completely unclear to me, so far, how this can crash for some users but not for the majority. And also why it crashes FTL at some very different location in the code.

@TylerVigario
Copy link

Both setups crashed the same.

I updated to latest fix/neighcrash

And then tried again with USE_SQLITE3_EXTENSION=false in /etc/pihole/pihole-FTL.conf

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

This is bizarre, could you test again with 149f656 if this really doesn't crash?

@TylerVigario
Copy link

This is bizarre, could you test again with 149f656 if this really doesn't crash?

My apologies, I was just able to get 149f656 to crash as well.

I'll test previous versions more thoroughly.

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

Well, so, at least we know now that it is not the SQLite3 extension and I reviewed this code now three times very carefully and added some minor optimizations ;-)

Onto the next one...

@DL6ER
Copy link
Member

DL6ER commented Apr 1, 2020

It's past midnight here, maybe you can find out with some certainty which of the commits above do crash and which does not. I also pushed another change to fix/neighcrash, but I don't really expect this to be stable all of a sudden.

Thanks for all your assistance, I do really appreciate it!

@TylerVigario
Copy link

Will do and likewise thank you for all your contributions. I'll follow up later on which commit started the mess.

@TylerVigario
Copy link

71e8498 did not experience a crash

f9476dd did not experience a crash

7759a76 crashes

149f656 crashes

c3147cc crashes

ec0579a crashes

fix/neighcrash crashes

@DL6ER
Copy link
Member

DL6ER commented Apr 2, 2020

Visualizing this: Red = crashed, Green = Did not experience a crash. I assume f9476dd would have experienced a crash at some point, too. It wouldn't make sense to have this one isolated commit working while the ones before and behind don't work.

Screenshot_2020-04-02 pi-hole FTL

@DL6ER
Copy link
Member

DL6ER commented Apr 2, 2020

@ionutgalita I looked at the difference (quite a lot) between 71e8498 and 7759a76. It's a lot.

Please try again with fix/neighcrash (it will take quite some time until this is ready), I fixed a small memory glitch which might actually have caused something like this.

However, please try only in roughly one hour from now (I'll already be at work then), as the CI is currently also building what you can find below (fun testing if my solution here doesn't work).

You can check whether you have the most recent version by issuing

pihole-FTL -v

which should return

v4.3.1-630-g ---> 203057c <---

(arrows added by me).

Some new to try *only if the new fix/neighcrash doesn't work):

pihole checkout ftl 0b8915a
pihole checkout ftl 33b638c
pihole checkout ftl 0443a5a
pihole checkout ftl 99bd5e7
pihole checkout ftl 302dbae
pihole checkout ftl 623f079

@bigpcjunky
Copy link

So FTL Version vDev (fix/neighcrash, vDev-203057c) crashes if I have DHCP on and IPv6 (SLAAC + RA) on. If I turn off IPv6 then save and restart system then it seems to run fine. If I turn !Pv6 back on save and restart it crashes again.

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x753b6460 (LWP 5402)]
0x00500d7a in parse_neighbor_cache () at src/database/network-table.c:442
442 src/database/network-table.c: No such file or directory.
(gdb) backtrace
#0 0x00500d7a in parse_neighbor_cache () at src/database/network-table.c:442
#1 0x00503bec in DB_thread (val=) at src/database/database-thread.c:68
#2 0x76e52494 in start_thread (arg=0x753b6460) at pthread_create.c:486
#3 0x76dd5578 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73
from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p linebuffer
$1 = 0x72e34888 "2601:645:301:3960:8113:4ddb:e93f:113d dev eth0 lladdr 80:c5:f2:9e:49:55 REACHABLE\n"
(gdb) p num
$2 =
(gdb) p ip
$3 = "2601:645:301:3960:8113:4ddb:e93f:113d\000\000\000h\371\341r\314\340\326v ];u@\227b\000\000\020\000\000\005\000\000@\227b\000\000\345\234T\000\345\234T\210\177b\000\000\000\000\000\300\005\340rH];u\000\000\000\000\300", <incomplete sequence \340> (gdb) p iface $4 = "eth0\000\016\000\000\001\000\000\000hi\361v\260kb\000\000\000\000\000\370\251^\000\320];u\220\016\000\000\345\000P\000\000\000\000\000\330];u\210\221b\000\001\000\000\000\250^\000hi\361v\260kb\000&\226\205^\002\000\000\000&\226\205^\224B\000\000\n\000\000\000\260kb\000\000\235b\000\260kb"
(gdb) p hwaddr
$5 = "80:c5:f2:9e:49:55\000\361v@\227b\000\000\345\234Tl\221b\000\300\005\340r\000\000\000\000\000\000\000\000\200];u\210\177b\000\001\000\000\000@\227b\000\210\177b\000\005\377W\000@\227b\000\000\000\000\000\b\247^\000\000\000\000\000\270];ul\221b\000\000\000\000\000\370\251^"
(gdb)

and

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x75492460 (LWP 2376)]
0x00487bea in parse_neighbor_cache () at src/database/network-table.c:389
389 src/database/network-table.c: No such file or directory.
(gdb) backtrace
#0 0x00487bea in parse_neighbor_cache () at src/database/network-table.c:389
#1 0x0048abec in DB_thread (val=) at src/database/database-thread.c:68
#2 0x76f2b494 in start_thread (arg=0x75492460) at pthread_create.c:486
#3 0x76eae578 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73
from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p linebuffer
$1 = 0x72f34888 "2601:645:301:3960:8ab1:11ff:fef1:1ff3 dev eth0 INCOMPLETE\n"
(gdb) p num
$2 = 2
(gdb) p ip
$3 = "2601:645:301:3960:8ab1:11ff:fef1:1ff3\000\000\000h\371\361r\314p\344v \035Iu@\a[\000\000\020\000\000U\000\000@\a[\000\000\302p\273\000\302p\273\210\357Z\000\000\000\000\000\300\005\360rH\035Iu\000\000\000\000\300", <incomplete sequence \360> (gdb) p iface $4 = "eth0\000\005\000\000\001\000\000\000h\371\376v\260\333Z\000\000\000\000\000\370\031W\000\320\035Iu\305\005\000\000\345pH\000\370\031W\000\330\035Iu\210\001[\000\001\000\000\000\030W\000h\371\376v\260\333Z\000ɉ\205^\027\000\000\000\241\211\205^\311\071\000\000h\000\000\000\260\333Z\000\000\r[\000\260\333Z"
(gdb) p hwaddr
$5 = "b0:fc:0d:4a:af:39\000\376v@\a[\000\000\302p\273l\001[\000\300\005\360r\000\000\000\000\000\000\000\000\200\035Iu\210\357Z\000\001\000\000\000@\a[\000\210\357Z\000\005oP\000@\a[\000\000\000\000\000\b\027W\000\000\000\000\000\270\035Iul\001[\000\000\000\000\000\370\031W"
(gdb)

@DL6ER
Copy link
Member

DL6ER commented Apr 2, 2020

@bigpcjunky Thanks, this was really useful! Please update fix/neighcrash once again. I'm fairly sure we have it now.

@pralor-bot
Copy link

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/ftl-crash-after-update-v4-3-1/29920/67

@TylerVigario
Copy link

TylerVigario commented Apr 2, 2020

I assume f9476dd would have experienced a crash at some point, too...

Unfortunately I'm not home at the moment but that's the build I left running and I just checked and it's still up.

@DL6ER
Copy link
Member

DL6ER commented Apr 2, 2020

@MeekLogic fix/neighcrash is now updated and should hopefully be stable as well. At least the crash reported by @bigpcjunky should be fixed in there.

@richie510
Copy link

I'm a little late to the party, but I had a raspberry pi 4 and a pi zero crashing for this problem. The latest fix/neighcrash fixed the problem for me on both of my systems.

@igalita
Copy link
Author

igalita commented Apr 2, 2020

@DL6ER

I also confirm that everything is stable now.

@DL6ER
Copy link
Member

DL6ER commented Apr 2, 2020

The fix has been merged into the regular beta code. Please checkout the release/v5.0 branch.
Thank you very much for your assistance!

@DL6ER DL6ER linked a pull request Apr 2, 2020 that will close this issue
5 tasks
@pralor-bot
Copy link

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/some-hostnames-do-not-resolve/29710/60

@Impedanz
Copy link

Got the same Problem 2024, brand new install in docker, after Enable DHCP the FTL crash and Pihole Shows "Lost API Connection"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants