New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FTL crashing while DHCP server is enabled #718
Comments
I'm running FTL (with the exact same commit) as DHCP server locally as well and am not able to reproduce this issue. Could you run the following debug instructions when you observe the crash in the debugger? This should help us narrowing down the issue.
Thanks a lot! |
Same symptoms as above as well as another who posted today on Reddit. This is in
|
Same issue with a Pi4 but not running a DHCP server on it
|
I'm not yet convinced @unbekannt3DE 's bug is the same as in the other two backtraces (which are very likely the exact same thing), however, it's possible that they have the same origin somewhere else. And I wasn't expecting the DHCP server to be the reason, so that matches with @unbekannt3DE's observation. Could you two also run |
I restarted my Pi 4 several times tonight and then restarted the FTL service manually about 3 hours ago. It hasn't crashed again since. |
@DL6ER - Sorry for the late reply. Here is the output you requested - I hope I did it right :) [New LWP 20695]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
__GI___poll (timeout=-1, nfds=9, fds=0x1e0c1d8) at ../sysdeps/unix/sysv/linux/poll.c:29
29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) handle SIGHUP nostop SIGPIPE nostop
Signal Stop Print Pass to program Description
SIGHUP No Yes Yes Hangup
SIGPIPE No Yes Yes Broken pipe
(gdb) continue
Continuing.
[New Thread 0x72dff460 (LWP 20716)]
[Thread 0x72dff460 (LWP 20716) exited]
[New Thread 0x72dff460 (LWP 20717)]
[Thread 0x72dff460 (LWP 20717) exited]
[New Thread 0x72dff460 (LWP 20718)]
[Thread 0x72dff460 (LWP 20718) exited]
[New Thread 0x72dff460 (LWP 20719)]
[Thread 0x72dff460 (LWP 20719) exited]
[New Thread 0x72dff460 (LWP 20720)]
[Thread 0x72dff460 (LWP 20720) exited]
[New Thread 0x72dff460 (LWP 20721)]
[Thread 0x72dff460 (LWP 20721) exited]
[New Thread 0x72dff460 (LWP 20722)]
[Thread 0x72dff460 (LWP 20722) exited]
[New Thread 0x72dff460 (LWP 20725)]
[Thread 0x72dff460 (LWP 20725) exited]
[New Thread 0x72dff460 (LWP 20726)]
[Thread 0x72dff460 (LWP 20726) exited]
[New Thread 0x72dff460 (LWP 20727)]
[Thread 0x72dff460 (LWP 20727) exited]
[New Thread 0x72dff460 (LWP 20728)]
[Thread 0x72dff460 (LWP 20728) exited]
[New Thread 0x72dff460 (LWP 20729)]
[Thread 0x72dff460 (LWP 20729) exited]
[New Thread 0x72dff460 (LWP 20730)]
[Thread 0x72dff460 (LWP 20730) exited]
[New Thread 0x72dff460 (LWP 20731)]
[Thread 0x72dff460 (LWP 20731) exited]
[New Thread 0x72dff460 (LWP 20732)]
[Thread 0x72dff460 (LWP 20732) exited]
[New Thread 0x72dff460 (LWP 20733)]
[Thread 0x72dff460 (LWP 20733) exited]
[New Thread 0x72dff460 (LWP 20734)]
[Thread 0x72dff460 (LWP 20734) exited]
[New Thread 0x72dff460 (LWP 20735)]
[Detaching after fork from child process 20736]
Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x75201460 (LWP 20693)]
__strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37) at strchrnul.c:50
50 strchrnul.c: No such file or directory.
(gdb) p linebuffer
$1 = 0x0
(gdb) p num
$2 = num
(gdb) p ip
No symbol "ip" in current context.
(gdb) p iface
No symbol "iface" in current context.
(gdb) p hwaddr
No symbol "hwaddr" in current context.
(gdb) Edit: Also, maybe this helps. These strange MAC's were always there, but maybe with the last FTL update something changed ... |
@ionutgalita No worries. Unfortunately, the crash happened in another place this time so the instructions I gave you were useless. Could you try this again maybe once or twice? |
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/ftl-crash-after-update-v4-3-1/29920/28 |
@DL6ER I will report again tomorrow to confirm. |
Seems like I got to the same place as above
|
@MeekLogic It happend to me also, run |
Unfortunately, this did not fix the issue for me. FTL still crashes like crazy and I feel like I am not getting the desired output from the debugger. |
@MeekLogic Sorry to hear that. Worked for me (at least for the last ~9 hours) I am on In case you use DHCP, try flushing Network table (Settings > Flush network table), restart FTP from cli This is exactly what I did last night and now it seems to work. |
@ionutgalita Thanks for the help but I'm pretty sure @DL6ER just needs the right debug output from one of us. He's stated the "core" teams' inability to produce it "in-house", so that means it lands on one of us to work with them. |
Yes, I know. Watching closely to see if my installation crashes again. |
Ah, yeah, sorry. We coded FTL asynchronously multi-threaded so it can do multiple tasks at the same time independently. This is to ensure highest performance and low delays for DNS resolution. When you see an output like
this indicates in which thread the issue occurred. Thurs does not mean that
Then run the |
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/ftl-crash-after-update-v4-3-1/29920/47 |
FWIW I do not run, nor have I ever, DHCP Server. My crashing issue started last night with the update to vDev-81c4eac, I believe. |
@DL6ER Got it, thanks for that information. If you need further information just let me know. I'll do my best to be thorough and prompt.
|
Thanks, what you did was almost correct, just don't run a |
I hope is correct now :) Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7513c460 (LWP 17761)]
__strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37)
at strchrnul.c:50
50 strchrnul.c: No such file or directory.
(gdb) thread 5
[Switching to thread 5 (Thread 0x7513c460 (LWP 17761))]
#0 __strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37)
at strchrnul.c:50
50 in strchrnul.c
(gdb) backtrace
#0 __strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37)
at strchrnul.c:50
#1 0x76d44174 in __find_specmb (
format=0x1 <error: Cannot access memory at address 0x1>)
at printf-parse.h:108
#2 _IO_vfprintf_internal (s=s@entry=0x7513ba68,
format=format@entry=0x1 <error: Cannot access memory at address 0x1>,
ap=..., ap@entry=...) at vfprintf.c:1315
#3 0x76dec024 in __GI___vasprintf_chk (
result_ptr=result_ptr@entry=0x7513bc60, flags=flags@entry=1,
format=0x1 <error: Cannot access memory at address 0x1>, format@entry=0x0,
args=..., args@entry=...) at vasprintf_chk.c:66
#4 0x76debf30 in __asprintf_chk (result_ptr=result_ptr@entry=0x7513bc60,
flags=flags@entry=1,
format=0x1 <error: Cannot access memory at address 0x1>)
at asprintf_chk.c:32
#5 0x004edb2a in asprintf (
__fmt=0x5d85d4 "SELECT id FROM network WHERE hwaddr = '%s';",
__ptr=0x7513bc60) at /usr/arm-linux-gnueabihf/include/bits/stdio2.h:178
#6 parse_neighbor_cache () at src/database/network-table.c:382
#7 0x004f0ae0 in DB_thread (val=<optimized out>)
at src/database/database-thread.c:68
#8 0x76e57494 in start_thread (arg=0x7513c460) at pthread_create.c:486
--Type <RET> for more, q to quit, c to continue without paging--c
#9 0x76dda578 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p num
$1 = num
(gdb) p ip
No symbol "ip" in current context.
(gdb) p iface
No symbol "iface" in current context.
(gdb) p hwaddr
No symbol "hwaddr" in current context.
(gdb) p linebuffer
$2 = 0x0
(gdb) |
Hmm, this was done correctly, but it still doesn't show the information I wanted to see, but, well, this is going to be a bit complicated, could you try:
? This will also print some more debugging output. edit Renamed branch. Just in case someone else tries this as well. |
pi@pi:~ $ pihole checkout ftl fix/neigh_crash
Please note that changing branches severely alters your Pi-hole subsystems
Features that work on the master branch, may not on a development branch
This feature is NOT supported unless a Pi-hole developer explicitly asks!
Have you read and understood this? [y/N] y
[✗] Requested branch "fix/neigh_crash" is not available
[i] Available branches for FTL are:
- FTLDNS
- bughaunting/overTime
- development
- feature/deb-and-rpm
- fix/dnssec-retry-crash
- fix/issue_template
- fix/msatter_crazy_IPv6
- fix/neigh_crash
- master
- master-find-missing-reference
- new/GeoIP
- new/all_clients_network_table_no-auto
- new/http
- release/v5.0
- revert-689-ltaub
- tweak/external_blocked_IPs
- tweak/log_upstream_errors
- tweak/remove-unused-FTL-var
- update/dnsmasq |
You were likely too fast for our automated build system, please try again. |
I am .. same result until now |
I'm not affected by the bug, but I can checkout the branch
|
@ionutgalita No need to ping me every time ;-)
but below there is:
Can you check if maybe the underscore was copied as some strange special character? This happens with github from time to time, maybe try
(without underscore), which I just created. @yubiuser I renamed the branch, the other one will not be maintained, just to mention. |
Unfortunately, I'm not too sure about that. Memory usage hovers around 2.1GB (8GB total) even when running Pihole multiple times. If you have any more precise methods I'd be more than happy to do that. |
For those of you still seeing the crash, could you try something else as well?
The first is before #708 and #711 were merged, the second is with only #708, the third is with only #711, the fourth one is with both of them. If one of them fails to check out, they are still being built. I pushed all four at the same time, this will keep our binary building jobs busy for some time. In this case, please try again later. |
Did in reverse order 71e8498 did not immediately crash f9476dd did not immediately crash 7759a76 did not immediately crash
|
This is a really tough bug. The only difference in c3147cc is the introduction of the SQLite3 extension for CIDR filtering. Please test, once again,
I added an explicit initialization of the SQLite3 library here. We never used to do this, however, it's the only thing I could see right now. |
The same results, unfortunately. |
Okay, thanks for your assistance, another try... This time, I added a new config option:
defaulting to The expected behavior is that not setting this (or setting it to Can you confirm this? If so, then we know that it is the registration of the SQLite3 extension we wrote recently. It's completely unclear to me, so far, how this can crash for some users but not for the majority. And also why it crashes FTL at some very different location in the code. |
Both setups crashed the same. I updated to latest And then tried again with |
This is bizarre, could you test again with 149f656 if this really doesn't crash? |
Well, so, at least we know now that it is not the SQLite3 extension and I reviewed this code now three times very carefully and added some minor optimizations ;-) Onto the next one... |
It's past midnight here, maybe you can find out with some certainty which of the commits above do crash and which does not. I also pushed another change to Thanks for all your assistance, I do really appreciate it! |
Will do and likewise thank you for all your contributions. I'll follow up later on which commit started the mess. |
Visualizing this: Red = crashed, Green = Did not experience a crash. I assume f9476dd would have experienced a crash at some point, too. It wouldn't make sense to have this one isolated commit working while the ones before and behind don't work. |
@ionutgalita I looked at the difference (quite a lot) between 71e8498 and 7759a76. It's a lot. Please try again with However, please try only in roughly one hour from now (I'll already be at work then), as the CI is currently also building what you can find below (fun testing if my solution here doesn't work). You can check whether you have the most recent version by issuing
which should return
(arrows added by me). Some new to try *only if the new
|
So FTL Version vDev (fix/neighcrash, vDev-203057c) crashes if I have DHCP on and IPv6 (SLAAC + RA) on. If I turn off IPv6 then save and restart system then it seems to run fine. If I turn !Pv6 back on save and restart it crashes again. Thread 5 "database" received signal SIGSEGV, Segmentation fault. and Thread 5 "database" received signal SIGSEGV, Segmentation fault. |
@bigpcjunky Thanks, this was really useful! Please update |
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/ftl-crash-after-update-v4-3-1/29920/67 |
Unfortunately I'm not home at the moment but that's the build I left running and I just checked and it's still up. |
@MeekLogic |
I'm a little late to the party, but I had a raspberry pi 4 and a pi zero crashing for this problem. The latest fix/neighcrash fixed the problem for me on both of my systems. |
I also confirm that everything is stable now. |
The fix has been merged into the regular beta code. Please checkout the |
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/some-hostnames-do-not-resolve/29710/60 |
Got the same Problem 2024, brand new install in docker, after Enable DHCP the FTL crash and Pihole Shows "Lost API Connection" |
In raising this issue, I confirm the following (please check boxes, eg [X]) Failure to fill the template will close your issue:
How familiar are you with the codebase?:
7
[ISSUE] Expected Behaviour:
After update to
vDev-b6364d0
, FTL should not crash while DHCP server is enabled.[BUG] Actual Behaviour:
FTL became unstable after update to
vDev-b6364d0
. It crashes after 10-30 seconds.[BUG | ISSUE] Steps to reproduce:
Enable DHCP server. FTL should crash soon after that.
Log file output [if available]
https://tricorder.pi-hole.net/dli98t4why
Device specifics
Hardware Type: rPi 3B+
OS: Raspbian Buster
This template was created based on the work of
udemy-dl
.The text was updated successfully, but these errors were encountered: