Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTL crashes instantly after start #50

Closed
3 tasks done
xorrizon opened this issue May 14, 2017 · 37 comments
Closed
3 tasks done

FTL crashes instantly after start #50

xorrizon opened this issue May 14, 2017 · 37 comments

Comments

@xorrizon
Copy link

xorrizon commented May 14, 2017

In raising this issue, I confirm the following (please check boxes, eg [X]) Failure to fill the template will close your issue:

  • I have read and understood the contributors guide.
  • The issue I am reporting can be replicated
  • The issue I am reporting isn't a duplicate

How familiar are you with the codebase?:

1


[BUG | ISSUE] Expected Behaviour:
pihole-FTL should start and not crash

[BUG | ISSUE] Actual Behaviour:
pihole-FTL either crashes instantly after start or after a few hours

[BUG | ISSUE] Steps to reproduce:

  • Start pihole-FTL, crashes almost instantly at the moment

Log file output [if available]

[2017-05-14 08:51:51.958] ########## FTL started! ##########
[2017-05-14 08:51:51.959] FTL branch: (no branch)
[2017-05-14 08:51:51.959] FTL hash: v2.7
[2017-05-14 08:51:51.959] FTL date: 2017-05-12 17:49:20 +0200
[2017-05-14 08:51:51.959] FTL user: pihole
[2017-05-14 08:51:51.959] Notice: Found no readable FTL config file
[2017-05-14 08:51:51.959]         Using default settings
[2017-05-14 08:51:51.959] Starting config file parsing
[2017-05-14 08:51:51.959]    SOCKET_LISTENING: only local
[2017-05-14 08:51:51.959]    TIMEFRAME: Rolling 24h
[2017-05-14 08:51:51.959]    QUERY_DISPLAY: Show queries
[2017-05-14 08:51:51.959] Finished config file parsing
[2017-05-14 08:51:51.963] Found no other running pihole-FTL process
[2017-05-14 08:51:51.964] PID of FTL process: 32302
[2017-05-14 08:51:52.130] Gravity list entries: 106239
[2017-05-14 08:51:52.130] No blacklist present
[2017-05-14 08:51:52.130] Notice: Increasing wildcards struct size from 0 to 1 (92.00 B)
[2017-05-14 08:51:52.130] Wildcard blocking list entries: 1
[2017-05-14 08:51:52.130] Starting initial log file parsing
[2017-05-14 08:51:52.131] Reading from /var/log/pihole.log.1 (rw-r--r--)
[2017-05-14 08:51:52.131] Notice: Increasing forwarded struct size from 0 to 4 (209.00 B)
[2017-05-14 08:51:52.137] New forward server: 139.59.208.246 ns1.he.de.dns.opennic.glue (0/4)
[2017-05-14 08:51:52.137] Notice: Increasing overTime struct size from 0 to 100 (4.25 KB)
[2017-05-14 08:51:52.141] New forward server: 62.113.203.55 secondary.server.edv-froehlich.de (1/4)
[2017-05-14 08:51:52.197] Notice: Increasing queries struct size from 0 to 10000 (324.41 KB)
[2017-05-14 08:51:52.198] Notice: Increasing domains struct size from 0 to 1000 (356.41 KB)
[2017-05-14 08:51:52.198] Notice: Increasing clients struct size from 0 to 10 (356.66 KB)
[2017-05-14 08:51:52.202] New client: 192.168.11.140 (0/10)
[2017-05-14 08:51:52.210] New client: 192.168.11.124 (1/10)
[2017-05-14 08:51:52.216] New client: 192.168.11.129 (2/10)
[2017-05-14 08:51:52.221] New client: 192.168.11.134 (3/10)
[2017-05-14 08:51:52.227] New client: 192.168.11.131 (4/10)
[2017-05-14 08:51:52.237] New client: 192.168.11.127 (5/10)
[2017-05-14 08:51:52.256] New client: 192.168.11.1 (6/10)
[2017-05-14 08:51:52.378] New client: 127.0.0.1 localhost (7/10)
[2017-05-14 08:51:52.705] New client: 192.168.11.100 (8/10)
[2017-05-14 08:51:53.099] New client: 192.168.11.113 (9/10)
[2017-05-14 08:51:53.099] Notice: Increasing clients struct size from 10 to 20 (360.95 KB)
[2017-05-14 08:51:53.110] New client: 192.168.11.149 (10/20)
[2017-05-14 08:51:53.122] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2017-05-14 08:51:53.122] ---------------------------->  FTL crashed!  <----------------------------
[2017-05-14 08:51:53.122] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2017-05-14 08:51:53.122] > Please report a bug at https://github.com/pi-hole/FTL/issues
[2017-05-14 08:51:53.122] > and include in your report already the following details:
[2017-05-14 08:51:53.122] >
[2017-05-14 08:51:53.122] > Received signal: Segmentation fault
[2017-05-14 08:51:53.122]        at address: 367155572728
[2017-05-14 08:51:53.122]         with code: SEGV_MAPERR (Address not mapped to object)
[2017-05-14 08:51:53.122] > Memory usage (structs): 356676
[2017-05-14 08:51:53.123] > Memory usage (dynamic): 5386
[2017-05-14 08:51:53.123] >
[2017-05-14 08:51:53.123] > Thank you for helping us to improve our FTL engine!
[2017-05-14 08:51:53.123] FTL terminated!

gdb backtrace

[2017-05-14 08:53:24.769] New client:  (11/20)

Program received signal SIGSEGV, Segmentation fault.
process_pihole_log (file=1) at parser.c:440
440	parser.c: No such file or directory.
(gdb) bt
#0  process_pihole_log (file=1) at parser.c:440
#1  0x0000000000000002 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Device specifics

Hardware Type: Pine64
Arch: aarch64
OS: Ubuntu 16.04.2
Kernel: 3.10.104-2-pine64-longsleep
Pi-hole Version: v3.0.1
FTL Version: v2.7

This template was created based on the work of udemy-dl.

@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

Thanks for reporting this issue!

It seems the error appears here: https://github.com/pi-hole/FTL/blob/master/parser.c#L440

Could you please do a

print hostname

in gdb?

@xorrizon
Copy link
Author

Thanks for the quick reply!

Program received signal SIGSEGV, Segmentation fault.
process_pihole_log (file=1) at parser.c:440
440	parser.c: No such file or directory.
(gdb) print hostname
$1 = 0x5568669058 "gravity.list "

@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

Okay, looks like the line number that comes from the backtrace is not on-the-spot. Please do also a

print client
print readbuffer

@xorrizon
Copy link
Author

(gdb) print hostname
$1 = 0x5577987058 "gravity.list "
(gdb) print client
$2 = 0x55779aa850 "192.168.11.149"
(gdb) print clientID
$3 = 8
(gdb) print memory
$4 = {wildcarddomains = 13, domainnames = 3029, clientips = 119, clientnames = 8, forwardedips = 29, forwardednames = 61, forwarddata = 38, 
  querytypedata = 152}
(gdb) print clients[clientID]
$5 = {magic = 87 'W', count = 1, ip = 0x55779aa940 "", name = 0x55779aa960 ""}
(gdb) print readbuffer
$6 = 0x7fb7ff3710 ""
(gdb) print clients[7]
$7 = {magic = 87 'W', count = 100, ip = 0x55779aa1b0 "192.168.11.149", name = 0x55779aa1d0 ""}
(gdb) print clients[6]
$8 = {magic = 87 'W', count = 8, ip = 0x55779a9ef0 "192.168.11.113", name = 0x55779a9f10 ""}

Since clients[clientID].ip is empty, the crash should occur somewhere before line 438....

@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

Since clients[clientID].ip is empty, the crash should occur somewhere before line 438....

Yeah, I was assuming client may be an invalid string, but what you posted seems perfectly fine (both the address and the content. On the other hand, it currently says

(gdb) print client
$2 = 0x55779aa850 "192.168.11.149"

whereas you also have

(gdb) print clients[7]
$7 = {magic = 87 'W', count = 100, ip = 0x55779aa1b0 "192.168.11.149", name = 0x55779aa1d0 ""}

i.e. this client is already known (ID 7) and there should be no new entry created for it!

I'll have to think more about that ... what is the content of counters ?

@xorrizon
Copy link
Author

(gdb) print counters
$16 = {queries = 400, invalidqueries = 0, blocked = 27, wildcardblocked = 1, cached = 116, unknown = 0, forwarded = 2, clients = 8, domains = 150, 
  queries_MAX = 10000, forwarded_MAX = 4, clients_MAX = 10, domains_MAX = 1000, overTime_MAX = 100, gravity = 106239, overTime = 19, IPv4 = 288, 
  IPv6 = 113, PTR = 0, SRV = 0, wildcarddomains = 1, forwardedqueries = 256}

@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

Hmm, I'm not sure where the issue is happening now, since

https://github.com/pi-hole/FTL/blob/master/parser.c#L433

seems to have worked, but

https://github.com/pi-hole/FTL/blob/master/parser.c#L435
and
https://github.com/pi-hole/FTL/blob/master/parser.c#L439

also seem to have worked, since there are valid addresses (so there is some memory allocated), but there is nothing copied into.

Maybe you can dig a bit further around and find out where it crashes exactly.
Use stepping and set a breakpoint somewhere before the crashing location in the code.

@xorrizon
Copy link
Author

Ohhhhkay..... I'm officially confused...

I tried to catch the line where clients[clientID].count is set to one to get the offset from gdb to the actual source file. Looks to be pretty accurately the last line before the one shown.
Afterwards I checked each variables value after each step. Here it gets weird.
When memory.clientips gets increased, it is only increased by one. Which could only happen if strlen(client) is 0. Which according to print client it is not....
Also memory.clientnames never gets increased, which narrows the line of the segfault to 437 to 440

[2017-05-14 10:42:09.749] New client:  (10/20)

Breakpoint 1, process_pihole_log (file=1) at parser.c:435
435	in parser.c
(gdb) print client
$24 = 0x5561526630 "192.168.11.149"
(gdb) print clients[clientID]
$25 = {magic = 87 'W', count = 0, ip = 0x0, name = 0x0}
(gdb) next
433	in parser.c
(gdb) print clients[clientID]
$26 = {magic = 87 'W', count = 0, ip = 0x0, name = 0x0}
(gdb) next
435	in parser.c
(gdb) print clients[clientID]
$27 = {magic = 87 'W', count = 1, ip = 0x0, name = 0x0}
(gdb) print client
$28 = 0x5561526630 "192.168.11.149"
(gdb) print memory.clienttips
There is no member named clienttips.
(gdb) print memory.clientips
$29 = 143
(gdb) next
436	in parser.c
(gdb) print memory.clientips
$30 = 143
(gdb) next
435	in parser.c
(gdb) next
436	in parser.c
(gdb) print memory.clientips
$31 = 143
(gdb) next
437	in parser.c
(gdb) print memory.clientips
$32 = 143
(gdb) next
436	in parser.c
(gdb) next
437	in parser.c
(gdb) print memory.clientips
$33 = 143
(gdb) print clients[clientID]
$34 = {magic = 87 'W', count = 1, ip = 0x5561526720 "", name = 0x0}
(gdb) next
436	in parser.c
(gdb) next
437	in parser.c
(gdb) print memory.clientips
$35 = 143
(gdb) next
436	in parser.c
(gdb) next
437	in parser.c
(gdb) print memory.clientips
$36 = 144
(gdb) next
439	in parser.c
(gdb) print memory.clientips
$37 = 144
(gdb) print memory.clientips
$38 = 144
(gdb) print clients[clientID]
$39 = {magic = 87 'W', count = 1, ip = 0x5561526720 "", name = 0x0}
(gdb) print memory.clientnames
$40 = 19
(gdb) next
440	in parser.c
(gdb) print memory.clientnames
$41 = 19
(gdb) print clients[clientID]
$42 = {magic = 87 'W', count = 1, ip = 0x5561526720 "", name = 0x0}
(gdb) print memory.clientips
$43 = 144
(gdb) next
439	in parser.c
(gdb) next
440	in parser.c
(gdb) print memory.clientips
$44 = 144
(gdb) print clients[clientID]
$45 = {magic = 87 'W', count = 1, ip = 0x5561526720 "", name = 0x5561526740 ""}
(gdb) next
441	in parser.c
(gdb) print memory.clientnames
$46 = 19
(gdb) next
440	in parser.c
(gdb) print memory.clientnames
$47 = 19
(gdb) next

Program received signal SIGSEGV, Segmentation fault.
process_pihole_log (file=1) at parser.c:440
440	in parser.c
(gdb) print memory.clientnames
$48 = 19
(gdb) print clients[clientID]
$49 = {magic = 87 'W', count = 1, ip = 0x5561526720 "", name = 0x5561526740 ""}
(gdb) print memory.clientips
$50 = 144

@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

I tried to catch the line where clients[clientID].count is set to one to get the offset from gdb to the actual source file.

Try

(gdb) list parser.c:440
435                                     clients[clientID].ip = calloc(strlen(client)+1,sizeof(char));
436                                     memory.clientips += (strlen(client) + 1) * sizeof(char);
437                                     strcpy(clients[clientID].ip, client);
438                                     // Store client hostname
439                                     clients[clientID].name = calloc(strlen(hostname)+1,sizeof(char));
440                                     memory.clientnames += (strlen(hostname) + 1) * sizeof(char);
441                                     strcpy(clients[clientID].name,hostname);
442                                     free(hostname);
443                                     // Increase counter by one
444                                     counters.clients++;

(this includes the result at my end, but my code is usually not on master, so it might be different)

I'll now look at the part you posted in more detail.

@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

Yeah, we see an evolution like

$26 = {magic = 87 'W', count = 0, ip = 0x0, name = 0x0}
$27 = {magic = 87 'W', count = 1, ip = 0x0, name = 0x0}
$42 = {magic = 87 'W', count = 1, ip = 0x5561526720 "", name = 0x0}
$45 = {magic = 87 'W', count = 1, ip = 0x5561526720 "", name = 0x5561526740 ""}

so everything until the calloc seems to have worked and was successful. Strangely, the last calloc required an strlen(hostname) which seems to be exactly the same thing that crashes in the line thereafter.

@xorrizon
Copy link
Author

So, it kept bugging me where the hostname "gravity.list " came from. So I searched the FTL source code for occurrences. Turns out, gravity.list has something to do with blacklists, and indeed, I think my problems started after I added "facebook.com" to the blacklist (as wildcard).

I tried deleting it (currently my only entry in the blacklist) and suddenly no crash anymore. Adding it again and restarting ftl causes the crash again. I tried a few variations and it only happens if added as wildcard instead as exact. Adding any other url as wildcard to the blacklist also causes the segfault.

I guess even before the actual crash, there has to be some kind of memory problem/buffer overflow/whatever. For now as a workaround I guess I just keep the blacklist empty for now.

Let me know if I should try anything else to help you debug it ;-)

@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

Uff, I have facebook.com (and others) in the wildcard blocker for ages, so there does not seem to be a "general" issue with that. Unfortunately, I'm still unable to reproduce this :-(

@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

@xorrizon try running FTL in a memory checker like valgrind

@xorrizon
Copy link
Author

I run valgrind pihole-FTL debug test with and without a blacklist entry. Without it exited without any errors (leaks or otherwise). With facebook.com added to the wildcard blacklist I got a ton:

==3128== Invalid free() / delete / delete[] / realloc()
==3128==    at 0x4845E54: free (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==3128==  Address 0xe is not stack'd, malloc'd or (recently) free'd
==3128== 
==3128== Invalid read of size 8
==3128==    at 0x10DDD4: findDomain (parser.c:815)
==3128==    by 0x10DDD4: process_pihole_log (parser.c:386)
==3128==  Address 0xfff0001a8 is on thread 1's stack
==3128==  24 bytes below stack pointer
==3128== 
[2017-05-14 15:18:31.690] New domain: www.facebook.com (269/1000)
==3128== Invalid read of size 8
==3128==    at 0x10DE84: findClient (parser.c:836)
==3128==    by 0x10DE84: process_pihole_log (parser.c:415)
==3128==  Address 0xfff0001a8 is on thread 1's stack
==3128==  24 bytes below stack pointer
==3128== 
==3128== Conditional jump or move depends on uninitialised value(s)
==3128==    at 0x484AE98: bcmp (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==3128== 
==3128== Conditional jump or move depends on uninitialised value(s)
==3128==    at 0x4E2CA8C: _nss_files_gethostbyaddr_r (files-hosts.c:110)
==3128==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==3128==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==3128==    by 0x10D25F: resolveHostname (parser.c:641)
==3128==    by 0x10E1C7: process_pihole_log (parser.c:422)
==3128==    by 0x126FFF: ???
==3128== 
==3128== Use of uninitialised value of size 8
==3128==    at 0x48CD33C: _itoa_word (_itoa.c:179)
==3128==    by 0x48D0353: vfprintf (vfprintf.c:1631)
==3128==    by 0x48EC87B: vsprintf (iovsprintf.c:42)
==3128==    by 0x48D63EF: sprintf (sprintf.c:32)
==3128==    by 0x4E4BA93: _nss_dns_gethostbyaddr2_r (dns-host.c:463)
==3128==    by 0x4E4BE23: _nss_dns_gethostbyaddr_r (dns-host.c:555)
==3128==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==3128==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==3128==    by 0x10D25F: resolveHostname (parser.c:641)
==3128==    by 0x10E1C7: process_pihole_log (parser.c:422)
==3128==    by 0x126FFF: ???
==3128== 
==3128== Conditional jump or move depends on uninitialised value(s)
==3128==    at 0x48CD344: _itoa_word (_itoa.c:179)
==3128==    by 0x48D0353: vfprintf (vfprintf.c:1631)
==3128==    by 0x48EC87B: vsprintf (iovsprintf.c:42)
==3128==    by 0x48D63EF: sprintf (sprintf.c:32)
==3128==    by 0x4E4BA93: _nss_dns_gethostbyaddr2_r (dns-host.c:463)
==3128==    by 0x4E4BE23: _nss_dns_gethostbyaddr_r (dns-host.c:555)
==3128==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==3128==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==3128==    by 0x10D25F: resolveHostname (parser.c:641)
==3128==    by 0x10E1C7: process_pihole_log (parser.c:422)
==3128==    by 0x126FFF: ???
==3128== 
==3128== Conditional jump or move depends on uninitialised value(s)
==3128==    at 0x48D03AC: vfprintf (vfprintf.c:1631)
==3128==    by 0x48EC87B: vsprintf (iovsprintf.c:42)
==3128==    by 0x48D63EF: sprintf (sprintf.c:32)
==3128==    by 0x4E4BA93: _nss_dns_gethostbyaddr2_r (dns-host.c:463)
==3128==    by 0x4E4BE23: _nss_dns_gethostbyaddr_r (dns-host.c:555)
==3128==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==3128==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==3128==    by 0x10D25F: resolveHostname (parser.c:641)
==3128==    by 0x10E1C7: process_pihole_log (parser.c:422)
==3128==    by 0x126FFF: ???
==3128== 
==3128== Conditional jump or move depends on uninitialised value(s)
==3128==    at 0x48CFFF4: vfprintf (vfprintf.c:1631)
==3128==    by 0x48EC87B: vsprintf (iovsprintf.c:42)
==3128==    by 0x48D63EF: sprintf (sprintf.c:32)
==3128==    by 0x4E4BA93: _nss_dns_gethostbyaddr2_r (dns-host.c:463)
==3128==    by 0x4E4BE23: _nss_dns_gethostbyaddr_r (dns-host.c:555)
==3128==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==3128==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==3128==    by 0x10D25F: resolveHostname (parser.c:641)
==3128==    by 0x10E1C7: process_pihole_log (parser.c:422)
==3128==    by 0x126FFF: ???
==3128== 
==3128== Conditional jump or move depends on uninitialised value(s)
==3128==    at 0x48D0064: vfprintf (vfprintf.c:1631)
==3128==    by 0x48EC87B: vsprintf (iovsprintf.c:42)
==3128==    by 0x48D63EF: sprintf (sprintf.c:32)
==3128==    by 0x4E4BA93: _nss_dns_gethostbyaddr2_r (dns-host.c:463)
==3128==    by 0x4E4BE23: _nss_dns_gethostbyaddr_r (dns-host.c:555)
==3128==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==3128==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==3128==    by 0x10D25F: resolveHostname (parser.c:641)
==3128==    by 0x10E1C7: process_pihole_log (parser.c:422)
==3128==    by 0x126FFF: ???
==3128== 
[2017-05-14 15:18:32.166] New client:  (10/20)
==3128== Invalid write of size 8
==3128==    at 0x10E230: process_pihole_log (parser.c:435)
==3128==  Address 0xfff000160 is on thread 1's stack
==3128==  96 bytes below stack pointer
==3128== 
==3128== Invalid read of size 8
==3128==    at 0x10E244: process_pihole_log (parser.c:435)
==3128==  Address 0xfff000160 is on thread 1's stack
==3128==  96 bytes below stack pointer
==3128== 
==3128== Invalid write of size 8
==3128==    at 0x10E254: process_pihole_log (parser.c:436)
==3128==  Address 0xfff000160 is on thread 1's stack
==3128==  96 bytes below stack pointer
==3128== 
==3128== Invalid write of size 8
==3128==    at 0x10E258: process_pihole_log (parser.c:436)
==3128==  Address 0xfff000168 is on thread 1's stack
==3128==  88 bytes below stack pointer
==3128== 
==3128== Invalid read of size 8
==3128==    at 0x10E264: strcpy (string3.h:104)
==3128==    by 0x10E264: process_pihole_log (parser.c:437)
==3128==  Address 0xfff000168 is on thread 1's stack
==3128==  88 bytes below stack pointer
==3128== 
==3128== Invalid write of size 8
==3128==    at 0x10E28C: strcpy (string3.h:104)
==3128==    by 0x10E28C: process_pihole_log (parser.c:437)
==3128==  Address 0xfff000168 is on thread 1's stack
==3128==  88 bytes below stack pointer
==3128== 
==3128== Invalid read of size 8
==3128==    at 0x10E2A8: process_pihole_log (parser.c:439)
==3128==  Address 0xfff000160 is on thread 1's stack
==3128==  96 bytes below stack pointer
==3128== 
==3128== Invalid write of size 8
==3128==    at 0x10E2B8: process_pihole_log (parser.c:440)
==3128==  Address 0xfff000160 is on thread 1's stack
==3128==  96 bytes below stack pointer
==3128== 
==3128== Invalid read of size 8
==3128==    at 0x10E2C0: process_pihole_log (parser.c:440)
==3128==  Address 0xfff000168 is on thread 1's stack
==3128==  88 bytes below stack pointer
==3128== 
==3128== Invalid read of size 8
==3128==    at 0x10E2C4: strcpy (string3.h:104)
==3128==    by 0x10E2C4: process_pihole_log (parser.c:441)
==3128==  Address 0xfff000160 is on thread 1's stack
==3128==  96 bytes below stack pointer
==3128== 
==3128== Invalid read of size 4
==3128==    at 0x10E2CC: process_pihole_log (parser.c:440)
==3128==  Address 0x119 is not stack'd, malloc'd or (recently) free'd
==3128== 
[2017-05-14 15:18:32.227] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2017-05-14 15:18:32.229] ---------------------------->  FTL crashed!  <----------------------------
[2017-05-14 15:18:32.231] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2017-05-14 15:18:32.232] > Please report a bug at https://github.com/pi-hole/FTL/issues
[2017-05-14 15:18:32.234] > and include in your report already the following details:
[2017-05-14 15:18:32.236] >
[2017-05-14 15:18:32.259] > Received signal: Segmentation fault
[2017-05-14 15:18:32.261]        at address: 281
[2017-05-14 15:18:32.265]         with code: SEGV_MAPERR (Address not mapped to object)
[2017-05-14 15:18:32.269] > Memory usage (structs): 356676
[2017-05-14 15:18:32.271] > Memory usage (dynamic): 6667
[2017-05-14 15:18:32.273] >
[2017-05-14 15:18:32.274] > Thank you for helping us to improve our FTL engine!
[2017-05-14 15:18:32.276] FTL terminated!
==3128== 
==3128== Process terminating with default action of signal 6 (SIGABRT)
==3128==    at 0x48BC528: raise (raise.c:54)
==3128==    by 0x48BD9DF: abort (abort.c:89)
==3128==    by 0x10F5FF: SIGSEGV_handler (signals.c:61)
==3128==    by 0x38055563: ??? (in /usr/lib/valgrind/memcheck-arm64-linux)
==3128== 
==3128== HEAP SUMMARY:
==3128==     in use at exit: 364,655 bytes in 455 blocks
==3128==   total heap usage: 84,952 allocs, 84,498 frees, 3,492,792 bytes allocated
==3128== 
==3128== LEAK SUMMARY:
==3128==    definitely lost: 0 bytes in 0 blocks
==3128==    indirectly lost: 0 bytes in 0 blocks
==3128==      possibly lost: 0 bytes in 0 blocks
==3128==    still reachable: 364,655 bytes in 455 blocks
==3128==         suppressed: 0 bytes in 0 blocks
==3128== Rerun with --leak-check=full to see details of leaked memory
==3128== 
==3128== For counts of detected and suppressed errors, rerun with: -v
==3128== Use --track-origins=yes to see where uninitialised values come from
==3128== ERROR SUMMARY: 42 errors from 21 contexts (suppressed: 0 from 0)
Aborted


@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

Okay, that might be an aarch64 or Pine64 related issue, then. Looking into this right now, but I don't seem to see any errors in the code that could be related to what your valgrind says...

Could you send me the full output? Via email to my username@pi-hole.net is also fine

@xorrizon
Copy link
Author

I'll just post it here. After removing a bunch of unrelated "New domain" lines it's not that much bigger than the last one:

==11015== Memcheck, a memory error detector
==11015== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==11015== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==11015== Command: pihole-FTL debug test
==11015== 
[2017-05-14 16:04:31.091] ########## FTL started! ##########
[2017-05-14 16:04:32.169] FTL branch: (no branch)
[2017-05-14 16:04:32.170] FTL hash: v2.7
[2017-05-14 16:04:32.171] FTL date: 2017-05-12 17:49:20 +0200
[2017-05-14 16:04:32.172] FTL user: ubuntu
[2017-05-14 16:04:32.176] Notice: Found no readable FTL config file
[2017-05-14 16:04:32.177]         Using default settings
[2017-05-14 16:04:32.178] Starting config file parsing
[2017-05-14 16:04:32.180]    SOCKET_LISTENING: only local
[2017-05-14 16:04:32.182]    TIMEFRAME: Rolling 24h
[2017-05-14 16:04:32.184]    QUERY_DISPLAY: Show queries
[2017-05-14 16:04:32.185] Finished config file parsing
[2017-05-14 16:04:32.189] PID of FTL process: 11015
[2017-05-14 16:04:35.530] Gravity list entries: 106239
[2017-05-14 16:04:35.532] No blacklist present
[2017-05-14 16:04:35.566] Notice: Increasing wildcards struct size from 0 to 1 (92.00 B)
[2017-05-14 16:04:35.624]         at query time: 
[2017-05-14 16:04:35.632] Wildcard blocking list entries: 1
[2017-05-14 16:04:35.633] Starting initial log file parsing
[2017-05-14 16:04:35.640] Reading from /var/log/pihole.log.1 (rw-r--r--)
[2017-05-14 16:04:35.715] Notice: Increasing forwarded struct size from 0 to 4 (209.00 B)
[2017-05-14 16:04:35.722]         at query time: May 13 06:27:10
[2017-05-14 16:04:35.995] New forward server: 139.59.208.246 ns1.he.de.dns.opennic.glue (0/4)
[2017-05-14 16:04:36.003] Notice: Increasing overTime struct size from 0 to 100 (4.25 KB)
[2017-05-14 16:04:36.007]         at query time: May 13 06:27:10
[2017-05-14 16:04:36.030] New forward server: 62.113.203.55 secondary.server.edv-froehlich.de (1/4)
[2017-05-14 16:04:48.228] Notice: Increasing queries struct size from 0 to 10000 (324.89 KB)
[2017-05-14 16:04:48.234]         at query time: May 13 16:01:37
[2017-05-14 16:04:48.249] Notice: Increasing domains struct size from 0 to 1000 (356.89 KB)
[2017-05-14 16:04:48.250]         at query time: May 13 16:01:37
[2017-05-14 16:04:48.252] New domain: hermes2.jabber.org (0/1000)
[2017-05-14 16:04:48.256] Notice: Increasing clients struct size from 0 to 10 (357.15 KB)
[2017-05-14 16:04:48.257]         at query time: May 13 16:01:37
[2017-05-14 16:04:48.271] New client: 192.168.11.140 (0/10)


[2017-05-14 16:04:48.310] New client: 192.168.11.134 (1/10)

[2017-05-14 16:04:48.334] New client: 192.168.11.1 (2/10)

[2017-05-14 16:04:48.345] New client: 192.168.11.124 (3/10)

[2017-05-14 16:04:48.407] New client: 192.168.11.127 (4/10)

[2017-05-14 16:04:48.621] New client: 192.168.11.129 (5/10)

[2017-05-14 16:04:48.688] New client: 192.168.11.131 (6/10)
[2017-05-14 16:04:48.702] New client: 192.168.11.113 (7/10)

[2017-05-14 16:04:50.142] New client: 192.168.11.149 (8/10)

[2017-05-14 16:04:50.405] New domain: platform.t==11015== Invalid free() / delete / delete[] / realloc()
==11015==    at 0x4845E54: free (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==11015==  Address 0xe is not stack'd, malloc'd or (recently) free'd
==11015== 
==11015== Invalid read of size 8
==11015==    at 0x10DDD4: findDomain (parser.c:815)
==11015==    by 0x10DDD4: process_pihole_log (parser.c:386)
==11015==  Address 0xfff0001a8 is on thread 1's stack
==11015==  24 bytes below stack pointer
==11015== 
==11015== Invalid read of size 8
==11015==    at 0x10DE84: findClient (parser.c:836)
==11015==    by 0x10DE84: process_pihole_log (parser.c:415)
==11015==  Address 0xfff0001a8 is on thread 1's stack
==11015==  24 bytes below stack pointer
==11015== 
==11015== Conditional jump or move depends on uninitialised value(s)
==11015==    at 0x484AE98: bcmp (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==11015== 
==11015== Conditional jump or move depends on uninitialised value(s)
==11015==    at 0x4E2CA8C: _nss_files_gethostbyaddr_r (files-hosts.c:110)
==11015==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==11015==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==11015==    by 0x10D25F: resolveHostname (parser.c:641)
==11015==    by 0x10E1C7: process_pihole_log (parser.c:422)
==11015==    by 0x126FFF: ???
==11015== 
==11015== Use of uninitialised value of size 8
==11015==    at 0x48CD33C: _itoa_word (_itoa.c:179)
==11015==    by 0x48D0353: vfprintf (vfprintf.c:1631)
==11015==    by 0x48EC87B: vsprintf (iovsprintf.c:42)
==11015==    by 0x48D63EF: sprintf (sprintf.c:32)
==11015==    by 0x4E4BA93: _nss_dns_gethostbyaddr2_r (dns-host.c:463)
==11015==    by 0x4E4BE23: _nss_dns_gethostbyaddr_r (dns-host.c:555)
==11015==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==11015==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==11015==    by 0x10D25F: resolveHostname (parser.c:641)
==11015==    by 0x10E1C7: process_pihole_log (parser.c:422)
==11015==    by 0x126FFF: ???
==11015== 
==11015== Conditional jump or move depends on uninitialised value(s)
==11015==    at 0x48CD344: _itoa_word (_itoa.c:179)
==11015==    by 0x48D0353: vfprintf (vfprintf.c:1631)
==11015==    by 0x48EC87B: vsprintf (iovsprintf.c:42)
==11015==    by 0x48D63EF: sprintf (sprintf.c:32)
==11015==    by 0x4E4BA93: _nss_dns_gethostbyaddr2_r (dns-host.c:463)
==11015==    by 0x4E4BE23: _nss_dns_gethostbyaddr_r (dns-host.c:555)
==11015==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==11015==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==11015==    by 0x10D25F: resolveHostname (parser.c:641)
==11015==    by 0x10E1C7: process_pihole_log (parser.c:422)
==11015==    by 0x126FFF: ???
==11015== 
==11015== Conditional jump or move depends on uninitialised value(s)
==11015==    at 0x48D03AC: vfprintf (vfprintf.c:1631)
==11015==    by 0x48EC87B: vsprintf (iovsprintf.c:42)
==11015==    by 0x48D63EF: sprintf (sprintf.c:32)
==11015==    by 0x4E4BA93: _nss_dns_gethostbyaddr2_r (dns-host.c:463)
==11015==    by 0x4E4BE23: _nss_dns_gethostbyaddr_r (dns-host.c:555)
==11015==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==11015==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==11015==    by 0x10D25F: resolveHostname (parser.c:641)
==11015==    by 0x10E1C7: process_pihole_log (parser.c:422)
==11015==    by 0x126FFF: ???
==11015== 
==11015== Conditional jump or move depends on uninitialised value(s)
==11015==    at 0x48CFFF4: vfprintf (vfprintf.c:1631)
==11015==    by 0x48EC87B: vsprintf (iovsprintf.c:42)
==11015==    by 0x48D63EF: sprintf (sprintf.c:32)
==11015==    by 0x4E4BA93: _nss_dns_gethostbyaddr2_r (dns-host.c:463)
==11015==    by 0x4E4BE23: _nss_dns_gethostbyaddr_r (dns-host.c:555)
==11015==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==11015==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==11015==    by 0x10D25F: resolveHostname (parser.c:641)
==11015==    by 0x10E1C7: process_pihole_log (parser.c:422)
==11015==    by 0x126FFF: ???
==11015== 
==11015== Conditional jump or move depends on uninitialised value(s)
==11015==    at 0x48D0064: vfprintf (vfprintf.c:1631)
==11015==    by 0x48EC87B: vsprintf (iovsprintf.c:42)
==11015==    by 0x48D63EF: sprintf (sprintf.c:32)
==11015==    by 0x4E4BA93: _nss_dns_gethostbyaddr2_r (dns-host.c:463)
==11015==    by 0x4E4BE23: _nss_dns_gethostbyaddr_r (dns-host.c:555)
==11015==    by 0x4961D1F: gethostbyaddr_r@@GLIBC_2.17 (getXXbyYY_r.c:266)
==11015==    by 0x4961AF3: gethostbyaddr (getXXbyYY.c:116)
==11015==    by 0x10D25F: resolveHostname (parser.c:641)
==11015==    by 0x10E1C7: process_pihole_log (parser.c:422)
==11015==    by 0x126FFF: ???
==11015== 
==11015== Invalid write of size 8
==11015==    at 0x10E230: process_pihole_log (parser.c:435)
==11015==  Address 0xfff000160 is on thread 1's stack
==11015==  96 bytes below stack pointer
==11015== 
==11015== Invalid read of size 8
==11015==    at 0x10E244: process_pihole_log (parser.c:435)
==11015==  Address 0xfff000160 is on thread 1's stack
==11015==  96 bytes below stack pointer
==11015== 
==11015== Invalid write of size 8
==11015==    at 0x10E254: process_pihole_log (parser.c:436)
==11015==  Address 0xfff000160 is on thread 1's stack
==11015==  96 bytes below stack pointer
==11015== 
==11015== Invalid write of size 8
==11015==    at 0x10E258: process_pihole_log (parser.c:436)
==11015==  Address 0xfff000168 is on thread 1's stack
==11015==  88 bytes below stack pointer
==11015== 
==11015== Invalid read of size 8
==11015==    at 0x10E264: strcpy (string3.h:104)
==11015==    by 0x10E264: process_pihole_log (parser.c:437)
==11015==  Address 0xfff000168 is on thread 1's stack
==11015==  88 bytes below stack pointer
==11015== 
==11015== Invalid write of size 8
==11015==    at 0x10E28C: strcpy (string3.h:104)
==11015==    by 0x10E28C: process_pihole_log (parser.c:437)
==11015==  Address 0xfff000168 is on thread 1's stack
==11015==  88 bytes below stack pointer
==11015== 
==11015== Invalid read of size 8
==11015==    at 0x10E2A8: process_pihole_log (parser.c:439)
==11015==  Address 0xfff000160 is on thread 1's stack
==11015==  96 bytes below stack pointer
==11015== 
==11015== Invalid write of size 8
==11015==    at 0x10E2B8: process_pihole_log (parser.c:440)
==11015==  Address 0xfff000160 is on thread 1's stack
==11015==  96 bytes below stack pointer
==11015== 
==11015== Invalid read of size 8
==11015==    at 0x10E2C0: process_pihole_log (parser.c:440)
==11015==  Address 0xfff000168 is on thread 1's stack
==11015==  88 bytes below stack pointer
==11015== 
==11015== Invalid read of size 8
==11015==    at 0x10E2C4: strcpy (string3.h:104)
==11015==    by 0x10E2C4: process_pihole_log (parser.c:441)
==11015==  Address 0xfff000160 is on thread 1's stack
==11015==  96 bytes below stack pointer
==11015== 
==11015== Invalid read of size 4
==11015==    at 0x10E2CC: process_pihole_log (parser.c:440)
==11015==  Address 0xda is not stack'd, malloc'd or (recently) free'd
==11015== 
witter.com (197/1000)
[2017-05-14 16:04:50.410] New domain: platform-eb.twitter.com (198/1000)
[2017-05-14 16:04:50.451] New domain: www.facebook.com (206/1000)
[2017-05-14 16:04:50.928] New client:  (9/10)
[2017-05-14 16:04:50.966] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2017-05-14 16:04:50.967] ---------------------------->  FTL crashed!  <----------------------------
[2017-05-14 16:04:50.968] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2017-05-14 16:04:50.969] > Please report a bug at https://github.com/pi-hole/FTL/issues
[2017-05-14 16:04:50.970] > and include in your report already the following details:
[2017-05-14 16:04:50.971] >
[2017-05-14 16:04:50.994] > Received signal: Segmentation fault
[2017-05-14 16:04:50.995]        at address: 218
[2017-05-14 16:04:50.998]         with code: SEGV_MAPERR (Address not mapped to object)
[2017-05-14 16:04:51.001] > Memory usage (structs): 356436
[2017-05-14 16:04:51.003] > Memory usage (dynamic): 5309
[2017-05-14 16:04:51.004] >
[2017-05-14 16:04:51.005] > Thank you for helping us to improve our FTL engine!
[2017-05-14 16:04:51.005] FTL terminated!
==11015== 
==11015== Process terminating with default action of signal 6 (SIGABRT)
==11015==    at 0x48BC528: raise (raise.c:54)
==11015==    by 0x48BD9DF: abort (abort.c:89)
==11015==    by 0x10F5FF: SIGSEGV_handler (signals.c:61)
==11015==    by 0x38055563: ??? (in /usr/lib/valgrind/memcheck-arm64-linux)
==11015== 
==11015== HEAP SUMMARY:
==11015==     in use at exit: 363,057 bytes in 390 blocks
==11015==   total heap usage: 81,797 allocs, 81,408 frees, 3,066,265 bytes allocated
==11015== 
==11015== LEAK SUMMARY:
==11015==    definitely lost: 0 bytes in 0 blocks
==11015==    indirectly lost: 0 bytes in 0 blocks
==11015==      possibly lost: 0 bytes in 0 blocks
==11015==    still reachable: 363,057 bytes in 390 blocks
==11015==         suppressed: 0 bytes in 0 blocks
==11015== Rerun with --leak-check=full to see details of leaked memory
==11015== 
==11015== For counts of detected and suppressed errors, rerun with: -v
==11015== Use --track-origins=yes to see where uninitialised values come from
==11015== ERROR SUMMARY: 42 errors from 21 contexts (suppressed: 0 from 0)

@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

I still don't understand what is going on...

==11015== Invalid read of size 8
==11015== at 0x10DDD4: findDomain (parser.c:815)
==11015== by 0x10DDD4: process_pihole_log (parser.c:386)
==11015== Address 0xfff0001a8 is on thread 1's stack
==11015== 24 bytes below stack pointer

findDomain() has a (pretty sophisticated) detection mechanism if it tries to access memory in regions where it should not do this. Same for findClient().

It might be worth trying re-running with valgrind --max-stackframe=10000000 pihole-FTL debug.

However, on the way of digging through the code if found a potential vulnerability where a specifically malformed log line in dnsmasq's log could lead to an illegal instruction to calloc(). I added a test and FTL will in the future skip those lines. It may or not be that this is related to this issue. So I'm a big cautious with presenting it as a fix.

@DL6ER
Copy link
Member

DL6ER commented May 14, 2017

However that works, it seems like line 440 is still what is causing FTL to fail.

==11015== Invalid read of size 4
==11015== at 0x10E2CC: process_pihole_log (parser.c:440)
==11015== Address 0xda is not stack'd, malloc'd or (recently) free'd

because FTL identifies the problem at location 218:

[2017-05-14 16:04:50.994] > Received signal: Segmentation fault
[2017-05-14 16:04:50.995] at address: 218
[2017-05-14 16:04:50.998] with code: SEGV_MAPERR (Address not mapped to object)

and we have 0xda == 218...

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

You can try again with FTL v2.7.1. Watch out for new WARN messages.

@xorrizon
Copy link
Author

Thanks for the hard work!
However it seems the aarch64 download is missing from the release page causing a 404 when running pihole -up

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

Thanks for pointing it out! It seems like the CI hangs for the aarch64 binary. Let me try to kick it manually...

screenshot at 2017-05-15 10-54-25

screenshot at 2017-05-15 10-53-05

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

@xorrizon The aarch64 binary is there.

@xorrizon
Copy link
Author

Here you go. Just removed the new domain lines to keep the logs a bit shorter.
Also adding --max-stackframe=10000000 to valgrind didn't change anything.

GDB output:

[2017-05-15 08:59:09.291] ########## FTL started! ##########
[2017-05-15 08:59:09.292] FTL branch: (no branch)
[2017-05-15 08:59:09.292] FTL hash: v2.7.1
[2017-05-15 08:59:09.292] FTL date: 2017-05-15 09:37:16 +0200
[2017-05-15 08:59:09.292] FTL user: ubuntu
[2017-05-15 08:59:09.292] Notice: Found no readable FTL config file
[2017-05-15 08:59:09.292]         Using default settings
[2017-05-15 08:59:09.292] Starting config file parsing
[2017-05-15 08:59:09.292]    SOCKET_LISTENING: only local
[2017-05-15 08:59:09.292]    TIMEFRAME: Rolling 24h
[2017-05-15 08:59:09.292]    QUERY_DISPLAY: Show queries
[2017-05-15 08:59:09.292] Finished config file parsing
[2017-05-15 08:59:09.293] PID of FTL process: 17608
[2017-05-15 08:59:09.458] Gravity list entries: 106239
[2017-05-15 08:59:09.458] No blacklist present
[2017-05-15 08:59:09.458] Notice: Increasing wildcards struct size from 0 to 1 (92.00 B)
[2017-05-15 08:59:09.458]         at query time: 
[2017-05-15 08:59:09.458] Wildcard blocking list entries: 1
[2017-05-15 08:59:09.458] Starting initial log file parsing
[2017-05-15 08:59:09.458] Reading from /var/log/pihole.log.1 (rw-r--r--)
[2017-05-15 08:59:09.459] Notice: Increasing forwarded struct size from 0 to 4 (209.00 B)
[2017-05-15 08:59:09.459]         at query time: May 14 00:00:42
[2017-05-15 08:59:09.526] New forward server: 139.59.208.246 ns1.he.de.dns.opennic.glue (0/4)
[2017-05-15 08:59:09.527] Notice: Increasing overTime struct size from 0 to 100 (4.25 KB)
[2017-05-15 08:59:09.527]         at query time: May 14 00:00:42
[2017-05-15 08:59:09.565] New forward server: 62.113.203.55 secondary.server.edv-froehlich.de (1/4)
[2017-05-15 08:59:09.719] Notice: Increasing queries struct size from 0 to 10000 (324.79 KB)
[2017-05-15 08:59:09.720]         at query time: May 14 08:01:04
[2017-05-15 08:59:09.720] Notice: Increasing domains struct size from 0 to 1000 (356.79 KB)
[2017-05-15 08:59:09.720]         at query time: May 14 08:01:04
[2017-05-15 08:59:09.720] New domain: lh3.googleusercontent.com (0/1000)
[2017-05-15 08:59:09.721] Notice: Increasing clients struct size from 0 to 10 (357.06 KB)
[2017-05-15 08:59:09.721]         at query time: May 14 08:01:04
[2017-05-15 08:59:09.726] New client: 192.168.11.140 (0/10)

[2017-05-15 08:59:09.731] New client: 192.168.11.149 (1/10)

[2017-05-15 08:59:09.737] New client: 192.168.11.113 (2/10)

2017-05-15 08:59:09.742] New client: 192.168.11.131 (3/10)
[2017-05-15 08:59:09.754] New client: 192.168.11.129 (4/10)

[2017-05-15 08:59:09.765] New client: 192.168.11.1 (7/10)

[2017-05-15 08:59:09.774] New client: 192.168.11.127 (8/10)

[2017-05-15 08:59:09.784] New client: 192.168.11.124 (9/10)

[2017-05-15 08:59:09.785] Notice: Increasing clients struct size from 10 to 20 (358.58 KB)
[2017-05-15 08:59:09.785]         at query time: May 14 08:13:23



*** Error in `/usr/bin/pihole-FTL': double free or corruption (out): 0x0000007fb7ff3710 ***

Program received signal SIGABRT, Aborted.
0x0000007fb7e90528 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x0000007fb7e90528 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x0000007fb7e919e0 in __GI_abort () at abort.c:89
#2  0x0000007fb7ec78c4 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fb7f783d0 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3  0x0000007fb7ecd2ec in malloc_printerr (action=<optimized out>, str=0x7fb7f784d8 "double free or corruption (out)", ptr=<optimized out>, 
    ar_ptr=<optimized out>) at malloc.c:5004
#4  0x0000007fb7ece020 in _int_free (av=0x7fb7fa09a0 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3865
#5  0x000000556e2c553c in process_pihole_log (file=1) at parser.c:395
#6  0x0000000000000039 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Valgrind Output

==18637== Memcheck, a memory error detector
==18637== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==18637== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==18637== Command: pihole-FTL debug test
==18637== 
[2017-05-15 09:01:33.752] ########## FTL started! ##########
[2017-05-15 09:01:33.835] FTL branch: (no branch)
[2017-05-15 09:01:33.836] FTL hash: v2.7.1
[2017-05-15 09:01:33.838] FTL date: 2017-05-15 09:37:16 +0200
[2017-05-15 09:01:33.839] FTL user: ubuntu
[2017-05-15 09:01:33.843] Notice: Found no readable FTL config file
[2017-05-15 09:01:33.845]         Using default settings
[2017-05-15 09:01:33.846] Starting config file parsing
[2017-05-15 09:01:33.849]    SOCKET_LISTENING: only local
[2017-05-15 09:01:33.851]    TIMEFRAME: Rolling 24h
[2017-05-15 09:01:33.853]    QUERY_DISPLAY: Show queries
[2017-05-15 09:01:33.854] Finished config file parsing
[2017-05-15 09:01:33.859] PID of FTL process: 18637
[2017-05-15 09:01:37.202] Gravity list entries: 106239
[2017-05-15 09:01:37.203] No blacklist present
[2017-05-15 09:01:37.238] Notice: Increasing wildcards struct size from 0 to 1 (92.00 B)
[2017-05-15 09:01:37.297]         at query time: 
[2017-05-15 09:01:37.304] Wildcard blocking list entries: 1
[2017-05-15 09:01:37.306] Starting initial log file parsing
[2017-05-15 09:01:37.314] Reading from /var/log/pihole.log.1 (rw-r--r--)
[2017-05-15 09:01:37.390] Notice: Increasing forwarded struct size from 0 to 4 (209.00 B)
[2017-05-15 09:01:37.396]         at query time: May 14 00:00:42
[2017-05-15 09:01:37.697] New forward server: 139.59.208.246 ns1.he.de.dns.opennic.glue (0/4)
[2017-05-15 09:01:37.706] Notice: Increasing overTime struct size from 0 to 100 (4.25 KB)
[2017-05-15 09:01:37.710]         at query time: May 14 00:00:42
[2017-05-15 09:01:37.767] New forward server: 62.113.203.55 secondary.server.edv-froehlich.de (1/4)
[2017-05-15 09:01:40.483] Notice: Increasing queries struct size from 0 to 10000 (324.85 KB)
[2017-05-15 09:01:40.490]         at query time: May 14 09:01:22
[2017-05-15 09:01:40.505] Notice: Increasing domains struct size from 0 to 1000 (356.85 KB)
[2017-05-15 09:01:40.507]         at query time: May 14 09:01:22

[2017-05-15 09:01:40.517] Notice: Increasing clients struct size from 0 to 10 (357.11 KB)
[2017-05-15 09:01:40.519]         at query time: May 14 09:01:22
[2017-05-15 09:01:40.535] New client: 192.168.11.140 (0/10)

[2017-05-15 09:01:40.563] New client: 192.168.11.131 (1/10)

[2017-05-15 09:01:40.576] New client: 192.168.11.129 (2/10)

[2017-05-15 09:01:40.656] New client: 192.168.11.113 (3/10)

[2017-05-15 09:01:40.695] New client: 192.168.11.134 (4/10)

[2017-05-15 09:01:40.708] New client: 192.168.11.1 (5/10)

[2017-05-15 09:01:40.843] New client: 192.168.11.124 (6/10)

==18637== Invalid free() / delete / delete[] / realloc()
==18637==    at 0x4845E54: free (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==18637==  Address 0x12cdb1 is not stack'd, malloc'd or (recently) free'd
==18637== 
==18637== Invalid read of size 1
==18637==    at 0x10D960: findDomain (parser.c:836)
==18637==    by 0x10E54F: process_pihole_log (parser.c:402)
==18637==  Address 0x100000001 is not stack'd, malloc'd or (recently) free'd
==18637== 
[2017-05-15 09:01:41.361] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2017-05-15 09:01:41.363] ---------------------------->  FTL crashed!  <----------------------------
[2017-05-15 09:01:41.364] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2017-05-15 09:01:41.365] > Please report a bug at https://github.com/pi-hole/FTL/issues
[2017-05-15 09:01:41.366] > and include in your report already the following details:
[2017-05-15 09:01:41.368] >
[2017-05-15 09:01:41.391] > Received signal: Segmentation fault
[2017-05-15 09:01:41.393]        at address: 4294967297
[2017-05-15 09:01:41.396]         with code: SEGV_MAPERR (Address not mapped to object)
[2017-05-15 09:01:41.399] > Memory usage (structs): 356436
[2017-05-15 09:01:41.401] > Memory usage (dynamic): 3348
[2017-05-15 09:01:41.402] >
[2017-05-15 09:01:41.404] > Thank you for helping us to improve our FTL engine!
[2017-05-15 09:01:41.405] FTL terminated!
==18637== 
==18637== Process terminating with default action of signal 6 (SIGABRT)
==18637==    at 0x48BC528: raise (raise.c:54)
==18637==    by 0x48BD9DF: abort (abort.c:89)
==18637==    by 0x10F687: SIGSEGV_handler (signals.c:61)
==18637==    by 0x38055563: ??? (in /usr/lib/valgrind/memcheck-arm64-linux)
==18637== 
==18637== HEAP SUMMARY:
==18637==     in use at exit: 361,010 bytes in 271 blocks
==18637==   total heap usage: 17,257 allocs, 16,987 frees, 1,579,036 bytes allocated
==18637== 
==18637== LEAK SUMMARY:
==18637==    definitely lost: 0 bytes in 0 blocks
==18637==    indirectly lost: 0 bytes in 0 blocks
==18637==      possibly lost: 0 bytes in 0 blocks
==18637==    still reachable: 361,010 bytes in 271 blocks
==18637==         suppressed: 0 bytes in 0 blocks
==18637== Rerun with --leak-check=full to see details of leaked memory
==18637== 
==18637== For counts of detected and suppressed errors, rerun with: -v
==18637== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Aborted

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

Thanks again, for your input and your patience!

==18637==    at 0x10D960: findDomain (parser.c:836)
==18637==    by 0x10E54F: process_pihole_log (parser.c:402)
==18637==  Address 0x100000001 is not stack'd, malloc'd or (recently) free'd
*** Error in `/usr/bin/pihole-FTL': double free or corruption (out): 0x0000007fb7ff3710 ***
#5  0x000000556e2c553c in process_pihole_log (file=1) at parser.c:395

Those two seem to be connected, although I don't see why freeing readbuffer2 could fail, since it is only used by getline() which allocates as much memory as it needs by itself.


==18637== Invalid read of size 1
==18637==    at 0x10D960: findDomain (parser.c:836)
==18637==    by 0x10E54F: process_pihole_log (parser.c:402)
==18637==  Address 0x100000001 is not stack'd, malloc'd or (recently) free'd

I'd like to know where the address 0x100000001 is coming from.

Could you do a

print readbuffer
print readbuffer2

?


It might have happened that a broken log line lead to a string that was not NULL-terminated. #54 fixes this.

@xorrizon
Copy link
Author

Here you go:

Program received signal SIGSEGV, Segmentation fault.
__GI___libc_free (mem=0x100000000) at malloc.c:2949
2949	malloc.c: No such file or directory.
(gdb) bt
#0  __GI___libc_free (mem=0x100000000) at malloc.c:2949
#1  0x000000555bc6e53c in process_pihole_log (file=1) at parser.c:395
#2  0x0000007ffffff370 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) frame 1
#1  0x000000555bc6e53c in process_pihole_log (file=1) at parser.c:395
395	parser.c: No such file or directory.
(gdb) p readbuffer
$1 = 0x7fb7ff3710 ""
(gdb) p readbuffer2 
$2 = 0x100000000 <error: Cannot access memory at address 0x100000000>

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

What I don't understand here is that you see

(gdb) p readbuffer
$1 = 0x7fb7ff3710 ""

is an empty string. This part of the code should only ever be accessed, when this buffer contains ]: query[A: https://github.com/pi-hole/FTL/blob/master/parser.c#L190

@xorrizon
Copy link
Author

Only explanation I can think of is that there is a buffer overflow or something similar somewhere which overwrites these....

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

I concur, but this is exactly what should be discovered by valgrind.

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

Well, the good thing is that the more I look at the code, the more I find and harden even the most unlikely situations. Please try again with FTL v2.7.2 which will be uploaded in a few minutes, although it does not contain any specific fixes for the issue you are seeing.

If nothing helps, I will look into buying such a device but I'd really like to avoid that being necessary.

@xorrizon
Copy link
Author

Are you sure? I always thought valgrind doesn't catch overflows if you write into buffers you allocated yourself...

Also, I looked through the code for string copys and such, and I think there might be a problem in https://github.com/pi-hole/FTL/blob/master/parser.c#L674
Since strlen doesn't include the terminating zero, the size of part and partbuffer is to small.

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

Since strlen doesn't include the terminating zero, the size of part and partbuffer is to small.

Good catch! This is something valgrind should definitely have detected (writing outside of allocated memory). But I'll surely fix it. I released v2.7.2 just seconds ago, so that is not included yet.

@xorrizon
Copy link
Author

Well thats the point, maybe the memory after these variables actually is allocated to another local variable which gets overwritten. Which in turn might completely screw up the program flow.

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

Crafting up another point release. We are surely keeping the build integration system busy this morning.

maybe the memory after these variables actually is allocated to another local variable which gets overwritten.

Yeah, I tend to forget that this cannot be detected by valgrind...

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

@xorrizon Please update to FTL v2.7.3 for yet another try

@xorrizon
Copy link
Author

SUCCESS!
Crash is goooone and no errors in valgrind!

@DL6ER
Copy link
Member

DL6ER commented May 15, 2017

Thanks for helping to improve FTL!

@DL6ER DL6ER closed this as completed May 15, 2017
@xorrizon
Copy link
Author

And thank you for the quick work and for being part of an awesome project!

DL6ER pushed a commit that referenced this issue Oct 14, 2019
Write DNS settings and implement everything that is needed to do that
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants