Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart FTL #12

Closed
Tntdruid opened this issue Mar 19, 2017 · 68 comments
Closed

Restart FTL #12

Tntdruid opened this issue Mar 19, 2017 · 68 comments

Comments

@Tntdruid
Copy link

There should be away to restart it via web ui like you can whit DNS, FTL is always offline after like 12h for me.

@Tntdruid
Copy link
Author

So far FLT goes offline after 8h uptime

@AzureMarker
Copy link
Contributor

Check the log to see if you can figure out why it's crashing.

@Tntdruid
Copy link
Author

Nothing about the crash in the pihole-FTL.log

@technicalpyro
Copy link
Contributor

also a debug log may be useful ... i have been running FTL since the start and havent seen this issue myself

@technicalpyro
Copy link
Contributor

technicalpyro commented Mar 20, 2017

I appear to have grown this issue as well

Debug e3ym4uxkya

EDIT manual start results in the following

pi@raspberrypi:/usr/bin $ ./pihole-FTL
FATAL: Opening of FTL log (/var/log/pihole-FTL.log) failed!
       Make sure it exists

and yes that log file is present maybe a permissions issue ?

@dschaper
Copy link
Member

Do you have the .PID or .PORT files?

@technicalpyro
Copy link
Contributor

ran pihole -r on my pi seems to have fixed will report back tonight as to whether it returns... on my VPS yes in /var/run the pid and port files are present however FTL doesnt appear to be running as my graphs are unresponsive

@dschaper
Copy link
Member

Thanks, there's a little debate that we are having with the init daemon. If we want one that will do a lot of the work to automatically restart on failure, then that means SystemD. But that also causes problems for old SysV inits and Docker... So trying to find the best solution to handling dead daemons. (Would be nice if it would just clean up it's PID when it dies, but that's not quite as easy to do.)

@dschaper
Copy link
Member

What OS and release versions are you both using?

@Tntdruid
Copy link
Author

raspbian jessie lite

@dschaper
Copy link
Member

Can you post the contents of your pihole-FTL.log, either here or on Pastebin?

[2017-03-20 21:17:12.691]  -> Blocked DNS queries: 0
[2017-03-20 21:17:12.691]  -> Unknown DNS queries: 0
[2017-03-20 21:17:12.691]  -> Unique domains: 13
[2017-03-20 21:17:12.691]  -> Unique clients: 1
[2017-03-20 21:17:12.691] Successfully accessed setupVars.conf
[2017-03-21 00:00:01.562] NOTICE: pihole.log has been flushed
[2017-03-21 00:00:01.562]   Resetting internal data structure
[2017-03-21 00:00:01.936] Gravity list entries: 104442
[2017-03-21 00:00:01.936] No blacklist present
[2017-03-21 00:00:01.936] No wildcard blocking list present

@dschaper
Copy link
Member

And @technicalpyro instead of calling the daemon directly, try sudo service pihole-FTL restart. All of the log file and PID/PORT assignments are handled by /etc/init.d/pihole-FTL now instead of directly being called by the daemon directly.

@technicalpyro
Copy link
Contributor

I am using raspbian Jessie on my pi and Debian x64 Jessie on the vps

If it dies again I didn't check this morning I will attempt the restart using the recommended command

@DL6ER
Copy link
Member

DL6ER commented Mar 21, 2017

I don't like the idea of being able to restart FTL from the web interface, since it should not crash to begin with! So you discovered a situation which we have never seen.

If there is nothing in the log, FTL has either been killed by the system (less likely) or it crashed (more likely). We have tested FTL with more than 10 mio. random DNS queries on Raspberry-like hardware, so it is quite unlikely that you ran out of available memory. However, please make sure that you have the most recent version using pihole -up.

I can provide detailed steps on how you can help me debugging FTL if you are willing to do this and if you are able to reproduce the crash - we have to start FTL in a debugger, so you have to be able to get it crash "reliably" for this to work.

@technicalpyro
Copy link
Contributor

mine has gone off again i am running back into work but post instructions ere and i will be more than happy to assist in the debug

@DL6ER
Copy link
Member

DL6ER commented Mar 21, 2017

As you will see in the following, we have to do some preparations first to ensure you have a proper debugging environment. Once you are used to it, you can skip most of the steps and debugging is actually quite easy and gives you insights into how software works.

  1. Install screen and gdb (probably sudo apt-get install screen gdb)
  2. Start a screen session (it will allow you to come back even if the SSH connection died)
    2.1. If you don't know about screen, then read about it (you will love it!)
    2.2. Continue afterwards :-)
  3. Start a screen session using screen
  4. Ensure that pihole-FTL is terminated (e.g. sudo killall pihole-FTL)
  5. Arrange file permissions to be able to start FTL as your current user (the following assumes you are logged in as user pi).
    5.1. sudo touch /var/log/pihole-FTL.log /run/pihole-FTL.pid /run/pihole-FTL.port
    5.2. sudo chown pi:pi /var/log/pihole-FTL.log /run/pihole-FTL.pid /run/pihole-FTL.port
    5.3. sudo chmod 0644 /var/log/pihole-FTL.log /run/pihole-FTL.pid /run/pihole-FTL.port
  6. Start pihole-FTL in the debugger: gdb pihole-FTL
  7. Type run debug to start FTL (you should see some lines of text and FTL should start successfully).
  8. You can now close the terminal (Ctrl+A and then D to detach) and come back later using (screen -r) when it has crashed
  9. If it has crashed, copy&past the terminal output, and
  10. type also backtrace and post the output

Hopefully, this gives enough information so we can do proper debugging.

@technicalpyro
Copy link
Contributor

technicalpyro commented Mar 21, 2017

i remember screen from ages ago when i ran a shell hosting company for irc bots ... ah the good old days. All commands completed successfully and i will report back when i have results

Thanks for the thorough explanation i love learning so this is awesome

UPDATE 0800 CST (UTC -6) no change yet but im off to work so will check again in approx 10 hours

@DL6ER
Copy link
Member

DL6ER commented Mar 23, 2017

Addendum: Yesterday evening I have identified a potential problem in the garbage collection thread. I will have a look at this as soon as I get home today.

@technicalpyro
Copy link
Contributor

i will update and see if it returns

@technicalpyro
Copy link
Contributor

installed update the forward destinations over time graph seems to be broken i am running the debug again so we will see if anything funky pops up there

console view of the web admin page has the following error

Uncaught TypeError: Cannot convert undefined or null to object
    at Function.keys (<anonymous>)
    at objectToArray (index.js:17)
    at Object.success (index.js:116)
    at j (jquery.min.js:2)
    at Object.fireWith [as resolveWith] (jquery.min.js:2)
    at x (jquery.min.js:4)
    at XMLHttpRequest.<anonymous> (jquery.min.js:4)

@DL6ER
Copy link
Member

DL6ER commented Mar 24, 2017

the forward destinations over time graph seems to be broken

We have seen a similar error before (not sure about the exact line numbers). This was on a system that was behaving oddly overall. After restarting the particular Pi, the problem was nor reproducible any longer.

Could you tell me if there is any output if you do

http://pi.hole/admin/api.php?overTimeDataForwards&getForwardDestinationNames

You don't have to post the whole thing here if there is meaningful output for both objects, like:

{
"over_time":
 {"1490306700":[13,40,13,13],
  "1490307300":[16,48,16,16],
  "1490307900":[18,100,18,44],
  "1490308500":[19,180,37,34],
  ...},
"forward_destinations":
 {"google-public-dns-b.google.com|2001:4860:4860::8844":2194,
  "google-public-dns-a.google.com|2001:4860:4860::8888":2077,
  ...}
}

Note that you have to be logged in to the Pi-hole dashboard with the browser you use for trying (if you have a password set).

@technicalpyro
Copy link
Contributor

technicalpyro commented Mar 25, 2017

{"over_time":null,"forward_destinations":[]}

will not reboot the device until all troubleshooting on this is done ... would rather it work for everyone than for just me

EDIT potentially worth noting the same is occurring with the same response from the API link on my VPS

@DL6ER
Copy link
Member

DL6ER commented Mar 25, 2017

Yeah, we definitely have to look for why that happens for you but we haven't seen it. Since you are still running FTL inside the debugger, things are much more easy now.

Stop FTL be pressing Ctrl+C, you will end up inside the debugger console, like:

^C
Program received signal SIGINT, Interrupt.
(gdb)

Please provide the output of the following commands:

  1. print counters
  2. print overTime[0]
  3. print overTime[counters.overTime-2]
  4. print overTime[counters.overTime-1]
  5. print overTime[counters.overTime-1].forwarddata[0]
  6. print overTime[counters.overTime-1].forwarddata[1]
  7. print forwarded[0]
  8. print forwarded[counters.forwarded-1]

You can resume FTL inside debugger afterwards using the command continue. Could you also try to restart FTL on the other device (your VPS) to see if that would solve the problem?

@technicalpyro
Copy link
Contributor

(gdb) print counters
$1 = {queries = 791, blocked = 201, wildcardblocked = 0, cached = 132, unknown = 0,
  forwarded = 2, clients = 4, domains = 151, queries_MAX = 10000, forwarded_MAX = 4,
  clients_MAX = 20, domains_MAX = 1000, gravity = 1116111, overtime = 54,
  IPv4 = 725, IPv6 = 66, PTR = 2, SRV = 0}
(gdb) print overTime[0]
$2 = {total = 11, blocked = 5}
(gdb) print overTime[counters.overTime-2]
There is no member named overTime.
(gdb) print overTime[counters.overTime-1].forwarddata[0]
There is no member named overTime.
(gdb) print overTime[counters.overTime-1].forwarddata[1]
There is no member named overTime.
(gdb) print forwarded[0]
$3 = {count = 452, ip = 0x54ac6a98 "208.67.220.220",
  name = 0x54b059b8 "resolver2.opendns.com"}
(gdb) print forwarded[counters.forwarded-1]
$4 = {count = 275, ip = 0x54b059f0 "208.67.222.222",
  name = 0x54b05a08 "resolver1.opendns.com"}

@technicalpyro
Copy link
Contributor

VPS restart seemed to fix it
http://imgur.com/gallery/7Srsk

@DL6ER
Copy link
Member

DL6ER commented Mar 25, 2017

Ah, okak, that perfectly explains anything:

There is no member named overTime.

The version of FTL you are running is too old and does not know about this data, so there is also nothing that could be returned through the API.

You have:

(gdb) print overTime[0]
$2 = {total = 11, blocked = 5}

In contrast, I have (up-to-date version of FTL):

(gdb) print overTime[0]
$2 = {timestamp = 1490310300, total = 0, blocked = 0, forwardnum = 4, forwarddata = 0x54acd928}

Note that an update cannot update the running process of FTL. Even if you have

pi@raspberrypi:~ $ pihole-FTL version
v1.6

an already running instance of FTL can be much much older.

@technicalpyro
Copy link
Contributor

technicalpyro commented Mar 25, 2017

OK thought i had killed everything befgore my last update but appears i didn't. Running Pihole -up and pihole -r to ensure im up to date here
did an update but thinking because i am no longer compiling locally files may be missing as
i get a return of no source file version.h

Definitely agree its a version issue now i am working out how to get my pi to do as i ask and run the latest

.  pi@raspberrypi:~ $ pihole-FTL version
v0.3.1

looks like its time for a fresh install

@DL6ER
Copy link
Member

DL6ER commented Mar 25, 2017

Maybe you have a version of pihole-FTL somewhere where it shouldn't be (and hence the updater will not update this one). Try

which pihole-FTL

and delete the file manually (sudo rm ...). You may have to do this several times in your case in case you have multiple versions of pihole-FTL around. If this is the case, please provide all locations which found. Run pihole -r afterwards to re-download pihole-FTL in the latest version.

@technicalpyro
Copy link
Contributor

pi@raspberrypi:~ $ which pihole-FTL
/usr/local/bin/pihole-FTL

i did find one in the older working dir from the start
/etc/.pihole-FTL/ may have been causing the conflict

i ended up completely reinstalling after running

pihole uninstall 
sudo rm -rf /etc/.pihole/
sudo rm -rf /var/www/html/admin/

with everything we have been doing it doesnt surprise me i managed to get a conflict

@DL6ER
Copy link
Member

DL6ER commented Mar 25, 2017

/usr/local/bin/pihole-FTL is then causing the problem as the updater is placing the updated executable in /usr/bin/pihole-FTL and the system will first look into the /usr/local/...` directories and will use the executable it finds there

Once you remove /usr/local/bin/pihole-FTL everything should be sorted out. Not sure what you did with your system now...

@technicalpyro
Copy link
Contributor

technicalpyro commented Apr 8, 2017

Please attempt the screen and see if the issue persists. I know it may or may not but that was the solution when this started happening to me although mine was never on a schedule so perhaps the nightly flush has something to do with it is that a cron job you added or just the default log end of day turnover ?

@WaLLy3K
Copy link

WaLLy3K commented Apr 8, 2017

Please attempt the screen and see if the issue persists.

Yep, I've got another session going and we'll see how that goes.

perhaps the nightly flush has something to do with it

As it's a feature of DietPi, it's the primary reason why I think this issue is occurring.

@technicalpyro
Copy link
Contributor

Ahhh ok I've never used dietpi

@DL6ER
Copy link
Member

DL6ER commented Apr 8, 2017

I never had a dead screen session despite that I use at least ten of them on various machines every day. Reading what is written on various webpages it seems like that this might happen if the session is forcefully disconnected, although - again - I have never seen that even when my connection (e.g. airport WiFi) was very bad and failed regularly.

Maybe you should try to detach explicitly this time (Ctrl+A and then D).

@WaLLy3K
Copy link

WaLLy3K commented Apr 8, 2017

Screen worked as expected (I detached explicitly every time, but who knows what silly thing I did to break it), and FTL doesn't appear to have crashed - but dashboard shows no data except for 4 queries today, and my count of domains being blocked.

http://pi.hole/admin/api.php?overTimeDataForwards&getForwardDestinationNames merely shows [] despite being logged in.

print counters
$1 = {queries = 4, invalidqueries = 0, blocked = 0, wildcardblocked = 0,
  cached = 4, unknown = 0, forwarded = 2, clients = 8, domains = 1,
  queries_MAX = 10000, forwarded_MAX = 0, clients_MAX = 0, domains_MAX = 1000,
  overTime_MAX = 100, gravity = 1656290, overTime = 1, IPv4 = 2, IPv6 = 2,
  PTR = 0, SRV = 0, wildcarddomains = 12, forwardedqueries = 0}

print overTime[0]
$2 = {timestamp = 1491660300, total = 4, blocked = 0, forwardnum = 0,
  forwarddata = 0x0, querytypedata = 0x753083f8}
  
print overTime[counters.overTime-2]
$3 = {timestamp = 0, total = 1668246575, blocked = 1769237601,
  forwardnum = 25965, forwarddata = 0x0, querytypedata = 0x96d}

print overTime[counters.overTime-1]
$4 = {timestamp = 1491660300, total = 4, blocked = 0, forwardnum = 0,
  forwarddata = 0x0, querytypedata = 0x753083f8}

print overTime[counters.overTime-1].forwarddata[0]
Cannot access memory at address 0x0

print overTime[counters.overTime-1].forwarddata[1]
Cannot access memory at address 0x4

print forwarded[0]
$5 = {count = 0, ip = 0x54abf068 "192.231.203.3",
  name = 0x54ac4450 "resolv2.internode.on.net"}

print forwarded[counters.forwarded-1]
$6 = {count = 0, ip = 0x54ac4470 "192.231.203.132",
  name = 0x54ac4488 "resolv.internode.on.net"}
  
backtrace
#0  0x76ea0964 in select () at ../sysdeps/unix/syscall-template.S:81
#1  0x54aadb40 in sleepms (milliseconds=<optimized out>) at daemon.c:149
#2  0x54aac4ea in main (argc=<optimized out>, argv=<optimized out>)
    at main.c:93

It's 12.30am, so back to sleep for me. Hopefully this helps you in the meantime!

@DL6ER
Copy link
Member

DL6ER commented Apr 8, 2017

I have Sat, 4.30pm, so let's take a look ;-)

What I see in your debugging information makes only little sense... Your observation is correct: FTL did only analyze four queries.

Please add the result of

ls -lh /var/log/pihole*

and upload the recent part of the FTL log to our Tricorder (you will get a token):

tail -n100 /var/log/pihole-FTL.log | nc tricorder.pi-hole.net 9999

@WaLLy3K
Copy link

WaLLy3K commented Apr 8, 2017

Phone buzzed, and I take ages to fall back to sleep, so let's do this:

FTL log is empty - I've never seen anything in it tbh, and it's likely that DietPi is flushing it hourly.

-rwxrwxr-x 1 root    root        0 Mar 14 23:00 /var/log/pihole_debug.log
-rwxrwxr-x 1 w3k     w3k         1 Apr  9 00:00 /var/log/pihole-FTL.log
-rwxrwxr-x 1 dnsmasq www-data 142K Apr  9 00:47 /var/log/pihole.log

@DL6ER
Copy link
Member

DL6ER commented Apr 8, 2017

That is really odd ... FTL does not start if it doesn't have full write access to its own log file. I agree that it is likely that DietPi somehow messes around with the file.

Anyhow, since you are running FTL in debug mode inside of gdb, you can see all messages there as well - but they might be buried unter the [New Thread ...] and [Thread ... exited] messages of gdb

@DL6ER
Copy link
Member

DL6ER commented Apr 8, 2017

I also see that there is no pihole.log.1. Does that sound right?

FTL v1.10.3 should be able to catch SIGUSR1 which leads to an instantaneous re-processing of the pihole.log.* files. Use

sudo killall -s SIGUSR1 pihole-FTL

If FTL is still running in gdb, you will have to enter continue.

@WaLLy3K
Copy link

WaLLy3K commented Apr 8, 2017

That's right, it doesn't get rotated because the primary Pi-hole log is cleared at midnight.

I've ran that command and continued within gdb, taking a screenshot of the output because I'm on my phone. Let me know if that output would be useful - but the dashboard still shows the same stats.

I could use the "full" logging option (rsyslog and logrotate) until a fix is applied by Fourdee, but I'll wait to see what you reckon the best course of action is.

@DL6ER
Copy link
Member

DL6ER commented Apr 8, 2017

Oh ... I didn't know about that:

DietPi will automatically clear log files in /var/log every hour (to free memory used by DietPi-Ramlog)

I'm not sure what happens to FTL's log file handle when that happens. Seems like there is a high chance that things can and will break here ... it seems like I'll have to take a look into re-opening the log file before writing.

Nevertheless, I cannot see how even the Pi-hole dashboard in master could work at all (how can it show more than an hour whenpihole.log is also flushed every hour?). FTL would be affected in the same way.
Please correct me if I'm mistaken.

And: Put your phone into "do-not-disturb-me" mode :-)

EDIT Switching to full is likely to solve all your problems at once, but I rather want to make sure that FTL will also work with the borked log system (cleared every hour/day/whatever)

@WaLLy3K
Copy link

WaLLy3K commented Apr 8, 2017

I'm certain the Pi-hole log is an exception to the hour rule. @Fourdee, could you confirm please?

@Fourdee
Copy link

Fourdee commented Apr 8, 2017

@DL6ER

FTL does not start if it doesn't have full write access to its own log file. I agree that it is likely that DietPi somehow messes around with the file.

DietPi-RAMlog preserves existing file permissions in /var/log:
https://github.com/Fourdee/DietPi/blob/master/dietpi/dietpi-ramlog#L93-L94

So unless permissions have been changed manually, its untouched by DietPi.

I'm not sure what happens to FTL's log file handle when that happens. Seems like there is a high chance that things can and will break here ... it seems like I'll have to take a look into re-opening the log file before writing.
but I rather want to make sure that FTL will also work with the borked log system (cleared every hour/day/whatever)

The system is not borked. Closing the file handle once write is completed, then reopening it when needed again would be the correct method.

open_FTL_log();
//write log data
close_FTL_log();

C++ implementation example, of opening a file in write mode, then closing it after completion:
https://github.com/Fourdee/Tr-Zero/blob/master/Tr-Zero_Sourcecode/Game_Replay.cpp#L93-L129

@WaLLy3K

I'm certain the Pi-hole log is an exception to the hour rule.

/var/log/pihole.log is, ftl log is not, i'll get this supported in DietPi.

@DL6ER
Copy link
Member

DL6ER commented Apr 8, 2017

@Fourdee I wanted to summon you only later, when I improved on the log writing, but you found your way here anyway - maybe even better :-)

@technicalpyro
Copy link
Contributor

technicalpyro commented Apr 8, 2017

@Tntdruid seeing as we have happened upon this second similar issue are you by any chance using DietPi?

EDIT just re-read the top portion of this one nevermind hope this info gives you so more places to look and resolve your issue

@Fourdee
Copy link

Fourdee commented Apr 8, 2017

@DL6ER

but you found your way here anyway

@Fourdee will do that 😄 (#12 (comment))

@DL6ER
Copy link
Member

DL6ER commented Apr 8, 2017

Fair enough, so it was @WaLLy3K that summoned you

@technicalpyro
Copy link
Contributor

would this affect the rolling log feature as well ?

@DL6ER
Copy link
Member

DL6ER commented Apr 8, 2017

Rolling log will not work without pihole.log.1, so there is nothing we would have to change here. Users have to be aware of limitations and without logrotate they cannot have it.

@pi-hole/dashboard We have to keep that in mind when bug reports about not seeing the rolling window are coming in...

@DL6ER
Copy link
Member

DL6ER commented Apr 9, 2017

FTL v2.0 has a (quite) fundamentally changed way of interacting with the log. This should stay completely unnoticed for users and should now seamlessly integrate with DietPi - FTL should now be capable of handling even situations in where the whole file system pihole.log is on disappears and re-appears

@nin9s
Copy link

nin9s commented Apr 10, 2017

moved to #17

@WaLLy3K
Copy link

WaLLy3K commented Apr 11, 2017

Looks like FTL has been running for two days now, so I'd say my issue is sorted! :)

ps -p "$(cat /run/pihole-FTL.pid)" -o etime= 2>/dev/null
2-00:30:57

@DL6ER
Copy link
Member

DL6ER commented Apr 12, 2017

@WaLLy3K although I'm not in favor of magically self-fixing errors ... They are just unpredictable

@DL6ER
Copy link
Member

DL6ER commented Apr 18, 2017

Closing due to inactivity. Please open a new issue when you encounter new/the same issue, since this one is already quite long and chaotic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants