Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed with result 'signal' , Restart=on-failure ? #52

Closed
AKA-THE-WIZ opened this issue Mar 30, 2019 · 7 comments
Closed

Failed with result 'signal' , Restart=on-failure ? #52

AKA-THE-WIZ opened this issue Mar 30, 2019 · 7 comments

Comments

@AKA-THE-WIZ
Copy link

I've been getting crashes of aqualinkd for the few weeks I've had it installed. Running on a pi zero w on rasbian stretch lite. Not too big of a deal but I have to restart the service or reboot the pi which I can't do from the spa ;)

system error logs upon crashes are these:

...
Mar 28 18:33:54 raspberrypi systemd[1]: Starting Cleanup of Temporary Directories...
Mar 28 18:33:54 raspberrypi systemd[1]: Started Cleanup of Temporary Directories.
Mar 28 19:54:10 raspberrypi systemd[1]: aqualinkd.service: Main process exited, code=killed, status=11/SEGV
Mar 28 19:54:10 raspberrypi systemd[1]: aqualinkd.service: Unit entered failed state.
Mar 28 19:54:10 raspberrypi systemd[1]: aqualinkd.service: Failed with result 'signal'.

I've searched all my logs and can't find any bad checksums or bad packets like here in issue #30

do i need to enable DEBUG_SERIAL to find checksum errors?

I noticed that Restart=on-failure is commented out in aqualinkd.service . Is this for a reason? Will I run into problems if I enable Restart=on-failure

@AKA-THE-WIZ AKA-THE-WIZ changed the title signal Failed with result 'signal' and Restart=on-failure Mar 30, 2019
@AKA-THE-WIZ AKA-THE-WIZ changed the title Failed with result 'signal' and Restart=on-failure Failed with result 'signal' , Restart=on-failure ? Mar 30, 2019
@sfeakes
Copy link
Owner

sfeakes commented Mar 30, 2019

You can uncomment restart on failure, but that shouldn’t be necessary. (It’s a band-aid). it looks like you may be running out of disk space and that’s what’s causing the crash. I say that due to the system messages just before.

Have you dried ‘df’ After the crash just to See if anything is full.

@AKA-THE-WIZ
Copy link
Author

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root        7562636 1074344   6148128  15% /
devtmpfs          217612       0    217612   0% /dev
tmpfs             221932       0    221932   0% /dev/shm
tmpfs             221932    3112    218820   2% /run
tmpfs               5120       4      5116   1% /run/lock
tmpfs             221932       0    221932   0% /sys/fs/cgroup
/dev/mmcblk0p1     44220   22541     21680  51% /boot
tmpfs              44384       0     44384   0% /run/user/1000

It doesn't look like it's running out of space to me, but this is basically the extent of my linux knowledge.

It crashed last night when I was using it, but it restarted itself so quick I didn't even notice until I checked the log this morning:

Mar 29 22:41:32 raspberrypi systemd[1]: Starting Daily apt download activities...
Mar 29 22:41:36 raspberrypi systemd[1]: Started Daily apt download activities.
Mar 29 22:41:36 raspberrypi systemd[1]: apt-daily.timer: Adding 3h 57min 18.588413s random time.
Mar 29 22:41:36 raspberrypi systemd[1]: apt-daily.timer: Adding 50min 7.886310s random time.
Mar 29 23:17:01 raspberrypi CRON[2127]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar 29 23:51:32 raspberrypi systemd[1]: aqualinkd.service: Main process exited, code=killed, status=11/SEGV
Mar 29 23:51:32 raspberrypi systemd[1]: aqualinkd.service: Unit entered failed state.
Mar 29 23:51:32 raspberrypi systemd[1]: aqualinkd.service: Failed with result 'signal'.
Mar 29 23:51:34 raspberrypi systemd[1]: aqualinkd.service: Service hold-off time over, scheduling restart.
Mar 29 23:51:34 raspberrypi systemd[1]: Stopped Aqualink RS daemon.
Mar 29 23:51:34 raspberrypi systemd[1]: Starting Aqualink RS daemon...
Mar 29 23:51:34 raspberrypi aqualinkd: Aqualink Daemon v1.2.5
Mar 29 23:51:34 raspberrypi aqualinkd: Config log_level         = 7
Mar 29 23:51:34 raspberrypi systemd[1]: Started Aqualink RS daemon.
Mar 29 23:51:34 raspberrypi aqualinkd: Config socket_port       = 80
Mar 29 23:51:34 raspberrypi aqualinkd: Config serial_port       = /dev/ttyUSB0
Mar 29 23:51:34 raspberrypi aqualinkd: Config web_directory     = /var/www/aqualinkd/
Mar 29 23:51:34 raspberrypi aqualinkd: Config device_id         = 0x0a
Mar 29 23:51:34 raspberrypi aqualinkd: Config override frz prot = NO
Mar 29 23:51:34 raspberrypi aqualinkd: Config mqtt_server       = (null)
Mar 29 23:51:34 raspberrypi aqualinkd: Config mqtt_dz_sub_topic = (null)
Mar 29 23:51:34 raspberrypi aqualinkd: Config mqtt_dz_pub_topic = (null)
Mar 29 23:51:34 raspberrypi aqualinkd: Config mqtt_aq_topic     = (null)
Mar 29 23:51:34 raspberrypi aqualinkd: Config mqtt_user         = (null)
Mar 29 23:51:34 raspberrypi aqualinkd: Config mqtt_passwd       = (null)
Mar 29 23:51:34 raspberrypi aqualinkd: Config mqtt_ID           = aqualinkd_b827eb0698
Mar 29 23:51:34 raspberrypi aqualinkd: Config idx water temp    = 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config idx pool temp     = 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config idx spa temp      = 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config idx SWG Percent   = 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config idx SWG PPM       = 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config PDA Mode          = NO
Mar 29 23:51:34 raspberrypi aqualinkd: Config deamonize         = YES
Mar 29 23:51:34 raspberrypi aqualinkd: Config log_file          = (null)
Mar 29 23:51:34 raspberrypi aqualinkd: Config light_pgm_mode    = 0.00
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Filter_Pump   = label Filter Pump     | PDAlabel FILTER PUMP     | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Spa_Mode      = label Spa Mode        | PDAlabel SPA             | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Aux_1         = label Low Speed       | PDAlabel AUX1            | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Aux_2         = label Yard Light      | PDAlabel AUX2            | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Aux_3         = label Pool Lights     | PDAlabel AUX3            | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Aux_4         = label NONE            | PDAlabel AUX4            | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Aux_5         = label NONE            | PDAlabel AUX5            | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Aux_6         = label NONE            | PDAlabel AUX6            | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Aux_7         = label NONE            | PDAlabel AUX7            | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Pool_Heater   = label Pool Heater     | PDAlabel POOL HEAT       | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Spa_Heater    = label Spa Heater      | PDAlabel SPA HEAT        | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Config BTN Solar_Heater  = label Solar Heater    | PDAlabel EXTRA AUX       | dzidx 0
Mar 29 23:51:34 raspberrypi aqualinkd: Starting web server on port 80
Mar 29 23:51:34 raspberrypi aqualinkd: Starting MQTT client to (null)
Mar 29 23:51:34 raspberrypi aqualinkd: Listening to Aqualink RS8 on serial port: /dev/ttyUSB0
Mar 29 23:51:37 raspberrypi aqualinkd: Getting control panel information
Mar 29 23:51:44 raspberrypi aqualinkd: Control Panel 8156 REV MM
Mar 29 23:51:57 raspberrypi aqualinkd: Getting freeze protection setpoints
Mar 29 23:52:05 raspberrypi aqualinkd: Getting pool & spa heat setpoints from aqualink

Just as before, no logs generated for 20 minutes or more before it fails. Having it restart on it's own fixes the practical usability but you are right about it being a band aid.

@sfeakes
Copy link
Owner

sfeakes commented Mar 31, 2019

It doesn't look like disk space. Can you set logging to debug, and post the information from the log file just before it crashes? Hopefully that will give me a better understanding of why.

@ballle98
Copy link
Contributor

ballle98 commented Apr 4, 2019

You can follow these instructions to enable core files https://pve.proxmox.com/wiki/Enable_Core_Dump_systemd

rebuild the code with debug symbols (you don't have to install the debug version)
make DBG="-g -O0"

when it crashes "sudo -i" to become root, run gbd and dump the back trace with the bt command
cd /var/lib/coredumps/
ls
gdb /home/pi/git/AqualinkD/release/aqualinkd core-aqualinkd-sig11-user0-group0-pid386-time1554347133
bt

Note this is an example where I forced it to crash and not a real issue....

root@raspberrypi:/var/lib/coredumps# gdb /home/pi/git/AqualinkD/release/aqualinkd core-aqualinkd-sig11-user0-group0-pid386-time1554347133
GNU gdb (Raspbian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/pi/git/AqualinkD/release/aqualinkd...done.

warning: exec file is newer than core file.
[New LWP 386]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
bt
Core was generated by `/usr/local/bin/aqualinkd -c /etc/aqualinkd.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xb6f7ad64 in nanosleep () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0xb6f7ad64 in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x000168c4 in delay (howLong=10) at utils.c:457
#2  0x0001599c in main_loop () at aqualinkd.c:1301
#3  0x0001666c in daemonise (pidFile=0xbea30b9c "/run/aqualinkd.pid",
    main_function=0x153f4 <main_loop>) at utils.c:402
#4  0x00014c94 in main (argc=3, argv=0xbea30e04) at aqualinkd.c:1006

@ballle98
Copy link
Contributor

ballle98 commented Apr 4, 2019

BTW it says it failed with signal SEGV (segment violation) which means it tried to dereference a pointer that was NULL or uninitialized.

@ballle98
Copy link
Contributor

ballle98 commented Apr 9, 2019

I'm running a different version of aqaulinkd but I see crashes every couple days. I captured a core and there is a function callback pointer with an invalid value. Perhaps this is fixed in a newer version of mongoose or there is a data structure being passed in that is not initialized.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x353a3230 in ?? ()
(gdb) bt
#0  0x353a3230 in ?? ()
#1  0x000306c4 in mg_mgr_handle_ctl_sock (mgr=0xbed00b18) at mongoose.c:3763
#2  0x00030dc0 in mg_socket_if_poll (iface=0x1a4230, timeout_ms=0)
    at mongoose.c:3902
#3  0x0002d644 in mg_mgr_poll (m=0xbed00b18, timeout_ms=0) at mongoose.c:2424
#4  0x00015934 in main_loop () at aqualinkd.c:1286
#5  0x0001666c in daemonise (pidFile=0xbed00b9c "/run/aqualinkd.pid",
    main_function=0x153f4 <main_loop>) at utils.c:402
#6  0x00014c94 in main (argc=3, argv=0xbed00e04) at aqualinkd.c:1006

(gdb) up
#1  0x000306c4 in mg_mgr_handle_ctl_sock (mgr=0xbed00b18) at mongoose.c:3763
3763          ctl_msg.callback(nc, MG_EV_POLL,
(gdb) list
3758      DBG(("read %d from ctl socket", len));
3759      (void) dummy; /* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25509 */
3760      if (len >= (int) sizeof(ctl_msg.callback) && ctl_msg.callback != NULL) {
3761        struct mg_connection *nc;
3762        for (nc = mg_next(mgr, NULL); nc != NULL; nc = mg_next(mgr, nc)) {
3763          ctl_msg.callback(nc, MG_EV_POLL,
3764                           ctl_msg.message MG_UD_ARG(nc->user_data));
3765        }
3766      }
3767    }
(gdb) p ctl_msg.callback
$1 = (mg_event_handler_t) 0x353a3232

@AKA-THE-WIZ
Copy link
Author

So I have been monitoring my logs, and it looks like I haven't had any crashes for a few weeks.

I did a complete reinstall on a freshly flashed card a while back after I screwed up some network settings. So maybe that fixed it somehow, even though I had tried that before like twice and the problem persisted.

I am going to close this issue for now, but I will post again if anything comes up. Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants