Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definitive guide on how to setup upsmon to shutdown computer #2444

Closed
dacarson opened this issue May 14, 2024 · 8 comments
Closed

Definitive guide on how to setup upsmon to shutdown computer #2444

dacarson opened this issue May 14, 2024 · 8 comments
Labels
documentation Linux Some issues are specific to Linux as a platform question Shutdowns and overrides and battery level triggers Issues and PRs about system shutdown, especially if battery charge/runtime remaining is involved systemd

Comments

@dacarson
Copy link

Hi All,
I have been having great fun writing a full NUT driver for a UPSPlus/EP-0136. This device is a HAT for Raspberry Pi, and communicates via i2c. My fork is here. I will see if I can submit it at some point.

Though the purpose of filing this issue is that I can not work out how to get upsmon to shutdown the RPi when it hits LB reliably.

I followed the NUT information/instructions 6.3. Configuring automatic shutdowns for low battery events.

And I searched for some of the errors I am seeing and see this thread on upsmon PID issues though I am unable to workout if this is (a) the problem or (b) how to work around it. I do sometimes see the upsmon.pid file appear and it is owned by root:root which is different to all the others (see below). But it doesn't appear all the time.

Right now, it doesn't seem to want to become 'active'.

nut.conf:

# ...Example text ...

MODE=netserver

upsd.conf (nothing configured, everything default)

# ...Example text ...

ups.conf

# ...Example text ...
maxretry = 3

[hotwater]
    driver = usbhid-ups
    port = auto
    desc = "Hotwater_UPS"

[pi]
    driver = upsplus
    port = /dev/i2c-1
    desc = "RaspberryPi_UPSPlus"

upsd.users

[upsmon]
	password  = pass
	upsmon master
# or
#	upsmon slave
[admin]
	password = mypass
	actions = SET
	instcmds = ALL

upsmon.conf

# ...Example text ...
RUN_AS_USER nut
MONITOR pi@localhost 1 upsmon pass master
MINSUPPLIES 1
SHUTDOWNCMD "/sbin/shutdown -h +0"
POLLFREQ 5
POLLFREQALERT 5
HOSTSYNC 15
DEADTIME 15
POWERDOWNFLAG /etc/killpower
NOTIFYMSG ONLINE        "UPS %s on line power"
NOTIFYMSG ONBATT        "UPS %s on battery"
NOTIFYMSG LOWBATT       "UPS %s battery is low"
NOTIFYMSG FSD           "UPS %s: forced shutdown in progress"
NOTIFYMSG COMMOK        "Communications with UPS %s established"
NOTIFYMSG COMMBAD       "Communications with UPS %s lost"
NOTIFYMSG SHUTDOWN      "Auto logout and shutdown proceeding"
NOTIFYMSG REPLBATT      "UPS %s battery needs to be replaced"
NOTIFYMSG NOCOMM        "UPS %s is unavailable"
NOTIFYMSG NOPARENT      "upsmon parent process died - shutdown impossible"
NOTIFYFLAG ONLINE       SYSLOG+WALL
NOTIFYFLAG ONBATT       SYSLOG+WALL
NOTIFYFLAG LOWBATT      SYSLOG+WALL
NOTIFYFLAG FSD  SYSLOG+WALL
NOTIFYFLAG COMMOK       SYSLOG+WALL
NOTIFYFLAG COMMBAD      SYSLOG+WALL
NOTIFYFLAG SHUTDOWN     SYSLOG+WALL
NOTIFYFLAG REPLBATT     SYSLOG+WALL
NOTIFYFLAG NOCOMM       SYSLOG+WALL
NOTIFYFLAG NOPARENT     SYSLOG+WALL
RBWARNTIME 43200
NOCOMMWARNTIME 300
FINALDELAY 5

nut-server status:

$ sudo systemctl status nut-server
* nut-server.service - Network UPS Tools - power devices information server
     Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2024-05-14 09:26:28 PDT; 49min ago
    Process: 642 ExecStart=/sbin/upsd (code=exited, status=0/SUCCESS)
   Main PID: 654 (upsd)
      Tasks: 1 (limit: 3952)
        CPU: 1min 44.533s
     CGroup: /system.slice/nut-server.service
             `-654 /lib/nut/upsd

May 14 09:35:20 missionpi upsd[654]: Set variable: admin@::1 set driver.debug on pi to 3
May 14 09:47:48 missionpi upsd[654]: Data for UPS [pi] is stale - check driver
May 14 09:47:56 missionpi upsd[654]: UPS [pi] data is no longer stale
May 14 10:15:43 missionpi upsd[654]: Set variable: admin@::1 set driver.debug on pi to 1
May 14 10:15:43 missionpi upsd[654]: Data for UPS [pi] is stale - check driver
May 14 10:15:43 missionpi upsd[654]: UPS [pi] data is no longer stale
May 14 10:15:52 missionpi upsd[654]: Can't connect to UPS [hotwater] (usbhid-ups-hotwater): No such file or directory
May 14 10:15:52 missionpi upsd[654]: Can't connect to UPS [pi] (upsplus-pi): No such file or directory
May 14 10:15:54 missionpi upsd[654]: Connected to UPS [hotwater]: usbhid-ups-hotwater
May 14 10:15:54 missionpi upsd[654]: Connected to UPS [pi]: upsplus-pi

nut-driver status:

$ sudo systemctl status nut-driver
* nut-driver.service - Network UPS Tools - power device driver controller
     Loaded: loaded (/lib/systemd/system/nut-driver.service; static)
     Active: active (running) since Tue 2024-05-14 10:21:50 PDT; 9s ago
    Process: 4035 ExecStart=/sbin/upsdrvctl start (code=exited, status=0/SUCCESS)
      Tasks: 2 (limit: 3952)
        CPU: 528ms
     CGroup: /system.slice/nut-driver.service
             |-4037 /lib/nut/usbhid-ups -a hotwater
             `-4039 /lib/nut/upsplus -a pi

May 14 10:21:50 missionpi upsdrvctl[4036]: Using subdriver: CyberPower HID 0.4
May 14 10:21:50 missionpi upsdrvctl[4036]: Network UPS Tools - Generic HID driver 0.41 (2.7.4)
May 14 10:21:50 missionpi upsdrvctl[4036]: USB communication driver 0.33
May 14 10:21:50 missionpi usbhid-ups[4037]: Startup successful
May 14 10:21:50 missionpi upsdrvctl[4038]: Network UPS Tools - UPSPlus driver 1.0 (2.8.2-128-g6fdabe059)
May 14 10:21:50 missionpi upsdrvctl[4035]: Network UPS Tools - UPS driver controller 2.7.4
May 14 10:21:50 missionpi upsplus[4039]: Startup successful
May 14 10:21:50 missionpi systemd[1]: Started Network UPS Tools - power device driver controller.
May 14 10:21:50 missionpi upsplus[4039]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
May 14 10:21:51 missionpi upsplus[4039]: sock_connect: enabling asynchronous mode (auto)

nut-monitor status:

$ sudo systemctl status nut-monitor
* nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller
     Loaded: loaded (/lib/systemd/system/nut-monitor.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Tue 2024-05-14 10:20:04 PDT; 2min 54s ago
    Process: 3952 ExecStart=/sbin/upsmon (code=exited, status=0/SUCCESS)
   Main PID: 3952 (code=exited, status=0/SUCCESS)
        CPU: 9ms

May 14 10:20:04 missionpi systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.
May 14 10:20:04 missionpi upsmon[3952]: fopen /run/nut/upsmon.pid: No such file or directory
May 14 10:20:04 missionpi upsmon[3952]: UPS: pi@localhost (master) (power value 1)
May 14 10:20:04 missionpi upsmon[3952]: Using power down flag file /etc/killpower
May 14 10:20:04 missionpi upsmon[3952]: '/etc/killpower' exists, but we can't read from it: No such file or directory
May 14 10:20:04 missionpi upsmon[3952]: POWERDOWNFLAG (/etc/killpower) does not containthe upsmon magic string - disabling!
May 14 10:20:04 missionpi upsmon[3953]: Startup successful
May 14 10:20:04 missionpi systemd[1]: nut-monitor.service: Succeeded.

PID files:

$ sudo ls -al /run/nut
total 12
drwxrwx---  2 root nut  140 May 14 10:15 .
drwxr-xr-x 33 root root 940 May 14 09:27 ..
-rw-r--r--  1 nut  nut    4 May 14 09:26 upsd.pid
srw-rw----  1 nut  nut    0 May 14 10:15 upsplus-pi
-rw-r--r--  1 nut  nut    5 May 14 10:15 upsplus-pi.pid
srw-rw----  1 nut  nut    0 May 14 10:15 usbhid-ups-hotwater
-rw-r--r--  1 nut  nut    5 May 14 10:15 usbhid-ups-hotwater.pid

upsc pi:

$ upsc pi 
Init SSL without certificate database
battery.capacity: 3.500000
battery.charge: 54
battery.charge.low: 80
battery.charger.status: discharging
battery.current: -2.229
battery.packs: 2
battery.runtime: 6374
battery.temperature: 38
battery.type: Li-ion
battery.voltage: 3.900
battery.voltage.high: 4.200
battery.voltage.low: 3.360
battery.voltage.nominal: 4.200
device.mfr: UPSPlus HAT
device.model: EP-0136
device.serial: 00298017-42575312-20323332
device.type: ups
device.uptime: 3296
driver.debug: 0
driver.flag.allow_killpower: 0
driver.name: upsplus
driver.parameter.pollinterval: 2
driver.parameter.port: /dev/i2c-1
driver.parameter.synchronous: auto
driver.state: updateinfo
driver.version: 2.8.2-128-g6fdabe059
driver.version.internal: 1.0
input.voltage: 0.0
output.current: 1.088
output.current.nominal: 4.5
output.voltage: 4.948
output.voltage.nominal: 5.0
ups.firmware: 10
ups.load: 26.179
ups.mfr: UPSPlus HAT
ups.model: EP-0136
ups.power: 5.890
ups.power.nominal: 22.500000
ups.realpower: 5.890
ups.realpower.nominal: 22.500000
ups.start.auto: yes
ups.status: OB LB DISCHRG
ups.timer.reboot: 0
ups.timer.shutdown: 0
ups.type: ups
@jimklimov
Copy link
Member

Hi, sounds great!

One thing that caught my attention first is the single nut-driver.service - I don't think NUT currently ships one (2.7.4 did). Instead, there would be a nut-driver@.service template and instances generated by NDE (https://github.com/networkupstools/nut/wiki/nut%E2%80%90driver%E2%80%90enumerator-(NDE)) separately for each driver - so a failure of one does not cause restart of everyone (and their run-time dependencies can differ, e.g. an SNMP UPS driver needs networking while USB/serial/i2c does not and can start ASAP).

I see that your driver is based on recent NUT upstream codebase - it can be helpful to update the rest of the running services to use your build, if only to get the more advanced debugging and other features. I hope there are no fatal incompatibilities (ABI or protocol wise) between older/newer drivers and servers and clients - those would be unfortunate and unplanned - but better safe than sorry in this regard too. Especially with older upsd and newer drivers - they could use a newer iteration of Unix socket protocol that the server would not accept and at best ignore; this coupling is treated as more intimate and might be less "protected" than the networked protocol intended to talk to unspecified third-party clients.

I am not quickly sure what to make of this part:

May 14 10:20:04 missionpi upsmon[3952]: Using power down flag file /etc/killpower
May 14 10:20:04 missionpi upsmon[3952]: '/etc/killpower' exists, but we can't read from it: No such file or directory
May 14 10:20:04 missionpi upsmon[3952]: POWERDOWNFLAG (/etc/killpower) does not containthe upsmon magic string - disabling!

It seems like the file exists but at the same time "No such file or directory". Wondering it there is a confusing filesystem object (e.g. a symlink pointing nowhere, so there is a directory entry but indeed nothing to read from) or some bug in 2.7.4 about this?

Also not sure about the "disabling" part. It would make sense for upsmon -K (a separate program call checking late in shutdown that the daemonized copy of upsmon saved the killpower file upon an FSD) to ignore an invalid file; however the daemon startup should have at best removed the file and marched on (IIRC). So I've re-read the code, and the message with "disabling" (and a missing space) is in fact from clear_pdflag() which should have removed it but could not in this case - so the daemon treats the flag as not configured (assumes the filename is occupied by something unrelated to NUT so we should not corrupt/remove that file) and probably leads to your other issues with shutting down.

Normally, upsmon is started by root and depending on settings it either stays root, or by default splits into two daemons (a bit under root and most of work happening under nut or nobody). The root-owned part is responsible for touch-file creation in case of FSD and calling the SHUTDOWNCMD, and exits (and/or is killed by OS shutdown processing that goes on a killing spree for remaining processes). For systems with late shutdown handling, either with old-style init scripts managing the whole OS life cycle until power-off, or with systemd shutdown hook support, there is a chance to run custom code after that killing spree. The nutshutdown script can be used (or adapted in non-Linux OSes) to check if the killpower flag exists, and run the driver program to tell the UPS to cut the power (if it supports such operation). Typically such script then sleeps for an hour and reboots, to work around UPSes that would not power off when the wall power has returned at the wrong time (during the shutdown), so your servers would not stay halted indefinitely.

@jimklimov
Copy link
Member

By the way, the thread you've mentioned leads to some other investigations and write-ups about this, notably https://github.com/networkupstools/nut/wiki/Technicalities:-Work-with-PID-and-state-file-paths (noting that e.g. for upsmon the PID file is that of child process, if they split ways and are not a monoprocess; if the child is killed the parent should exit too).

@jimklimov jimklimov added question documentation systemd Linux Some issues are specific to Linux as a platform Shutdowns and overrides and battery level triggers Issues and PRs about system shutdown, especially if battery charge/runtime remaining is involved labels May 14, 2024
jimklimov added a commit that referenced this issue May 14, 2024
…a non-NUT killpower file [#2444]

Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
@dacarson
Copy link
Author

dacarson commented May 14, 2024

Hi, sounds great!

Thanks for your prompt and detailed response.

One thing that caught my attention first is the single nut-driver.service - I don't think NUT currently ships one (2.7.4 did). Instead, there would be a nut-driver@.service template and instances generated by NDE (https://github.com/networkupstools/nut/wiki/nut%E2%80%90driver%E2%80%90enumerator-(NDE)) separately for each driver - so a failure of one does not cause restart of everyone (and their run-time dependencies can differ, e.g. an SNMP UPS driver needs networking while USB/serial/i2c does not and can start ASAP).

I have seen the newer nut-driver@.service services. It makes sense to separate them. It looks like I do have an older version installed:

$ sudo apt show nut
Package: nut
Version: 2.7.4-13
Priority: optional
Section: metapackages
Maintainer: Laurent Bigonville <bigon@debian.org>
Installed-Size: 276 kB
Depends: nut-client, nut-server
Homepage: https://networkupstools.org/
Tag: admin::monitoring, hardware::power, hardware::power:ups,
 interface::daemon, network::server, role::program, scope::utility
Download-Size: 247 kB
APT-Manual-Installed: yes
APT-Sources: http://deb.debian.org/debian bullseye/main arm64 Packages
Description: network UPS tools - metapackage
 Network UPS Tools (NUT) is a client/server monitoring system that
 allows computers to share uninterruptible power supply (UPS) and
 power distribution unit (PDU) hardware. Clients access the hardware
 through the server, and are notified whenever the power status
 changes.
 .
 This package is a metapackage that installs both nut-server and nut-client,
 in most cases it is sufficient for a basic UPS monitoring system.

Though it says that it is the newest version:

$ sudo apt-get -s install nut
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
nut is already the newest version (2.7.4-13).
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.

I see that your driver is based on recent NUT upstream codebase - it can be helpful to update the rest of the running services to use your build, if only to get the more advanced debugging and other features. I hope there are no fatal incompatibilities (ABI or protocol wise) between older/newer drivers and servers and clients - those would be unfortunate and unplanned - but better safe than sorry in this regard too.

Until this point, my newer driver has been running fine. I monitor the data from it on another RPi running grafana. I do occasionally see this from nut-driver service:

May 14 10:21:50 missionpi upsplus[4039]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
May 14 10:21:51 missionpi upsplus[4039]: sock_connect: enabling asynchronous mode (auto)
May 14 10:24:45 missionpi upsplus[4039]: WARNING: send_to_all: write 32 bytes to socket 6 failed (ret=-1), disconnecting: Broken pipe
May 14 10:24:45 missionpi upsplus[4039]: sock_connect: enabling asynchronous mode (auto)

I've tried to understand this issue, but it seems others have just ignored it.

Especially with older upsd and newer drivers - they could use a newer iteration of Unix socket protocol that the server would not accept and at best ignore; this coupling is treated as more intimate and might be less "protected" than the networked protocol intended to talk to unspecified third-party clients.

Are there instructions somewhere for where I can find a newer release?

Presently I do have a fork of main for development, as I do hope to submit my driver back.

I am not quickly sure what to make of this part:

May 14 10:20:04 missionpi upsmon[3952]: Using power down flag file /etc/killpower
May 14 10:20:04 missionpi upsmon[3952]: '/etc/killpower' exists, but we can't read from it: No such file or directory
May 14 10:20:04 missionpi upsmon[3952]: POWERDOWNFLAG (/etc/killpower) does not containthe upsmon magic string - disabling!

It seems like the file exists but at the same time "No such file or directory". Wondering it there is a confusing filesystem object (e.g. a symlink pointing nowhere, so there is a directory entry but indeed nothing to read from) or some bug in 2.7.4 about this?

I am confused by this too. The file is there. I just assumed that something (magic string?) needs to be inside it:

$ ls -al /etc/killpower 
-rw-r--r-- 1 root root 0 Sep 28  2023 /etc/killpower

Also not sure about the "disabling" part. It would make sense for upsmon -K (a separate program call checking late in shutdown that the daemonized copy of upsmon saved the killpower file upon an FSD) to ignore an invalid file; however the daemon startup should have at best removed the file and marched on (IIRC).

I haven't tried testing with upsmon -K, though I have been using upsdrvctl -t shutdown and thus the driver command line /lib/nut/upsplus -a pi -k to make sure that the UPS does actually shutdown (which it does :-) ):

$ sudo upsdrvctl -t shutdown 
Network UPS Tools - UPS driver controller 2.7.4
*** Testing mode: not calling exec/kill
   0.000000	
If you're not a NUT core developer, chances are that you're told to enable debugging
to see why a driver isn't working for you. We're sorry for the confusion, but this is
the 'upsdrvctl' wrapper, not the driver you're interested in.

Below you'll find one or more lines starting with 'exec:' followed by an absolute
path to the driver binary and some command line option. This is what the driver
starts and you need to copy and paste that line and append the debug flags to that
line (less the 'exec:' prefix).

   0.000666	Shutdown UPS: hotwater
   0.000785	exec:  /lib/nut/usbhid-ups -a hotwater -k
   0.000872	Shutdown UPS: pi
   0.000929	exec:  /lib/nut/upsplus -a pi -k

So I've re-read the code, and the message with "disabling" (and a missing space) is in fact from clear_pdflag() which should have removed it but could not in this case - so the daemon treats the flag as not configured (assumes the filename is occupied by something unrelated to NUT so we should not corrupt/remove that file) and probably leads to your other issues with shutting down.

Ah! It sounds like the file shouldn't be there. So I could try just deleting the killpower file and then see what happens...

Normally, upsmon is started by root and depending on settings it either stays root, or by default splits into two daemons (a bit under root and most of work happening under nut or nobody). The root-owned part is responsible for touch-file creation in case of FSD and calling the SHUTDOWNCMD, and exits (and/or is killed by OS shutdown processing that goes on a killing spree for remaining processes). For systems with late shutdown handling, either with old-style init scripts managing the whole OS life cycle until power-off, or with systemd shutdown hook support, there is a chance to run custom code after that killing spree.

This did magically happen once, but I don't know why and haven't been able to see it happen again. Here is a sample from syslog of the one time it worked:

May 14 09:06:26 missionpi upsmon[34621]: Communications with UPS pi@localhost lost
May 14 09:06:31 missionpi upsmon[34621]: Communications with UPS pi@localhost established
May 14 09:06:31 missionpi upsmon[34621]: UPS pi@localhost battery is low
May 14 09:06:31 missionpi upsd[34541]: Client upsmon@::1 set FSD on UPS [pi]
May 14 09:06:31 missionpi upsmon[34621]: Executing automatic power-fail shutdown
May 14 09:06:31 missionpi upsmon[34621]: Auto logout and shutdown proceeding
May 14 09:06:36 missionpi systemd[1]: nut-monitor.service: Succeeded.
May 14 09:06:36 missionpi systemd[1]: unattended-upgrades.service: Succeeded.
May 14 09:06:36 missionpi systemd[1]: Stopping Session 1 of user pi.
May 14 09:06:36 missionpi systemd[1]: Stopping Session 3 of user pi.
May 14 09:06:36 missionpi systemd[1]: Stopping Session 4 of user pi.
May 14 09:06:36 missionpi systemd[1]: Removed slice system-modprobe.slice.
...
May 14 09:06:36 missionpi systemd[1]: Stopping Make remote CUPS printers available locally...
May 14 09:06:36 missionpi systemd[1]: Stopping dphys-swapfile - set up, mount/unmount, and delete a swap file...
May 14 09:06:36 missionpi systemd[1]: Stopping Getty on tty1...
May 14 09:06:36 missionpi systemd[1]: glamor-test.service: Succeeded.
May 14 09:06:36 missionpi systemd[1]: Stopped Check for glamor.
May 14 09:06:36 missionpi systemd[1]: gldriver-test.service: Succeeded.
May 14 09:06:36 missionpi systemd[1]: Stopped Check for v3d driver.

The nutshutdown script can be used (or adapted in non-Linux OSes) to check if the killpower flag exists, and run the driver program to tell the UPS to cut the power (if it supports such operation).

It does support such an operation. When I run /lib/nut/upsplus -a pi -k, the UPS shutdown timer starts (via upsdrv_shutdown(), and then shutsdown. Though the one time upsmon magically happened, the UPS didn't cut power. I need to debug that.

Typically such script then sleeps for an hour and reboots, to work around UPSes that would not power off when the wall power has returned at the wrong time (during the shutdown), so your servers would not stay halted indefinitely.

I'll have to look into this more as I believe my UPS has an issue here.

@dacarson
Copy link
Author

Thanks for your help! The RPi is shutting down reliably now!

I had two issues;
(a) the /etc/killpower file, I needed to remove it. Not sure how it got there in the first place and
(b) while reading through the PID Issue, I saw a comment suggesting a change the service launch from Forking to Simple. I tried that out, but it didn't work. When I changed it back to Forking, I put Folking.

Fixing these two items fixed it. I guess it happened magically before I messed with (b) and maybe the file is (a) wasn't there.

However, I don't see the shutdown sent to the UPS as specified in step 8 of the shutdown flow:
8. init then runs your shutdown script. This checks for the POWERDOWNFLAG, finds it, and tells the UPS driver(s) to power off the load by sending commands to the connected UPS device(s) they manage.

Where can I find what the "commands" are that are sent to the UPS device?

With debug=1 my driver logs each of the values that are retrieved. I would expect to see the Shutdown timer set to a value if it was told to shutdown. (For my driver, 0 means that the timer isn't running). These are the last entries in syslog before the shutdown:

May 14 15:24:58 missionpi upsplus[4039]: [D1] Battery Charge Level: 73%
May 14 15:24:58 missionpi bluetoothd[858]: Stopping SDP server
May 14 15:24:58 missionpi upsplus[4039]: [D1] INA219 Battery Voltage: 4.000V
May 14 15:24:58 missionpi bluetoothd[858]: Exit
May 14 15:24:58 missionpi upsplus[4039]: [D1] INA219 Battery Power: 12.532W
May 14 15:24:58 missionpi upsplus[4039]: [D1] Battery Voltage High: 4.200V
May 14 15:24:58 missionpi upsplus[4039]: [D1] Battery runtime: 6165s
May 14 15:24:58 missionpi upsd[4158]: mainloop: Interrupted system call
May 14 15:24:58 missionpi upsd[4158]: Signal 15: exiting
May 14 15:24:58 missionpi systemd[1]: rpi-eeprom-update.service: Succeeded.
May 14 15:24:58 missionpi upsplus[4039]: [D1] INA219 Battery Current: -3.152A
May 14 15:24:58 missionpi upsplus[4039]: WARNING: send_to_all: write 33 bytes to socket 6 failed (ret=-1), disconnecting: Broken pipe
May 14 15:24:58 missionpi upsplus[4039]: [D1] INA219 Output Voltage: 4.928V
May 14 15:24:58 missionpi upsplus[4039]: [D1] INA219 Output Power: 7.414W
May 14 15:24:58 missionpi upsplus[4039]: [D1] UPS Load: 32.952%
May 14 15:24:58 missionpi upsplus[4039]: [D1] INA219 Output Current: 1.593A
May 14 15:24:58 missionpi upsplus[4039]: [D1] MicroUSB Input Voltage: 62.465V
May 14 15:24:58 missionpi upsplus[4039]: [D1] Device uptime: 21196s
May 14 15:24:58 missionpi upsplus[4039]: [D1] Shutdown Timer: 0s
May 14 15:24:58 missionpi systemd[1]: Stopped Check for Raspberry Pi EEPROM updates.
May 14 15:24:58 missionpi upsplus[4039]: [D1] Reboot Timer: 0s
May 14 15:24:58 missionpi upsplus[4039]: [D1] Auto restart on external power: yes
May 14 15:24:58 missionpi upsplus[4039]: [D1] Power status: normal
May 14 15:24:58 missionpi upsplus[4039]: [D1] Battery Voltage Low: 3.360V
May 14 15:24:58 missionpi upsplus[4039]: [D1] Battery Voltage High: 4.200V
May 14 15:24:58 missionpi upsplus[4039]: [D1] Battery Status: Low
May 14 15:24:58 missionpi upsplus[4039]: [D1] Battery Status: Resting
May 14 15:24:58 missionpi upsplus[4039]: [D1] Battery Voltage High: 4.200V

@jimklimov
Copy link
Member

jimklimov commented May 15, 2024

For installing the current NUT codebase (e.g. your fork) over packaged versions (release/update cadence is the distro thing, can't help much here) with as much re-use of their build configuration as is deemed reasonable, check this wiki doc: https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests

This should in particular deliver a /usr/lib/systemd/system-shutdown/nutshutdown which systemd should run late during power-off. That's the "step 8" :)

Your /etc/killpower file maybe was touch'ed for manual experiments? I made that mistake some time ago, but don't remember its reads claiming "No such file". Maybe if fgets() involved in that magic token parsing encounters an EOF right away (empty file), it claims that there is no file? By permissions it would have been at least readable. :\

Note that it can be prudent to use a non-default location under /run or /dev/shm rather than /etc to avoid hitting the flash chips with this not very important data (should disappear after reboot and a new start of upsmon anyway).

For issue "b", beside changing the service type there should have been also a change of its behavior (running daemons "foregrounded" as far as NUT is concerned, e.g. by enabling debug with -D since forever, or with new versions - using an explicit -F option).

@dacarson
Copy link
Author

For installing the current NUT codebase (e.g. your fork) over packaged versions (release/update cadence is the distro thing, can't help much here) with as much re-use of their build configuration as is deemed reasonable, check this wiki doc: https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests

This is very useful, thank you. Wish I had found that earlier. I had to reverse engineer the existing NUT binaries to work out what I needed to put on the configure line to build a driver that would work in place. For my platform, and for my driver, I am using:
$ ./configure --with-linux_i2c --with-statepath=/run/nut --sysconfdir=/etc/nut --with-user=nut --with-group=nut --with-pidpath=/run/nut

I created the man document for my driver, but I was never able to get documentation building. With the link above that specifies the packages, I hope to be able to build everything e2e.

This should in particular deliver a /usr/lib/systemd/system-shutdown/nutshutdown which systemd should run late during power-off. That's the "step 8" 😊

I do have a /etc/init.d/nut-server that actually does have a poweroff step, though when I search syslog I don't see any of the log messages that it should produce.

/etc/init.d/nut-server poweroff step
  poweroff)
    wait_delay=`sed -ne 's#^ *POWEROFF_WAIT= *\(.*\)$#\1#p' /etc/nut/nut.conf`
    # UPS poweroff action is actually done here.
    # But nut-monitor (Ie nut-client) does the check and call nut-server if needed!
    # This action MUST NOT be called directly, and thus is not exposed in 'Usage'
    case "$MODE" in
      standalone|netserver)
        log_daemon_msg "Shutting down the UPS ..."
        if $upsdrvctl shutdown ; then
          # FIXME (needed?): sleep 5
          log_progress_msg "Waiting for UPS to cut the power"
          log_end_msg 0
        else
          log_progress_msg "Shutdown failed."
          log_progress_msg "Waiting for UPS batteries to run down"
          log_end_msg 0
        fi
        if [ -n "$wait_delay" ] ; then
          log_daemon_msg " (will reboot after $wait_delay) ..."
          sleep "$wait_delay"
          invoke-rc.d reboot stop
        fi
        ;;
      none|netclient|*)
        # nothing to do
        ;;
    esac
    ;;

Your /etc/killpower file maybe was touch'ed for manual experiments? I made that mistake some time ago, but don't remember its reads claiming "No such file". Maybe if fgets() involved in that magic token parsing encounters an EOF right away (empty file), it claims that there is no file? By permissions it would have been at least readable. :\

I could have created it for manual experiments, following an example I found online. Now that I have removed it, I wonder if it is being created correctly. If it wasn't created, that would explain why the RPi is shutting down but not the UPS.

Note that it can be prudent to use a non-default location under /run or /dev/shm rather than /etc to avoid hitting the flash chips with this not very important data (should disappear after reboot and a new start of upsmon anyway).

For issue "b", beside changing the service type there should have been also a change of its behavior (running daemons "foregrounded" as far as NUT is concerned, e.g. by enabling debug with -D since forever, or with new versions - using an explicit -F option).

I did add the -D option, when I switched it to Simple. I did remove the -D but mis-spelt Fork.

My next step is to try replacing my NUT deployment with a current version. There seems to be a lot to do to accomplish this. I need to install all the needed components, and what is confusing is, I need to make sense of the DEB scripts and how they are meant to be used.

Feel free to close this issue now. If, and when, I get the time to work through replacing my NUT deployment and I run into issues, I will file a new issue.

Thanks for your help.

@jimklimov
Copy link
Member

jimklimov commented May 15, 2024

For Debian packaging, there's generally debuild (IIRC) to run and wrap the operation. It has been a while since I touched that, but there is a backlogged issue to pick up "reference" packaging scripts from the 42ITy NUT fork (FTY branch here) and clarify what to do with them, primarily to help users roll their own non-distro packages and have a sort of file-based dialog to suggest what distros can do with NUT integration (and feed back their ideas via PRs to us).

I do have a /etc/init.d/nut-server that actually does have a poweroff step, though when I search syslog I don't see any of the log messages that it should produce.

The posted implementation seems to take care of telling the UPS to power off, sleeping and rebooting the server (if still powered); however it does not seem to ask if the FSD flag was raised in the first place. Also, somebody (who probably checks upsmon -K for the flag) should call this script with this argument during your shutdown, and have some way for this sleep before a reboot to be exempt from the killing spree for all remaining processes (if your OS does that) - e.g. the "systemd-shutdown" hooks achieve just that.

Also not sure if invoke-rc.d is right for an OS with systemd, although it can be correct for other frameworks (upstart IIRC?) and might be a portability alias in systemd itself? :-\

@dacarson
Copy link
Author

FWIW I was able to get to the newer version of NUT by moving from Debian 11 bullseye to Debian 12 bookworm. Now:

$ sudo apt show nut
Package: nut
Version: 2.8.0-7
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Linux Some issues are specific to Linux as a platform question Shutdowns and overrides and battery level triggers Issues and PRs about system shutdown, especially if battery charge/runtime remaining is involved systemd
Projects
Status: Todo
Development

No branches or pull requests

2 participants