Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upsmon child process PID stored in upsmon.pid #123

Open
bigon opened this issue Apr 22, 2014 · 34 comments
Open

upsmon child process PID stored in upsmon.pid #123

bigon opened this issue Apr 22, 2014 · 34 comments
Labels
service/daemon start/stop General subject for starting and stopping NUT daemons (drivers, server, monitor); also BG/FG/Debug systemd

Comments

@bigon
Copy link
Contributor

bigon commented Apr 22, 2014

Hello,

When using systemd, it complains about the PID stored in the .pid file:

nut-monitor.service: Supervising process XXXX which is not our child. We'll most likely not notice when it exits.

And indeed when looking in upsmon.pid, the PID stored there is the one from the grand-child (unprivileged process) of the process started by init. Shouldn't this be the PID of the direct forked process instead?

@aquette
Copy link
Member

aquette commented Apr 23, 2014

2014-04-22 22:39 GMT+02:00 Laurent Bigonville notifications@github.com:

Hello,

Hi Laurent

When using systemd, it complains about the PID stored in the .pid file:

nut-monitor.service: Supervising process XXXX which is not our child. We'll most likely not notice when it exits.

And indeed when looking in upsmon.pid, the PID stored there is the one
from the grand-child of the process started by init. Shouldn't this be the
PID of the direct forked process instead?

look closer and read this FAQ entry:
http://www.networkupstools.org/docs/FAQ.html#_why_are_there_two_copies_of_upsmon_running

that said, is there an "override" mechanism in systemd to avoid this
unnecessary msg?

cheers,
Arnaud

Engineering Linux/Unix Expert - Opensource Solutions Lead - Eaton -
http://opensource.eaton.com
NUT (Network UPS Tools) Project Leader - http://www.networkupstools.org
Debian Developer - http://www.debian.org

Free Software Developer - http://arnaud.quette.fr

Conseiller Municipal - Saint Bernard du Touvet

@bigon
Copy link
Contributor Author

bigon commented Apr 23, 2014

I'm not sure there is an override.

Is it really a problem to store the pid of the process running as root instead of the unprivileged one?

@bigon
Copy link
Contributor Author

bigon commented Apr 23, 2014

Apparently the unpriv process complains and continues to run if the privileged process is killed

avr 23 20:31:05 fornost upsmon[9846]: upsmon parent process died - shutdown impossible
avr 23 20:31:05 fornost upsmon[9846]: Parent died - shutdown impossible

@clepple
Copy link
Member

clepple commented Apr 24, 2014

Stepping back, what is systemd trying to accomplish by watching the PID? If the intent is to restart upsmon if it is killed, then the right thing might be to use the -D flag to keep the parent process from going into the background. Then systemd can monitor it directly, and the PID file is still available to use for sending SIGHUP to the child to reread the configuration file (per the limitations in the upsmon man page).

@bigon
Copy link
Contributor Author

bigon commented Apr 24, 2014

Oh indeed we could prevent it to go into the background, this is even advised by systemd developers.

About reloading, we probably need to add the ExecReload= in the systemd service too then

@iva2k
Copy link

iva2k commented Mar 1, 2021

From the surface looking at it - sounds like there should be two different .pid files for upsmon run as root and upsmon run as nut/unprivileged user. If that's the case, it is a design bug. Should systemd take care of pid of upsmon run as root, and forked unprivileged pid of usbmon run as nut should be taken care of by the forking code?

Is unprivileged pid overwriting root pid and that breaks systemd and / or upsmon operation?

Even if that's not the case, the error messages in syslog do not look comforting and give no confidence that UPS shutdown will work correctly in all circumstances. Sounds like a critical issue. Can someone from NUT team chime in and triage this?

Here's a capture of sudo service nut-client status and ps from Ubuntu 18LTS, latest NUT package:

SERVICES STATUS : CLIENT =======================================================
● nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller
   Loaded: loaded (/lib/systemd/system/nut-monitor.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2021-03-01 09:41:38 PST; 7s ago
  Process: 18311 ExecStart=/sbin/upsmon (code=exited, status=0/SUCCESS)
 Main PID: 18313 (upsmon)
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/nut-monitor.service
           ├─18312 /lib/nut/upsmon
           └─18313 /lib/nut/upsmon

Mar 01 09:41:38 fs1.a.com systemd[1]: Starting Network UPS Tools - power device monitor and shutdown controller...
Mar 01 09:41:38 fs1.a.com upsmon[18311]: fopen /var/run/nut/upsmon.pid: No such file or directory
Mar 01 09:41:38 fs1.a.com upsmon[18311]: UPS: ups2@localhost (master) (power value 1)
Mar 01 09:41:38 fs1.a.com upsmon[18311]: Using power down flag file /etc/killpower
Mar 01 09:41:38 fs1.a.com upsmon[18312]: Startup successful
Mar 01 09:41:38 fs1.a.com systemd[1]: nut-monitor.service: Can't open PID file /var/run/nut/upsmon.pid (yet?) after start: No such file or direc
Mar 01 09:41:38 fs1.a.com systemd[1]: nut-monitor.service: Supervising process 18313 which is not our child. We'll most likely not notice when i
Mar 01 09:41:38 fs1.a.com systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.

RUNNING PROCESSES ==============================================================
USER       PID %CPU %MEM    VSZ   RSS TT       STAT  STARTED     TIME COMMAND         GROUP      GID
nut      18279  0.0  0.0  21996   456 ?        Ss   09:41:36 00:00:00 usbhid-ups      nut        130
nut      18281  0.0  0.0  38136   364 ?        Ss   09:41:36 00:00:00 upsd            nut        130
root     18312  0.0  0.0  35836  2764 ?        Ss   09:41:37 00:00:00 upsmon          root         0
nut      18313  0.0  0.0  50000  3936 ?        S    09:41:37 00:00:00 upsmon          nut        130

PID FILES ======================================================================
total 16
-rw-r--r-- 1 nut  nut  6 Mar  1 09:41 upsd.pid
-rw-r--r-- 1 root root 6 Mar  1 09:41 upsmon.pid
srw-rw---- 1 nut  nut  0 Mar  1 09:41 usbhid-ups-ups2
-rw-r--r-- 1 nut  nut  6 Mar  1 09:41 usbhid-ups-ups2.pid

@gwaitsi
Copy link

gwaitsi commented Mar 13, 2021

The below message re fopen also appears on freebsd variations i.e. freenas/truenas (although nut appears to work and shutdown while all related pids are created)

fopen /var/run/nut/upsmon.pid: No such file or directory
fopen /var/db/nut/upsd.pid No such file or directory

@electrofloat
Copy link

So... what is the solution to this issue?

@RJHsiao
Copy link

RJHsiao commented May 13, 2021

Hi there,
I get same message in my Ubuntu 20.04 LTS server, and no solution found.
Is somebody work on it? Or the solution(s) is/are exist that we can google it with the keyword I don't know?

@jimklimov
Copy link
Member

jimklimov commented May 13, 2021 via email

@jimklimov jimklimov added the service/daemon start/stop General subject for starting and stopping NUT daemons (drivers, server, monitor); also BG/FG/Debug label Nov 14, 2021
@jimklimov
Copy link
Member

jimklimov commented Feb 11, 2022

PR #683 (and #349 before it, now part of it) introduces a separation of debugging options vs. foreground/background running behavior, and in particular redefines the daemons under systemd units to run in foreground. Hopefully that change would alleviate this issue. Testing is welcome ;)

@jimklimov
Copy link
Member

Playing around with the daemons and service units, for issues/PRs linked above, found an interesting behavior here:

When nut-monitor.service is initially started (newly as a foregrounded process without the extra forking for systemd, but still with forking to privileged/unprivileged pair), the "Main PID" for systemd is that of the (root-owned) parent process:

* nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller
   Loaded: loaded (/lib/systemd/system/nut-monitor.service; disabled)
   Active: active (running) since Wed 2022-02-16 14:23:56 UTC; 3s ago
 Main PID: 24963 (upsmon)
   CGroup: /system.slice/nut-monitor.service
           ├─24963 /usr/local/ups/sbin/upsmon -F
           └─24964 /usr/local/ups/sbin/upsmon -F

Feb 16 14:23:56 mirabox systemd[1]: Starting Network UPS Tools - power device monitor and shutdown controller...
Feb 16 14:23:56 mirabox systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.
Feb 16 14:23:56 mirabox nut-monitor[24963]: fopen /var/run/upsmon.pid: No such file or directory
Feb 16 14:23:56 mirabox nut-monitor[24963]: Could not find PID file to see if previous upsmon instance is already running!
Feb 16 14:23:56 mirabox nut-monitor[24963]: UPS: nutdev1 (primary) (power value 1)
Feb 16 14:23:56 mirabox nut-monitor[24963]: Using power down flag file /etc/killpower

This only partially matches the other info: while "24963" is indeed the root parent, the recorded PIDFile value is that of the child:

# ps -ef | grep  upsmon
root     24963     1  0 14:23 ?        00:00:00 /usr/local/ups/sbin/upsmon -F
nobody   24964 24963  0 14:23 ?        00:00:00 /usr/local/ups/sbin/upsmon -F

# cat /run/upsmon.pid
24964

Systemd notices that after e.g. reloading the service unit:

# journalctl -flu nut-monitor &
# systemctl reload  nut-monitor
Feb 16 14:28:02 mirabox systemd[1]: Reloading Network UPS Tools - power device monitor and shutdown controller.
Feb 16 14:28:03 mirabox nut-monitor[24963]: Reloading configuration
Feb 16 14:28:03 mirabox nut-monitor[24974]: Network UPS Tools upsmon 2.7.4-4685-gc025b7e
root@mirabox:/home/bios/nut# Feb 16 14:28:03 mirabox systemd[1]: nut-monitor.service: Supervising process 24964 which is not our child. We'll most likely not notice when it exits.
Feb 16 14:28:03 mirabox systemd[1]: Reloaded Network UPS Tools - power device monitor and shutdown controller.

and then the reported "Main PID" matches it instead:

* nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller
   Loaded: loaded (/lib/systemd/system/nut-monitor.service; disabled)
   Active: active (running) since Wed 2022-02-16 14:23:56 UTC; 4min 49s ago
  Process: 24974 ExecReload=/usr/local/ups/sbin/upsmon -c reload (code=exited, status=0/SUCCESS)
 Main PID: 24964 (upsmon)
   CGroup: /system.slice/nut-monitor.service
           ├─24963 /usr/local/ups/sbin/upsmon -F
           └─24964 /usr/local/ups/sbin/upsmon -F

Feb 16 14:23:56 mirabox systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.
Feb 16 14:23:56 mirabox nut-monitor[24963]: fopen /var/run/upsmon.pid: No such file or directory
Feb 16 14:23:56 mirabox nut-monitor[24963]: Could not find PID file to see if previous upsmon instance is already running!
Feb 16 14:23:56 mirabox nut-monitor[24963]: UPS: nutdev1 (primary) (power value 1)
Feb 16 14:23:56 mirabox nut-monitor[24963]: Using power down flag file /etc/killpower
Feb 16 14:28:02 mirabox systemd[1]: Reloading Network UPS Tools - power device monitor and shutdown controller.
Feb 16 14:28:03 mirabox nut-monitor[24963]: Reloading configuration
Feb 16 14:28:03 mirabox nut-monitor[24974]: Network UPS Tools upsmon 2.7.4-4685-gc025b7e
Feb 16 14:28:03 mirabox systemd[1]: nut-monitor.service: Supervising process 24964 which is not our child. We'll most likely not notice when it exits.
Feb 16 14:28:03 mirabox systemd[1]: Reloaded Network UPS Tools - power device monitor and shutdown controller.

@jimklimov
Copy link
Member

One more aspect discussed above, about inability to open PID files like this:

fopen /var/run/upsmon.pid: No such file or directory
fopen /var/state/nut/upsd.pid No such file or directory

per investigation (and fixes) done during work on PR #1300 these are probably benign: these two daemons check if their earlier copy is already running, by looking at a PID file (if exists) and signaling the reported PID number. In case of first start after reboot (or clean restart of a service), these files do not exist and the fact is reported. With #1300 the reasons why such probing failed (no PID file, unparsable PID file, some error signalling a process) should now be logged in a less confusing manner, e.g. as seen above:

Feb 16 14:23:56 mirabox nut-monitor[24963]: fopen /var/run/upsmon.pid: No such file or directory
Feb 16 14:23:56 mirabox nut-monitor[24963]: Could not find PID file to see if previous upsmon instance is already running!

@jimklimov
Copy link
Member

On a related note, actual drivers wrapped into systemd/SMF unit instances (with nut-driver-enumerator now part of NUT) could also benefit from not-forking when started via upsdrvctl. According to comments in the latter, it generally uses forkexec() because it may start many drivers in parallel. While someone might explore adding an option to forgo that fork when starting a single device; this was not addressed so far AFAIK.

@Grandma-Betty
Copy link

Came here looking for a solution for this issue. Will it get fixed in the next nut release? Is there a workaround?
Here's what I get:

Feb 21 21:24:23 ubuntuserver systemd[1]: Starting Network UPS Tools - power device monitor and shutdown controller...
Feb 21 21:24:23 ubuntuserver upsmon[6887]: fopen /run/nut/upsmon.pid: No such file or directory
Feb 21 21:24:23 ubuntuserver upsmon[6887]: Using power down flag file /etc/killpower
Feb 21 21:24:23 ubuntuserver upsmon[6887]: UPS: ups@192.168.30.5 (slave) (power value 1)
Feb 21 21:24:23 ubuntuserver upsmon[6896]: Startup successful
Feb 21 21:24:23 ubuntuserver systemd[1]: nut-monitor.service: Can't open PID file /run/nut/upsmon.pid (yet?) after start: Operation not permitted
Feb 21 21:24:23 ubuntuserver systemd[1]: nut-monitor.service: Supervising process 6898 which is not our child. We'll most likely not notice when it exits.
Feb 21 21:24:23 ubuntuserver systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.

@dan
Copy link

dan commented Feb 22, 2022

Came here looking for a solution for this issue. Will it get fixed in the next nut release? Is there a workaround? Here's what I get:

Feb 21 21:24:23 ubuntuserver systemd[1]: Starting Network UPS Tools - power device monitor and shutdown controller...
Feb 21 21:24:23 ubuntuserver upsmon[6887]: fopen /run/nut/upsmon.pid: No such file or directory
Feb 21 21:24:23 ubuntuserver upsmon[6887]: Using power down flag file /etc/killpower
Feb 21 21:24:23 ubuntuserver upsmon[6887]: UPS: ups@192.168.30.5 (slave) (power value 1)
Feb 21 21:24:23 ubuntuserver upsmon[6896]: Startup successful
Feb 21 21:24:23 ubuntuserver systemd[1]: nut-monitor.service: Can't open PID file /run/nut/upsmon.pid (yet?) after start: Operation not permitted
Feb 21 21:24:23 ubuntuserver systemd[1]: nut-monitor.service: Supervising process 6898 which is not our child. We'll most likely not notice when it exits.
Feb 21 21:24:23 ubuntuserver systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.

I have the same issue.

@jimklimov
Copy link
Member

Not sure - there are some higher pressing priorities at the moment, at least on my side.

Thinking of the last week's investigation however, I wonder if the systemd unit "PIDFile=..." is needed here. Without it I suppose systemd would just track the parent (root) process. Anyhow it can not do much about the unprivileged child going AWOL, except restarting the parent to get them both alive again. Thinking of it more, maybe that was why PIDFile got there in the first place (to detect untimely demise of a child).

@brianbloom
Copy link

I think this is tripping up my shutdown scripts as well as I get the same log messages with PID problems. Or maybe I don't understand the shutdown workflow well enough. I have some bandwidth to help with testing if someone can advise what I should do.

@jimklimov
Copy link
Member

Can you please check if service definitions in current NUT handle this better? At least, daemons should now run in foreground mode so one fork less.

@brianbloom
Copy link

brianbloom commented Apr 18, 2022

@jimklimov (assuming that is addressed to me) I am running an apt installed version of 2.7.4. Does "current NUT" mean one of the 2.80 releases?

@ioogithub
Copy link

Came here looking for a solution for this issue. Will it get fixed in the next nut release? Is there a workaround? Here's what I get:

Feb 21 21:24:23 ubuntuserver systemd[1]: Starting Network UPS Tools - power device monitor and shutdown controller...
Feb 21 21:24:23 ubuntuserver upsmon[6887]: fopen /run/nut/upsmon.pid: No such file or directory
Feb 21 21:24:23 ubuntuserver upsmon[6887]: Using power down flag file /etc/killpower
Feb 21 21:24:23 ubuntuserver upsmon[6887]: UPS: ups@192.168.30.5 (slave) (power value 1)
Feb 21 21:24:23 ubuntuserver upsmon[6896]: Startup successful
Feb 21 21:24:23 ubuntuserver systemd[1]: nut-monitor.service: Can't open PID file /run/nut/upsmon.pid (yet?) after start: Operation not permitted
Feb 21 21:24:23 ubuntuserver systemd[1]: nut-monitor.service: Supervising process 6898 which is not our child. We'll most likely not notice when it exits.
Feb 21 21:24:23 ubuntuserver systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.

I am a new user and really struggling to get my shutdown script working, everything seems like it should but it it simply does not. This is the only error I can see. Can a dev or experienced user commend it this could be causing a problem with shutdown sequences or is this issue unrelated?

I am following this tutorial: https://forums.unraid.net/topic/93341-tutorial-networked-nut-for-cyberpower-ups/ and everything works up to the upssched.conf part.

@jimklimov
Copy link
Member

jimklimov commented Aug 30, 2022

@hawtkey: Depending on daemon, they are used in NUT generally to verify if another copy is running, or to send signals to it via command-line (e.g. commands to reload, FSD, etc), or to kill off older sibling to start a new one. Systemd is a relatively new kid on the block and not ubiquitous across OSes, so some tradeoffs still gotta get designed.

@recklessnl
Copy link

Having the exact same issue as well.

Is there no workaround in the meantime?

@jimklimov
Copy link
Member

Run daemon foreground?

@Grandma-Betty
Copy link

@jimklimov Could you link an example on how to do that?
I'm curious why the official Ubuntu repositories are waiting so long to go further, they're still on nut package 2.7.4. Maybe an update would fix some of this issues we all are having with ESXi hosts which could be related to the outdated libusb libraries.

@recklessnl
Copy link

recklessnl commented Sep 6, 2022

I upgraded to the latest version of NUT today on my Debian 11 system and I'm still having the same issue. @jimklimov what's the easiest way to run the daemon foreground?

Sep 06 17:18:39 proxmox systemd[1]: Starting Network UPS Tools - power device monitor and shutdown controller...
Sep 06 17:18:39 proxmox upsmon[9410]: fopen /run/nut/upsmon.pid: No such file or directory
Sep 06 17:18:39 proxmox upsmon[9410]: UPS: ups1@localhost (master) (power value 1)
Sep 06 17:18:39 proxmox upsmon[9410]: Using power down flag file /etc/killpower
Sep 06 17:18:39 proxmox systemd[1]: nut-monitor.service: Can't open PID file /run/nut/upsmon.pid (yet?) after start: Operation not permitted
Sep 06 17:18:39 proxmox upsmon[9411]: Startup successful
Sep 06 17:18:39 proxmox systemd[1]: nut-monitor.service: Supervising process 9413 which is not our child. We'll most likely not notice when it exits.
Sep 06 17:18:39 proxmox systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.
Sep 06 17:18:39 proxmox upsmon[9413]: Init SSL without certificate database

@jimklimov
Copy link
Member

Can't really speak for distributions' cadence - that's outside the scope of NUT as an upstream project. From what I gather, @bigon worked on proposing an updated package recipe for "experimental" distro; and from there it would eventually trickle by backports into stable/LTS distros if nobody complains of regressions.

Actually there are a few issues fixes after NUT v2.8.0 release, and some outstanding (e.g. certain but not all CPS-like devices that talk rubbish on USB HID protocol were understood before and are not now that we check it more strictly). So maybe it would be an eventual NUT v2.8.1+ that would hit the stable distros.

@jimklimov
Copy link
Member

As for running the daemon differently. it depends.

Assuming that you still have NUT v2.7.4 wrapped by systemd, you can either hackily change the unit definition in-place (systemctl status nut-monitor should show the path to the file to edit - and edits would be lost as the package is upgraded), or add a "drop-in" file which systemd daemon would merge in memory over the packaged definition. Either way, change the unit type from "forking" to default ("simple") and add the command-line option to ExecStart=.../upsmon line.

  • With NUT v2.7.4 one way to keep the daemon foregrounded (so one fork less) is to request debugging (at least one -D option).

  • With NUT v2.8.0 you can add debug_min to configuration (so not hack systemd units at all), or use the new -F option for foregrounding.

Unit definitions in NUT v2.8.0+ sources should actually include this. Then it is up to the distro what unit definitions they package - from NUT or inherited from their own older package recipe revisions.

@jimklimov
Copy link
Member

jimklimov commented Sep 8, 2022

Also, a shout-out to all who post "I have same issue": please, do detail which NUT version/build you have - this is an area where fixes are iterated, so no NUT is made equal ;)

And also, just to help me wrap my head around this: what "issue" do each of you have?

  • Systemd reporting how it dislikes a PID file is IMHO more of a symptom than diagnosis/breakage.

    • If there are no further practical problems, I am inclined to treat this further as "just" a (mis-)diagnostics noise until proven otherwise.
  • If systemd can or can't track the multi-process upsmon - that may be a problem, and maybe one more with the particular service management framework than NUT.

    • Maybe some architectural changes on NUT side (with more or clearer unit definitions and/or systemd notify interaction for example) are possible, for the two to cooperate seamlessly. => See also On systemd aware OSes, optionally integrate with sd_notify() #1590
    • For practical purposes, I guess systemd should track the parent (root-owned) upsmon process. If the child (unprivileged) process is used at all, then if it dies off it's up to the parent to respawn it (or die also - so both are respawned by systemd). The PID file is primarily used to know whom to send signals to via upsmon -c cmd and points to unprivileged daemon (or privileged if separation is not used in certain setups).
    • Not sure if systemd needs to do (benefits from) monitoring the PID file at all. Maybe dropping the line from unit definition is an acceptable solution?
    • Alternately we might always keep a file like upsmon-parent.pid (whether it forks or not) for the systemd and the likes to monitor. @bigon can that be useful?
    • Always gotta keep in mind that "in the field" the solution for end-users may rely on getting distributions to ship NUT provided systemd unit definitions (or a variant thereof). If distros continue to ship something they conjured up over the years, all bets are off on our side :-}
  • If your power-event shutdowns don't work, that is a practical issue, but probably somehow has to do with setup of upsmon itself, maybe permissions, or ability last-moment command to UPS to power off (e.g. nutshutdown that may be used for integration with systemd-shutdown; are needed filesystems with configs mounted), etc. -- but probably not related to systemd tracking (or not) the multi-process upsmon PIDs.

    • One thing that comes to mind, maybe systemd-driven OS shutdown tries to stop nut-monitor.service, gets lost in PIDs, and waits for 90s to klll everything? anyhow it should not preclude stopping of unrelated services...

@recklessnl
Copy link

recklessnl commented Sep 8, 2022

Thanks for the detailed response Jim! In my case, NUT with my UPS has worked for years without any hiccups, but for the last few weeks it's not been reliable anymore, with data going stale, and connections being refused. Here's my config, and keep in mind this used to work flawlessly for years with my Cyberpower UPS.

If I reboot the server, it works for like a day. All the attributes show up, and the pid errors discussed in this thread all show up right away, but it will work and communicate correctly with my UPS. But the next day, it starts failing with either connection failed or data stale, without fail.

I'm using NUT version 2.8.0-2, from the Debian Unstable repo. Currently running latest version of Proxmox, which is basically Debian 11.

ups.conf:

[ups1]
  driver = usbhid-ups
  port = auto
  desc = "Cyberpower UPS Server"
  pollinterval = 15

upsmon.conf:

POLLFREQ 8
DEADTIME 25

MONITOR ups1@localhost 1 upsmonitor my_password master
RUN_AS_USER nut
POWERDOWNFLAG /etc/killpower
SHUTDOWNCMD "/usr/sbin/shutdown -h now"
#NOTIFYCMD /etc/nut/notify
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC
NOTIFYFLAG ONLINE SYSLOG+WALL+EXEC
NOTIFYFLAG COMMBAD SYSLOG+WALL+EXEC
NOTIFYFLAG COMMOK SYSLOG+WALL+EXEC
NOTIFYFLAG REPLBATT SYSLOG+WALL+EXEC
NOTIFYFLAG NOCOMM SYSLOG+EXEC
NOTIFYFLAG FSD SYSLOG+EXEC
NOTIFYFLAG SHUTDOWN SYSLOG+EXEC

After a while, upsmon service shows Data Stale:


Sep 08 16:15:30 proxmox upsmon[9413]: Poll UPS [ups1@localhost] failed - Data stale
Sep 08 16:15:38 proxmox upsmon[9413]: Poll UPS [ups1@localhost] failed - Data stale
Sep 08 16:15:46 proxmox upsmon[9413]: Poll UPS [ups1@localhost] failed - Data stale

service nut-driver status shows:

Sep 07 22:41:42 proxmox usbhid-ups[9405]: libusb_get_report: error sending control message: Cannot send after transport endpoint shutdown
Sep 07 22:41:42 proxmox usbhid-ups[9405]: libusb_get_report: error sending control message: No such device
Sep 07 22:42:42 proxmox usbhid-ups[9405]: libusb_get_report: error sending control message: Cannot send after transport endpoint shutdown
Sep 07 22:42:42 proxmox usbhid-ups[9405]: libusb_get_report: error sending control message: No such device
Sep 07 22:42:57 proxmox usbhid-ups[9405]: libusb_get_report: error sending control message: Cannot send after transport endpoint shutdown
Sep 07 22:42:57 proxmox usbhid-ups[9405]: libusb_get_report: error sending control message: No such device
Sep 07 22:44:42 proxmox usbhid-ups[9405]: libusb_get_report: error sending control message: Cannot send after transport endpoint shutdown
Sep 07 22:44:42 proxmox usbhid-ups[9405]: libusb_get_report: error sending control message: No such device
Sep 07 22:44:57 proxmox usbhid-ups[9405]: libusb_get_report: error sending control message: Cannot send after transport endpoint shutdown
Sep 07 22:44:57 proxmox usbhid-ups[9405]: libusb_get_report: error sending control message: No such device

@jimklimov , where should I put the debug_min and to what value should I set it to prevent this from happening? I'd love to get NUT working again.

@jimklimov
Copy link
Member

jimklimov commented Sep 8, 2022

Thanks for the details, though the particular issue here is likely not about upsmon pid.

libusb_get_report: error sending control message: No such device

this looks sinister... and there are many reports of CPSes getting reconnected (dmesg may confirm) - at which point AFAIK usually kernel grabs the "newly discovered" device and per udev rules should relinquish access to NUT user in OS.

Recent fixes included usbhid-ups ability to reconnect on the fly (hopefully getting permissions for the device back), with further fix in that area made approx. last week. So it may quite be that a custom build of current master would help.

As for debug_min - please see docs (man pages, config file settings) for the daemon in question. But that's about NUT debug setting (via config files instead of hacking init scripts), not about hardware connectivity flip-flops.

@recklessnl
Copy link

@jimklimov thanks for the reply.

Recent fixes included usbhid-ups ability to reconnect on the fly (hopefully getting permissions for the device back), with further fix in that area made approx. last week. So it may quite be that a custom build of current master would help.

Will this be upstreamed to the Debian unstable repo soon? Would be easier than maintaining a custom build. Still having issues with this.

@jimklimov
Copy link
Member

jimklimov commented Sep 19, 2022

AFAIK recipes were proposed, search NUT issues from this summer for "debian" or "ubuntu". What happens next is up to distros...

Notably -- not sure which service definitions they would use eventually (NUT's or their old ones)...

@jimklimov
Copy link
Member

jimklimov commented Nov 25, 2022

Note: recently had to dive into the code to see what code writes PID files into which locations; this is analyzed in #1712 (comment)

UPDATE: ...and summarized the area in https://github.com/networkupstools/nut/wiki/Technicalities:-Work-with-PID-and-state-file-paths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service/daemon start/stop General subject for starting and stopping NUT daemons (drivers, server, monitor); also BG/FG/Debug systemd
Projects
None yet
Development

No branches or pull requests