-
-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upsmon child process PID stored in upsmon.pid #123
Comments
2014-04-22 22:39 GMT+02:00 Laurent Bigonville notifications@github.com:
that said, is there an "override" mechanism in systemd to avoid this cheers, Engineering Linux/Unix Expert - Opensource Solutions Lead - Eaton - Free Software Developer - http://arnaud.quette.frConseiller Municipal - Saint Bernard du Touvet |
I'm not sure there is an override. Is it really a problem to store the pid of the process running as root instead of the unprivileged one? |
Apparently the unpriv process complains and continues to run if the privileged process is killed
|
Stepping back, what is systemd trying to accomplish by watching the PID? If the intent is to restart upsmon if it is killed, then the right thing might be to use the -D flag to keep the parent process from going into the background. Then systemd can monitor it directly, and the PID file is still available to use for sending SIGHUP to the child to reread the configuration file (per the limitations in the upsmon man page). |
Oh indeed we could prevent it to go into the background, this is even advised by systemd developers. About reloading, we probably need to add the ExecReload= in the systemd service too then |
From the surface looking at it - sounds like there should be two different .pid files for Is unprivileged pid overwriting root pid and that breaks Even if that's not the case, the error messages in syslog do not look comforting and give no confidence that UPS shutdown will work correctly in all circumstances. Sounds like a critical issue. Can someone from NUT team chime in and triage this? Here's a capture of
|
The below message re fopen also appears on freebsd variations i.e. freenas/truenas (although nut appears to work and shutdown while all related pids are created) fopen /var/run/nut/upsmon.pid: No such file or directory |
So... what is the solution to this issue? |
Hi there, |
I keep meaning to take a closer look at this, but it keeps drowning in
priorities :(
…On Thu, May 13, 2021, 19:46 RJ Hsiao ***@***.***> wrote:
Hi there,
I get same message in my Ubuntu 20.04 LTS server, and no solution found.
Is somebody work on it? Or the solution(s) is/are exist that we can google
it with the keyword I don't know?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#123 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMPTFGJKKIQA6RWSPPOEXLTNQF53ANCNFSM4AOSH5RQ>
.
|
…dsignal(), like in upsd [for networkupstools#1299, networkupstools#123]
…dsignal(), like in upsd [for networkupstools#1299, networkupstools#123]
Playing around with the daemons and service units, for issues/PRs linked above, found an interesting behavior here: When
This only partially matches the other info: while "24963" is indeed the root parent, the recorded PIDFile value is that of the child:
Systemd notices that after e.g. reloading the service unit:
and then the reported "Main PID" matches it instead:
|
One more aspect discussed above, about inability to open PID files like this:
per investigation (and fixes) done during work on PR #1300 these are probably benign: these two daemons check if their earlier copy is already running, by looking at a PID file (if exists) and signaling the reported PID number. In case of first start after reboot (or clean restart of a service), these files do not exist and the fact is reported. With #1300 the reasons why such probing failed (no PID file, unparsable PID file, some error signalling a process) should now be logged in a less confusing manner, e.g. as seen above:
|
On a related note, actual drivers wrapped into systemd/SMF unit instances (with nut-driver-enumerator now part of NUT) could also benefit from not-forking when started via upsdrvctl. According to comments in the latter, it generally uses |
Came here looking for a solution for this issue. Will it get fixed in the next nut release? Is there a workaround?
|
I have the same issue. |
Not sure - there are some higher pressing priorities at the moment, at least on my side. Thinking of the last week's investigation however, I wonder if the systemd unit "PIDFile=..." is needed here. Without it I suppose systemd would just track the parent (root) process. Anyhow it can not do much about the unprivileged child going AWOL, except restarting the parent to get them both alive again. Thinking of it more, maybe that was why PIDFile got there in the first place (to detect untimely demise of a child). |
I think this is tripping up my shutdown scripts as well as I get the same log messages with PID problems. Or maybe I don't understand the shutdown workflow well enough. I have some bandwidth to help with testing if someone can advise what I should do. |
Can you please check if service definitions in current NUT handle this better? At least, daemons should now run in foreground mode so one fork less. |
@jimklimov (assuming that is addressed to me) I am running an apt installed version of 2.7.4. Does "current NUT" mean one of the 2.80 releases? |
I am a new user and really struggling to get my shutdown script working, everything seems like it should but it it simply does not. This is the only error I can see. Can a dev or experienced user commend it this could be causing a problem with shutdown sequences or is this issue unrelated? I am following this tutorial: https://forums.unraid.net/topic/93341-tutorial-networked-nut-for-cyberpower-ups/ and everything works up to the upssched.conf part. |
@hawtkey: Depending on daemon, they are used in NUT generally to verify if another copy is running, or to send signals to it via command-line (e.g. commands to reload, FSD, etc), or to kill off older sibling to start a new one. Systemd is a relatively new kid on the block and not ubiquitous across OSes, so some tradeoffs still gotta get designed. |
Having the exact same issue as well. Is there no workaround in the meantime? |
Run daemon foreground? |
@jimklimov Could you link an example on how to do that? |
I upgraded to the latest version of NUT today on my Debian 11 system and I'm still having the same issue. @jimklimov what's the easiest way to run the daemon foreground?
|
Can't really speak for distributions' cadence - that's outside the scope of NUT as an upstream project. From what I gather, @bigon worked on proposing an updated package recipe for "experimental" distro; and from there it would eventually trickle by backports into stable/LTS distros if nobody complains of regressions. Actually there are a few issues fixes after NUT v2.8.0 release, and some outstanding (e.g. certain but not all CPS-like devices that talk rubbish on USB HID protocol were understood before and are not now that we check it more strictly). So maybe it would be an eventual NUT v2.8.1+ that would hit the stable distros. |
As for running the daemon differently. it depends. Assuming that you still have NUT v2.7.4 wrapped by systemd, you can either hackily change the unit definition in-place (
Unit definitions in NUT v2.8.0+ sources should actually include this. Then it is up to the distro what unit definitions they package - from NUT or inherited from their own older package recipe revisions. |
Also, a shout-out to all who post "I have same issue": please, do detail which NUT version/build you have - this is an area where fixes are iterated, so no NUT is made equal ;) And also, just to help me wrap my head around this: what "issue" do each of you have?
|
Thanks for the detailed response Jim! In my case, NUT with my UPS has worked for years without any hiccups, but for the last few weeks it's not been reliable anymore, with data going stale, and connections being refused. Here's my config, and keep in mind this used to work flawlessly for years with my Cyberpower UPS. If I reboot the server, it works for like a day. All the attributes show up, and the pid errors discussed in this thread all show up right away, but it will work and communicate correctly with my UPS. But the next day, it starts failing with either connection failed or data stale, without fail. I'm using NUT version 2.8.0-2, from the Debian Unstable repo. Currently running latest version of Proxmox, which is basically Debian 11. ups.conf:
upsmon.conf:
After a while,
@jimklimov , where should I put the |
Thanks for the details, though the particular issue here is likely not about upsmon pid.
this looks sinister... and there are many reports of CPSes getting reconnected (dmesg may confirm) - at which point AFAIK usually kernel grabs the "newly discovered" device and per udev rules should relinquish access to NUT user in OS. Recent fixes included usbhid-ups ability to reconnect on the fly (hopefully getting permissions for the device back), with further fix in that area made approx. last week. So it may quite be that a custom build of current master would help. As for |
@jimklimov thanks for the reply.
Will this be upstreamed to the Debian unstable repo soon? Would be easier than maintaining a custom build. Still having issues with this. |
AFAIK recipes were proposed, search NUT issues from this summer for "debian" or "ubuntu". What happens next is up to distros... Notably -- not sure which service definitions they would use eventually (NUT's or their old ones)... |
Note: recently had to dive into the code to see what code writes PID files into which locations; this is analyzed in #1712 (comment) UPDATE: ...and summarized the area in https://github.com/networkupstools/nut/wiki/Technicalities:-Work-with-PID-and-state-file-paths |
Hello,
When using systemd, it complains about the PID stored in the .pid file:
And indeed when looking in upsmon.pid, the PID stored there is the one from the grand-child (unprivileged process) of the process started by init. Shouldn't this be the PID of the direct forked process instead?
The text was updated successfully, but these errors were encountered: