New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(#1757704) journald restart #72
(#1757704) journald restart #72
Conversation
Ugh, this is really weird. With this PR (or even with only the first commit from this PR) systemd goes I'm outta here on all services just after finishing bootup and just unwatches all PIDs associated with them, which results in service daemons still running, but systemd claiming they're not.
Example of zombie daemons:
Also, this all results in systemd claiming that no services are running:
I have no idea what's going on, but I suspect there's some unfortunate constellation of backported patches already in RHEL 7.8 which interferes with this patchset, as it works as expected in RHEL 7.4 (where this patchset is present as well). |
Gosh, I'm actually blind, the issue is pretty obvious, see the code review... |
src/core/unit.c
Outdated
if (state != UNIT_INACTIVE) | ||
return false; | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This MUST remain false
, because the output values of this function were inverted by commit 1716821#diff-a89a9b6f80aada989d298b4c2c3a9d64L285 which is NOT in RHEL 7.4, that's why here the patch says true
(3aaaa27#diff-a89a9b6f80aada989d298b4c2c3a9d64R308).
Also, see the comment at the beginning of this function - we want to keep units which are not in the UNIT_INACTIVE
state:
/* Checks whether the unit is ready to be unloaded for garbage collection.
* Returns true when the unit may be collected, and false if there's some
* reason to keep it loaded.
*
* References from other units are *not* checked here. Instead, this is done
* in unit_gc_sweep(), but using markers to properly collect dependency loops.
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a sloppy handling of diff files on my side. Thanks for catching this!
…ed during service restart When preparing for a restart we quickly go through the DEAD/INACTIVE service state before entering AUTO_RESTART. When doing this, we need to make sure we don't destroy the FD store. Previously this was done by checking the failure state of the unit, and keeping the FD store around when the unit failed, under the assumption that the restart logic will then get into action. This is not entirely correct howver, as there might be failure states that will no result in restarts. With this commit we slightly alter the logic: a ref counter for the fd store is added, that is increased right before we handle the restart logic, and decreased again right-after. This should ensure that the fdstore lives exactly as long as it needs. Follow-up for f0bfbfa. (cherry picked from commit 7eb2a8a) (cherry picked from commit e2636bd) Resolves: #1757704
See systemd/systemd#2236 (cherry picked from commit 3889613) (cherry picked from commit 4dc893c) Resolves: #1757704
f39284d
to
ad0bd9f
Compare
v2. |
The LGTM fail is unrelated:
|
Yeah, the underlying system on LGTM's side got most likely updated, and the analysis fails even on the master branch. I think the potential fix for this issue is systemd/systemd@6154d33 which might be a good idea to backport later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! The... LGTM fail is being fixed in #73, so just ignore it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.