Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(#1757704) journald restart #72

Merged

Conversation

jsynacek
Copy link
Contributor

@jsynacek jsynacek commented Feb 4, 2020

No description provided.

@jsynacek jsynacek added the pr/needs-review Formerly needs-review label Feb 4, 2020
@systemd-rhel-bot systemd-rhel-bot added the pr/needs-ci Formerly needs-ci label Feb 4, 2020
@mrc0mmand
Copy link
Member

mrc0mmand commented Feb 5, 2020

Ugh, this is really weird. With this PR (or even with only the first commit from this PR) systemd goes I'm outta here on all services just after finishing bootup and just unwatches all PIDs associated with them, which results in service daemons still running, but systemd claiming they're not.

Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=169 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Startup finished in 3.980s (kernel) + 10.562s (initrd) + 9.657s (userspace) = 24.201s.
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=StartupFinished cookie=170 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: kdump.service: cgroup is empty
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=169 reply_cookie=0 error=
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting kdump.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting multi-user.target
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting systemd-readahead-done.timer
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting systemd-readahead-collect.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting systemd-binfmt.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting sysinit.target
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting systemd-tmpfiles-setup-dev.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting dracut-pre-trigger.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting initrd.target
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting initrd-cleanup.service
...
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting plymouth-read-write.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting boot.mount
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting sys-kernel-config.mount
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting systemd-journal-catalog-update.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting plymouth-quit.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting dracut-pre-pivot.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting dev-hugepages.mount
...
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=171 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=172 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=171 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=173 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=172 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=174 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=175 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=173 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=176 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=174 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=177 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=178 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=175 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=176 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=179 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=180 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Unwatching 981.
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=177 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=181 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Unwatching 1390.
...
eb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=272 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=273 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=274 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=275 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=276 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=277 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=278 reply_cookie=0 error=n/a
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Unwatching 682.
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Releasing resources.
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd-logind[960]: Got message type=signal sender=:1.1 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=276 reply_cookie=0 error
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Releasing all stored fds
...

Example of zombie daemons:

[root@n51 ~]# systemctl status dbus
● dbus.service - D-Bus System Message Bus
   Loaded: loaded (/usr/lib/systemd/system/dbus.service; static; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:dbus-daemon(1)

Feb 06 00:23:33 n51.pufty.ci.centos.org systemd[1]: Started D-Bus System Message Bus.
Feb 06 00:23:33 n51.pufty.ci.centos.org systemd[966]: Executing: /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
Feb 06 00:23:33 n51.pufty.ci.centos.org dbus[966]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service'
Feb 06 00:23:33 n51.pufty.ci.centos.org dbus[966]: [system] Successfully activated service 'org.freedesktop.hostname1'
Feb 06 00:23:33 n51.pufty.ci.centos.org dbus[966]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Feb 06 00:23:33 n51.pufty.ci.centos.org dbus[966]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Feb 06 00:23:35 n51.pufty.ci.centos.org dbus[966]: [system] Activating via systemd: service name='org.freedesktop.resolve1' unit='dbus-org.freedesktop.resolve1.service'
Feb 06 00:23:36 n51.pufty.ci.centos.org dbus[966]: [system] Successfully activated service 'org.freedesktop.resolve1'
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting dbus.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Unwatching 966.

[root@n51 ~]# ps aux | grep [d]bus
dbus       966  0.0  0.0  56100  2500 ?        Ss   18:26   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

[root@n51 ~]# systemctl status chronyd
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:chronyd(8)
           man:chrony.conf(5)

Feb 06 00:23:33 n51.pufty.ci.centos.org systemd[1]: Permission denied while opening PID file or potentially unsafe symlink chain, will now retry with relaxed checks: /var/run/chrony/chronyd.pid
Feb 06 00:23:33 n51.pufty.ci.centos.org systemd[1]: chronyd.service changed start-post -> running
Feb 06 00:23:33 n51.pufty.ci.centos.org systemd[1]: Job chronyd.service/start finished, result=done
Feb 06 00:23:33 n51.pufty.ci.centos.org systemd[1]: Started NTP client/server.
Feb 06 00:23:33 n51.pufty.ci.centos.org systemd[1]: Watching 958 through watch_pids1.
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Collecting chronyd.service
Feb 06 00:23:38 n51.pufty.ci.centos.org systemd[1]: Unwatching 958.
Feb 06 00:23:39 n51.pufty.ci.centos.org chronyd[958]: Selected source 72.30.35.88
Feb 06 00:23:39 n51.pufty.ci.centos.org chronyd[958]: System clock wrong by -21401.358914 seconds, adjustment started
Feb 05 18:26:58 n51.pufty.ci.centos.org chronyd[958]: System clock was stepped by -21401.358914 seconds

[root@n51 ~]# ps aux | grep [c]hronyd
chrony     958  0.0  0.0 115668  1708 ?        S    18:26   0:00 /usr/sbin/chronyd

Also, this all results in systemd claiming that no services are running:

[root@n51 ~]# systemctl list-units
0 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

I have no idea what's going on, but I suspect there's some unfortunate constellation of backported patches already in RHEL 7.8 which interferes with this patchset, as it works as expected in RHEL 7.4 (where this patchset is present as well).

@mrc0mmand
Copy link
Member

Gosh, I'm actually blind, the issue is pretty obvious, see the code review...

src/core/unit.c Outdated
if (state != UNIT_INACTIVE)
return false;
return true;
Copy link
Member

@mrc0mmand mrc0mmand Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MUST remain false, because the output values of this function were inverted by commit 1716821#diff-a89a9b6f80aada989d298b4c2c3a9d64L285 which is NOT in RHEL 7.4, that's why here the patch says true (3aaaa27#diff-a89a9b6f80aada989d298b4c2c3a9d64R308).

Also, see the comment at the beginning of this function - we want to keep units which are not in the UNIT_INACTIVE state:

        /* Checks whether the unit is ready to be unloaded for garbage collection.
         * Returns true when the unit may be collected, and false if there's some
         * reason to keep it loaded.
         *
         * References from other units are *not* checked here. Instead, this is done
         * in unit_gc_sweep(), but using markers to properly collect dependency loops.
         */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a sloppy handling of diff files on my side. Thanks for catching this!

@systemd-rhel-bot systemd-rhel-bot added needs-itr and removed pr/needs-review Formerly needs-review needs-itr labels Feb 5, 2020
keszybz and others added 5 commits February 6, 2020 09:59
We would close all the stored fds in service_release_resources(), which of
course broke the whole concept of storing fds over service restart.

Fixes #4408.

(cherry picked from commit f0bfbfa)
(cherry picked from commit 4c27143)
Resolves: #1757704
…ed during service restart

When preparing for a restart we quickly go through the DEAD/INACTIVE
service state before entering AUTO_RESTART. When doing this, we need to
make sure we don't destroy the FD store. Previously this was done by
checking the failure state of the unit, and keeping the FD store around
when the unit failed, under the assumption that the restart logic will
then get into action.

This is not entirely correct howver, as there might be failure states
that will no result in restarts.

With this commit we slightly alter the logic: a ref counter for the fd
store is added, that is increased right before we handle the restart
logic, and decreased again right-after.

This should ensure that the fdstore lives exactly as long as it needs.

Follow-up for f0bfbfa.

(cherry picked from commit 7eb2a8a)
(cherry picked from commit e2636bd)
Resolves: #1757704
(cherry picked from commit 1c36b4a)
(cherry picked from commit e974b99)
Resolves: #1757704
See systemd/systemd#2236

(cherry picked from commit 3889613)
(cherry picked from commit 4dc893c)
Resolves: #1757704
This test fails before previous commit, and passes with it.

(cherry picked from commit bff653e)
(cherry picked from commit ee8f69a)
Resolves: #1757704
@jsynacek
Copy link
Contributor Author

jsynacek commented Feb 6, 2020

v2.

@systemd-rhel-bot systemd-rhel-bot removed the pr/needs-ci Formerly needs-ci label Feb 6, 2020
@jsynacek
Copy link
Contributor Author

jsynacek commented Feb 6, 2020

The LGTM fail is unrelated:

...
./src/shared/nss-util.h:93:28: error: ‘RES_USE_INET6’ undeclared
...

@mrc0mmand
Copy link
Member

The LGTM fail is unrelated:

...
./src/shared/nss-util.h:93:28: error: ‘RES_USE_INET6’ undeclared
...

Yeah, the underlying system on LGTM's side got most likely updated, and the analysis fails even on the master branch. I think the potential fix for this issue is systemd/systemd@6154d33 which might be a good idea to backport later.

Copy link
Member

@mrc0mmand mrc0mmand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The... LGTM fail is being fixed in #73, so just ignore it in this PR.

@systemd-rhel-bot systemd-rhel-bot merged commit 65fb1fb into redhat-plumbers:master Feb 6, 2020
Copy link
Member

@dtardon dtardon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants