New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shutdown[1]: Failed to wait for process: Protocol error #8155

Closed
Potomac opened this Issue Feb 10, 2018 · 53 comments

Comments

@Potomac

Potomac commented Feb 10, 2018

Submission type

  • Bug report

systemd version the issue has been seen with

237.0-2

Used distribution

archlinux 64 bits

In case of bug report: Expected behaviour you didn't see

no error message should be seen at extinction of the PC

In case of bug report: Unexpected behaviour you saw

I see the message :
shutdown[1]: Failed to wait for process: Protocol error

In case of bug report: Steps to reproduce the problem

  • use archlinux 64 bits
  • install the last archlinux package of systemd ( 237.0-2 )
  • during the shutdown you will see the message :
    shutdown[1]: Failed to wait for process: Protocol error

see the attached file ( screen capture )

@Potomac

This comment has been minimized.

Show comment
Hide comment
@Potomac

Potomac commented Feb 10, 2018

process_error

@BBBob

This comment has been minimized.

Show comment
Hide comment
@BBBob

BBBob Feb 11, 2018

I meet the same error now!!

BBBob commented Feb 11, 2018

I meet the same error now!!

@jigmpatel

This comment has been minimized.

Show comment
Hide comment
@jigmpatel

jigmpatel Feb 11, 2018

I have the same issue. Will adding the "shutdown" hook to mkinitcpio.conf help? I am on the 4.15.2 kernel.

jigmpatel commented Feb 11, 2018

I have the same issue. Will adding the "shutdown" hook to mkinitcpio.conf help? I am on the 4.15.2 kernel.

@rbisewski

This comment has been minimized.

Show comment
Hide comment
@rbisewski

rbisewski Feb 11, 2018

Encountered something identical like this today. Running an up-to-date archlinux 64bit with Linux 4.15.2 and systemd 237.0-2 installed.

The work-around for now appears to edit /etc/mkinitcpio.conf and look for the following line:

HOOKS=(base udev autodetect modconf block filesystems keyboard fsck)

Add the shutdown hook like so:

HOOKS=(base udev autodetect modconf block filesystems keyboard fsck shutdown)

Afterwards, regenerate the initramfs as follows:

mkinitcpio -p linux

Upon reboot and a 2nd shutdown, the problem seems to go away. The developers (either archlinux or systemd) might want to check to ensure this is a regression or intended behavior.

rbisewski commented Feb 11, 2018

Encountered something identical like this today. Running an up-to-date archlinux 64bit with Linux 4.15.2 and systemd 237.0-2 installed.

The work-around for now appears to edit /etc/mkinitcpio.conf and look for the following line:

HOOKS=(base udev autodetect modconf block filesystems keyboard fsck)

Add the shutdown hook like so:

HOOKS=(base udev autodetect modconf block filesystems keyboard fsck shutdown)

Afterwards, regenerate the initramfs as follows:

mkinitcpio -p linux

Upon reboot and a 2nd shutdown, the problem seems to go away. The developers (either archlinux or systemd) might want to check to ensure this is a regression or intended behavior.

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Feb 12, 2018

Member

shutdown[1]: Failed to wait for process: Protocol error

Hmm, there appears to be something wrong with wait_for_terminate_with_timeout(), but I am puzzled by what might cause this. @kyle-walker any idea?

This suggests that the mount() child process dies some abnormal death, i.e. SIGSEGV or such... But I don't see how that could ever happen...

Member

poettering commented Feb 12, 2018

shutdown[1]: Failed to wait for process: Protocol error

Hmm, there appears to be something wrong with wait_for_terminate_with_timeout(), but I am puzzled by what might cause this. @kyle-walker any idea?

This suggests that the mount() child process dies some abnormal death, i.e. SIGSEGV or such... But I don't see how that could ever happen...

@majster95

This comment has been minimized.

Show comment
Hide comment
@majster95

majster95 Feb 12, 2018

On a arch linux forum user loqs wrote a solution. He suggest to mask mkinitcpio-generate-shutdown-ramfs.service, which trigger error.
For me this works fine.

https://bbs.archlinux.org/viewtopic.php?pid=1767560#p1767560

majster95 commented Feb 12, 2018

On a arch linux forum user loqs wrote a solution. He suggest to mask mkinitcpio-generate-shutdown-ramfs.service, which trigger error.
For me this works fine.

https://bbs.archlinux.org/viewtopic.php?pid=1767560#p1767560

@loqs

This comment has been minimized.

Show comment
Hide comment
@loqs

loqs Feb 12, 2018

Perhaps @falconindy or @eli-schwartz can provide insight into the order of events mkinitcpio-generate-shutdown-ramfs.service triggers. I can only link to the service file which also contains mkinitcpio.
https://git.archlinux.org/mkinitcpio.git/tree/systemd/mkinitcpio-generate-shutdown-ramfs.service and https://git.archlinux.org/mkinitcpio.git/tree/install/sd-shutdown

loqs commented Feb 12, 2018

Perhaps @falconindy or @eli-schwartz can provide insight into the order of events mkinitcpio-generate-shutdown-ramfs.service triggers. I can only link to the service file which also contains mkinitcpio.
https://git.archlinux.org/mkinitcpio.git/tree/systemd/mkinitcpio-generate-shutdown-ramfs.service and https://git.archlinux.org/mkinitcpio.git/tree/install/sd-shutdown

@rbisewski

This comment has been minimized.

Show comment
Hide comment
@rbisewski

rbisewski Feb 19, 2018

Thanks to all of the posters above for their ideas and helpful links :)

I've looked into this a bit more, I think a short term solution is to adjust a single line in the mkinitcpio-generate-shutdown-ramfs.service that only exists in arch-linux.

If the any of the arch-linux devs have time, I wouldn't mind if they took a quick glance at my suggestion. In any event, for current arch users I believe I have a better potential fix.

As mentioned by the poster above, currently the script looks like this:

[Unit]
Description=Generate shutdown-ramfs
DefaultDependencies=no
Before=shutdown.target
ConditionFileIsExecutable=!/run/initramfs/shutdown

[Service]
Type=oneshot
# /tmp could be umounted at this point
# use /run as temporary directory
Environment=TMPDIR=/run
ExecStart=/usr/bin/mkinitcpio -A sd-shutdown -k none -c /dev/null -d /run/initramfs

Adjusting the last line from sd-shutdown to shutdown results in a script that looks like the following:

[Unit]
Description=Generate shutdown-ramfs
DefaultDependencies=no
Before=shutdown.target
ConditionFileIsExecutable=!/run/initramfs/shutdown

[Service]
Type=oneshot
# /tmp could be umounted at this point
# use /run as temporary directory
Environment=TMPDIR=/run
ExecStart=/usr/bin/mkinitcpio -A shutdown -k none -c /dev/null -d /run/initramfs

As far as I can tell the error appears to go away. This also avoids both of the work arounds mentioned here and at the arch forums (e.g. masking or adding initramfs hooks) and allows the script to function as per before.

rbisewski commented Feb 19, 2018

Thanks to all of the posters above for their ideas and helpful links :)

I've looked into this a bit more, I think a short term solution is to adjust a single line in the mkinitcpio-generate-shutdown-ramfs.service that only exists in arch-linux.

If the any of the arch-linux devs have time, I wouldn't mind if they took a quick glance at my suggestion. In any event, for current arch users I believe I have a better potential fix.

As mentioned by the poster above, currently the script looks like this:

[Unit]
Description=Generate shutdown-ramfs
DefaultDependencies=no
Before=shutdown.target
ConditionFileIsExecutable=!/run/initramfs/shutdown

[Service]
Type=oneshot
# /tmp could be umounted at this point
# use /run as temporary directory
Environment=TMPDIR=/run
ExecStart=/usr/bin/mkinitcpio -A sd-shutdown -k none -c /dev/null -d /run/initramfs

Adjusting the last line from sd-shutdown to shutdown results in a script that looks like the following:

[Unit]
Description=Generate shutdown-ramfs
DefaultDependencies=no
Before=shutdown.target
ConditionFileIsExecutable=!/run/initramfs/shutdown

[Service]
Type=oneshot
# /tmp could be umounted at this point
# use /run as temporary directory
Environment=TMPDIR=/run
ExecStart=/usr/bin/mkinitcpio -A shutdown -k none -c /dev/null -d /run/initramfs

As far as I can tell the error appears to go away. This also avoids both of the work arounds mentioned here and at the arch forums (e.g. masking or adding initramfs hooks) and allows the script to function as per before.

@falconindy

This comment has been minimized.

Show comment
Hide comment
@falconindy

falconindy Feb 19, 2018

Contributor

That's not a fix, it avoids the entire point of the hook.

Contributor

falconindy commented Feb 19, 2018

That's not a fix, it avoids the entire point of the hook.

@rbisewski

This comment has been minimized.

Show comment
Hide comment
@rbisewski

rbisewski Feb 19, 2018

I was afraid of that; thanks for your quick reply though.

This shutdown I used above is the busybox alternative present as part of mkinitcpio, or something else?

Hmmm... I was hoping for quick solution, but it seems I'll have to look into it a bit more then to get the systemd hook to work again.

rbisewski commented Feb 19, 2018

I was afraid of that; thanks for your quick reply though.

This shutdown I used above is the busybox alternative present as part of mkinitcpio, or something else?

Hmmm... I was hoping for quick solution, but it seems I'll have to look into it a bit more then to get the systemd hook to work again.

@falconindy

This comment has been minimized.

Show comment
Hide comment
@falconindy

falconindy Feb 19, 2018

Contributor

This suggests that the mount() child process dies some abnormal death

@poettering: Presumably this is the code you're referring to:

                if (waitid(P_PID, pid, &status, WEXITED|WNOHANG) == 0) {
                        if (status.si_pid == pid) {
                                /* This is the correct child.*/
                                if (status.si_code == CLD_EXITED)
                                        return (status.si_status == 0) ? 0 : -EPROTO;
                                else
                                        return -EPROTO;
                        }
                }

Doesn't it also potentially imply that the process simply exited non-zero? That's far less interesting... Maybe the case of non-zero exit shouldn't be overloaded as EPROTO? Perhaps EOWNERDEAD.

I'm not sure I understand the crux of this bug report -- is this just about an error on the console, or is shutdown/reboot actually impacted functionally?

Contributor

falconindy commented Feb 19, 2018

This suggests that the mount() child process dies some abnormal death

@poettering: Presumably this is the code you're referring to:

                if (waitid(P_PID, pid, &status, WEXITED|WNOHANG) == 0) {
                        if (status.si_pid == pid) {
                                /* This is the correct child.*/
                                if (status.si_code == CLD_EXITED)
                                        return (status.si_status == 0) ? 0 : -EPROTO;
                                else
                                        return -EPROTO;
                        }
                }

Doesn't it also potentially imply that the process simply exited non-zero? That's far less interesting... Maybe the case of non-zero exit shouldn't be overloaded as EPROTO? Perhaps EOWNERDEAD.

I'm not sure I understand the crux of this bug report -- is this just about an error on the console, or is shutdown/reboot actually impacted functionally?

@patrick-ausderau

This comment has been minimized.

Show comment
Hide comment
@patrick-ausderau

patrick-ausderau Feb 21, 2018

@falconindy My machine does not poweroff (stay on showing that error message). I have to hard shut it down. So, impacted functionally.

patrick-ausderau commented Feb 21, 2018

@falconindy My machine does not poweroff (stay on showing that error message). I have to hard shut it down. So, impacted functionally.

poettering added a commit to poettering/systemd that referenced this issue Feb 21, 2018

umount: beef up logging when umount/remount child processes fail
Let's extend what we log if umount/remount doesn't work correctly as we
expect.

See systemd#8155

poettering added a commit to poettering/systemd that referenced this issue Feb 21, 2018

tree-wide: reopen log when we need to log in FORK_CLOSE_ALL_FDS children
In a number of occasions we use FORK_CLOSE_ALL_FDS when forking off a
child, since we don't want to pass fds to the processes spawned (either
because we later want to execve() some other process there, or because
our child might hang around for longer than expected, in which case it
shouldn't keep our fd pinned). This also closes any logging fds, and
thus means logging is turned off in the child. If we want to do proper
logging, explicitly reopen the logs hence in the child at the right
time.

This is particularly crucial in the umount/remount children we fork off
the shutdown binary, as otherwise the children can't log, which is
why systemd#8155 is harder to debug than necessary: the log messages we
generate about failing mount() system calls aren't actually visible on
screen, as they done in the child processes where the log fds are
closed.
@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Feb 21, 2018

Member

So I prepped #8246 now which should bring back proper logging when umount()/remount() is failing during shutdown, which is pretty likely what is happening here. Would be good if someone with this problem could apply that PR, and reproduce the issue.

Most likely the arch initrd script is somehow causing some mount point to remain busy. No idea why though, but with that PR we should at least know for sure.

Member

poettering commented Feb 21, 2018

So I prepped #8246 now which should bring back proper logging when umount()/remount() is failing during shutdown, which is pretty likely what is happening here. Would be good if someone with this problem could apply that PR, and reproduce the issue.

Most likely the arch initrd script is somehow causing some mount point to remain busy. No idea why though, but with that PR we should at least know for sure.

@falconindy

This comment has been minimized.

Show comment
Hide comment
@falconindy

falconindy Feb 21, 2018

Contributor

As I've mentioned in other bug reports, the "sd-shutdown" hook merely installs systemd-shutdown as /run/initramfs/shutdown, and executes that on pivot. Arch's home brewed shell script which is included with the "shutdown" hook doesn't appear to have this problem.

Contributor

falconindy commented Feb 21, 2018

As I've mentioned in other bug reports, the "sd-shutdown" hook merely installs systemd-shutdown as /run/initramfs/shutdown, and executes that on pivot. Arch's home brewed shell script which is included with the "shutdown" hook doesn't appear to have this problem.

@mus65

This comment has been minimized.

Show comment
Hide comment
@mus65

mus65 Feb 23, 2018

Contributor

I bisected this to commit 4c253ed .

With current git (which includes #8246) I get the following on shutdown:
20180223_192659

Contributor

mus65 commented Feb 23, 2018

I bisected this to commit 4c253ed .

With current git (which includes #8246) I get the following on shutdown:
20180223_192659

@philmmanjaro

This comment has been minimized.

Show comment
Hide comment
@philmmanjaro

philmmanjaro Feb 25, 2018

Hmm, introduce new safe_fork() helper and port everything over this makes me wonder why so many times I get fork: Resource temporarily unavailable on a Ryzen CPU when compiling with all cores a new kernel. May be related or not ...

philmmanjaro commented Feb 25, 2018

Hmm, introduce new safe_fork() helper and port everything over this makes me wonder why so many times I get fork: Resource temporarily unavailable on a Ryzen CPU when compiling with all cores a new kernel. May be related or not ...

@vcaputo

This comment has been minimized.

Show comment
Hide comment
@vcaputo

vcaputo Feb 25, 2018

Member

@philmmanjaro It's possible your Ryzen just has enough cores that your parallel make is exhausting the number of processes alotted to your user via rlimit. Run ulimit -a to see your limits, ulimit -u is the user processes one. Generally it can be configured in /etc/security/limits.conf or /etc/security/limits.d in PAM-enabled systems.

Note you generally don't want to set it to unlimited otherwise you become vulnerable to trivial fork bombs.

Member

vcaputo commented Feb 25, 2018

@philmmanjaro It's possible your Ryzen just has enough cores that your parallel make is exhausting the number of processes alotted to your user via rlimit. Run ulimit -a to see your limits, ulimit -u is the user processes one. Generally it can be configured in /etc/security/limits.conf or /etc/security/limits.d in PAM-enabled systems.

Note you generally don't want to set it to unlimited otherwise you become vulnerable to trivial fork bombs.

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Feb 26, 2018

Member

With current git (which includes #8246) I get the following on shutdown:

Hmm, so it appears something keeps those mounts busy...

Member

poettering commented Feb 26, 2018

With current git (which includes #8246) I get the following on shutdown:

Hmm, so it appears something keeps those mounts busy...

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Feb 26, 2018

Member

As I've mentioned in other bug reports, the "sd-shutdown" hook merely installs systemd-shutdown as /run/initramfs/shutdown, and executes that on pivot. Arch's home brewed shell script which is included with the "shutdown" hook doesn't appear to have this problem.

Hmm, so you are saying on Arch PID 1 transitions twice during shutdown, once from the regular pid1 to systemd-shutdown and then a second time to systemd-shutdown which however is a copy of the first, but copied to /run/initramfs/shutdown? And it's the second one that generates these warnings?

Member

poettering commented Feb 26, 2018

As I've mentioned in other bug reports, the "sd-shutdown" hook merely installs systemd-shutdown as /run/initramfs/shutdown, and executes that on pivot. Arch's home brewed shell script which is included with the "shutdown" hook doesn't appear to have this problem.

Hmm, so you are saying on Arch PID 1 transitions twice during shutdown, once from the regular pid1 to systemd-shutdown and then a second time to systemd-shutdown which however is a copy of the first, but copied to /run/initramfs/shutdown? And it's the second one that generates these warnings?

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Feb 26, 2018

Member

Hmm, introduce new safe_fork() helper and port everything over this makes me wonder why so many times I get fork: Resource temporarily unavailable on a Ryzen CPU when compiling with all cores a new kernel. May be related or not ...

Yes, @vcaputo is right, what you are seeing looks very much unrelated, just some kind of fork limit reached, either RLIMIT_NPROC or maybe the TasksMax= value of your session or so.

Member

poettering commented Feb 26, 2018

Hmm, introduce new safe_fork() helper and port everything over this makes me wonder why so many times I get fork: Resource temporarily unavailable on a Ryzen CPU when compiling with all cores a new kernel. May be related or not ...

Yes, @vcaputo is right, what you are seeing looks very much unrelated, just some kind of fork limit reached, either RLIMIT_NPROC or maybe the TasksMax= value of your session or so.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Feb 26, 2018

I have the same issue during shutdown.
Inconvenience -> awfull errors on my clear setup 😊

ghost commented Feb 26, 2018

I have the same issue during shutdown.
Inconvenience -> awfull errors on my clear setup 😊

@medhefgo

This comment has been minimized.

Show comment
Hide comment
@medhefgo

medhefgo Feb 27, 2018

Contributor

Hmm, so you are saying on Arch PID 1 transitions twice during shutdown, once from the regular pid1 to systemd-shutdown and then a second time to systemd-shutdown which however is a copy of the first, but copied to /run/initramfs/shutdown? And it's the second one that generates these warnings?

You know, it amuses me that every time arch initrd comes up, you are confused/fascinated about how it uses systemd-shutdown instead of some home-brew script.

Meanwhile, I am wondering why any initrd (including dracut as far as I can see) that uses systemd to handle early boot shouldn't also use systemd-shutdown for late shutdown. Imho, you should even gently nudge them to do just that...

Contributor

medhefgo commented Feb 27, 2018

Hmm, so you are saying on Arch PID 1 transitions twice during shutdown, once from the regular pid1 to systemd-shutdown and then a second time to systemd-shutdown which however is a copy of the first, but copied to /run/initramfs/shutdown? And it's the second one that generates these warnings?

You know, it amuses me that every time arch initrd comes up, you are confused/fascinated about how it uses systemd-shutdown instead of some home-brew script.

Meanwhile, I am wondering why any initrd (including dracut as far as I can see) that uses systemd to handle early boot shouldn't also use systemd-shutdown for late shutdown. Imho, you should even gently nudge them to do just that...

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Feb 28, 2018

Member

Meanwhile, I am wondering why any initrd (including dracut as far as I can see) that uses systemd to handle early boot shouldn't also use systemd-shutdown for late shutdown. Imho, you should even gently nudge them to do just that...

You have a point, @haraldh say something!

Member

poettering commented Feb 28, 2018

Meanwhile, I am wondering why any initrd (including dracut as far as I can see) that uses systemd to handle early boot shouldn't also use systemd-shutdown for late shutdown. Imho, you should even gently nudge them to do just that...

You have a point, @haraldh say something!

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Feb 28, 2018

Member

I bisected this to commit 4c253ed .

With current git (which includes #8246) I get the following on shutdown:

So, hmm, the interesting quetsion is now what keeps /oldroot busy? Any chance anyone could beef up the initrd script with a "fuser -v -m /oldroot"?

Member

poettering commented Feb 28, 2018

I bisected this to commit 4c253ed .

With current git (which includes #8246) I get the following on shutdown:

So, hmm, the interesting quetsion is now what keeps /oldroot busy? Any chance anyone could beef up the initrd script with a "fuser -v -m /oldroot"?

@frostschutz

This comment has been minimized.

Show comment
Hide comment
@frostschutz

frostschutz Feb 28, 2018

I added it to initcpio's shutdown hook ( /lib/initcpio/shutdown ), ran before the umount -R /oldroot - which is successful, so /oldroot is not busy other than various submounts. So this output is probably not useful?

https://gist.github.com/frostschutz/0d613c1dd2bdd551c557cbde971a6887

Is there a better place to add it? (particularly when not using the shutdown hook as that is when the message occurs)

frostschutz commented Feb 28, 2018

I added it to initcpio's shutdown hook ( /lib/initcpio/shutdown ), ran before the umount -R /oldroot - which is successful, so /oldroot is not busy other than various submounts. So this output is probably not useful?

https://gist.github.com/frostschutz/0d613c1dd2bdd551c557cbde971a6887

Is there a better place to add it? (particularly when not using the shutdown hook as that is when the message occurs)

@frostschutz

This comment has been minimized.

Show comment
Hide comment
@frostschutz

frostschutz Mar 1, 2018

I did both shutdown and sd-shutdown (both results in the gist above, about the same though.)

I also tried to strace it just for kicks, but systemd-shutdown kills strace first thing. ;)

frostschutz commented Mar 1, 2018

I did both shutdown and sd-shutdown (both results in the gist above, about the same though.)

I also tried to strace it just for kicks, but systemd-shutdown kills strace first thing. ;)

@medhefgo

This comment has been minimized.

Show comment
Hide comment
@medhefgo

medhefgo Mar 1, 2018

Contributor

Dis you stuff a fork+exec to fuser into shutdown.c? Cause anything else is most likely useless.

Contributor

medhefgo commented Mar 1, 2018

Dis you stuff a fork+exec to fuser into shutdown.c? Cause anything else is most likely useless.

@haraldh

This comment has been minimized.

Show comment
Hide comment
@haraldh

haraldh Mar 4, 2018

Member

Meanwhile, I am wondering why any initrd (including dracut as far as I can see) that uses systemd to handle early boot shouldn't also use systemd-shutdown for late shutdown. Imho, you should even gently nudge them to do just that...

Well, dracut disassembles more than systemd-shutdown at shutdown

Member

haraldh commented Mar 4, 2018

Meanwhile, I am wondering why any initrd (including dracut as far as I can see) that uses systemd to handle early boot shouldn't also use systemd-shutdown for late shutdown. Imho, you should even gently nudge them to do just that...

Well, dracut disassembles more than systemd-shutdown at shutdown

@medhefgo

This comment has been minimized.

Show comment
Hide comment
@medhefgo

medhefgo Mar 4, 2018

Contributor

Well, dracut disassembles more than systemd-shutdown at shutdown

You mean this?
https://github.com/dracutdevs/dracut/blob/master/modules.d/99shutdown/shutdown.sh

I don't quite see what dracut is doing that sd-shutdown isn't. And if there were, it most likely should be added to sd-shutdown, no?

Contributor

medhefgo commented Mar 4, 2018

Well, dracut disassembles more than systemd-shutdown at shutdown

You mean this?
https://github.com/dracutdevs/dracut/blob/master/modules.d/99shutdown/shutdown.sh

I don't quite see what dracut is doing that sd-shutdown isn't. And if there were, it most likely should be added to sd-shutdown, no?

@haraldh

This comment has been minimized.

Show comment
Hide comment
@haraldh

haraldh Mar 5, 2018

Member

dracut modules can install a shutdown hook to be executed here:
https://github.com/dracutdevs/dracut/blob/master/modules.d/99shutdown/shutdown.sh#L92

Member

haraldh commented Mar 5, 2018

dracut modules can install a shutdown hook to be executed here:
https://github.com/dracutdevs/dracut/blob/master/modules.d/99shutdown/shutdown.sh#L92

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Mar 5, 2018

Member

@haraldh i think the key point here is that dracut's shutdown logic should maybe invoke systemd-shutdown as last step again...

Member

poettering commented Mar 5, 2018

@haraldh i think the key point here is that dracut's shutdown logic should maybe invoke systemd-shutdown as last step again...

@hussamT

This comment has been minimized.

Show comment
Hide comment
@hussamT

hussamT Mar 6, 2018

Is this something that affects functionality in any way or is it harmless? I'd like to upgrade from 236.81 to 238 today and this is a work machine so I would like to not face regressions.
Thank you.

hussamT commented Mar 6, 2018

Is this something that affects functionality in any way or is it harmless? I'd like to upgrade from 236.81 to 238 today and this is a work machine so I would like to not face regressions.
Thank you.

@medhefgo

This comment has been minimized.

Show comment
Hide comment
@medhefgo

medhefgo Mar 6, 2018

Contributor

I finally found the culprit: https://github.com/systemd/systemd/blame/master/src/core/umount.c#L518

The continue was added with the umount in fork changes. It's trying to handle the error case for remounting the mountpoint read-only, but this is just a precautionary measure, so any errors were originally ignored. Current code will simply skip trying to unmount entirely if it cannot remount read-only (which, apparently, these api file systems don't support?).

Removing the continue solves the immediate issue, but adds other ugliness to output. Since right now every umount failure is logged, we get a few umount errors, even though they will be cleaned up later by sd-shutdown in initramfs.

@poettering I'm not quite sure how to approach this, but I would say we should only debug log umount issues. But if we are running in initrd or if there is no initrd that we will later switch to, we ought to error log instead.
Also, slightly related, should we add a check to skip remounting api filesystems like we do for network filesystems just for future precautions?

@haraldh well, to be quite frank, calling some shutdown hooks isn't exactly rocket science to add to sd-shutdown. And the world could be removed from yet another evil shell script :D

@hussamT The root filesystem is currently left mounted, but in read-only mode. So data loss should not be occurring.

Contributor

medhefgo commented Mar 6, 2018

I finally found the culprit: https://github.com/systemd/systemd/blame/master/src/core/umount.c#L518

The continue was added with the umount in fork changes. It's trying to handle the error case for remounting the mountpoint read-only, but this is just a precautionary measure, so any errors were originally ignored. Current code will simply skip trying to unmount entirely if it cannot remount read-only (which, apparently, these api file systems don't support?).

Removing the continue solves the immediate issue, but adds other ugliness to output. Since right now every umount failure is logged, we get a few umount errors, even though they will be cleaned up later by sd-shutdown in initramfs.

@poettering I'm not quite sure how to approach this, but I would say we should only debug log umount issues. But if we are running in initrd or if there is no initrd that we will later switch to, we ought to error log instead.
Also, slightly related, should we add a check to skip remounting api filesystems like we do for network filesystems just for future precautions?

@haraldh well, to be quite frank, calling some shutdown hooks isn't exactly rocket science to add to sd-shutdown. And the world could be removed from yet another evil shell script :D

@hussamT The root filesystem is currently left mounted, but in read-only mode. So data loss should not be occurring.

@hussamT

This comment has been minimized.

Show comment
Hide comment
@hussamT

hussamT Mar 6, 2018

hussamT commented Mar 6, 2018

@medhefgo

This comment has been minimized.

Show comment
Hide comment
@medhefgo

medhefgo Mar 7, 2018

Contributor

@haraldh Actually, while reading logs carefully I just now noticed that we actually already have shutdown hooks:

execute_directories(dirs, DEFAULT_TIMEOUT_USEC, NULL, NULL, arguments);

Contributor

medhefgo commented Mar 7, 2018

@haraldh Actually, while reading logs carefully I just now noticed that we actually already have shutdown hooks:

execute_directories(dirs, DEFAULT_TIMEOUT_USEC, NULL, NULL, arguments);

@tonylambiris

This comment has been minimized.

Show comment
Hide comment
@tonylambiris

tonylambiris Mar 7, 2018

Same issues, Arch on all various hardware I run -- adding shutdown to /etc/mkinitcpio.conf appeared to remedy the timeouts.

tonylambiris commented Mar 7, 2018

Same issues, Arch on all various hardware I run -- adding shutdown to /etc/mkinitcpio.conf appeared to remedy the timeouts.

@thyeun

This comment has been minimized.

Show comment
Hide comment
@thyeun

thyeun Mar 8, 2018

question about adding shutdown to /etc/mkinitcpio.conf, will it create another issue or not?

thyeun commented Mar 8, 2018

question about adding shutdown to /etc/mkinitcpio.conf, will it create another issue or not?

philmmanjaro added a commit to manjaro/packages-core that referenced this issue Mar 14, 2018

Modifications
- linux318: v3.18.99
- linux44: v4.4.121
- linux49: v4.9.87
- linux414: v4.14.26
- linux415: v4.15.9
- linux416: v4.16-rc5
- systemd: work on systemd/systemd#8155

philmmanjaro added a commit to manjaro/packages-multilib that referenced this issue Mar 14, 2018

Modifications
- nvidia: bump to 390.42
- systemd: work on systemd/systemd#8155
@kyawthusoe45

This comment has been minimized.

Show comment
Hide comment
@kyawthusoe45

kyawthusoe45 Jun 4, 2018

This issue is still present in systemd 238.133. How is this closed?

kyawthusoe45 commented Jun 4, 2018

This issue is still present in systemd 238.133. How is this closed?

@medhefgo

This comment has been minimized.

Show comment
Hide comment
@medhefgo

medhefgo Jun 4, 2018

Contributor

Obviously, because it was merged after 238.

Contributor

medhefgo commented Jun 4, 2018

Obviously, because it was merged after 238.

@thyeun

This comment has been minimized.

Show comment
Hide comment
@thyeun

thyeun Jun 4, 2018

=.= because they dont have a proper solution, than the close solution is the best solution, it happen again on 238.133 latest version that i just upgrade today.

thyeun commented Jun 4, 2018

=.= because they dont have a proper solution, than the close solution is the best solution, it happen again on 238.133 latest version that i just upgrade today.

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Jun 4, 2018

Member

@kyawthusoe3142 ask your distro to backport #8429

Member

poettering commented Jun 4, 2018

@kyawthusoe3142 ask your distro to backport #8429

@MaxXor

This comment has been minimized.

Show comment
Hide comment
@MaxXor

MaxXor Jun 4, 2018

@poettering Arch Linux systemd 238.133 includes this PR and the messages still show up on reboot/halt (it was built using f58e62c).

MaxXor commented Jun 4, 2018

@poettering Arch Linux systemd 238.133 includes this PR and the messages still show up on reboot/halt (it was built using f58e62c).

@thyeun

This comment has been minimized.

Show comment
Hide comment
@thyeun

thyeun Jun 4, 2018

@MaxXor, i think what @poettering asking is every time a new upgrade, each distro have to re-do the backport (fresh re-do) again. Correct me if i'm wrong.

So, it is this is the way of doing things? =.=

thyeun commented Jun 4, 2018

@MaxXor, i think what @poettering asking is every time a new upgrade, each distro have to re-do the backport (fresh re-do) again. Correct me if i'm wrong.

So, it is this is the way of doing things? =.=

@hussamT

This comment has been minimized.

Show comment
Hide comment
@hussamT

hussamT Jun 4, 2018

That depends on whether https://github.com/systemd/systemd-stable/commits/v238-stable contains #8429 or not. (Edit: original message was badly written in a hurry)

hussamT commented Jun 4, 2018

That depends on whether https://github.com/systemd/systemd-stable/commits/v238-stable contains #8429 or not. (Edit: original message was badly written in a hurry)

@medhefgo

This comment has been minimized.

Show comment
Hide comment
@medhefgo

medhefgo Jun 4, 2018

Contributor

@MaxXor Please get your facts right. I only see the memleak patch of my pr was backported.

Contributor

medhefgo commented Jun 4, 2018

@MaxXor Please get your facts right. I only see the memleak patch of my pr was backported.

@MaxXor

This comment has been minimized.

Show comment
Hide comment
@MaxXor

MaxXor Jun 4, 2018

@medhefgo Sorry, I didn't notice it was on systemd-stable repo...

MaxXor commented Jun 4, 2018

@medhefgo Sorry, I didn't notice it was on systemd-stable repo...

@philmmanjaro

This comment has been minimized.

Show comment
Hide comment
@philmmanjaro

philmmanjaro Jun 5, 2018

@kyawthusoe3142 Please check the PKGBUILD from Manjaro. There all needed backport patches are added on top of 238.133-1:

_backports=(
  # core: sd-shutdown improvements (#8429) (#8155)
  '0494cae03d762eaf2fb7217ee7d70f615dcb5183'
  '1d62d22d9432d5c4a637002c9a29b20d52f25d9a'
  '3bc341bee9fc7dfb41a131246b6fb0afd6ff4407'
  '8645ffd12b3cc7b0292acd9e1d691c4fab4cf409'
  'e783b4902f387640bba12496936d01e967545c3c'
  '456b2199f6ef0378da007e71347657bcf83ae465'
  # Use libmount in systemd-shutdown, add tests (#8452) 
  '95b862b0540ac24999fdfbd670e8744bb626729a'
  '6fa392bf911ef17caf4c13e839236c8edd11bfaf'
  'a6dcd22976a10d7733de91aca240427c5def1bc9'
  '1fd8edb53aa5894e9b8cbec87376ecce660d3087'
  '71ae04c40081d11cc412d731d59c4a25e6bc5f07'
  # shutdown: Don't limit unmount attempts prematurely (#8469)
  'ac9cea5ba30acbf17fd431a4a4092c4dbee23593'
  # nspawn: wait for network namespace creation before interface setup (#8633)
  '7511655807e90aa33ea7b71991401a79ec36bb41'
)

philmmanjaro commented Jun 5, 2018

@kyawthusoe3142 Please check the PKGBUILD from Manjaro. There all needed backport patches are added on top of 238.133-1:

_backports=(
  # core: sd-shutdown improvements (#8429) (#8155)
  '0494cae03d762eaf2fb7217ee7d70f615dcb5183'
  '1d62d22d9432d5c4a637002c9a29b20d52f25d9a'
  '3bc341bee9fc7dfb41a131246b6fb0afd6ff4407'
  '8645ffd12b3cc7b0292acd9e1d691c4fab4cf409'
  'e783b4902f387640bba12496936d01e967545c3c'
  '456b2199f6ef0378da007e71347657bcf83ae465'
  # Use libmount in systemd-shutdown, add tests (#8452) 
  '95b862b0540ac24999fdfbd670e8744bb626729a'
  '6fa392bf911ef17caf4c13e839236c8edd11bfaf'
  'a6dcd22976a10d7733de91aca240427c5def1bc9'
  '1fd8edb53aa5894e9b8cbec87376ecce660d3087'
  '71ae04c40081d11cc412d731d59c4a25e6bc5f07'
  # shutdown: Don't limit unmount attempts prematurely (#8469)
  'ac9cea5ba30acbf17fd431a4a4092c4dbee23593'
  # nspawn: wait for network namespace creation before interface setup (#8633)
  '7511655807e90aa33ea7b71991401a79ec36bb41'
)
@thefzsalam

This comment has been minimized.

Show comment
Hide comment
@thefzsalam

thefzsalam Jun 6, 2018

It happens to me too on a fresh install of Arch Linux

thefzsalam commented Jun 6, 2018

It happens to me too on a fresh install of Arch Linux

@thefzsalam

This comment has been minimized.

Show comment
Hide comment
@thefzsalam

thefzsalam Jun 6, 2018

Occasionally, it gets stuck at
Reboot: power down, when I run shutdown -h now

thefzsalam commented Jun 6, 2018

Occasionally, it gets stuck at
Reboot: power down, when I run shutdown -h now

@GopherJ

This comment has been minimized.

Show comment
Hide comment
@GopherJ

GopherJ Jul 30, 2018

Anyone knows how to update to the version with this fix? This bug nearly caused the death of my computer...when I leave the company I didn’t see this error and it prevents my pc from shutting down.
The computer’s temperature rises finally to 100 degrees.....

GopherJ commented Jul 30, 2018

Anyone knows how to update to the version with this fix? This bug nearly caused the death of my computer...when I leave the company I didn’t see this error and it prevents my pc from shutting down.
The computer’s temperature rises finally to 100 degrees.....

@philmmanjaro

This comment has been minimized.

Show comment
Hide comment
@philmmanjaro

philmmanjaro Jul 31, 2018

@GopherJ: at Manjaro we use _commit='de7436b02badc82200dc127ff190b8155769b8e7' for pkgver=239.0 and the following backports:

_backports=(
  # resolve
  86b112a315464604f4b40222d8bbd912432d640c
  a5042ec4d7840f79d49688f07bf9bae7203ac50e
  fa6a69d7837f1d5fcd0ba279b51a41a26badaf03
  6da95857c19202120af76871c91a47a0f23aed8d
  b02a7e1aeadda724976290528fb864f99f1e396b
  5a01b3f35d7b6182c78b6973db8d99bdabd4f9c3
  a661dc36f68b5ebb1247a503533f8067ff8c0432
  f43580f17d9977ea330deacc8931982e41a49abf
  cc7d50a5714bc810af51b0c55be12b4f55acc089
  052a85d18859faeb38b01c9bbec560afe226e2a4
)

PKGBUILD can be found here.

philmmanjaro commented Jul 31, 2018

@GopherJ: at Manjaro we use _commit='de7436b02badc82200dc127ff190b8155769b8e7' for pkgver=239.0 and the following backports:

_backports=(
  # resolve
  86b112a315464604f4b40222d8bbd912432d640c
  a5042ec4d7840f79d49688f07bf9bae7203ac50e
  fa6a69d7837f1d5fcd0ba279b51a41a26badaf03
  6da95857c19202120af76871c91a47a0f23aed8d
  b02a7e1aeadda724976290528fb864f99f1e396b
  5a01b3f35d7b6182c78b6973db8d99bdabd4f9c3
  a661dc36f68b5ebb1247a503533f8067ff8c0432
  f43580f17d9977ea330deacc8931982e41a49abf
  cc7d50a5714bc810af51b0c55be12b4f55acc089
  052a85d18859faeb38b01c9bbec560afe226e2a4
)

PKGBUILD can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment