Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd: mount units fail with "Mount process finished, but there is no mount." #10872

Closed
bboozzoo opened this issue Nov 21, 2018 · 75 comments · Fixed by #14234
Closed

systemd: mount units fail with "Mount process finished, but there is no mount." #10872

bboozzoo opened this issue Nov 21, 2018 · 75 comments · Fixed by #14234
Labels
bug 🐛 Programming errors, that need preferential fixing mount pid1
Milestone

Comments

@bboozzoo
Copy link
Contributor

bboozzoo commented Nov 21, 2018

systemd version the issue has been seen with

239, 238, 237

Used distribution

Arch
Ubuntu 18.10, 18.04
Fedora 28 & 29

Expected behaviour you didn't see

mounts work

Unexpected behaviour you saw

some mount units fail with

kernel: EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: (null)
systemd[1]: tmp-mounttest-mount-15.mount: Mount process finished, but there is no mount.                                                                                                                                                                                                                             
systemd[1]: tmp-mounttest-mount-15.mount: Failed with result 'protocol'.                                                                                                                                                                                                                                             
systemd[1]: Failed to mount mount unit for test 15.

Steps to reproduce the problem
Grab the reproducer script: https://gist.github.com/bboozzoo/d4b142229b1915ef7cc0cf8593599ad9/828d716e484a39da11987b2dc38da86434d1f89f

We have been tracking the problem in snapd for a while in https://forum.snapcraft.io/t/unexplained-mount-failure-protocol-error-what-we-know-so-far/5682 and it started appearing around April/May 2018. It randomly reproduces in the CI runs on distros with recent(-ish) systemd while installing snaps. What snapd does is in short: generate a mount unit for the snap, drop it under /etc/systemd/system, call systemctl daemon-reload, and later systemctl start <...>.mount. The last step randomly fails. The journal message is what is provided above.

The reproducer script was used to explore some possible ideas on how it fails. So far, only the variant when the thing happens is daemon-reload interleaved with start/stop of mount units. On my machine it fails reliably in 1-2 loop iterations. Other variants that were explored and failed to reproduce: loading mount units before and doing start/stop, using mount directly, calling systemd-mount (this one failed in a precular, but unrelated way).

I also tried Fedora 28/29 cloud images and Ubuntu 18.10/18.04 cloud image with similar results.

Edit: added Ubuntu 18.04

@poettering
Copy link
Member

If you replace the mount binary with a shell script that invokes the original mount binary but after it checks /proc/self/mountinfo to see if the mount is actually established, what do you see?

@poettering poettering added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Nov 26, 2018
@bboozzoo
Copy link
Contributor Author

Sorry late feedback. This is what I get:

Nov 29 13:50:36 localhost systemd[1]: Reloading.
Nov 29 13:50:36 localhost systemd[1]: Mounting mount unit for test 5...
Nov 29 13:50:36 localhost kernel: EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
Nov 29 13:50:36 localhost mount[4082]: 299 90 7:0 / /tmp/mounttest/mount/5 rw,nodev,relatime shared:160 - ext4 /dev/loop0 rw,seclabel
Nov 29 13:50:36 localhost systemd[1]: Mounted mount unit for test 5.
Nov 29 13:50:36 localhost systemd[1]: Reloading.
Nov 29 13:50:36 localhost systemd[1]: Mounting mount unit for test 10...
Nov 29 13:50:36 localhost systemd[1]: Reloading.
Nov 29 13:50:36 localhost kernel: EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: (null)
Nov 29 13:50:37 localhost mount[4114]: 308 90 7:1 / /tmp/mounttest/mount/10 rw,nodev,relatime shared:165 - ext4 /dev/loop1 rw,seclabel
Nov 29 13:50:37 localhost systemd[1]: tmp-mounttest-mount-10.mount: Mount process finished, but there is no mount.
Nov 29 13:50:37 localhost systemd[1]: tmp-mounttest-mount-10.mount: Failed with result 'protocol'.
Nov 29 13:50:37 localhost systemd[1]: Failed to mount mount unit for test 10.
Nov 29 13:50:37 localhost systemd[1]: Reloading.
Nov 29 13:50:37 localhost systemd[1]: Reloading.
Nov 29 13:50:37 localhost systemd[1]: Mounting mount unit for test 16...
Nov 29 13:50:37 localhost systemd[1]: Mounting mount unit for test 4...
Nov 29 13:50:37 localhost systemd[1]: Mounting mount unit for test 15...
Nov 29 13:50:37 localhost kernel: EXT4-fs (loop2): mounted filesystem with ordered data mode. Opts: (null)
Nov 29 13:50:37 localhost kernel: EXT4-fs (loop3): mounted filesystem with ordered data mode. Opts: (null)
Nov 29 13:50:37 localhost kernel: EXT4-fs (loop4): mounted filesystem with ordered data mode. Opts: (null)
Nov 29 13:50:37 localhost mount[4185]: 317 90 7:2 / /tmp/mounttest/mount/16 rw,nodev,relatime shared:170 - ext4 /dev/loop2 rw,seclabel
Nov 29 13:50:37 localhost systemd[1]: Mounted mount unit for test 16.                                                                                                                                                                                                                                                                                      
Nov 29 13:50:37 localhost mount[4186]: 318 90 7:3 / /tmp/mounttest/mount/4 rw,nodev,relatime shared:175 - ext4 /dev/loop3 rw,seclabel
Nov 29 13:50:37 localhost systemd[1]: Mounted mount unit for test 4.
Nov 29 13:50:37 localhost mount[4189]: 319 90 7:4 / /tmp/mounttest/mount/15 rw,nodev,relatime shared:180 - ext4 /dev/loop4 rw,seclabel
Nov 29 13:50:37 localhost systemd[1]: Mounted mount unit for test 15.
Nov 29 13:50:37 localhost systemd[1]: Reloading.
Nov 29 13:50:37 localhost systemd[1]: Reloading.

For the record, this is the wrapper I used:

#!/bin/bash
marg=
for arg in "$@"; do
    if [[ "$arg" = /tmp/mounttest/mount* ]]; then
       marg=$arg
       break
    fi
done
/usr/bin/mount.orig "$@"
status=$?
if [[ -n "$marg" ]]; then
   grep "$marg" /proc/self/mountinfo
fi
exit $status

@poettering
Copy link
Member

I am pretty sure #10980 will fix this one too. Any chance you can give it a whirl?

@bboozzoo
Copy link
Contributor Author

bboozzoo commented Dec 3, 2018

Sure, I'll try with the PR and report back.

@bboozzoo
Copy link
Contributor Author

bboozzoo commented Dec 3, 2018

Running with revision ca7d7db from #10980:

Dec 03 15:38:55 archlinux kernel: EXT4-fs (loop6): mounted filesystem with ordered data mode. Opts: (null)                     
Dec 03 15:38:55 archlinux mount[3664]: 421 48 7:6 / /tmp/mounttest/mount/12 rw,nodev,relatime shared:242 - ext4 /dev/loop6 rw
Dec 03 15:38:55 archlinux systemd[1]: -.slice: Failed to set 'cpu.cfs_period_us' attribute on '/' to '100000': Invalid argument
Dec 03 15:38:55 archlinux systemd[1]: -.slice: Failed to set 'cpu.cfs_quota_us' attribute on '/' to '-1': Invalid argument   
Dec 03 15:38:55 archlinux systemd[1]: -.slice: Failed to set 'memory.limit_in_bytes' attribute on '/' to '-1': Invalid argument
Dec 03 15:38:55 archlinux systemd[1]: tmp-mounttest-mount-12.mount: Mount process finished, but there is no mount.           
Dec 03 15:38:55 archlinux systemd[1]: tmp-mounttest-mount-12.mount: Failed with result 'protocol'.                           
Dec 03 15:38:55 archlinux systemd[1]: Failed to mount mount unit for test 12.                                                
Dec 03 15:38:55 archlinux systemd[1]: Reloading.                                                                               

FWIW I noticed that the script was hitting another sanity check, which is run in the same process, after systemctl start, but before systemctl stop:

verify() {
    local i
    i="$1"

    if ! grep -q "$PREFIX/mount/$i " /proc/self/mountinfo; then
        echo "disk-$i missing"
        return 1
    fi
   ...
}

The log says:

Dec 03 15:38:58 archlinux mount[3989]: 384 48 7:1 / /tmp/mounttest/mount/7 rw,nodev,relatime shared:232 - ext4 /dev/loop1 rw
Dec 03 15:38:58 archlinux systemd[1]: Mounted mount unit for test 7.
Dec 03 15:38:58 archlinux kernel: EXT4-fs (loop2): mounted filesystem with ordered data mode. Opts: (null)
Dec 03 15:38:58 archlinux mount[3994]: 411 48 7:2 / /tmp/mounttest/mount/10 rw,nodev,relatime shared:247 - ext4 /dev/loop2 rw
Dec 03 15:38:58 archlinux systemd[1]: Unmounting mount unit for test 10...
Dec 03 15:38:58 archlinux systemd[1]: Unmounting mount unit for test 7...
Dec 03 15:38:58 archlinux systemd[271]: tmp-mounttest-mount-10.mount: Succeeded.
Dec 03 15:38:58 archlinux systemd[271]: tmp-mounttest-mount-7.mount: Succeeded.
Dec 03 15:38:58 archlinux systemd[1]: tmp-mounttest-mount-10.mount: Succeeded.
Dec 03 15:38:58 archlinux systemd[1]: Unmounted mount unit for test 10.
Dec 03 15:38:58 archlinux systemd[1]: tmp-mounttest-mount-7.mount: Succeeded.
Dec 03 15:38:58 archlinux systemd[1]: Unmounted mount unit for test 7.

mvo5 added a commit to mvo5/snappy that referenced this issue Dec 12, 2018
This works around the systemd bug
systemd/systemd#10872 by ensuring that
there is only a single operation that manipulates mount units
at the same time.
@bboozzoo
Copy link
Contributor Author

We were checking some potential workarounds in snapd. We've simplified the reproducer script to do just one mount, but many reloads in parallel and we were able to reproduce the problem, though it took much longer.

Now that #10980 is merged, would you like me to try with the latest master?

mvo5 added a commit to mvo5/snappy that referenced this issue Jan 7, 2019
This is an RFC PR to see if the "mount protocol error" reported in
systemd/systemd#10872 can be worked around by serializing the mount unit
adding/removal. Proposing to get full spread runs.

This is similar to snapcore#6243 but it goes further by ensuring a single daemon
reload on the systemd go package level. Note that there is still a
chance that the protocol error happens if something else (like dpkg or
the user) runs "systemd daemon-reload" while we write a mount unit.
But the risk should be hughely smaller.
mvo5 added a commit to mvo5/snappy that referenced this issue Jan 31, 2019
This is an RFC PR to see if the "mount protocol error" reported in
systemd/systemd#10872 can be worked around by serializing the mount unit
adding/removal. Proposing to get full spread runs.

This is similar to snapcore#6243 but it goes further by ensuring a single daemon
reload on the systemd go package level. Note that there is still a
chance that the protocol error happens if something else (like dpkg or
the user) runs "systemd daemon-reload" while we write a mount unit.
But the risk should be hughely smaller.
@myrkr
Copy link
Contributor

myrkr commented Mar 5, 2019

I still have seen this problem of parallel execution of systemctl daemon-reload (after creating some new service unit) and parallel execution of mount units with the error message:

2019-03-04 14:00:52.640 init.scope: var-lib-machines-grade2.mount: Mount process finished, but there is no mount.
2019-03-04 14:00:52.640 init.scope: var-lib-machines-grade2.mount: Failed with result 'protocol'.

in version v241.

We work around this problem by trusting the successful execution of the mount command:

diff --git a/src/core/mount.c b/src/core/mount.c
index c31cad6b52..a8327fe355 100644
--- a/src/core/mount.c
+++ b/src/core/mount.c
@@ -1347,9 +1347,11 @@ static void mount_sigchld_event(Unit *u, pid_t pid, int code, int status) {
                         /* Either /bin/mount has an unexpected definition of success,
                          * or someone raced us and we lost. */
                         log_unit_warning(UNIT(m), "Mount process finished, but there is no mount.");
-                        f = MOUNT_FAILURE_PROTOCOL;
+                        log_unit_warning(UNIT(m), "But assuming success anyway");
+                        mount_enter_mounted(m, f);
+                } else {
+                        mount_enter_dead(m, f);
                 }
-                mount_enter_dead(m, f);
                 break;
 
         case MOUNT_MOUNTING_DONE:

Unfortunately, I have not yet been able to isolate a reproducer for this problem, it just fails from time to time when creating new systemd units (and reloading) and performing mounts as dependency for starting other service units.

@frispete
Copy link

I do see this issue with nfs mounts on openSUSE Tumbleweed, that carries 241.

$ systemctl --failed
  UNIT           LOAD   ACTIVE SUB    DESCRIPTION                                                                                                                                                  
● home-nfs.mount loaded failed failed /home/nfs                                                                                                                                                    
● video.mount    loaded failed failed /video                                                                                                                                                       
● work.mount     loaded failed failed /work         
$ grep nfs /etc/fstab
server:/home    /home/nfs  nfs   nfsvers=4,x-systemd.requires=/home,x-systemd.requires=nfs-mountd.service  0  0
server:/work    /work      nfs   nfsvers=4,x-systemd.requires=nfs-mountd.service  0  0
server:/video   /video     nfs   nfsvers=4,x-systemd.requires=nfs-mountd.service  0  0
syslog:
2019-04-14T15:29:22.925116+02:00 xrated systemd[1]: Started System Security Services Daemon.
2019-04-14T15:29:22.925227+02:00 xrated systemd[1]: home-nfs.mount: Mount process finished, but there is no mount.
2019-04-14T15:29:22.925289+02:00 xrated systemd[1]: home-nfs.mount: Failed with result 'protocol'.
2019-04-14T15:29:22.925610+02:00 xrated systemd[1]: Failed to mount /home/nfs.
2019-04-14T15:29:22.925818+02:00 xrated systemd[1]: Dependency failed for Remote File Systems.
2019-04-14T15:29:22.925866+02:00 xrated systemd[1]: remote-fs.target: Job remote-fs.target/start failed with result 'dependency'.
2019-04-14T15:29:22.925984+02:00 xrated systemd[1]: work.mount: Mount process finished, but there is no mount.
2019-04-14T15:29:22.926029+02:00 xrated systemd[1]: work.mount: Failed with result 'protocol'.
2019-04-14T15:29:22.926246+02:00 xrated systemd[1]: Failed to mount /work.
2019-04-14T15:29:22.926495+02:00 xrated systemd[1]: video.mount: Mount process finished, but there is no mount.
2019-04-14T15:29:22.926548+02:00 xrated systemd[1]: video.mount: Failed with result 'protocol'.
2019-04-14T15:29:22.926780+02:00 xrated systemd[1]: Failed to mount /video.
2019-04-14T15:29:22.927205+02:00 xrated systemd[1]: rpc-statd-notify.service: Succeeded.
2019-04-14T15:29:22.927493+02:00 xrated systemd[1]: Started Notify NFS peers of a restart.
2019-04-14T15:29:22.928432+02:00 xrated systemd[1]: Reached target User and Group Name Lookups.
2019-04-14T15:29:22.929360+02:00 xrated systemd[1]: Starting Permit User Sessions...
2019-04-14T15:29:22.930346+02:00 xrated systemd[1]: Starting Login Service...
2019-04-14T15:29:22.938858+02:00 xrated systemd[1]: Reloading.

@frispete
Copy link

Forgot to mention, the mounts are working fine, nevertheless.

@hcderaad
Copy link

hcderaad commented May 7, 2019

Can confirm this same issue on recent openSUSE Tumbleweed releases with version:
Name : systemd
Version : 241-2.1
Arch : x86_64
Vendor : openSUSE

In this case it is an encrypted XFS partition. Mount fails on boot, yet is fully usable, exactly like @frispete reports. Any updates or possibly any testcases to run perhaps?

@msekletar msekletar added this to the v243 milestone Jun 26, 2019
msekletar added a commit to msekletar/systemd that referenced this issue Jul 1, 2019
If we get a SIGCHLD we enable and eventually dispatch
sigchld_event_source where we actually reap the process. We received
SIGCHLD for the specific PID so wait for that process first.

Motivation to do this is to prevent problem due to our state machine for
mount units relying on the fact that we always dispatch mountinfo
notifications before dispatching sigchld handler for the
mount. Previously, this was racy because we might have called
manager_dispatch_sigchld() for completely unrelated process but we would
actually reap the mount process which completed in the meantime. sigchld
handler for the mount unit would then fail the mount unit because we
haven't dispatched mountinfo notification yet.

event| mount         kernel              PID 1
------------------------------------------------------------------------
1    |                                   forks off mount as PID x
------------------------------------------------------------------------
2    |                                   receives SIGCHLD for PID y
------------------------------------------------------------------------
3    |                                   enables sigchld_event_source
------------------------------------------------------------------------
4    |                                   dispatches sigchld_event_source
------------------------------------------------------------------------
5    | mount()       mountinfo_notif
------------------------------------------------------------------------
6    | exit()
------------------------------------------------------------------------
7    |                                   calls waitid() with P_ALL
------------------------------------------------------------------------
8    |                                   calls sigchld_handler for mount
------------------------------------------------------------------------
9    |                                   fails the mount unit since
     |                                   mountinfo_notif wasn't
     |                                   processed yet
------------------------------------------------------------------------

Fixes systemd#10872
msekletar added a commit to msekletar/systemd that referenced this issue Jul 3, 2019
If we get a SIGCHLD we enable and eventually dispatch
sigchld_event_source where we actually reap the process. We received
SIGCHLD for the specific PID so wait for that process first.

Motivation to do this is to prevent problem due to our state machine for
mount units relying on the fact that we always dispatch mountinfo
notifications before dispatching sigchld handler for the
mount. Previously, this was racy because we might have called
manager_dispatch_sigchld() for completely unrelated process but we would
actually reap the mount process which completed in the meantime. sigchld
handler for the mount unit would then fail the mount unit because we
haven't dispatched mountinfo notification yet.

event| mount         kernel              PID 1
------------------------------------------------------------------------
1    |                                   forks off mount as PID x
------------------------------------------------------------------------
2    |                                   receives SIGCHLD for PID y
------------------------------------------------------------------------
3    |                                   enables sigchld_event_source
------------------------------------------------------------------------
4    |                                   dispatches sigchld_event_source
------------------------------------------------------------------------
5    | mount()       mountinfo_notif
------------------------------------------------------------------------
6    | exit()
------------------------------------------------------------------------
7    |                                   calls waitid() with P_ALL
------------------------------------------------------------------------
8    |                                   calls sigchld_handler for mount
------------------------------------------------------------------------
9    |                                   fails the mount unit since
     |                                   mountinfo_notif wasn't
     |                                   processed yet
------------------------------------------------------------------------

Fixes systemd#10872
poettering added a commit to poettering/systemd that referenced this issue Jul 17, 2019
(The interesting bits about the what and why are in a comment in the
patch, please have a look there instead of looking here in the commit
msg).

Fixes: systemd#10872
@poettering
Copy link
Member

I prepped a proposal to fix this in #13097, ptal!

@syyhao1994
Copy link
Contributor

syyhao1994 commented Oct 18, 2020

It fails with "disk-nn missing".

I encounterd the same problem, so this problem is still unreasolvable?

What kernel version are you using?

4.19.
And i found the problem that when i run the reproducer https://gist.github.com/nomuranec/359a4495900f26a1befa4380634b130f
on aarch64 with systemd 239 or 243, the error is "Mount process finished, but there is no mount."
but on x86_64 the error is disk-X missing with variant_0.
And i patch the patches mentioned above, the "Mount process finished, but there is no mount." on aarch64 is no longer exist, but i still get the error as "disk-X missing"
And i also run this reproducer with systemd v246, the problem of "disk-X missing" is sitll exist.

@syyhao1994
Copy link
Contributor

syyhao1994 commented Oct 18, 2020

"disk-nn missing"

So maybe the "disk-nn missing" phenomenon is caused by the reproducer rather than a problem?
@nomuranec

@raven-au
Copy link

"disk-nn missing"

So maybe the "disk-nn missing" phenomenon is caused by the reproducer rather than a problem?
@nomuranec

It's my understanding this is caused by a locking problem in the kernel when reading the proc mounts table.
There was also a temporary work around added to libmount (of util-linux).

The kernel fix went into v5.7 or v5.8, I'm not sure what version of util-linux has the workaround.

So if your libmount doesn't have the workaround or your distribution kernel (or the kernel you are building) doesn't have the fix you will see the problem.

@karelzak
Copy link
Contributor

The kernel fix went into v5.7 or v5.8, I'm not sure what version of util-linux has the workaround.

util-linux v2.35 and v2.36 (the current upstream (not released yet) is without the workaround)

@syyhao1994
Copy link
Contributor

"disk-nn missing"

So maybe the "disk-nn missing" phenomenon is caused by the reproducer rather than a problem?
@nomuranec

It's my understanding this is caused by a locking problem in the kernel when reading the proc mounts table.
There was also a temporary work around added to libmount (of util-linux).

The kernel fix went into v5.7 or v5.8, I'm not sure what version of util-linux has the workaround.

So if your libmount doesn't have the workaround or your distribution kernel (or the kernel you are building) doesn't have the fix you will see the problem.

I try the workaround patch in util-linux of util-linux/util-linux@e4925f5.
But as we could see through runnin the producer, the "disk-nn missing" is still there.

@karelzak
Copy link
Contributor

@syyhao1994 there are more patches related to this topic. You need also ee551c909f95437fd9fcd162f398c069d0ce9720.

@syyhao1994
Copy link
Contributor

@syyhao1994 there are more patches related to this topic. You need also ee551c909f95437fd9fcd162f398c069d0ce9720.

You mean the problem of "disk-xx missing"?

@nomuranec
Copy link
Contributor

@syyhao1994 Have you tried variant 5 (--variant-static-lodev)of the test program? Variant 5 is simplified version of variant 0. The mappings of lodev and disk-xx.img are static in variant 5 during the test instead of dynamic as in variant 0. IMO this issue is complicated by involving not only mount but also lodev mapping changes.

@syyhao1994
Copy link
Contributor

@karelzak thanks, that's good idea. Here is a modified reproducer: https://gist.github.com/nomuranec/359a4495900f26a1befa4380634b130f

I added variant 5 (--variant-static-lodev), which does same thing as variant 0 except that relationship between image file and loopback device doesn't change during running. That is:

  • variant 0 specifies a path to image file by "What=". mount/umount command automatically attaches/detaches loopback device to the file.
  • variant 5 attaches loopback device for each image file before starting the loops and specifies the device node by "What=".

With variant 0, "disk-nn missing" phenomenon still reproduces reliably within 40 loops with today's head e1d32d6.
With variant 5, it doesn't even when I increase $LOOPS to 400.

@karelzak As nomuranec said, the problem is still exist with variant 0

@syyhao1994
Copy link
Contributor

@syyhao1994 Have you tried variant 5 (--variant-static-lodev)of the test program? Variant 5 is simplified version of variant 0. The mappings of lodev and disk-xx.img are static in variant 5 during the test instead of dynamic as in variant 0. IMO this issue is complicated by involving not only mount but also lodev mapping changes.

I tried many times, the problem is still exist with variant 5, but it was very hard to reproduce, i have tried maybe thousands of time.
With variant 0 it is very easy to reproduce.
I am so confused with this phenomenon.

@nomuranec
Copy link
Contributor

Because variant 0 has high frequency lodev mapping change in the mix, it could trigger other problem than /proc/self/mountinfo race. If you are chasing mount-related problem, variant 5 has tighter focus on that. If variant 0 matches your use case, IMHO forking a new issue with specific systemd/kernel/util-linux version avoids confusion...

@syyhao1994
Copy link
Contributor

Because variant 0 has high frequency lodev mapping change in the mix, it could trigger other problem than /proc/self/mountinfo race. If you are chasing mount-related problem, variant 5 has tighter focus on that. If variant 0 matches your use case, IMHO forking a new issue with specific systemd/kernel/util-linux version avoids confusion...

Ok, it may be couldn't solve in systemd. Thank you a lot!

@karelzak
Copy link
Contributor

Frankly, I do not see reason why use loopdev to test mountinfo issues. It only inceases complexity and kernel loopdev driver is pretty problematic when used in paralell. It seesm better to minimize complexity and use for example "tmpfs" to test mountinfo.

See for example my version of the script which I have used to implement the workaround: http://people.redhat.com/kzak/rep-tmpfs.sh.

Anyway, the mountinfo read() issue should be fixed by kernel (since 5.8).

bluca pushed a commit to bluca/systemd that referenced this issue Nov 13, 2020
…"just_mounted"

When starting a mount unit, systemd invokes mount command and moves the
unit's internal state to "mounting".  Then it watches for updates of
/proc/self/mountinfo.  When the expected mount entry newly appears in
mountinfo, the unit internal state is changed to "mounting-done".
Finally, when systemd finds the mount command has finished, it checks
whether the unit internal state is "mounting-done" and changes the state
to "mounted".
If the state was not "mounting-done" in the last step though mount command
was successfully finished, the unit is marked as "failed" with following
log messages:
  Mount process finished, but there is no mount.
  Failed with result 'protocol'.

If daemon-reload is done in parallel with starting mount unit, it is
possible that things happen in following order and result in above failure.
  1. the mount unit state changes to "mounting"
  2. daemon-reload saves the unit state
  3. kernel completes the mount and /proc/self/mountinfo is updated
  4. daemon-reload restores the saved unit state, that is "mounting"
  5. systemd notices the mount command has finished but the unit state
     is still "mounting" though it should be "mounting-done"

mount_setup_existing_unit() should take into account that MOUNT_MOUNTING
is transitional state and set MOUNT_PROC_JUST_MOUNTED flag if the unit
comes from /proc/self/mountinfo so that mount_process_proc_self_mountinfo()
later can make state transition from "mounting" to "mounting-done".

Fixes: systemd#10872
(cherry picked from commit 1d086a6)
@poettering poettering modified the milestones: v248, v249 Feb 23, 2021
@poettering poettering modified the milestones: v249, v250 Jun 1, 2021
@poettering poettering modified the milestones: v250, v251 Nov 4, 2021
@yuwata yuwata modified the milestones: v251, v252 Jan 24, 2022
@yuwata yuwata added the bug 🐛 Programming errors, that need preferential fixing label Jan 24, 2022
@poettering poettering modified the milestones: v252, v253 Sep 2, 2022
@yuwata
Copy link
Member

yuwata commented Dec 6, 2022

Hopefully fixed by #23893.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing mount pid1