Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd: masked active network mount unit breaks the state of other mount units #10874

Closed
vtapia opened this issue Nov 21, 2018 · 8 comments
Closed
Assignees
Milestone

Comments

@vtapia
Copy link
Contributor

vtapia commented Nov 21, 2018

systemd version the issue has been seen with

239 and 237

Used distribution

Ubuntu 18.10 and 18.04

Expected behaviour you didn't see

A masked mount unit should not affect the behavior of the rest of the units

Unexpected behaviour you saw

Mounting manually a network share (NFS) and masking it breaks the state of other units, and their dependencies.

Steps to reproduce the problem

# Mount the NFS share
root@iscsi-cosmic:/home/ubuntu# mount | grep media
root@iscsi-cosmic:/home/ubuntu# systemctl list-units | grep media
root@iscsi-cosmic:/home/ubuntu# mount 10.230.56.72:/tmp/mnt /media
root@iscsi-cosmic:/home/ubuntu# systemctl list-units | grep media
media.mount                                                                              loaded active mounted   /media                                                                           

# Mount and umount a test mount to confirm everything works
root@iscsi-cosmic:/home/ubuntu# cat /etc/fstab
LABEL=cloudimg-rootfs   /        ext4   defaults        0 0
tmpfs /test tmpfs nosuid,nodev 0 0
root@iscsi-cosmic:/home/ubuntu# systemctl start test.mount
root@iscsi-cosmic:/home/ubuntu# systemctl stop test.mount

# Mask the share (media.mount)
root@iscsi-cosmic:/home/ubuntu# systemctl mask media.mount
Created symlink /etc/systemd/system/media.mount → /dev/null.

# Start test.mount (fails with Result: Protocol)
root@iscsi-cosmic:/home/ubuntu# systemctl start test.mount
Job for test.mount failed.
See "systemctl status test.mount" and "journalctl -xe" for details.
root@iscsi-cosmic:/home/ubuntu# systemctl status test.mount
● test.mount - /test
   Loaded: loaded (/etc/fstab; generated)
   Active: failed (Result: protocol) since Wed 2018-11-21 15:53:29 UTC; 8s ago
    Where: /test
     What: tmpfs
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Nov 21 15:53:29 iscsi-cosmic systemd[1]: Mounting /test...
Nov 21 15:53:29 iscsi-cosmic systemd[1]: test.mount: Mount process finished, but there is no mount.
Nov 21 15:53:29 iscsi-cosmic systemd[1]: test.mount: Failed with result 'protocol'.
Nov 21 15:53:29 iscsi-cosmic systemd[1]: Failed to mount /test.

# /test is mounted regardless
root@iscsi-cosmic:/home/ubuntu# mount | grep /test
tmpfs on /test type tmpfs (rw,nosuid,nodev,relatime)

# Unmask and start test.mount properly
root@iscsi-cosmic:/home/ubuntu# umount /test
root@iscsi-cosmic:/home/ubuntu# systemctl unmask media.mount
Removed /etc/systemd/system/media.mount.
root@iscsi-cosmic:/home/ubuntu# mount | grep /test
root@iscsi-cosmic:/home/ubuntu# systemctl start test.mount
root@iscsi-cosmic:/home/ubuntu# mount | grep /test
tmpfs on /test type tmpfs (rw,nosuid,nodev,relatime)

# Mask media.mount again
root@iscsi-cosmic:/home/ubuntu# systemctl mask media.mount
Created symlink /etc/systemd/system/media.mount → /dev/null.

# Try to stop test.mount
root@iscsi-cosmic:/home/ubuntu# systemctl stop test.mount
Job for test.mount failed.
See "systemctl status test.mount" and "journalctl -xe" for details.
root@iscsi-cosmic:/home/ubuntu# systemctl status test.mount
● test.mount - /test
   Loaded: loaded (/etc/fstab; generated)
   Active: active (mounted) (Result: exit-code) since Wed 2018-11-21 15:54:39 UTC; 6s ago
    Where: /test
     What: tmpfs
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)
    Tasks: 0 (limit: 2317)
   Memory: 144.0K
   CGroup: /system.slice/test.mount

Nov 21 15:54:21 iscsi-cosmic systemd[1]: Mounting /test...
Nov 21 15:54:21 iscsi-cosmic systemd[1]: Mounted /test.
Nov 21 15:54:39 iscsi-cosmic systemd[1]: Unmounting /test...
Nov 21 15:54:39 iscsi-cosmic umount[5975]: umount: /test: not mounted.
Nov 21 15:54:39 iscsi-cosmic systemd[1]: test.mount: Mount process exited, code=exited status=32
Nov 21 15:54:39 iscsi-cosmic systemd[1]: Failed unmounting /test.

# /test is no longer mounted
root@iscsi-cosmic:/home/ubuntu# mount | grep /test
@arvidjaar
Copy link
Contributor

Probably related to #10872

@poettering
Copy link
Member

As with #10872, if you replace the mount binary by a shell script, do you see the mount being created properly?

@vtapia
Copy link
Contributor Author

vtapia commented Nov 27, 2018

The mount wrapper:

root@iscsi-cosmic:/home/ubuntu# cat mount_wrapper 
#!/bin/bash

/bin/mount.original $@
if [ -z $# ]; then
        exit
fi
grep $2 /proc/self/mountinfo

This is the output of a mount unit with the NFS unit unmasked:

root@iscsi-cosmic:/home/ubuntu# systemctl start test.mount
root@iscsi-cosmic:/home/ubuntu# journalctl -xu test.mount
...
-- Unit test.mount has begun starting up.
Nov 27 09:51:55 iscsi-cosmic mount[18433]: 91 28 0:55 / /test rw,nosuid,nodev,relatime shared:71 - tmpfs tmpfs rw                                                                                 
Nov 27 09:51:55 iscsi-cosmic systemd[1]: Mounted /test.
-- Subject: Unit test.mount has finished start-up
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit test.mount has finished starting up.
--
-- The start-up result is RESULT.
root@iscsi-cosmic:/home/ubuntu# systemctl stop test.mount

With the NFS unit masked:

root@iscsi-cosmic:/home/ubuntu# systemctl mask media.mount
Created symlink /etc/systemd/system/media.mount → /dev/null.
root@iscsi-cosmic:/home/ubuntu# systemctl start test.mount
Job for test.mount failed.
See "systemctl status test.mount" and "journalctl -xe" for details.
root@iscsi-cosmic:/home/ubuntu# systemctl status test.mount
● test.mount - /test
   Loaded: loaded (/etc/fstab; generated)
   Active: failed (Result: protocol) since Tue 2018-11-27 09:53:00 UTC; 6s ago
    Where: /test
     What: tmpfs
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Nov 27 09:53:00 iscsi-cosmic systemd[1]: Mounting /test...
Nov 27 09:53:00 iscsi-cosmic mount[19513]: 91 28 0:55 / /test rw,nosuid,nodev,relatime shared:71 - tmpfs tmpfs rw
Nov 27 09:53:00 iscsi-cosmic systemd[1]: test.mount: Mount process finished, but there is no mount.
Nov 27 09:53:00 iscsi-cosmic systemd[1]: test.mount: Failed with result 'protocol'.
Nov 27 09:53:00 iscsi-cosmic systemd[1]: Failed to mount /test.

In both cases the entry is present in /proc/self/mountinfo

@vtapia
Copy link
Contributor Author

vtapia commented Nov 27, 2018

I think I've found the problem in systemd's code. In src/core/mount.c, with the following call stack:

mount_dispatch_io()
  mount_load_proc_self_mountinfo()
    mount_setup_unit()
      mount_setup_existing_unit()

At mount_setup_existing_unit():

       .... 
        if (!mount_is_extrinsic(MOUNT(u)) && mount_is_network(p)) {
                /* _netdev option may have shown up late, or on a
                 * remount. Add remote-fs dependencies, even though
                 * local-fs ones may already be there.
                 *
                 * Note: due to a current limitation (we don't track
                 * in the dependency "Set*" objects who created a
                 * dependency), we can only add deps, never lose them,
                 * until the next full daemon-reload. */
                unit_add_dependency_by_name(u, UNIT_BEFORE, SPECIAL_REMOTE_FS_TARGET, NULL, true, UNIT_DEPENDENCY_MOUNTINFO_IMPLICIT);
                load_extras = true;
        }

load_extras is set to true in this case because, with a normal NFS mount, mount_is_network() is true and mount_is_extrinsic() is false. Then:

        if (load_extras)
                return mount_add_extras(MOUNT(u));

Where mount_add_extras() calls:

        r = unit_set_default_slice(u);
        if (r < 0)
                return r;

Then unit_set_default_slice() calls unit_set_slice(), that evaluates the state of the unit:

        if (unit_active_state(u) != UNIT_INACTIVE)
                return -EBUSY;

In our scenario, the NFS mount point is UNIT_ACTIVE, returning a -EBUSY (-16) up to mount_setup_existing_unit(), where this is run:

...
                r = mount_setup_existing_unit(u, what, where, options, fstype, &flags);

        if (r < 0)
                goto fail;
...
fail:
        log_warning_errno(r, "Failed to set up mount unit: %m");
        return r;

Back to mount_setup_unit and mount_load_proc_self_mountinfo() where the return codes are aggregated:

        for (;;) {
                ...
                k = mount_setup_unit(m, d, p, options, fstype, set_flags);
                if (r == 0 && k < 0)
                        r = k;
        }

        return r;

And finally, in mount_dispatch_io(), the whole mount process is cancelled for all units:

        r = mount_load_proc_self_mountinfo(m, true);
        if (r < 0) {
                /* Reset flags, just in case, for later calls */
                LIST_FOREACH(units_by_type, u, m->units_by_type[UNIT_MOUNT]) {
                        Mount *mount = MOUNT(u);

                        mount->is_mounted = mount->just_mounted = mount->just_changed = false;
                }

                return 0;
        }

I'm not yet familiar with the codebase, would it be sensible to add "u->load_state == UNIT_LOADED" to the "if (!mount_is_extrinsic(MOUNT(u)) && mount_is_network(p))" check that sets load_extras to true?

@poettering poettering added this to the v240 milestone Nov 27, 2018
@poettering
Copy link
Member

Thanks for tracking this down, much appreciated!

@poettering poettering self-assigned this Nov 28, 2018
poettering added a commit to poettering/systemd that referenced this issue Nov 28, 2018
If we can't process a specific line in /proc/self/mountinfo we should
log about it (which we do), but this should not affect other lines, nor
further processing of mount units. Let's keep these failures local.

Fixes: systemd#10874
@poettering
Copy link
Member

Fix waiting in #10980

@keszybz
Copy link
Member

keszybz commented Dec 6, 2018

@vtapia if you could test #10980 (in particular c165888), that'd be appreciated.

@vtapia
Copy link
Contributor Author

vtapia commented Dec 13, 2018

Just wanted to confirm that the suggested commit fixes the issue, thanks for the fix!

(sorry for the delay, I've been out some time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants