Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force-detach of XFS volume not noticed by systemd #3468

Closed
mlafeldt opened this issue Jun 8, 2016 · 11 comments

Comments

3 participants
@mlafeldt
Copy link

commented Jun 8, 2016

Submission type

  • Bug report
  • Request for enhancement (RFE)

systemd version the issue has been seen with

systemctl --version
systemd 225
-PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS -ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN

Used distribution

CoreOS stable 1010.5.0

Expected behavior you didn't see

When I force-detach an XFS volume, I expect systemd to notice that the corresponding mount is gone and report it accordingly so that other units can rely on the volume being available.

Unexpected behavior you saw

We're using these systemd units to mount an AWS EBS volume with an XFS filesystem:

# var-lib-docker.mount
[Unit]
Description=Mount ephemeral to /var/lib/docker
Requires=format-ephemeral.service
After=format-ephemeral.service
Before=docker.service
[Mount]
What=/dev/xvdb
Where=/var/lib/docker
Type=xfs
# format-ephemeral.service
[Unit]
Description=Format the EBS volume
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/wipefs -f /dev/xvdb
ExecStart=/usr/sbin/mkfs.xfs /dev/xvdb

The mount works as expected:

ip-10-8-4-150 core # mount | grep docker
/dev/xvdb on /var/lib/docker type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

When force-detaching the EBS volume via the AWS console, systemd will not notice that the mount is no longer valid:

ip-10-8-4-150 core # touch /var/lib/docker
touch: setting times of '/var/lib/docker': Input/output error
ip-10-8-4-150 core # systemctl status var-lib-docker.mount
● var-lib-docker.mount - Mount ephemeral to /var/lib/docker
   Loaded: loaded (/etc/systemd/system/var-lib-docker.mount; static; vendor preset: disabled)
   Active: active (mounted) since Tue 2016-06-07 20:00:15 UTC; 15h ago
    Where: /var/lib/docker
     What: /dev/xvdb
  Process: 867 ExecMount=/bin/mount /dev/xvdb /var/lib/docker -t xfs (code=exited, status=0/SUCCESS)
   Memory: 0B
      CPU: 7ms

Jun 07 20:00:15 ip-10-8-4-150.eu-west-1.compute.internal systemd[1]: Mounting Mount ephemeral to /var/lib/docker...
Jun 07 20:00:15 ip-10-8-4-150.eu-west-1.compute.internal systemd[1]: Mounted Mount ephemeral to /var/lib/docker.

journalctl -k reports that the XFS filesystem was shutdown, which appears to be the correct behavior of XFS if an error occurs:

Jun 08 11:45:05 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 108573088
Jun 08 11:45:05 ip-10-8-4-150.eu-west-1.compute.internal kernel: Buffer I/O error on dev xvdb, logical block 13571636, lost async page write
Jun 08 11:45:05 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 171967232
Jun 08 11:45:05 ip-10-8-4-150.eu-west-1.compute.internal kernel: Buffer I/O error on dev xvdb, logical block 21495904, lost async page write
Jun 08 11:45:06 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 171967240
Jun 08 11:45:06 ip-10-8-4-150.eu-west-1.compute.internal kernel: Buffer I/O error on dev xvdb, logical block 21495905, lost async page write
Jun 08 11:45:08 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 212907568
Jun 08 11:45:08 ip-10-8-4-150.eu-west-1.compute.internal kernel: Buffer I/O error on dev xvdb, logical block 26613446, lost async page write
Jun 08 11:45:10 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 171967248
Jun 08 11:45:10 ip-10-8-4-150.eu-west-1.compute.internal kernel: Buffer I/O error on dev xvdb, logical block 21495906, lost async page write
Jun 08 11:45:11 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 171967312
Jun 08 11:45:11 ip-10-8-4-150.eu-west-1.compute.internal kernel: Buffer I/O error on dev xvdb, logical block 21495914, lost async page write
Jun 08 11:45:13 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 171976824
Jun 08 11:45:13 ip-10-8-4-150.eu-west-1.compute.internal kernel: Buffer I/O error on dev xvdb, logical block 21497103, lost async page write
Jun 08 11:45:15 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 171967320
Jun 08 11:45:15 ip-10-8-4-150.eu-west-1.compute.internal kernel: Buffer I/O error on dev xvdb, logical block 21495915, lost async page write
Jun 08 11:45:15 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 171967328
Jun 08 11:45:15 ip-10-8-4-150.eu-west-1.compute.internal kernel: Buffer I/O error on dev xvdb, logical block 21495916, lost async page write
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 209717663
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 209717599
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: XFS (xvdb): metadata I/O error: block 0xc80099f ("xlog_iodone") error 5 numblks 64
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 46224
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 2651536
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 104857602
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 104857664
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 104975736
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 104977896
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 107371136
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: blk_update_request: I/O error, dev xvdb, sector 209715202
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: XFS (xvdb): xfs_do_force_shutdown(0x2) called from line 1203 of file ../source/fs/xfs/xfs_log.c.  Return address = 0xffffffffc05981e0
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: XFS (xvdb): Log I/O Error Detected.  Shutting down filesystem
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: XFS (xvdb): Please umount the filesystem and rectify the problem(s)
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: XFS (xvdb): metadata I/O error: block 0xc80095f ("xlog_iodone") error 5 numblks 64
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: XFS (xvdb): xfs_do_force_shutdown(0x2) called from line 1203 of file ../source/fs/xfs/xfs_log.c.  Return address = 0xffffffffc05981e0
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: XFS (xvdb): xfs_log_force: error -5 returned.

What's weird is that list-units reports that the device is active and plugged...

ip-10-8-4-150 core # systemctl list-units | grep xvdb
  sys-devices-vbd\x2d51728-block-xvdb.device                                    loaded active plugged   /sys/devices/vbd-51728/block/xvdb

While the status command says that it's dead:

ip-10-8-4-150 core # systemctl status sys-devices-vbd\x2d51728-block-xvdb.device
● sys-devices-vbdx2d51728-block-xvdb.device
   Loaded: loaded
   Active: inactive (dead)

It's also worth noting that systemd works fine when using umount instead of force-detaching.

Steps to reproduce the problem

  • Format an EBS volume with XFS and mount it using the mentioned unit files
  • Force-detach the volume via AWS console
@poettering

This comment has been minimized.

Copy link
Member

commented Jun 8, 2016

not sure what a xfs forced shutdown is. But do note that systemd considers mounts active as long as they show up in /proc/self/mountinfo. Does yours? If so, and you think it should not, please file a kernel bug.

@mlafeldt

This comment has been minimized.

Copy link
Author

commented Jun 8, 2016

Yes, it's still there:

ip-10-8-3-84 core # grep docker /proc/self/mountinfo
123 98 202:16 / /var/lib/docker rw,relatime shared:74 - xfs /dev/xvdb rw,seclabel,attr2,inode64,noquota

Regarding the XFS shutdown. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/ch06s09.html says:

When XFS encounters a metadata error that is not recoverable it will shut down the file system and return a EFSCORRUPTED error. The system logs will contain details of the error enountered and will recommend running xfs_repair if necessary.

Can you explain why, as I reported, list-units might report a different result as status?

ip-10-8-4-150 core # systemctl list-units | grep xvdb
  sys-devices-vbd\x2d51728-block-xvdb.device                                    loaded active plugged   /sys/devices/vbd-51728/block/xvdb

While the status command says that it's dead:

ip-10-8-4-150 core # systemctl status sys-devices-vbd\x2d51728-block-xvdb.device
● sys-devices-vbdx2d51728-block-xvdb.device
   Loaded: loaded
   Active: inactive (dead)
@evverx

This comment has been minimized.

Copy link
Member

commented Jun 8, 2016

@mlafeldt , try:

systemctl status 'sys-devices-vbd\x2d51728-block-xvdb.device'
@mlafeldt

This comment has been minimized.

Copy link
Author

commented Jun 8, 2016

Ah. Thanks!

ip-10-8-3-84 core # systemctl status sys-devices-vbd\x2d51728-block-xvdb.device
● sys-devices-vbdx2d51728-block-xvdb.device
   Loaded: loaded
   Active: inactive (dead)
ip-10-8-3-84 core # systemctl status 'sys-devices-vbd\x2d51728-block-xvdb.device'
● sys-devices-vbd\x2d51728-block-xvdb.device - /sys/devices/vbd-51728/block/xvdb
   Loaded: loaded
   Active: active (plugged) since Wed 2016-06-08 11:38:12 UTC; 1h 9min ago
   Device: /sys/devices/vbd-51728/block/xvdb

But shouldn't the status command report that the former doesn't exist then? Given this:

ip-10-8-3-84 core # systemctl list-units | grep xvdb
  sys-devices-vbd\x2d51728-block-xvdb.device                                    loaded active plugged   /sys/devices/vbd-51728/block/xvdb
ip-10-8-3-84 core #
@poettering

This comment has been minimized.

Copy link
Member

commented Jun 8, 2016

well, systemd just reflects the state the kernel reports, for both mounts and devices. Hence if you see the mount in /proc/self/mountinfo, then the mount is still there. Similar, if the block device is still in /sys/class/block/ then the block device is still there...

I have the suspicion that there's simply a misunderstanding here how the kernel works here...

@mlafeldt

This comment has been minimized.

Copy link
Author

commented Jun 8, 2016

OK, so you think that the Linux XFS driver should propagate the error instead of telling the user to "umount the filesystem"?

Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: XFS (xvdb): Log I/O Error Detected.  Shutting down filesystem
Jun 08 11:45:16 ip-10-8-4-150.eu-west-1.compute.internal kernel: XFS (xvdb): Please umount the filesystem and rectify the problem(s)
@poettering

This comment has been minimized.

Copy link
Member

commented Jun 8, 2016

@mlafeldt well, i'll leave that for the XFS devs to figure out. systemd really does the right thing here, and simply reflects the kernel's exported state. If the file system should be unmounted or the device should vanish, then that's something the kernel folks should figure out.

@poettering

This comment has been minimized.

Copy link
Member

commented Jun 8, 2016

Anyway, closing this, as this appears to be a misunderstanding about how the kernel handles this, and if there's something to fix, then it would be in the kernel, not in systemd.

@mlafeldt

This comment has been minimized.

Copy link
Author

commented Jun 8, 2016

I see.

Re #3468 (comment): There seems to be some special logic that reports .device files to always be loaded.

ip-10-8-3-84 core # systemctl status 'xxx.device'
● xxx.device
   Loaded: loaded
   Active: inactive (dead)
@mlafeldt

This comment has been minimized.

Copy link
Author

commented Jun 8, 2016

FWIW, this is indeed the intended behavior of XFS: coreos/bugs#1396 (comment)

@poettering

This comment has been minimized.

Copy link
Member

commented Jun 8, 2016

Re #3468 (comment): There seems to be some special logic that reports .device files to always be loaded.

Yeah, "loaded" just means that a unit of this name makes theoretic sense, it doesn't mean that the device is actually there. That's what "Active:" shows you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.