Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod is deleted but rbd not unmap #54214

Closed
likakuli opened this issue Oct 19, 2017 · 16 comments

Comments

@likakuli
Copy link
Contributor

commented Oct 19, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened:
Pod was deleted successfully but there is still a lock on the ceph image. I find two logs,one for that lock not removed and another for that lock removed. I find that there are little difference between them.

the log that lock not removed:

Oct 16 17:49:06 gs-server-4081 kubelet: I1016 17:49:06.859629   12668 reconciler.go:242] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/rbd/4037ef87-b257-11e7-9ddb-00155d283404-mongo-node-2" (spec.Name: "mongo-node-2") pod "4037ef87-b257-11e7-9ddb-00155d283404" (UID: "4037ef87-b257-11e7-9ddb-00155d283404")
Oct 16 17:49:07 gs-server-4081 kernel: rbd: loaded
Oct 16 17:49:08 gs-server-4081 kubelet: I1016 17:49:08.060636   12668 rbd_util.go:141] lock list output ""
Oct 16 17:49:09 gs-server-4081 kernel: rbd0: unknown partition table
Oct 16 17:49:09 gs-server-4081 kernel: rbd: rbd0: added with size 0x1900000000
Oct 16 17:49:13 gs-server-4081 kernel: rbd0: unknown partition table
Oct 16 17:49:14 gs-server-4081 kernel: EXT4-fs (rbd0): mounted filesystem with ordered data mode. Opts: (null)
Oct 16 17:49:14 gs-server-4081 kernel: EXT4-fs (rbd0): re-mounted. Opts: (null)
Oct 16 17:49:14 gs-server-4081 kubelet: I1016 17:49:14.160393   12668 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/rbd/4037ef87-b257-11e7-9ddb-00155d283404-mongo-node-2" (spec.Name: "mongo-node-2") pod "4037ef87-b257-11e7-9ddb-00155d283404" (UID: "4037ef87-b257-11e7-9ddb-00155d283404").
Oct 18 15:20:54 gs-server-4081 kubelet: I1018 15:20:54.941430   10754 reconciler.go:242] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/rbd/4037ef87-b257-11e7-9ddb-00155d283404-mongo-node-2" (spec.Name: "mongo-node-2") pod "4037ef87-b257-11e7-9ddb-00155d283404" (UID: "4037ef87-b257-11e7-9ddb-00155d283404")
Oct 18 15:20:55 gs-server-4081 kernel: rbd: loaded
Oct 18 15:20:56 gs-server-4081 kubelet: I1018 15:20:56.427300   10754 rbd_util.go:141] lock list output "There is 1 exclusive lock on this image.\nLocker         ID                                Address                \nclient.1227132 kubelet_lock_magic_gs-server-4081 10.203.40.80:0/1064444 \n"
Oct 18 15:20:56 gs-server-4081 kernel: rbd0: unknown partition table
Oct 18 15:20:56 gs-server-4081 kernel: rbd: rbd0: added with size 0x1900000000
Oct 18 15:21:13 gs-server-4081 kernel: rbd0: unknown partition table
Oct 18 15:21:14 gs-server-4081 kernel: EXT4-fs (rbd0): mounted filesystem with ordered data mode. Opts: (null)
Oct 18 15:21:14 gs-server-4081 kernel: EXT4-fs (rbd0): re-mounted. Opts: (null)
Oct 18 15:21:14 gs-server-4081 kubelet: I1018 15:21:14.680620   10754 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/rbd/4037ef87-b257-11e7-9ddb-00155d283404-mongo-node-2" (spec.Name: "mongo-node-2") pod "4037ef87-b257-11e7-9ddb-00155d283404" (UID: "4037ef87-b257-11e7-9ddb-00155d283404").
Oct 18 15:21:14 gs-server-4081 kubelet: I1018 15:21:14.701955   10754 reconciler.go:201] UnmountVolume operation started for volume "kubernetes.io/rbd/4037ef87-b257-11e7-9ddb-00155d283404-mongo-node-2" (spec.Name: "mongo-node-2") from pod "4037ef87-b257-11e7-9ddb-00155d283404" (UID: "4037ef87-b257-11e7-9ddb-00155d283404").
Oct 18 15:21:15 gs-server-4081 kubelet: E1018 15:21:15.058692   10754 disk_manager.go:113] failed to detach disk from /var/lib/kubelet/plugins/kubernetes.io/rbd/rbd/k8s-image-mongo-node-2
Oct 18 15:21:15 gs-server-4081 kubelet: E1018 15:21:15.058752   10754 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/rbd/4037ef87-b257-11e7-9ddb-00155d283404-mongo-node-2\" (\"4037ef87-b257-11e7-9ddb-00155d283404\")" failed. No retries permitted until 2017-10-18 15:21:15.558727726 +0800 CST (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/rbd/4037ef87-b257-11e7-9ddb-00155d283404-mongo-node-2" (volume.spec.Name: "mongo-node-2") pod "4037ef87-b257-11e7-9ddb-00155d283404" (UID: "4037ef87-b257-11e7-9ddb-00155d283404") with: rbd: failed to unmap device /dev/rbd0:Error: exit status 16
Oct 18 15:21:15 gs-server-4081 kubelet: I1018 15:21:15.603635   10754 reconciler.go:201] UnmountVolume operation started for volume "kubernetes.io/rbd/4037ef87-b257-11e7-9ddb-00155d283404-mongo-node-2" (spec.Name: "mongo-node-2") from pod "4037ef87-b257-11e7-9ddb-00155d283404" (UID: "4037ef87-b257-11e7-9ddb-00155d283404").
Oct 18 15:21:15 gs-server-4081 kubelet: I1018 15:21:15.603764   10754 operation_generator.go:672] UnmountVolume.TearDown succeeded for volume "kubernetes.io/rbd/4037ef87-b257-11e7-9ddb-00155d283404-mongo-node-2" (OuterVolumeSpecName: "mongo-node-2") pod "4037ef87-b257-11e7-9ddb-00155d283404" (UID: "4037ef87-b257-11e7-9ddb-00155d283404"). InnerVolumeSpecName "mongo-node-2". PluginName "kubernetes.io/rbd", VolumeGidValue ""
Oct 18 15:21:15 gs-server-4081 kubelet: I1018 15:21:15.703821   10754 reconciler.go:363] Detached volume "kubernetes.io/rbd/4037ef87-b257-11e7-9ddb-00155d283404-mongo-node-2" (spec.Name: "mongo-node-2") devicePath: ""

and the log that lock removed:

Oct 16 17:59:35 gs-server-4086 kubelet: I1016 17:59:35.198168   53479 rbd_util.go:141] lock list output ""
Oct 16 17:59:36 gs-server-4086 kernel: rbd0: unknown partition table
Oct 16 17:59:36 gs-server-4086 kernel: rbd: rbd0: added with size 0x1900000000
Oct 16 17:59:40 gs-server-4086 kernel: rbd0: unknown partition table
Oct 16 17:59:41 gs-server-4086 kernel: EXT4-fs (rbd0): mounted filesystem with ordered data mode. Opts: (null)
Oct 16 17:59:41 gs-server-4086 kubelet: I1016 17:59:41.264486   53479 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/rbd/40254c2e-b257-11e7-9ddb-00155d283404-mongo-node-1" (spec.Name: "mongo-node-1") pod "40254c2e-b257-11e7-9ddb-00155d283404" (UID: "40254c2e-b257-11e7-9ddb-00155d283404").
Oct 16 17:59:41 gs-server-4086 kernel: EXT4-fs (rbd0): re-mounted. Opts: (null)
Oct 16 17:59:41 gs-server-4086 kubelet: I1016 17:59:41.348630   53479 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/rbd/40254c2e-b257-11e7-9ddb-00155d283404-mongo-node-1" (spec.Name: "mongo-node-1") pod "40254c2e-b257-11e7-9ddb-00155d283404" (UID: "40254c2e-b257-11e7-9ddb-00155d283404").
Oct 16 18:01:26 gs-server-4086 kubelet: I1016 18:01:26.650647   53479 reconciler.go:201] UnmountVolume operation started for volume "kubernetes.io/rbd/40254c2e-b257-11e7-9ddb-00155d283404-mongo-node-1" (spec.Name: "mongo-node-1") from pod "40254c2e-b257-11e7-9ddb-00155d283404" (UID: "40254c2e-b257-11e7-9ddb-00155d283404").
---------------------------------------------
Oct 16 18:01:27 gs-server-4086 kubelet: I1016 18:01:27.590104   53479 rbd_util.go:141] lock list output "There is 1 exclusive lock on this image.\nLocker         ID                                Address                \nclient.1227411 kubelet_lock_magic_gs-server-4086 10.203.40.89:0/1016309 \n"
Oct 16 18:01:28 gs-server-4086 kubelet: I1016 18:01:28.978137   53479 rbd_util.go:339] rbd: successfully unmap device /dev/rbd0 
---------------------------------------------
Oct 16 18:01:28 gs-server-4086 kubelet: I1016 18:01:28.978223   53479 operation_generator.go:672] UnmountVolume.TearDown succeeded for volume "kubernetes.io/rbd/40254c2e-b257-11e7-9ddb-00155d283404-mongo-node-1" (OuterVolumeSpecName: "mongo-node-1") pod "40254c2e-b257-11e7-9ddb-00155d283404" (UID: "40254c2e-b257-11e7-9ddb-00155d283404"). InnerVolumeSpecName "mongo-node-1". PluginName "kubernetes.io/rbd", VolumeGidValue ""
Oct 16 18:01:29 gs-server-4086 kubelet: I1016 18:01:29.060518   53479 reconciler.go:363] Detached volume "kubernetes.io/rbd/40254c2e-b257-11e7-9ddb-00155d283404-mongo-node-1" (spec.Name: "mongo-node-1") devicePath: ""

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.6.7
  • Cloud provider or hardware configuration**:
  • OS (e.g. from /etc/os-release): centos7
  • Kernel (e.g. uname -a): 3.10.0-327.el7.x86_64
  • Install tools:
  • Others:
@likakuli

This comment has been minimized.

Copy link
Contributor Author

commented Oct 19, 2017

@likakuli

This comment has been minimized.

Copy link
Contributor Author

commented Oct 19, 2017

/sig storage

@rootfs

This comment has been minimized.

Copy link
Member

commented Oct 20, 2017

rbd: failed to unmap device /dev/rbd0:Error: exit status 16

Error 16 is EBUSY.

similar to #54263

@rootfs

This comment has been minimized.

Copy link
Member

commented Oct 20, 2017

@likakuli were you running kubelet with cadvisor on?

@rootfs

This comment has been minimized.

Copy link
Member

commented Oct 20, 2017

/assign

@likakuli

This comment has been minimized.

Copy link
Contributor Author

commented Oct 25, 2017

@rootfs yes, every kubelet run with cadvisor,and i know what code 16 represent for,but i still don't know the reason, is this a bug? How can i avoid this kind of situation from happening,thank you

@rootfs

This comment has been minimized.

Copy link
Member

commented Oct 25, 2017

It is probably related to google/cadvisor#771

@weiwei04

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2017

We see some other case may cause rbd: failed to unmap device /dev/rbd0:Error: exit status 16,

  1. every kubelet run with node_exporter and node_exporter is deployed by k8s daemonset with -v /:/rootfs:ro (mount kubelet host rootfs into node_exporter container).

  2. the rbd device is mounted before node_exporter started, because k8s 1.6.7 not support mount propagation, when the rbd plugin unmount /dev/rbd0 on the host namespace, the mount info will still leave over in node_exporter container namespace, may cause status 16.

hope this would help

@rootfs

This comment has been minimized.

Copy link
Member

commented Nov 13, 2017

@weiwei04 thanks, does the problem persist if node_exporter stops collecting filesystem metrics?

@fejta-bot

This comment has been minimized.

Copy link

commented Mar 12, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

commented Apr 11, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

commented May 11, 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@rawkode

This comment has been minimized.

Copy link

commented Jul 9, 2018

I'm experiencing this issue. From what I can read above, if the rbd mount happens before the node_exporter starts; we'll always experience status 16?

@navidnabavi

This comment has been minimized.

Copy link

commented Sep 29, 2018

We're getting this issue. This is because the mounted volume is mounted on Cadvisor container too. So the /dev/rbdX file is still in use. But still didn't find any solution to fix.

@miaoshixuan

This comment has been minimized.

Copy link

commented Jan 24, 2019

rbd volume will stuck when DetachDisk failed as Unmount will always returen not mounted

func (detacher *rbdDetacher) UnmountDevice(deviceMountPath string) error {
if pathExists, pathErr := volutil.PathExists(deviceMountPath); pathErr != nil {
return fmt.Errorf("Error checking if path exists: %v", pathErr)
} else if !pathExists {
klog.Warningf("Warning: Unmount skipped because path does not exist: %v", deviceMountPath)
return nil
}
devicePath, _, err := mount.GetDeviceNameFromMount(detacher.mounter, deviceMountPath)
if err != nil {
return err
}
// Unmount the device from the device mount point.
klog.V(4).Infof("rbd: unmouting device mountpoint %s", deviceMountPath)
if err = detacher.mounter.Unmount(deviceMountPath); err != nil {
return err
}
klog.V(3).Infof("rbd: successfully umount device mountpath %s", deviceMountPath)
klog.V(4).Infof("rbd: detaching device %s", devicePath)
err = detacher.manager.DetachDisk(detacher.plugin, deviceMountPath, devicePath)
if err != nil {
return err
}
klog.V(3).Infof("rbd: successfully detach device %s", devicePath)
err = os.Remove(deviceMountPath)
if err != nil {
return err
}
klog.V(3).Infof("rbd: successfully remove device mount point %s", deviceMountPath)
return nil
}

@jeremyxu2010

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2019

We're getting this issue. This is because the mounted volume is mounted on Cadvisor container too. So the /dev/rbdX file is still in use. But still didn't find any solution to fix.

My solution:

container use a HostToContainer volumeMount.

        volumeMounts:
        ...
        - mountPath: /host
          mountPropagation: HostToContainer
          name: hostrootfs
          readOnly: true

make a script file to be executed every a minute:

/usr/local/bin/unmap_not_used_rbd.sh

#!/bin/bash
for v in $(find /dev -name 'rbd*'); do 
   /usr/bin/rbd unmap $v; 
done
chmod +x /usr/local/bin/unmap_not_used_rbd.sh
echo '* * * * * /usr/local/bin/unmap_not_used_rbd.sh >/dev/null 2>&1' >> /var/spool/cron/root
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.