Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI block volume refactor to fix target path #68635

Merged
merged 4 commits into from
Nov 16, 2018

Conversation

mkimuram
Copy link
Contributor

@mkimuram mkimuram commented Sep 13, 2018

What this PR does / why we need it:
Fix for adding block volume support to CSI RBD driver

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #68424

Special notes for your reviewer:
/sig storage

This PR fixes following 3 issues:
[Issues]

  1. symlinkPath used in MapDevice doesn't match to the one makeBlockVolumes expects
  2. No need to create file in neither targetBlockFilePath nor globalMapPathBlockFile It is not an issue. CSI spec requires CO to create an empty file to bind mount the device on, so the file should be created.
  3. EvalHostSymlinksdevice fails for csi drivers that don't implement NodeStageVolume

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 13, 2018
@mkimuram
Copy link
Contributor Author

This PR is tested with below csi rbd driver and csi provisioner. Please also take a look.

[csi rbd driver]
mkimuram/ceph-csi@e84bc99

[csi provisioner]
kubernetes-csi/external-provisioner@da09b7c

@mkimuram
Copy link
Contributor Author

@wongma7 @vladimirvivien

@neolit123
Copy link
Member

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 14, 2018
Copy link
Member

@vladimirvivien vladimirvivien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is gettin closer. Let me think about the issue of delegating the creation of the block file to the driver. While this works, it means that driver pods will be forced to have privileged access to the node.

specName := m.specName
glog.V(4).Infof(log("blockMapper.GetPodDeviceMapPath [path=%s; name=%s]", path, specName))
return path, specName
fileName := "file"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the value "file" seems extremely specific and is may not be useful for all drivers . Returning the volumeID may be needed during operations such as tearDown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimirvivien

Thank you for your comment.
volumeID seems to be included in path as pods/{podUid}/volumeDevices/kubernetes.io~csi/{volumeID}/dev, so I think that just making the filename as "file" should be fine because volumeID could be resolved from path name. However, there is no disadvantage by making it volumeID, so I'm willing to change as so, as it might be safer.

@@ -139,18 +139,6 @@ func (m *csiBlockMapper) SetUpDevice() (string, error) {
}
glog.V(4).Info(log("blockMapper.SetupDevice created global device map path successfully [%s]", globalMapPath))

// create block device file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delegating the creation of block device to the Driver may be a good approach. However, before this is removed, I think it should be cleared/documented that this step will be delegated to the driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add comments to SetUpDevice and MapDevice before calling NodeStageVolume and NodePublishVolume.

@@ -201,8 +189,8 @@ func (m *csiBlockMapper) MapDevice(devicePath, globalMapPath, volumeMapPath, vol
defer cancel()

globalMapPathBlockFile := devicePath
dir, _ := m.GetPodDeviceMapPath()
targetBlockFilePath := filepath.Join(dir, "file")
dir, file := m.GetPodDeviceMapPath()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again returning "file" will not work properly for drivers that needs volumeID during Teardown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename file to volumeID when GetPodDeviceMapPath is changed to return volumeID.

Copy link
Member

@vladimirvivien vladimirvivien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This portion needs careful attention not to effect other drivers.

@@ -887,24 +888,26 @@ func (og *operationGenerator) GenerateMapVolumeFunc(
return volumeToMount.GenerateError("MapVolume failed", fmt.Errorf("Device path of the volume is empty"))
}

// When kubelet is containerized, devicePath may be a symlink at a place unavailable to
// kubelet, so evaluate it on the host and expect that it links to a device in /dev,
// Map device to global and pod device map path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here will affect how all block volume devices work including in-tree drivers. Any changes here should be tested to make sure other drivers are working as before.

Copy link
Contributor Author

@mkimuram mkimuram Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for in-tree drivers, I added e2e tests for block volume. So, it would be tested there. (Block volume feature was missed to be beta in v1.12, so it won't be automatically tested by CI, though.)
In addition, it won't be tested with combinations of containerized kubelet, so we might need to find a way to test with containerized kubelet test environment. Let me think how to test it.

As for csi drivers, I'm working on to apply the same testsuites that in-tree drivers do to CSI drivers, here and it will make all csi drivers to be tested for block volume. (Actually, no block volume supported CSI drivers there, though.)

@mkimuram
Copy link
Contributor Author

mkimuram commented Sep 20, 2018

@vladimirvivien

Returning the volumeID may be needed during operations such as tearDown.

I looked through the Teardown logic for block volume again as you advised me. As a result, I came to the conclusion that it would be better to avoid delegating the creation of symlink to the path provided by GetPodDeviceMapPath and GetGlobalMapPath to each CSI driver.
Instead, CSI driver should create symlink or bind mount their actual device to another path, like stagingPath and publishPath in SetUpDevice through NodeStageVolume and NodePublishVolume, then csiBlockMapper should create symlink to the path provided by GetPodDeviceMapPath and GetGlobalMapPath in MapDevice, as existing in-tree drivers do.
By doing so, we don't need to touch much on existing block volume code.

I implemented above way in another commit in a separate branch. PTAL

Summary of the difference of the concept is as below:
(I hope that it will help you review the code.)

[Previous Implementation]

  • Map:
  mapVolumeFunc
    SetUpDevice
      NodeStageVolume   (Attach device to node and create symlink to the path provided by GetGlobalMapPath)
    MapDevice
      NodePublishVolume (Create symlink to the path provided by GetPodDeviceMapPath)
  • Unmap:
  unmapVolumeFunc
    og.blkUtil.UnmapDevice (Delete symlink from the path provided by GetPodDeviceMapPath)
    og.blkUtil.UnmapDevice (Delete symlink from the path provided by GetGlobalMapPath)
  unmapDeviceFunc
    TearDownDevice
      NodeUnstageVolume
      NodeUnpublishVolume  (Detach device from node)

[New Implementation above]

  • Map:
  mapVolumeFunc
    SetUpDevice
      NodeStageVolume       (Attach device to node and Create symlink to the stagingPath)
      NodePublishVolume     (Create symlink to the path publishPath)
    MapDevice
      ioutil.MapBlockVolume (Create symlink to the path provided by GetGlobalMapPath and GetPodDeviceMapPath)
  • Unmap:
  unmapVolumeFunc
    og.blkUtil.UnmapDevice (Delete symlink to the path provided by GetPodDeviceMapPath)
    og.blkUtil.UnmapDevice (Delete symlink to the path provided by GetGlobalMapPath)
  unmapDeviceFunc
    TearDownDevice
      NodeUnpublishVolume  (Delete symlink from stagingPath)
      NodeUnstageVolume    (Delete symlink from publishPath and detach device from node)

@vladimirvivien
Copy link
Member

@mkimuram I am reviewing your proposed changes today. I will post feedback here and other PR where appropriate.

@vladimirvivien
Copy link
Member

@mkimuram
Ok overall the second approach looks clean and workable. The code organization looks great. I do have couple of things I want to review

Other than that, I think this maybe workable.

Good progress.

@mkimuram
Copy link
Contributor Author

@vladimirvivien

Thank your for your comment.

Ok overall the second approach looks clean and workable. The code organization looks great. I do have couple of things I want to review

I will fix based on your comment and make PR.

The following changes from volumeToDetach.VolumeName -> volumeToDetach.VolumeSpec.Name() If these are not equivalent and equal, it should not be changed. mkimuram/kubernetes@1e59179#diff-450e811a4953f760ff1594ede8b2037eR1063

Actually, as for the changes in operation_generator.go, I found another two issues below while testing with my rbd prototype codes and they are the fixes for the issues.

  1. The first argument for NewBlockVolumeUnmapper is inconsistent between GenerateUnmapVolumeFunc and GenerateUnmapDeviceFunc, as a result, loadVolumeData in NewBlockVolumeUnmapper fails.

--> As specName seems to be expected there, I fixed as so.

  1. TearDownDevice fails due to device in use and it is caused by descriptor lock

--> As descriptor lock seems required to be released before TearDownDevice, I fixed as so.

Anyway, we need to check and test them carefully, whether they won't affect any existing drivers.
(Without them, the changes were only in csi_block.go, we would be easier to decide applying changes.)

Retrieve the Attachment object in SetupDevice and pass attachment to methods stageVolumeForBloc and publishVolumeForBlock. That way the attachment lookup does not occur twice (once in each method call). https://github.com/mkimuram/kubernetes/blob/1e591792edd41726c26b5355a1888becaa746136/pkg/volume/csi/csi_block.go#L272

I will fix as so. Thank you for pointing it out.

@vladimirvivien
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 23, 2018
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 24, 2018
@mkimuram
Copy link
Contributor Author

@vladimirvivien

I've updated the commit to the second version and add unit test for it. PTAL

Please also confirm the log below.

  1. Console log
# kubectl create -f sample/pod-block.yaml
# kubectl describe pod csirbd-block-demo-pod

<snip>

Events:                                                                                                         
  Type    Reason                  Age   From                     Message                                        
  ----    ------                  ----  ----                     -------                                        
  Normal  Scheduled               25s   default-scheduler        Successfully assigned default/csirbd-block-demo
-pod to 127.0.0.1                                                                                               
  Normal  SuccessfulAttachVolume  25s   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-8
db97c58-c00a-11e8-b6ef-525400b854f0"                                                                            
  Normal  SuccessfulMountVolume   9s    kubelet, 127.0.0.1       MapVolume.MapDevice succeeded for volume "pvc-8
db97c58-c00a-11e8-b6ef-525400b854f0" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc
-8db97c58-c00a-11e8-b6ef-525400b854f0/dev"                                                                      
  Normal  SuccessfulMountVolume   9s    kubelet, 127.0.0.1       MapVolume.MapDevice succeeded for volume "pvc-8
db97c58-c00a-11e8-b6ef-525400b854f0" volumeMapPath "/var/lib/kubelet/pods/91a17d03-c00a-11e8-b6ef-525400b854f0/v
olumeDevices/kubernetes.io~csi"                                                                                 
  Normal  Pulling                 7s    kubelet, 127.0.0.1       pulling image "nginx"
  Normal  Pulled                  4s    kubelet, 127.0.0.1       Successfully pulled image "nginx"
  Normal  Created                 4s    kubelet, 127.0.0.1       Created container
  Normal  Started                 4s    kubelet, 127.0.0.1       Started container
# kubectl delete -f sample/pod-block.yaml
  1. Check if the mapped device can be accessible inside pod
# kubectl exec -it csirbd-block-demo-pod bash 
root@csirbd-block-demo-pod:/# test -b /mnt/block
root@csirbd-block-demo-pod:/# echo $?
0
root@csirbd-block-demo-pod:/# test -d /mnt/block                                                                
root@csirbd-block-demo-pod:/# echo $?                                                                           
1
# exit
exit
  1. Path (Before pod deletion)
    3.1 symlink to the path provided by GetGlobalMapPath
# ls -lR /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b
6ef-525400b854f0/                                                                                               
/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/:
total 8
drwxr-x---. 2 root root 4096 Sep 24 15:00 data
drwxr-x---. 2 root root 4096 Sep 24 15:00 dev

/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/data:
total 4
-rw-r--r--. 1 root root 257 Sep 24 15:00 vol_data.json

/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/dev:
total 0
lrwxrwxrwx. 1 root root 9 Sep 24 15:00 91a17d03-c00a-11e8-b6ef-525400b854f0 -> /dev/rbd0

3.2 symlink to the path provided by getPublishPath

# ls -lR /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/                
/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/:
total 0
lrwxrwxrwx. 1 root root 9 Sep 24 15:00 pvc-8db97c58-c00a-11e8-b6ef-525400b854f0 -> /dev/rbd0

3.3 symlink to the path provided by GetPodDeviceMapPath

# ls -lR /var/lib/kubelet/pods/91a17d03-c00a-11e8-b6ef-525400b854f0/volumeDevices/kubernet
es.io~csi/                                                                                                      
/var/lib/kubelet/pods/91a17d03-c00a-11e8-b6ef-525400b854f0/volumeDevices/kubernetes.io~csi/:
total 0
lrwxrwxrwx. 1 root root 9 Sep 24 15:00 pvc-8db97c58-c00a-11e8-b6ef-525400b854f0 -> /dev/rbd0
  1. Path (After pod deletion)
    4.1 symlink to the path provided by GetGlobalMapPath
# ls -lR /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/
/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/:
total 4
drwxr-x---. 2 root root 4096 Sep 24 15:00 data

/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/data:
total 4
-rw-r--r--. 1 root root 257 Sep 24 15:00 vol_data.json

4.2 symlink to the path provided by getPublishPath

# ls -lR /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/  
/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/:
total 0

4.3 symlink to the path provided by GetPodDeviceMapPath

# ls -lR /var/lib/kubelet/pods/91a17d03-c00a-11e8-b6ef-525400b854f0/
ls: cannot access '/var/lib/kubelet/pods/91a17d03-c00a-11e8-b6ef-525400b854f0/': No such file or directory
  1. Log:
    5.1 Map:
# less /tmp/kubelet.log 

<snip>

I0924 15:00:41.926647  196280 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-8db97c58-c00a-11e8-b6ef-52540
0b854f0" (UniqueName: "kubernetes.io/csi/csi-rbdplugin^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0") pod "csirbd-block-demo-pod" (UID: "91a17d03-c00a-
11e8-b6ef-525400b854f0") 
I0924 15:00:41.931329  196280 operation_generator.go:1206] Controller attach succeeded for volume "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0" (UniqueNam
e: "kubernetes.io/csi/csi-rbdplugin^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0") pod "csirbd-block-demo-pod" (UID: "91a17d03-c00a-11e8-b6ef-525400b85
4f0") device path: "csi-665e3df0874c62c78bffd36eea417afd2731d4af771ad7ad6a1fe364f5c6a119"
I0924 15:00:42.027941  196280 reconciler.go:252] operationExecutor.MountVolume started for volume "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0" (UniqueNam
e: "kubernetes.io/csi/csi-rbdplugin^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0") pod "csirbd-block-demo-pod" (UID: "91a17d03-c00a-11e8-b6ef-525400b85
4f0") 
I0924 15:00:42.028176  196280 operation_generator.go:862] MapVolume.WaitForAttach entering for volume "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0" (Uniqu
eName: "kubernetes.io/csi/csi-rbdplugin^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0") pod "csirbd-block-demo-pod" (UID: "91a17d03-c00a-11e8-b6ef-52540
0b854f0") DevicePath "csi-665e3df0874c62c78bffd36eea417afd2731d4af771ad7ad6a1fe364f5c6a119"
I0924 15:00:42.032745  196280 operation_generator.go:871] MapVolume.WaitForAttach succeeded for volume "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0" (Uniq
ueName: "kubernetes.io/csi/csi-rbdplugin^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0") pod "csirbd-block-demo-pod" (UID: "91a17d03-c00a-11e8-b6ef-5254
00b854f0") DevicePath "csi-665e3df0874c62c78bffd36eea417afd2731d4af771ad7ad6a1fe364f5c6a119"
I0924 15:00:42.037884  196280 csi_block.go:103] kubernetes.io/csi: blockMapper.stageVolumeForBlock STAGE_UNSTAGE_VOLUME capability not set. Skipping Mo
untDevice...
I0924 15:00:42.451335  196280 prober.go:118] Liveness probe for "kube-dns-596fbb8fbd-pxzct_kube-system(4fe395a1-c00a-11e8-b6ef-525400b854f0):sidecar" s
ucceeded
I0924 15:00:42.576578  196280 operation_generator.go:935] MapVolume.MapDevice succeeded for volume "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0" (UniqueNa
me: "kubernetes.io/csi/csi-rbdplugin^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0") pod "csirbd-block-demo-pod" (UID: "91a17d03-c00a-11e8-b6ef-525400b8
54f0") volumeMapPath "/var/lib/kubelet/pods/91a17d03-c00a-11e8-b6ef-525400b854f0/volumeDevices/kubernetes.io~csi"
I0924 15:00:42.576694  196280 server.go:460] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"csirbd-block-demo-pod", UID:"91a17d03-c00a
-11e8-b6ef-525400b854f0", APIVersion:"v1", ResourceVersion:"440", FieldPath:""}): type: 'Normal' reason: 'SuccessfulMountVolume' MapVolume.MapDevice su
cceeded for volume "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0" volumeMapPath "/var/lib/kubelet/pods/91a17d03-c00a-11e8-b6ef-525400b854f0/volumeDevices/k
ubernetes.io~csi"
I0924 15:00:42.576752  196280 server.go:460] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"csirbd-block-demo-pod", UID:"91a17d03-c00a
-11e8-b6ef-525400b854f0", APIVersion:"v1", ResourceVersion:"440", FieldPath:""}): type: 'Normal' reason: 'SuccessfulMountVolume' MapVolume.MapDevice su
cceeded for volume "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a
-11e8-b6ef-525400b854f0/dev"
I0924 15:00:42.759437  196280 volume_manager.go:383] All volumes are attached and mounted for pod "csirbd-block-demo-pod_default(91a17d03-c00a-11e8-b6e
f-525400b854f0)"

5.2 Unmap:

# less /tmp/kubelet.log 

<snip>

I0924 15:05:13.753726  196280 reconciler.go:181] operationExecutor.UnmountVolume started for volume "mypvc" (UniqueName: "kubernetes.io/csi/csi-rbdplug
in^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0") pod "91a17d03-c00a-11e8-b6ef-525400b854f0" (UID: "91a17d03-c00a-11e8-b6ef-525400b854f0") 
I0924 15:05:13.754407  196280 operation_generator.go:1010] UnmapVolume succeeded for volume "kubernetes.io/csi/csi-rbdplugin^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0" (OuterVolumeSpecName: "mypvc") pod "91a17d03-c00a-11e8-b6ef-525400b854f0" (UID: "91a17d03-c00a-11e8-b6ef-525400b854f0"). InnerVolumeSpecName "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0". PluginName "kubernetes.io/csi", VolumeGidValue ""
E0924 15:05:13.754553  196280 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/dev/91a17d03-c00a-11e8-b6ef-525400b854f0 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/dev/91a17d03-c00a-11e8-b6ef-525400b854f0": REMOVE
I0924 15:05:13.768872  196280 operation_generator.go:698] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/91a17d03-c00a-11e8-b6ef-525400b854f0-default-token-ltpvh" (OuterVolumeSpecName: "default-token-ltpvh") pod "91a17d03-c00a-11e8-b6ef-525400b854f0" (UID: "91a17d03-c00a-11e8-b6ef-525400b854f0"). InnerVolumeSpecName "default-token-ltpvh". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0924 15:05:13.854151  196280 reconciler.go:301] Volume detached for volume "default-token-ltpvh" (UniqueName: "kubernetes.io/secret/91a17d03-c00a-11e8-b6ef-525400b854f0-default-token-ltpvh") on node "127.0.0.1" DevicePath ""
I0924 15:05:13.854425  196280 reconciler.go:294] operationExecutor.UnmountDevice started for volume "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0" (UniqueName: "kubernetes.io/csi/csi-rbdplugin^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0") on node "127.0.0.1" 
E0924 15:05:13.893282  196280 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0": REMOVE
E0924 15:05:14.117642  196280 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0 when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0": REMOVE
E0924 15:05:14.117806  196280 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0_deleting when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0_deleting": REMOVE
I0924 15:05:14.118470  196280 operation_generator.go:1435] The path isn't device path or doesn't exist. Skip checking device path: /dev/rbd0
I0924 15:05:14.118582  196280 operation_generator.go:1132] UnmapDevice succeeded for volume "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0" (UniqueName: "kubernetes.io/csi/csi-rbdplugin^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0") on node "127.0.0.1" 
E0924 15:05:14.118793  196280 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/dev when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/dev": REMOVE
E0924 15:05:14.118874  196280 plugin_watcher.go:120] error could not find plugin for deleted file /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/dev when handling delete event: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8db97c58-c00a-11e8-b6ef-525400b854f0/dev": REMOVE
I0924 15:05:14.155345  196280 reconciler.go:301] Volume detached for volume "pvc-8db97c58-c00a-11e8-b6ef-525400b854f0" (UniqueName: "kubernetes.io/csi/csi-rbdplugin^csi-rbd-8dd52e4a-c00a-11e8-b3eb-525400b854f0") on node "127.0.0.1" DevicePath ""

@mkimuram
Copy link
Contributor Author

/retest

@mkimuram
Copy link
Contributor Author

@vladimirvivien

As for the change in operation_generator.go, I think that below two points are required to be confirmed:

  • (a) Whether first arugument for NewBlockVolumeUnmapper in GenerateUnmapDeviceFunc can be changed safely,
  • (b) Whether descriptor lock can be released safely before calling TearDownDevice and RemoveMapPath

As a result of my investigation, it should be safe for both (a) and (b). (Also, I tested e2e block test for iscsi and rbd and it passed.) Let me explain the reason.

There are two types of plugins:

  • (1) plugins that do nothing in TearDownDevice
    • cinder
    • local
    • gce
    • aws
    • azure
  • (2) plugins that do something in TearDownDevice
    • iscsi
    • rbd
    • fc

And TearDownDevice is only called from GenerateUnmapDeviceFunc in operation_generator.go.
Also, TearDownDevice is the only unmapper method that is called in GenerateUnmapDeviceFunc.

Therefore:

  • For (1):
    • The change in how to create unmapper in GenerateUnmapDeviceFunc won't affect anything, because they do nothing by using the unmapper created there. ... (a) is OK for (1)
    • The change of the order of deleting descriptor lock and TearDownDevice call won't affect because they do nothing in TearDownDevice. ... (b) is OK for (1)
  • For (2):
    • All plugins don't use volumeName that is directly passed from unmapper, instead they use mapPath to decide which device to detach. ... (a) is OK for (2)
      iscsi
      rbd
      fc
    • All plugins release descriptor lock inside TearDownDevice, therefore it is safe to release it outside,
      or in GenerateUnmapDeviceFunc. .. (b) is OK for (2)
      iscsi
      rbd: TearDownDevice,DetachBlockDisk
      fc: TearDownDevice, DetachBlockFCDisk

For (b), we might be able to keep the descriptor lock as is in GenerateUnmapDeviceFunc and do the same implementation to (2) in csi_block.go, however, I couldn't find a good reason to do it, expect keeping operation_generator.go code as much as as is. If it would be better to keep it as is, I will change the code for (b) as so.
How do you think about it?

// Call NodePublishVolume
publishPath, err := m.publishVolumeForBlock(csiSource, attachment, stagingPath)
if err != nil {
return "", err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the volume be unstaged here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rootfs

Thank you for your comment.

IIUC, reconciler will keep calling MountVolume to try to make ASW and DSW the same, even when MountVolume fails in SetUpDevice. So, even if we call NodeUnstageVolume here to unstage the volume that failed in publish, another attempts to MountVolume would stage it again. (Then, MountVolume may fail in publish again.) In addition, if NodeUnstageVolume for Nth MountVolume are called before NodePublishVolume and after NodeStageVolume for N+1th MountVolume, it will make NodePublishVolume call for unstaged volume, which violates CSI spec. Therefore, just calling NodeUnstageVolume here won't role back the status properly.

However, this raises me a question whether it is safe to keep the volume staged, after the failure in publish. At a glance, it seems safe from data corruption viewpoint, for we won't directly touch to the staged device and staging path.
The worst case senarios that I noticed are below:

  • If the pvc deletion is done before the publish-failed pod deletion, the device might be deleted while it is attached to a node,
  • If the pvc is used in another pod scheduled on a different node before the publish-failed pod deletion, the device might be attached to multiple nodes.

@vladimirvivien

Is my understanding correct? And are there any issues that you can think of by not staging the volume there?

@rootfs
Copy link
Contributor

rootfs commented Sep 26, 2018

sorry late to the review, I haven't tested it yet but I believe it makes sense to release the loopback device before detaching the underlying device.

I see this PR does two things: a csi block refactoring and the operator generator reordering. What if breaking them up in separate PRs?

Copy link
Contributor

@bswartz bswartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 11/14 CSI meeting we agreed that staging paths should always be directories, even for blocks. This PR clarifies the wording: container-storage-interface/spec#335
To conform to the spec this PR should be updated to create a directory for the staging path.

// create block device file
blockFile, err := os.OpenFile(globalMapPathBlockFile, os.O_CREATE|os.O_RDWR, 0750)
// create an empty file on staging path where block device is bind mounted to
stagingPathFile, err := os.OpenFile(stagingPath, os.O_CREATE|os.O_RDWR, 0750)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the new wording in the CSI spec, this should be a directory always.

@mkimuram
Copy link
Contributor Author

@bswartz

Changed stagingPath to a directory and also changed the comments. PTAL

@bswartz
Copy link
Contributor

bswartz commented Nov 15, 2018

@mkimuram We finally settled on another spec change which is that the CO should not create the 0-byte file during NodePublishVolume() either. It will be the SP's responsibility to create and delete the file at target_path. The CO must create the parent directory of target_path.

@mkimuram
Copy link
Contributor Author

mkimuram commented Nov 15, 2018

@bswartz

We finally settled on another spec change which is that the CO should not create the 0-byte file during NodePublishVolume() either. It will be the SP's responsibility to create and delete the file at target_path. The CO must create the parent directory of target_path.

Updated. PTAL

I've updated the csi rbd driver to create/delete target path, then PV deletion starts to fail again.
However, this is driver side issue, not k8s side issue, so I'm sharing the current version.
I will look into the driver issue, tomorrow.

Above updated version works good now. (edit: on Nov 15, 2018)

@bswartz
Copy link
Contributor

bswartz commented Nov 15, 2018

I updated my test driver to match the spec, and ran it against this patch, and it worked great. That's all I have time for tonight. If there's time tomorrow I'll take a look at the code changes.

@bswartz
Copy link
Contributor

bswartz commented Nov 15, 2018

/lgtm

@k8s-ci-robot
Copy link
Contributor

@bswartz: changing LGTM is restricted to assignees, and only kubernetes/kubernetes repo collaborators may be assigned issues.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mkimuram
Copy link
Contributor Author

@bswartz

Thank you for your review.
I also successfully fixed the rbd driver and now it works good. (I will be able to share it, soon.)

@bswartz @saad-ali @vladimirvivien

One thing that I would like to confirm last is that target_pathes below are secure and safe enough with the current implementation. I think that this point was discussed in CSI meeting and it became up to CO.

  • target_path for staging: plugins/kubernetes.io/csi/volumeDevices/staging/{volumeID}
  • target_path for publish: plugins/kubernetes.io/csi/volumeDevices/publish/{volumeID}

I guess that SP already should has permission to their parent directory, so from security point of view, there will be no difference even if we choose to use deeper path. From safety point of view, should we separate them for different csi drivers (to protect form different drivers at least) or should we create one more directory for each volume like {volumeID}/{volumeID}?

Also, current implementation assumes that driver "Bidirectional" mount that directory, so we still have a chance to choose completely different directory from plugins/kubernetes.io/csi/volumeDevices/, if needed.

@saad-ali
Copy link
Member

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2018
@saad-ali
Copy link
Member

saad-ali commented Nov 15, 2018

/approve cancel

target_path for publish: plugins/kubernetes.io/csi/volumeDevices/publish/{volumeID}

staging_target_path can be volume specific but target_path for publish has to be pod specific

Why not align this with what mount does?

@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2018
}

// getPublishPath returns a publish path for a file (on the node) that should be used on NodePublishVolume/NodeUnpublishVolume
// Example: plugins/kubernetes.io/csi/volumeDevices/publish/{volumeID}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple containers use this, it will collide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saad-ali

Thank you for your comment.

Is my understanding right that you are suggesting that we should use spec.PersistentVolume.Name instead of kstrings.EscapeQualifiedNameForDisk(m.specName), as it is done in makeDeviceMountPath for filesystem volume?

Then, it will be like as below:

  • block: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/{pvname}
    /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/{pvname}
  • filesystem: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/{pvname}/globalmount

m.specName is originally used there, because in-tree block volume code use it for GlobalMapPath and PodDeviceMapPath.

  • GlobalMapPath : /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/{volumeID}/dev
  • PodDeviceMapPath: /var/lib/kubelet/pods/{podUid}/volumeDevices/kubernetes.io~csi/{volumeID}

However, as explained in #68635 (comment), for block volume, we choose not to let csi driver handle GlobalMapPath and PodDeviceMapPath directly, instead csi driver make block volume visible to another path (target_path), then k8s (ioutil.MapBlockVolume) make target_path visible from GlobalMapPath and PodDeviceMapPath. So, there is no restriction for using m.specName for target_path and staging_target_path.

@bswartz @vladimirvivien
Any concern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saad-ali

I checked the log and code again and found out that block codes already use the same ID to filesystem codes. I mean {pvname} should be the same to {volumeID}, like pvc-a618a282-e929-11e8-bb66-525400b854f0, so it should not collide.

Is it ok to just update the comment to make them look same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would spec.PersistentVolume.Name collide if the PVC is used by multiple pods? Ideally the target_path should be unique per pod (have the pod ID in it) right? Otherwise if 2 pods use the same volume on the same node, each pod will have it's volume setup code running in parallel.

I'm basing this on my understanding of mount, so it may well be different for block. If so I'd like to understand why?

@mkimuram
Copy link
Contributor Author

@saad-ali @bswartz @vladimirvivien

Fix the comment. PTAL

@mkimuram
Copy link
Contributor Author

/retest

@AishSundar
Copy link
Contributor

@saad-ali @bswartz @vladimirvivien is this fix good to go?

@saad-ali
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mkimuram, saad-ali, vladimirvivien

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 16, 2018
@saad-ali
Copy link
Member

/lgtm

Based on verification from @bswartz

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2018
@k8s-ci-robot k8s-ci-robot merged commit cde4c9e into kubernetes:master Nov 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues for supporting block volume with CSI RBD driver
10 participants