Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WaitForAttach failed for azure disk: parsing "/dev/disk/azure/scsi1/lun1": invalid syntax #62540

Closed
andyzhangx opened this issue Apr 13, 2018 · 3 comments · Fixed by #62612
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@andyzhangx
Copy link
Member

andyzhangx commented Apr 13, 2018

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

Uncomment only one, leave it on its own line:

/kind bug
/kind feature

What happened:

Events:
  Type     Reason                  Age                 From                               Message
  ----     ------                  ----                ----                               -------
  Warning  FailedScheduling        25m (x26 over 31m)  default-scheduler                  0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 1 node(s) were not ready, 1 node(s) were out of disk space, 1 node(s) were unschedulable.
  Normal   SuccessfulMountVolume   23m                 kubelet, k8s-agentpool-66825246-0  MountVolume.SetUp succeeded for volume "default-token-cxk4v"
  Normal   SuccessfulAttachVolume  23m                 attachdetach-controller            AttachVolume.Attach succeeded for volume "pvc-f1562ecb-3e5f-11e8-ab6b-000d3af9f967"
  Warning  FailedMount             1m (x11 over 22m)   kubelet, k8s-agentpool-66825246-0  MountVolume.WaitForAttach failed for volume "pvc-f1562ecb-3e5f-11e8-ab6b-000d3af9f967" : azureDisk - Wait for attach expect device path as a lun number, instead got: /dev/disk/azure/scsi1/lun1 (strconv.Atoi: parsing "/dev/disk/azure/scsi1/lun1": invalid syntax)
  Warning  FailedMount             1m (x10 over 21m)   kubelet, k8s-agentpool-66825246-0  Unable to mount volumes for pod "deployment-azuredisk1-68849b4cbf-k9blx_default(6e72ed25-3eb7-11e8-ab6b-000d3af9f967)": timeout expired waiting for volumes to attach or mount for pod "default"/"deployment-azuredisk1-68849b4cbf-k9blx". list of unmounted volumes=[azuredisk]. list of unattached volumes=[azuredisk default-token-cxk4v]
azureuser@k8s-master-66825246-0:~$ kubectl get no

  Warning  FailedMount             1m (x11 over 22m)   kubelet, k8s-agentpool-66825246-0  MountVolume.WaitForAttach failed for volume "pvc-f1562ecb-3e5f-11e8-ab6b-000d3af9f967" : azureDisk - Wait for attach expect device path as a lun number, instead got: /dev/disk/azure/scsi1/lun1 (strconv.Atoi: parsing "/dev/disk/azure/scsi1/lun1": invalid syntax)
  Warning  FailedMount             1m (x10 over 21m)   kubelet, k8s-agentpool-66825246-0  Unable to mount volumes for pod  

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
This was introduced by following code change introduced in v1.10.0:

markDeviceMountedErr := actualStateOfWorld.MarkDeviceAsMounted(
volumeToMount.VolumeName, devicePath, deviceMountPath)

candidate fix is here(only works on Linux, not a good solution):
andyzhangx@e287390

This issue is caused by PR: #58177

Environment:

  • Kubernetes version (use kubectl version): v1.10
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

/sig azure
/assign

@andyzhangx
Copy link
Member Author

andyzhangx commented Apr 15, 2018

Update: above fix does not work on Windows, I am still investigating a better solution.
This issue is caused by devicePath updated after WaitForAttach, related code is here:

markDeviceMountedErr := actualStateOfWorld.MarkDeviceAsMounted(
volumeToMount.VolumeName, devicePath, deviceMountPath)

while devicePath should not be changed from LUN to real device path in azure_dd since

  • AttachDisk happens on master, devicePath is a LUN(Logical Unit Number) number after AttachDisk process in the beginning
  • WaitForAttach happens on agent node:
    • on Linux node, devicePath could be /dev/disk/azure/scsi1/lunx;
    • while on Windows node, devicePath would be a disk number, when WaitForAttach is recalled, it could not tell whether it's a LUN or a disk number (this is the key point).

@jingxu97
Copy link
Contributor

@andyzhangx thanks for issuing this problem. Could you submit a PR with your fix so that we can review it? Thanks!

@andyzhangx
Copy link
Member Author

andyzhangx commented Apr 16, 2018

@jingxu97 finally I realized devicePath should not be used in WaitForAttach func any more by referencing the gce_pd code, PTAL #62612, aws_ebs code should have same issue, I filed a aws_ebs issue here: #62613

k8s-github-robot pushed a commit that referenced this issue Apr 17, 2018
Automatic merge from submit-queue (batch tested with PRs 62676, 62612). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix WaitForAttach failure issue for azure disk

**What this PR does / why we need it**:
From v1.10, `devicePath` will be updated due to following code change:
https://github.com/kubernetes/kubernetes/blob/568afb4ecca99bc3b54fddc20927b3713369d357/pkg/volume/util/operationexecutor/operation_generator.go#L517-L518

So in v1.10.0, MountVolume.WaitForAttach will fail in the azure disk remount, error logs would be like following:
```
MountVolume.WaitForAttach failed for volume "pvc-f1562ecb-3e5f-11e8-ab6b-000d3af9f967" : azureDisk - Wait for attach expect device path as a lun number, instead got: /dev/disk/azure/scsi1/lun1 (strconv.Atoi: parsing "/dev/disk/azure/scsi1/lun1": invalid syntax)
  Warning  FailedMount             1m (x10 over 21m)   kubelet, k8s-agentpool-66825246-0  Unable to mount volumes for pod  
```

This PR does not use `devicePath` anymore since it could be changed, instead, it use `diskController.GetDiskLun(diskName, volumeSource.DataDiskURI, nodeName)` to get disk LUN, this ARM api call would cost about 0.12s

The GCE disk won't have this issue since `devicePath` is not used in [WaitForAttach func](https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/gce_pd/attacher.go#L133), while aws disk is also using `devicePath`  in [WaitForAttach func](https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/aws_ebs/attacher.go#L145), I think there is potentical issue for aws_ebs

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #62540

**Special notes for your reviewer**:
should cherry-pick to v1.10

**Release note**:

```
fix WaitForAttach failure issue for azure disk
```
/assign @feiskyer 
/sig azure

FYI @khenidak
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants