New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix azure disk not available issue when device name changed #57549

Merged
merged 2 commits into from Jan 4, 2018

Conversation

Projects
None yet
7 participants
@andyzhangx
Member

andyzhangx commented Dec 22, 2017

What this PR does / why we need it:
There is possibility that device name(/dev/sd*) would change when attach/detach data disk in Azure VM according to Troubleshoot Linux VM device name change.
And We did hit this issue, see customer case.
This PR would use /dev/disk/by-id instead of /dev/sd* for azure disk and /dev/disk/by-id would not change even device name changed.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #57444

Special notes for your reviewer:
In a customer case, customer is unable to use azure disk in StatefulSet since /dev/sd* changed after detach/attach disk.
we are using /dev/sd*(code is here) to "mount -bind" k8s path, while /dev/sd* could be changed when VM is attach/detaching data disks, see Troubleshoot Linux VM device name change
And I have also checked related AWS, GCE code, they are using /dev/disk/by-id/ other than /dev/sd*, see aws code
gce code

Release note:

fix azure disk not available when device name changed

/sig azure
/assign @rootfs
@karataliu @brendandburns @khenidak

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Dec 22, 2017

/test pull-kubernetes-unit

1 similar comment
@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Dec 23, 2017

/test pull-kubernetes-unit

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 2, 2018

/assign @brendanburns

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Jan 2, 2018

@andyzhangx: GitHub didn't allow me to assign the following users: brendanburns.

Note that only kubernetes members can be assigned.

In response to this:

/assign @brendanburns

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 2, 2018

@rootfs @brendandburns @khenidak
I have verified this fix again, it works well.
With this PR, it uses /dev/disk/by-id which use serial number of azure disk, so if we delete a pod of a statefulset, the pod with same azure disk(same serial number) will be added back by statefulset, so no device name(`/dev/sd*) change.

While without this PR, I have tried same scenario, since we are using /dev/sd* to reference a disk, device name will be changed(a new device name will be created since original device is occupied in OS) if we delete a pod with azure disk mount in a statefulset. I think that's reason why AWS, GCE use
/dev/disk/by-id instead of /dev/sd*, customer case is here

azureuser@k8s-agentpool1-10841272-0:~$ ll /dev/disk/by-id
lrwxrwxrwx 1 root root   9 Jan  2 08:29 scsi-14d53465420202020580428cd5e2218f31feb4deda1af027b -> ../../sde
lrwxrwxrwx 1 root root   9 Jan  2 08:29 scsi-14d53465420202020702e8b5b2e0e32af229dd8eea5486d79 -> ../../sdg
lrwxrwxrwx 1 root root   9 Jan  2 08:31 scsi-14d53465420202020d7a0ae87b4b231f2351bb763568b712f -> ../../sdh
lrwxrwxrwx 1 root root   9 Jan  2 02:45 scsi-14d53465420202020e4afa64c69f28d4e9c12bc1fe9d188bd -> ../../sda
lrwxrwxrwx 1 root root  10 Jan  2 02:45 scsi-14d53465420202020e4afa64c69f28d4e9c12bc1fe9d188bd-part1 -> ../../sda1
lrwxrwxrwx 1 root root   9 Jan  2 08:24 scsi-14d53465420202020f9410353d32e20362b96ff4ed83c2b7b -> ../../sdd
lrwxrwxrwx 1 root root   9 Jan  2 08:13 scsi-14d53465420202020ffa9a4b1ebb67ab78cbbb5fcc357628b -> ../../sdf
lrwxrwxrwx 1 root root   9 Jan  2 02:45 scsi-360022480ad3d7f2adb609b28ef8381fd -> ../../sdb
lrwxrwxrwx 1 root root  10 Jan  2 02:45 scsi-360022480ad3d7f2adb609b28ef8381fd-part1 -> ../../sdb1
lrwxrwxrwx 1 root root   9 Jan  2 02:45 wwn-0x60022480ad3d7f2adb609b28ef8381fd -> ../../sdb
lrwxrwxrwx 1 root root  10 Jan  2 02:45 wwn-0x60022480ad3d7f2adb609b28ef8381fd-part1 -> ../../sdb1
@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 2, 2018

also @karataliu since he also hit this issue before, there is a little possibility to hit this issue for one k8s cluster, while if there are large scale usage of azure disk, this issue will certainly happen.

@rootfs

This comment has been minimized.

Member

rootfs commented Jan 2, 2018

@andyzhangx the azure troubleshooting guide doesn't suggest use /dev/disk/by-id, it uses uuid instead :)

@khenidak

This comment has been minimized.

Contributor

khenidak commented Jan 2, 2018

Based on discussion here #57444 (comment) and this doc, it seems that udev rules are the most reliable. Any specific reason we are still falling back to crawling /dev/disk/by-id?

/dev/disk/by-id looks effective in case of boot/ephemeral disks but not in case of data disks.

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 3, 2018

@rootfs Good catch, at first I want to use /dev/disk/by-uuid, while in my debugging, I found the newly attached disk would not be in /dev/disk/by-uuid immediately (need to sleep for a while), then I tried /dev/disk/by-id, it works well. /dev/disk/by-id use disk serial number, which would be quite similar to uuid.
@khenidak I pushed a new commit, would use /dev/disk/azure/scsi1/lun* as first choice, and in some case(e.g. coreos case), /dev/disk/azure/ is not populated, would then use /dev/disk/by-id/ as second choice, if these two paths are both not found, then we would use /dev/sd* as last choice.

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 3, 2018

BTW, this PR is only a part of the fix, consider following scenario:

  1. an agent node has attached a few data disks
  2. after agent node restart, device names(/dev/sd*) of data disks changed, original k8s directory links (see below)would be invalid.
/dev/sdl       ext4          53G   55M   50G   1% /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/b4080534610
/dev/sdl       ext4          53G   55M   50G   1% /var/lib/kubelet/pods/785439eb-ef94-11e7-b5f0-0017fa006d97/volumes/kubernetes.io~azure-disk/pvc-7852f8bc-ef94-11e7-b5f0-0017fa006d97
/dev/sdk       ext4          53G   55M   50G   1% /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/b3607579573

I am working on another PR to fix this issue: check whether directory links are valid when kubelet start up.
I found Azure china could repro this issue very easily, it's a good debugging env for me.

glog.V(12).Infof("azure disk - validating disk %q with sys disk %q", dev[0].Name(), diskName)
if string(dev[0].Name()) == diskName {
glog.V(12).Infof("azureDisk - validating disk %q with sys disk %q", devName, diskName)
if string(devName) == diskName {

This comment has been minimized.

@karataliu

karataliu Jan 3, 2018

Contributor

devName already string?

This comment has been minimized.

@andyzhangx

andyzhangx Jan 3, 2018

Member

removed

@rootfs

This comment has been minimized.

Member

rootfs commented Jan 3, 2018

/test pull-kubernetes-cross

@rootfs

This comment has been minimized.

Member

rootfs commented Jan 3, 2018

@andyzhangx any upcoming fixes go into this one?

@khenidak

This comment has been minimized.

Contributor

khenidak commented Jan 3, 2018

@andyzhangx Thanks

I am thinking maybe we don't special handling for CoreOS if we can get Azure udev rules in CoreOs Azure images. @colemickens is possible? I am on slack the entire day today if you want to talk more.

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 4, 2018

/test pull-kubernetes-cross

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 4, 2018

@khenidak And as I know, there is other PAAS, e.g. cloudfoundry, they also do special handling for no /dev/disk/azure populated condition, so I would like to still keep it in the code , and this PR will always check for /dev/disk/azure/scsi1 as first choice.
@rootfs this is all for this PR, and I would like to send another seperate PR to fix it all.

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Jan 4, 2018

@andyzhangx: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-cross a4786fc link /test pull-kubernetes-cross

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 4, 2018

pull-kubernetes-cross is broken due to config issue, this PR is only related to linux

@rootfs

This comment has been minimized.

Member

rootfs commented Jan 4, 2018

@andyzhangx want to confirm with you, are virtual disk serial number unique and immutable?

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 4, 2018

@rootfs yes, it's unique and immutable

@rootfs

This comment has been minimized.

Member

rootfs commented Jan 4, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Jan 4, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Jan 4, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, rootfs

Associated issue: #57444

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 4, 2018

thanks for the review, I will cherry-pick this PR to all supported versions, from v1.7-1.9, even 1.6 if possible along with another associated PR.

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Jan 4, 2018

Automatic merge from submit-queue (batch tested with PRs 56382, 57549). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit d8680a3 into kubernetes:master Jan 4, 2018

12 of 13 checks passed

pull-kubernetes-cross Job failed.
Details
Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation andyzhangx authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke-gci Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

k8s-merge-robot added a commit that referenced this pull request Jan 6, 2018

Merge pull request #57887 from andyzhangx/automated-cherry-pick-of-#5…
…7549-upstream-release-1.8

Automatic merge from submit-queue.

Automated cherry pick of #57549

Cherry pick of #57549 on release-1.8.

#57549: use /dev/disk/by-id instead of /dev/sd* for azure disk

k8s-merge-robot added a commit that referenced this pull request Jan 9, 2018

Merge pull request #57886 from andyzhangx/automated-cherry-pick-of-#5…
…7549-upstream-release-1.9

Automatic merge from submit-queue.

Automated cherry pick of #57549

Cherry pick of #57549 on release-1.9.

#57549: use /dev/disk/by-id instead of /dev/sd* for azure disk

roberthbailey pushed a commit to roberthbailey/kubernetes that referenced this pull request Jan 10, 2018

Merge pull request kubernetes#57953 from andyzhangx/azuredisk-remount…
…-fix

Automatic merge from submit-queue (batch tested with PRs 57733, 57613, 57953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix device name change issue for azure disk: add remount logic

**What this PR does / why we need it**:
fix device name change issue for azure disk: add remount logic

Accoding to [Troubleshoot Linux VM device name change](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/troubleshoot-device-names-problems), there is possibility of device name change, so when kubelet is restarted, we need to check whether the following two paths are still valid:
1. `/var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m358246426`: in MountDevice func
2. `/var/lib/kubelet/pods/950f2eb8-d4e7-11e7-bc95-000d3a041274/volumes/kubernetes.io~azure-disk/pvc-67e4e319-d4e7-11e7-bc95-000d3a041274`: in SetUpAt func

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#57952

**Special notes for your reviewer**:
 this is a corresponding fix of kubernetes#57549, kubernetes#57549 uses '/dev/disk/by-id', and this PR would check whether the mountPath is valid when kubelet restart(e.g.  after VM reboot since device name may change), if not valid, remount,  remember '/dev/disk/by-id' will be always valid.

**Release note**:

```
fix device name change issue for azure disk: add remount logic
```

k8s-merge-robot added a commit that referenced this pull request Jan 11, 2018

Merge pull request #57892 from andyzhangx/automated-cherry-pick-of-#5…
…7549-upstream-release-1.7

Automatic merge from submit-queue.

Automated cherry pick of #57549: use /dev/disk/by-id instead of /dev/sd* for azure disk

Cherry pick of #57549 on release-1.7.

#57549: use /dev/disk/by-id instead of /dev/sd* for azure disk
@peskybp

This comment has been minimized.

peskybp commented Jan 19, 2018

Looking through the commit history, it appears that the cherry-picks landed back into 1.8.7. Currently, AKS offers upgrades only to 1.8.2. Is there any expected timeline you can give that would see this fix into an AKS offered version of Kubernetes?

Aside from "rolling our own" is there any other option we have on the AKS side that would allow us to get the changes to the provisioning active into our clusters?

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Jan 19, 2018

@peskybp AKS use standard k8s original binary, which means the only option is wait for 1.8.7 support on AKS. Did you hit this issue recently?

@peskybp

This comment has been minimized.

peskybp commented Jan 19, 2018

I did hit the issue, using an ACS based Kubernetes cluster running 1.7.7. We are already in the process of moving over to AKS based clusters, so was simply hoping that there would be some escalated effort to bring 1.8.7 into AKS given that the azure-disk provisioner is the default StorageClass.

Not sure if you are looking for any more debug info at this time, but I still have the affected pod active and should be able to get you some more information, so just let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment