Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix azure disk attachment error on Linux #70002

Merged
merged 1 commit into from
Oct 24, 2018

Conversation

andyzhangx
Copy link
Member

@andyzhangx andyzhangx commented Oct 19, 2018

What type of PR is this?

Uncomment only one, leave it on its own line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
There PR is going to fix drawback after PR: #62612
following getDiskController code will fetch vmCache may lead to below error and the vmCache refresh time period would be 1 min, and MountVolume.WaitForAttach may cost 1 min which is too long time

diskController, err := getDiskController(a.plugin.host)

MountVolume.WaitForAttach failed for volume "pvc-12b458f4-c23f-11e8-8d27-46799c22b7c6" : Cannot find Lun for disk kubernetes-dynamic-pvc-12b458f4-c23f-11e8-8d27-46799c22b7c6

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #69262

Special notes for your reviewer:
This PR add a new func getDiskLUN to get disk LUN num from a deviceInfo, deviceInfo could be a LUN number or a device path, e.g. /dev/disk/azure/scsi1/lun2.
While on Windows, since the deviceInfo could be a LUN num or a disk num(on windows), cannot tell whether it's a LUN num since both conditions are a number, we just keep the original code logic.

Does this PR introduce a user-facing change?:

NONE

Release note:

fix azure disk attachment error on Linux

/sig azure
/assign @feiskyer @khenidak
cc @brendandburns

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. sig/azure cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 19, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/storage Categorizes an issue or PR as relevant to SIG Storage. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 19, 2018
}
glog.V(5).Infof("azureDisk - WaitForAttach: GetDiskLun succeeded, got lun(%v)", lun)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a V(2) log here with device and lun number? It could help for troubleshoooting issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already addressed in #70012

func getDiskLUN(deviceInfo string) (int32, error) {
var diskLUN string
if len(deviceInfo) <= 2 {
diskLUN = deviceInfo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check whether it is a number or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility of supporting lun number >= 100 in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's checking under below:

lun, err := strconv.Atoi(diskLUN)

Azure platform only support up to 64 disk number

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 19, 2018
use new getDiskLUN func in linux
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2018
@andyzhangx
Copy link
Member Author

ping @feiskyer

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 23, 2018
Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@andyzhangx
Copy link
Member Author

/test pull-kubernetes-verify

@antoineco
Copy link
Contributor

In spite of some warnings during Pod relocation, I can confirm all volumes eventually get re-attached successfully after 2-3 minutes (Kubernetes 1.10, custom build including this patch).

Related Pod events:

  Warning  FailedAttachVolume      4m               attachdetach-controller                        Multi-Attach error for volume "pvc-1234-5678" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedAttachVolume      4m (x2 over 4m)  attachdetach-controller                        AttachVolume.Attach failed for volume "pvc-1234-5678" : Attach volume "k8s-mstr-myclustersub2-d1-kub--pvc-1234-5678" to instance "/subscriptions/abcd-efgh/resourceGroups/myclustersub2-d1/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-mycluster-33570458-vmss/virtualMachines/7" failed with compute.VirtualMachineScaleSetVMsClient#Update: Failure sending request: StatusCode=409 -- Original Error: failed request: autorest/azure: Service returned an error. Status=<nil> Code="ConflictingUserInput" Message="Disk '/subscriptions/abcd-efgh/resourceGroups/myclustersub2-d1/providers/Microsoft.Compute/disks/k8s-mstr-myclustersub2-d1-kub--pvc-1234-5678' cannot be attached as the disk is already owned by VM '/subscriptions/abcd-efgh/resourceGroups/myclustersub2-d1/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-mycluster-33570458-vmss/virtualMachines/k8s-mycluster-33570458-vmss_9'."
  Warning  FailedMount             3m (x7 over 4m)  kubelet, k8s-mycluster-33570458-vmss000007  MountVolume.WaitForAttach failed for volume "pvc-1234-5678" : azureDisk - WaitForAttach failed within timeout node (k8s-mycluster-33570458-vmss000007) diskId:(k8s-mstr-myclustersub2-d1-kub--pvc-1234-5678) lun:(1)
  Normal   SuccessfulAttachVolume  13m (x3 over 15m)  attachdetach-controller                        AttachVolume.Attach succeeded for volume "pvc-1d577a03-d28c-11e8-9585-000d3ad0b3ea"

k8s-ci-robot added a commit that referenced this pull request Oct 28, 2018
…0002-upstream-release-1.11

Automated cherry pick of #70002: improve azure disk attachment perf on Linux
k8s-ci-robot added a commit that referenced this pull request Nov 1, 2018
…0002-upstream-release-1.12

Automated cherry pick of #70002: improve azure disk attachment perf on Linux
k8s-ci-robot added a commit that referenced this pull request Nov 5, 2018
…0002-upstream-release-1.10

Automated cherry pick of #70002: improve azure disk attachment perf on Linux
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GetAzureDiskLun sometimes costs 10 min
5 participants