New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AzureDisk support for vmss nodes #59716

Merged
merged 11 commits into from Feb 14, 2018

Conversation

@feiskyer
Member

feiskyer commented Feb 11, 2018

What this PR does / why we need it:

This PR adds AzureDisk support for vmss nodes. Changes include

  • Upgrade vmss API to 2017-12-01
  • Upgrade vmss clients with new version API
  • Abstract AzureDisk operations for vmss and vmas
  • Added AzureDisk support for vmss
  • Unit tests and fake clients fix

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #43287

Special notes for your reviewer:

Depending on #59652 (the first two commits are from #59652).

Release note:

Add AzureDisk support for vmss nodes
@khenidak

This comment has been minimized.

Contributor

khenidak commented Feb 11, 2018

/assign @khenidak

@andyzhangx

There are some functions like GetNextDiskLun have too many duplicated code with original vmav code.
Functions like AttachDisk, DetachDisk should depend on #59693 which is removing duplicated code.

func (ss *scaleSet) AttachDisk(isManagedDisk bool, diskName, diskURI string, nodeName types.NodeName, lun int32, cachingMode compute.CachingTypes) error {
ssName, instanceID, vm, err := ss.getVmssVM(string(nodeName))
if err != nil {
if err == ErrorNotVmssInstance {

This comment has been minimized.

@andyzhangx

andyzhangx Feb 11, 2018

Member

this is weird, in the code logic, it uses vmss.AttachDisk, while error says it should not use vmss.AttachDisk, should return to use vmas.AttachDisk ? Is there a real case for such issue?

This comment has been minimized.

@feiskyer

feiskyer Feb 11, 2018

Member

If master nodes are managed by as, and pods are running on master nodes, then there will be such problem.

This comment has been minimized.

@khenidak

khenidak Feb 12, 2018

Contributor

This is fine. Also will allow us to support users who have mixed clusters (avsets + scale sets). I do however strongly believe that we should use controller common as the abstraction layer.

// GetDiskLun finds the lun on the host that the vhd is attached to, given a vhd's diskName and diskURI
func (ss *scaleSet) GetDiskLun(diskName, diskURI string, nodeName types.NodeName) (int32, error) {
_, _, vm, err := ss.getVmssVM(string(nodeName))

This comment has been minimized.

@andyzhangx

andyzhangx Feb 11, 2018

Member

this line is only different from Original GetNextDiskLun, I would encourage to combine these two different GetNextDiskLun

This comment has been minimized.

@feiskyer
func (ss *scaleSet) DetachDiskByName(diskName, diskURI string, nodeName types.NodeName) error {
ssName, instanceID, vm, err := ss.getVmssVM(string(nodeName))
if err != nil {
if err == ErrorNotVmssInstance {

This comment has been minimized.

@andyzhangx

andyzhangx Feb 11, 2018

Member

same as above

// GetNextDiskLun searches all vhd attachment on the host and find unused lun
// return -1 if all luns are used
func (ss *scaleSet) GetNextDiskLun(nodeName types.NodeName) (int32, error) {
_, _, vm, err := ss.getVmssVM(string(nodeName))

This comment has been minimized.

@andyzhangx

andyzhangx Feb 11, 2018

Member

same as above

attached[diskName] = false
}
_, _, vm, err := ss.getVmssVM(string(nodeName))

This comment has been minimized.

@andyzhangx

andyzhangx Feb 11, 2018

Member

same as above

@feiskyer

This comment has been minimized.

Member

feiskyer commented Feb 11, 2018

Functions like AttachDisk, DetachDisk should depend on #59693 which is removing duplicated code.

Please note that VirtualMachineScaleSetVM and VirtualMachine are different data structures (not interfaces). Although disk operations are in same logic with original VirtualMachine, some duplication are still required now.

Update: VirtualMachineScaleSetVM and VirtualMachine are using different api versions now. They should use same api version in the future and we can merge functions togather then.

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Feb 11, 2018

@rootfs this cmmit 0ca2690 removes instanceid truncating code:

	if ind := strings.LastIndex(instanceid, "/"); ind >= 0 {
		instanceid = instanceid[(ind + 1):]
	}

Do you know why you would truncate instance id, e.g. from /subscriptions/4be8920b-2978-43d7-ab14-04d8549c1d00/resourceGroups/andy-k8s192/providers/Microsoft.Compute/virtualMachines/k8s-agentpool-87187153-0 to k8s-agentpool-87187153-0, any consideration at that time?

I have checked the code, it looks ok if no truncating of instance id, still want to confirm with you, thanks.

@khenidak

This comment has been minimized.

Contributor

khenidak commented Feb 12, 2018

re #59716 (comment)

Controller Common is the abstraction and this is where we should do if vmss or avset). Let us not default to VMSS now.

if resp != nil {
// HTTP 4xx or 5xx suggests we should retry
if 399 < resp.StatusCode && resp.StatusCode < 600 {

This comment has been minimized.

@khenidak

khenidak Feb 12, 2018

Contributor

That is not correct status code such as 403 are terminal in all cases. The will occur if the service principal expired, or principal/MSI/EMSI don't have proper permission

This comment has been minimized.

@feiskyer

feiskyer Feb 12, 2018

Member

If principal expired, we should surely retry the API call. Isn't this expected?

This comment has been minimized.

@rootfs

rootfs Feb 13, 2018

Member

if retry will exhaust API quota quicker, then we need to be more frugal.

This comment has been minimized.

@feiskyer

feiskyer Feb 14, 2018

Member

The logic here is same with shouldRetryAPIRequest, just with a different param (http.Response instead of autorest.Response). It doesn't change existing retry logic.

// AttachDisk attaches a vhd to vm
// the vhd must exist, can be identified by diskName, diskURI, and lun.
func (ss *scaleSet) AttachDisk(isManagedDisk bool, diskName, diskURI string, nodeName types.NodeName, lun int32, cachingMode compute.CachingTypes) error {

This comment has been minimized.

@khenidak

khenidak Feb 12, 2018

Contributor

We have to find a way to find if a VM is part of availability set or Scale Set. We can not try fail then retry. This information is either part of the VM/config/node label

This comment has been minimized.

@feiskyer

feiskyer Feb 12, 2018

Member

This is solved by availabilitySetNodesCache, which holds a list of VMs not managed by vmss.

func (ss *scaleSet) AttachDisk(isManagedDisk bool, diskName, diskURI string, nodeName types.NodeName, lun int32, cachingMode compute.CachingTypes) error {
ssName, instanceID, vm, err := ss.getVmssVM(string(nodeName))
if err != nil {
if err == ErrorNotVmssInstance {

This comment has been minimized.

@khenidak

khenidak Feb 12, 2018

Contributor

This is fine. Also will allow us to support users who have mixed clusters (avsets + scale sets). I do however strongly believe that we should use controller common as the abstraction layer.

computerName := strings.ToLower(*vm.OsProfile.ComputerName)
localCache[computerName] = ssName
}

This comment has been minimized.

@khenidak

khenidak Feb 12, 2018

Contributor

break?

This comment has been minimized.

@feiskyer

feiskyer Feb 12, 2018

Member

why break? This for loop is aiming to hold all VMs.

@feiskyer

This comment has been minimized.

Member

feiskyer commented Feb 12, 2018

Controller Common is the abstraction and this is where we should do if vmss or avset). Let us not default to VMSS now.

If vmType is set to vmss, we are default to vmss because most nodes are expected running on vmss in such case.

Updated comments #59716 (comment). We could merge them together in the future after vmss and vm are using same compute api version, but currently they should still be separated.

Opened #59736 to track this issue.

@k8s-ci-robot k8s-ci-robot added size/XL and removed size/XXL labels Feb 12, 2018

@feiskyer

This comment has been minimized.

Member

feiskyer commented Feb 12, 2018

vmss cache PR has been merged. Made another rebase.

@k8s-ci-robot k8s-ci-robot added size/XXL and removed size/XL labels Feb 13, 2018

@feiskyer

This comment has been minimized.

Member

feiskyer commented Feb 13, 2018

Controller Common is the abstraction and this is where we should do if vmss or avset). Let us not default to VMSS now.

Offline talked with @khenidak. vmType checking is better part of controllerCommon.

@khenidak Added a new commit to address this issue. PTAL

@khenidak

This comment has been minimized.

Contributor

khenidak commented Feb 13, 2018

/LGTM Let's merge :-)

@brendandburns

This comment has been minimized.

Contributor

brendandburns commented Feb 13, 2018

/approve

@@ -0,0 +1,182 @@
/*
Copyright 2017 The Kubernetes Authors.

This comment has been minimized.

@rootfs

This comment has been minimized.

@feiskyer
// vmType is Virtual Machine Scale Set (vmss).
ss, ok := c.cloud.vmSet.(*scaleSet)
if !ok {
return fmt.Errorf("error of converting vmSet (%q) to scaleSet", c.cloud.vmSet)

This comment has been minimized.

@rootfs

rootfs Feb 13, 2018

Member

also dump VMType for diagnostics.

This comment has been minimized.

@feiskyer
}
// AttachDisk attaches a vhd to vm. The vhd must exist, can be identified by diskName, diskURI, and lun.
func (c *controllerCommon) AttachDisk(isManagedDisk bool, diskName, diskURI string, nodeName types.NodeName, lun int32, cachingMode compute.CachingTypes) error {

This comment has been minimized.

@rootfs

rootfs Feb 13, 2018

Member

add comments on how this works. TBH, this flow looks quite cryptic to me

This comment has been minimized.

@feiskyer

feiskyer Feb 14, 2018

Member

ack, will do

This comment has been minimized.

@feiskyer

feiskyer Feb 14, 2018

Member

Added in the new commit

"time"
"github.com/Azure/azure-sdk-for-go/arm/compute"
"github.com/Azure/azure-sdk-for-go/arm/disk"
"github.com/Azure/azure-sdk-for-go/arm/network"
"github.com/Azure/azure-sdk-for-go/arm/storage"
computepreview "github.com/Azure/azure-sdk-for-go/services/compute/mgmt/2017-12-01/compute"

This comment has been minimized.

@rootfs

rootfs Feb 13, 2018

Member

any reason why rename it? This rename causes lots of diffs.

This comment has been minimized.

@feiskyer

feiskyer Feb 14, 2018

Member

There is already an existing package with name compute: "github.com/Azure/azure-sdk-for-go/arm/compute"

@feiskyer

This comment has been minimized.

Member

feiskyer commented Feb 14, 2018

@rootfs @khenidak Thanks for reviewing. Addressed comments. PTAL

@khenidak

This comment has been minimized.

Contributor

khenidak commented Feb 14, 2018

Lets clear the test and merge. Thanks a lot for this (and the rest of the VMSS work) been a long time coming!

@brendandburns

This comment has been minimized.

Contributor

brendandburns commented Feb 14, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 14, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Feb 14, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brendandburns, feiskyer, khenidak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these OWNERS Files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Feb 14, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Feb 14, 2018

Automatic merge from submit-queue (batch tested with PRs 59489, 59716). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit d89e641 into kubernetes:master Feb 14, 2018

10 of 13 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-bazel-test
Details
pull-kubernetes-e2e-gce Job triggered.
Details
pull-kubernetes-verify Job triggered.
Details
cla/linuxfoundation feiskyer authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details

@feiskyer feiskyer deleted the feiskyer:vmss-disk branch Feb 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment