Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure PD (Managed/Blob) #46360

Merged
merged 1 commit into from Jul 14, 2017
Merged

Azure PD (Managed/Blob) #46360

merged 1 commit into from Jul 14, 2017

Conversation

@khenidak
Copy link
Contributor

khenidak commented May 24, 2017

This is exactly the same code as this PR. It has a clean set of generated items. We created a separate PR to accelerate the accept/merge the PR

CC @colemickens
CC @brendandburns

What this PR does / why we need it:

  1. Adds K8S support for Azure Managed Disks.
  2. Adds support for dedicated blob disks (1:1 to storage account) in addition to shared blob disks (n:1 to storage account).
  3. Automatically manages the underlying storage accounts. New storage accounts are created at 50% utilization. Max is 100 disks, 60 disks per storage account.
  4. Addresses the current issues with Blob Disks:
    ..* Significantly faster attach process. Disks are now usually available for pods on nodes under 30 sec if formatted, under a min if not formatted.
    ..* Adds support to move disks between nodes.
    ..* Adds consistent attach/detach behavior, checks if the disk is leased/attached on a different node before attempting to attach to target nodes.
    ..* Fixes a random hang behavior on Azure VMs during mount/format (for both blob + managed disks).
    ..* Fixes a potential conflict by avoiding the use of disk names for mount paths. The new plugin uses hashed disk uri for mount path.

The existing AzureDisk is used as is. Additional "kind" property was added allowing the user to decide if the pd will be shared, dedicated or managed (Azure Managed Disks are used).

Due to the change in mounting paths, existing PDs need to be recreated as PV or PVCs on the new plugin.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented May 24, 2017

Hi @khenidak. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

roberthbailey left a comment

I've only reviewed the code in the cloudprovider directory. The majority of the changes are in the volume/azure_dd package, so I'll defer to @ brendandburns and @rootfs to review and approve those changes.

@@ -23,6 +23,10 @@ import (
azs "github.com/Azure/azure-sdk-for-go/storage"
)

const (

This comment has been minimized.

Copy link
@roberthbailey

roberthbailey May 24, 2017

Member

Please put this onto a single line instead of using a block since there is only one constant.

@roberthbailey

This comment has been minimized.

Copy link
Member

roberthbailey commented May 24, 2017

/approve

@rootfs

This comment has been minimized.

Copy link
Member

rootfs commented May 24, 2017

@k8s-bot ok to test

@rootfs

This comment has been minimized.

Copy link
Member

rootfs commented May 24, 2017

@khenidak This is a large refactoring, hard to review.

Please reuse existing code and interface and not shuffle/copy them around. If there existing interface not working for you, address the limitation in the interface.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented May 24, 2017

/approve

This is not a code review

@khenidak

This comment has been minimized.

Copy link
Contributor Author

khenidak commented May 31, 2017

@rootfs

  1. Moved the disk controllers to Azure Cloud as requested.
  2. re REST vs SDK - We will move to the SDK once we test against the specific requirements for plugin
  3. Added few unit tests where possible.
return "", err
}

func formatIfNotFormatted(disk string, fstype string) {

This comment has been minimized.

Copy link
@rootfs

rootfs May 31, 2017

Member

reuse method provided in mounter

This comment has been minimized.

Copy link
@khenidak

khenidak Jun 1, 2017

Author Contributor

discussed above.

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 1, 2017

Member

IF that's the case, it should be addressed in pkg/util/mount.

As mentioned, this format hang issue was root caused here

Submit an issue if you see other symptoms or root cause.

This comment has been minimized.

Copy link
@brendandburns

brendandburns Jun 1, 2017

Contributor

I'd really rather not fix this problem in this PR. We can TODO it, but this PR has a bunch of fixes I'd like to see merged. Can we address the refactor/merge into pkg/util/mount in a separate PR?

@@ -0,0 +1,1036 @@
/*
Copyright 2016 The Kubernetes Authors.

This comment has been minimized.

Copy link
@rootfs

rootfs May 31, 2017

Member

2017 and for rest of the new files

This comment has been minimized.

Copy link
@khenidak

khenidak Jun 1, 2017

Author Contributor

actually it is 2016 in all the files i have checked. Wouldn't that break Bazel if changed?

This comment has been minimized.

Copy link
@brendandburns

brendandburns Jun 1, 2017

Contributor

No, 2017 for all new files should work.

case "skuname":
storageAccountType = strings.ToLower(v)
case "location":
return nil, fmt.Errorf("AzureDisk - location parameter is not supported anymore in PVC, use PV to use named storage accounts in different locations")

This comment has been minimized.

Copy link
@rootfs

rootfs May 31, 2017

Member

This is a regression. We use location to create a disk at the same Azure region with compute instance.

This comment has been minimized.

Copy link
@khenidak

khenidak Jun 1, 2017

Author Contributor

Users does not need to provide location, location is picked up from the cloud provider and disks are created in the same resource group. The current implementation does validate location = current (AFAIK) hence is subject to situation where users create disks in different location than the cluster. PVCs and PVs provisioned already will have no impact with this change. only new PVC/PVs

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 21, 2017

Member

Address this.

case "location":
return nil, fmt.Errorf("AzureDisk - location parameter is not supported anymore in PVC, use PV to use named storage accounts in different locations")
case "storageaccount":
return nil, fmt.Errorf("AzureDisk - storage parameter is not suppoerted anymore in PVC, use PV to use named storage account")

This comment has been minimized.

Copy link
@rootfs

rootfs May 31, 2017

Member

Why this is not supported?

This comment has been minimized.

Copy link
@khenidak

khenidak Jun 1, 2017

Author Contributor

because the user has 2 options when doing blob disk, either PVC (dedicated or shared) using a set of storage accounts managed by the plugin or PV in this case the user provides diskuri which includes storage account

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 21, 2017

Member

Use case for storage account is for a multi tenant setup, providing separate storage account for each tenant.

for k, v := range p.options.Parameters {
switch strings.ToLower(k) {
case "skuname":
storageAccountType = strings.ToLower(v)

This comment has been minimized.

Copy link
@rootfs

rootfs May 31, 2017

Member

storageAccountType is assigned twice for skuname and storageaccounttype, what does it really mean?

This comment has been minimized.

Copy link
@khenidak

khenidak Jun 1, 2017

Author Contributor

because sku is "arm" label not end user label, all Azure documentation and portal refer to it "storage account type". Only when you interact with arm templates you deal with sku. Hence, This should use "storage account type" to avoid confusion, but since there is no risk (as described below) in leaving sku parameter it is left for this version until we remove it from all samples then retire it

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 21, 2017

Member

This is not what I read from https://docs.microsoft.com/en-us/rest/api/storagerp/storageaccounts
"Note that in older versions, sku name was called accountType."

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Jun 22, 2017

Member

skuname and storageAccountType has same meaning here, it's an alias:
Note that in older versions, sku name was called accountType.
type SkuName { Standard_LRS, Standard_GRS, Standard_RAGRS, Standard_ZRS, Premium_LRS }

return "", err
}

func formatIfNotFormatted(disk string, fstype string) {

This comment has been minimized.

Copy link
@rootfs

rootfs May 31, 2017

Member

this is mostly identical (except the order of command) with /pkg/util/mount but bypass mounter and directly call exec.New to obtain a runner. This is problematic as we are refactoring mounters.

If there is an issue with default mounter, refactor at /pkg/util/mount, but don't bypass mounter interface.

This comment has been minimized.

Copy link
@khenidak

khenidak Jun 1, 2017

Author Contributor

as discussed, there is an issue that hangs the format/mount sequence as its is in K8s (it takes 4 mins+ to execute, which is basically a hang). It waits on IO (process in D sate) on Azure. Until we identify exactly what is wrong (either in k8s) & until a fix is applied and taste (either on Azure/k8s or both) we will have to go with this approach since this approach consistently works with predictable format/mount time.

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 21, 2017

Member

This issue is now discussed in #47158

@@ -0,0 +1,179 @@
/*

This comment has been minimized.

Copy link
@rootfs

rootfs May 31, 2017

Member

follow the file name convention: the file that implement the plugin is named with the plugin's name.

This comment has been minimized.

Copy link
@khenidak

khenidak Jun 1, 2017

Author Contributor

the file is named azure_dd.go, as for mount and attacher i didn't find a consistent naming concision across the plugins. I can prefix all files with azure_dd if that will help.

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 21, 2017

Member

filename prefixed with azure_ for log parsing.
For new files, copyright year stay with 2017

@@ -1,145 +0,0 @@
/*

This comment has been minimized.

Copy link
@rootfs

rootfs May 31, 2017

Member

keep it as we are going to refactor disk utilities among plugins.

This comment has been minimized.

Copy link
@khenidak

khenidak Jun 1, 2017

Author Contributor

this has moved into the common file of the disk controllers. Since this only needed by Blob Disk + Managed Disk (and will always be, since any block device based disks will always use AzureDiskVolumeSource type with the addition of Kind parameter) also they have nothing to do with "vhd" per-se

var _ volume.Unmounter = &azureDiskUnmounter{}
var _ volume.Mounter = &azureDiskMounter{}

func (m *azureDiskMounter) GetAttributes() volume.Attributes {

This comment has been minimized.

Copy link
@rootfs

rootfs May 31, 2017

Member

Missed Managed

This comment has been minimized.

Copy link
@khenidak

khenidak Jun 1, 2017

Author Contributor

added

}

// we didn't fail
//TODO: do we need to do this? the bind-mount performs ro mount

This comment has been minimized.

Copy link
@rootfs

rootfs May 31, 2017

Member

this is for ownership change.

This comment has been minimized.

Copy link
@khenidak

khenidak Jun 1, 2017

Author Contributor

removed the comment

@rootfs

This comment has been minimized.

Copy link
Member

rootfs commented May 31, 2017

@khenidak there are many regressions when you remove existing azure_dd.go and decide to withdraw support for certain provsioning parameters.

In existing azure_dd.go, you'll find newly added functions such as SupportsMountOption and SupportsBulkVolumeVerification. Even in functions you did copy from there, the function signature changes.

As we are approaching code freeze, extensive refactoring must be carefully evaluated.

@rootfs

This comment has been minimized.

Copy link
Member

rootfs commented May 31, 2017

@khenidak

This comment has been minimized.

Copy link
Contributor Author

khenidak commented Jun 1, 2017

Support bulk verification and support mount options are there in this PR for a while now including passing the mount options to the mounter during mount. as for signature change if you are referring to referring to UnixxxxID type defs then these have also been included.

I went back and verified that this PR includes all the "bulk" changes that happened in volume. including one related to skipping TearDownAt call if the path does not exist (just updated the PR with it).

"path"
"regexp"
"strconv"
STRINGS "strings"

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 1, 2017

Member

Upper case words have their meanings in golang. Use other alias e.g. libstrings

This comment has been minimized.

Copy link
@andyzhangx
return "", err
}

func formatIfNotFormatted(disk string, fstype string) {

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 1, 2017

Member

IF that's the case, it should be addressed in pkg/util/mount.

As mentioned, this format hang issue was root caused here

Submit an issue if you see other symptoms or root cause.

return attached, lun, err
}

fragment := vmData.(map[string]interface{})

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 1, 2017

Member

This is not safe, it will panic on unexpected vm stream. Run unit test below and you'll get a panic.

 	vm := []byte("foo")
	if err := json.Unmarshal(vm, &vmData); err != nil {
               return 
	}
        // panic happens!
  	fragment := vmData.(map[string]interface{})

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Jun 19, 2017

Member

thanks for the check, I have used below code to avoid such conversion issue:
fragment, ok := vmData.(map[string]interface{})
if !ok {
return attached, lun, fmt.Errorf("convert vmData to map error")
}
Also do the checks for the proceeding code.

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Jun 19, 2017

Member

I have movoed all the error handling code of json format in func: ExtractVMData, ExtractDiskData in azure_util.go

diskIdTemplate string = "/subscriptions/%s/resourcegroups/%s/providers/microsoft.compute/disks/%s"
defaultDataDiskCount int = 16 // which will allow you to work with most medium size VMs (if not found in map)

vhdBlobUriTemplate = "https://%s.blob.core.windows.net/%s/%s"

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 1, 2017

Member

This is not accurate. You have to choose among core.windows.net, core.usgovcloudapi.net, etc....

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Jun 16, 2017

Member

thanks for the hint, I have used az.Environment.StorageEndpointSuffix


const (
aadTokenEndPointPath string = "%s/oauth2/token/"
apiversion string = "2016-04-30-preview"

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 1, 2017

Member

This apiversion is not supported in every azure region
Azure/azure-xplat-cli#3513

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Jun 16, 2017

Member

For ARM, The supported api-versions are '2016-04-30-preview, 2017-03-30' in global azure. So for other regions, like azure china, just wait for the api version update or do customerization if want to use it ASAP.

return cachingMode, nil
}

type ioHandler interface {

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 1, 2017

Member

keep vhd_util.go file and its unit test.

@rootfs

This comment has been minimized.

Copy link
Member

rootfs commented Jun 1, 2017

@khenidak address these concerns and ping me or other members from @kubernetes/sig-storage-pr-reviews when you are done

Make CI test pass.

None of the CI tests have passed yet. Run hach/verify-all.sh

Unit test new functions

Especially those in raw HTTP, there are many uncaptured errors.

Refactoring:

  • make sure existing methods in azure_dd.go are not lost
  • Address mount utility related in issues, if any, in pkg/util/mount
  • Keep vhd_util.go and its unit test

Compatibility

  • We need to support location parameter in Provision(). Provisioner doesn't always run in the same region as compute instance,
  • Don't withdraw support for parameters without a proper notice or documentation.

Raw HTTP:

  • If you continue raw HTTP, handle errors and malformed json.
  • Make sure the API version works for all azure regions that are currently supported in existing code.
  • Don't use hardcoded azure endpoints in raw HTTP, they don't work in every azure region.
    I am not an expert on Azure sdk, cc @ahmetb @colemickens

Address review comments from @brendandburns in #41950

cc @jdumars

return err
}

func (c *blobDiskController) diskHasNoLease(diskUri string) (bool, error) {

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 1, 2017

Member

cc @colemickens

As advised by Joseph Romeo by email: "Avoid looking at VHD blob leases to decide a disk is attached or detached. This can lead you to unexpected failures or worse-case get into a bad state."

cachingMode := string(*volumeSource.CachingMode)
managed := (*volumeSource.Kind == v1.AzureManagedDisk)
if managed {
lun, err = a.plugin.diskController.AttachManagedDisk(string(nodeName), volumeSource.DataDiskURI, cachingMode)

This comment has been minimized.

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Jun 16, 2017

Member

hi, we use nodeName to get the vm object, see in AttachManagedDisk func:
vm, err := c.common.getArmVM(nodeName)

}

// finds an empty based on VM size and current attached disks
func findEmptyLun(vmSize string, dataDisks []interface{}) (int, error) {

This comment has been minimized.

Copy link
@rootfs

rootfs Jun 1, 2017

Member

How to deal with race condition?

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Jun 16, 2017

Member

I have used lock in this func

kind = v1.AzureDataDiskKind(v)
case "cachingmode":
cachingMode = v1.AzureDataDiskCachingMode(v)
case "fstype":

This comment has been minimized.

Copy link
@rootfs
@saad-ali

This comment has been minimized.

Copy link
Member

saad-ali commented Jun 1, 2017

@rootfs @khenidak FYI, code freeze for 1.7 is at 6 PM PT today (about 1 hour away). For this to get into 1.7.0, it must be LGTM'd before then.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jul 14, 2017

@khenidak: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-cross 1ad55b5 link @k8s-bot pull-kubernetes-cross test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@karataliu

This comment has been minimized.

Copy link
Contributor

karataliu commented Jul 14, 2017

/retest

@karataliu

This comment has been minimized.

Copy link
Contributor

karataliu commented Jul 14, 2017

@brendandburns Thanks for reviewing. I've got the test fix commit karataliu@3ed5502 verified, and it has been squashed into the final single commit 677e593 here.

@brendandburns

This comment has been minimized.

Copy link
Contributor

brendandburns commented Jul 14, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 14, 2017
@brendandburns

This comment has been minimized.

Copy link
Contributor

brendandburns commented Jul 14, 2017

/approve no-issue

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jul 14, 2017

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brendandburns, khenidak, roberthbailey, thockin

Associated issue requirement bypassed by: brendandburns

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:
  • OWNERS [brendandburns,thockin]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jul 14, 2017

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 9e97b52 into kubernetes:master Jul 14, 2017
9 of 10 checks passed
9 of 10 checks passed
Submit Queue Required Github CI test is not green: pull-kubernetes-verify
Details
cla/linuxfoundation andyzhangx authorized
Details
pull-kubernetes-bazel Job succeeded.
Details
pull-kubernetes-e2e-gce-etcd3 Jenkins job succeeded.
Details
pull-kubernetes-e2e-kops-aws Jenkins job succeeded.
Details
pull-kubernetes-federation-e2e-gce Jenkins job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Jenkins job succeeded.
Details
pull-kubernetes-node-e2e Jenkins job succeeded.
Details
pull-kubernetes-unit Jenkins job succeeded.
Details
pull-kubernetes-verify Jenkins job succeeded.
Details
@k8s-cherrypick-bot

This comment has been minimized.

Copy link

k8s-cherrypick-bot commented Jul 17, 2017

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@wojtek-t wojtek-t added this to the v1.7 milestone Jul 18, 2017
@brendandburns brendandburns modified the milestones: v1.8, v1.7 Jul 18, 2017
brendandburns added a commit to seanknox/kubernetes that referenced this pull request Jul 18, 2017
k8s-github-robot pushed a commit that referenced this pull request Jul 19, 2017
…60-upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #46360 upstream release 1.7

Cherry pick of #46360 on release-1.7.

#46360: Azure PD (Managed/Blob)
@andyzhangx

This comment has been minimized.

Copy link
Member

andyzhangx commented Jul 26, 2017

Hi @nmakhotkin , the code is already in 1.7.2, let me know if you have any question.

@nmakhotkin

This comment has been minimized.

Copy link

nmakhotkin commented Jul 26, 2017

@andyzhangx thanks! I'll try it.

@k8s-cherrypick-bot

This comment has been minimized.

Copy link

k8s-cherrypick-bot commented Jul 31, 2017

Commit found in the "release-1.8" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@jdumars

This comment has been minimized.

Copy link
Member

jdumars commented Aug 15, 2017

Release Note:

Adds Kubernetes support for Azure Managed Disks, support for dedicated blob disks (1:1 to storage account) in addition to shared blob disks (n:1 to storage account), and automatically manages the underlying storage accounts. New storage accounts are created at 50% utilization. Max is 100 disks, 60 disks per storage account. Due to the change in mounting paths, existing PDs need to be recreated as either PV or PVCs with the new plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.