Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provision vsphere volume as per zone #72731

Merged

Conversation

skarthiksrinivas
Copy link
Contributor

@skarthiksrinivas skarthiksrinivas commented Jan 9, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
Currently vsphere cloud provider (VCP) insists on provisioning a volume only on a globally shared datastore. Hence, in a zoned environment, even in presence of a shared datastore within a specific zone, the volume provisioning can fail if that datastore is not shared across all the zones hosting kubernetes nodes. This change fixes this issue by considering the zone information provided in allowedTopologies for selection of the datastore. If allowedTopologies is not provided, the current behaviour is retained as-is.
This PR addresses one part of issue #67703. The other part to attach zone labels to the created volumes is here - #72687
Which issue(s) this PR fixes:
Fixes #

Does this PR introduce a user-facing change?:
Yes

This change ensures that volumes get provisioned based on the zone information provided in allowedTopologies.

Storage class spec:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fastpolicy1
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: zeroedthick
    storagePolicyName: vSAN Default Storage Policy
allowedTopologies:
- matchLabelExpressions:
  - key: failure-domain.beta.kubernetes.io/zone
    values:
    - zone1

PV creation Logs:
I0109 11:17:52.321372       1 vsphere.go:1147] Starting to create a vSphere volume with volumeOptions: &{CapacityKB:1048576 Tags:map[kubernetes.io/created-for/pvc/namespace:default kubernetes.io/created-for/pvc/name:pvcsc-1-policy kubernetes.io/created-for/pv/name:pvc-34650c12-1400-11e9-aef4-005056804cc9] Name:kubernetes-dynamic-pvc-34650c12-1400-11e9-aef4-005056804cc9 DiskFormat:zeroedthick Datastore: VSANStorageProfileData: StoragePolicyName:vSAN Default Storage Policy StoragePolicyID: SCSIControllerType: Zone:[zone1]}
...
I0109 11:17:59.430113       1 vsphere.go:1334] The canonical volume path for the newly created vSphere volume is "[vsanDatastore] 98db185c-6683-d8c7-bc55-0200435ec5da/kubernetes-dynamic-pvc-34650c12-1400-11e9-aef4-005056804cc9.vmdk"

Ran regression tests (no zone) and they passed.

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jan 9, 2019
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 9, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @skarthiksrinivas. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 9, 2019
@skarthiksrinivas
Copy link
Contributor Author

@frapposelli
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 9, 2019
@frapposelli
Copy link
Member

/sig vmware

@k8s-ci-robot k8s-ci-robot added the area/provider/vmware Issues or PRs related to vmware provider label Jan 9, 2019
@@ -104,6 +104,7 @@ func (util *VsphereDiskUtil) CreateVolume(v *vsphereVolumeProvisioner) (volSpec
Name: name,
}

volumeOptions.Zone = selectedZone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand this correctly, this feature does not uses NodeAffinity field of PVs and hence does not uses topology aware provisioning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Your understanding is correct. The scope of this fix is limited to honouring the allowedTopologies zones during volume provisioning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I don't specify allowedTopology in my default storageClass and my nodes are spread across zones then it will basically result in PVCs which can not be used by pods. Previously - we blocked/errored out on volume provisioning altogether if datastore being used is not shared with all VMs (#72497). How does this interact with that bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue in #72497 is that volume provisioning will fail if there is no shared datastore available across all kubernetes node VMs. Now, with this change, by specifying allowedTopologies in SC, the volume provisioning can happen as long as there is a shared datastore available for the nodes within the zone.
There is no change in behaviour when allowedTopology is not specified. It will continue to work in the same way today, i.e by looking for shared datastore across all nodes and succeed or fail as the case is.

nm.zoneInfoLock.Lock()
nm.zoneInfoMap[nodeName] = zone
nm.zoneInfoLock.Unlock()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use defer pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.


nm.zoneInfoLock.Lock()
delete(nm.zoneInfoMap, node.ObjectMeta.Name)
nm.zoneInfoLock.Unlock()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use defer if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Fixed the code to use defer pattern for all access to zoneInfoLock.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 10, 2019
@skarthiksrinivas
Copy link
Contributor Author

Addressed comments.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 13, 2019
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Feb 13, 2019

@skarthiksrinivas: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kops-aws a7f8f66e9dbcc397251ad7c40f963c9735c628c1 link /test pull-kubernetes-e2e-kops-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@skarthiksrinivas
Copy link
Contributor Author

/retest

@SandeepPissay
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frapposelli, SandeepPissay, skarthiksrinivas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 15, 2019
@vladimirvivien
Copy link
Member

@SandeepPissay will this be cherry picked to earlier k8s version ?

@frapposelli
Copy link
Member

@vladimirvivien I believe this will not be backported.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 16, 2019
@rhockenbury
Copy link

Does anyone know if there are plans to implement theWaitForFirstConsumer option for the volume binding mode for vsphere?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 17, 2019
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 18, 2019
@frapposelli
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. kind/bug Categorizes issue or PR as related to a bug. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Feb 18, 2019
@k8s-ci-robot k8s-ci-robot merged commit 701f914 into kubernetes:master Feb 18, 2019
@frapposelli
Copy link
Member

🎉

return "", err
}

if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be ahead of the if block starting at line 1260 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's correct. Thanks for pointing out. The current sequence makes this check a no-op. Fixed it. Have created a PR for this - #74263


for _, host := range hosts {
var hostSystemMo mo.HostSystem
host.Properties(ctx, host.Reference(), []string{"datastore"}, &hostSystemMo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the effect of this call (I don't see assignment) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last parameter in the this method call is an OUT parameter which will be set and that's how hostSystemMo gets assigned. However, this method returns error which is not processed in this step. I have fixed that now in the PR mentioned above.

@skarthiksrinivas
Copy link
Contributor Author

Does anyone know if there are plans to implement theWaitForFirstConsumer option for the volume binding mode for vsphere?

Yes. We do have that task in the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/vmware Issues or PRs related to vmware provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants