Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume topology aware dynamic provisioning: basic plumbing #63232

Merged
merged 3 commits into from May 25, 2018

Conversation

@lichuqiang
Copy link
Member

lichuqiang commented Apr 27, 2018

What this PR does / why we need it:

Split PR #63193 for better review
part 1: basic scheduler and controller plumbing

Next: #63233

Which issue(s) this PR fixes
Feature: kubernetes/enhancements#561
Design: kubernetes/community#2168

Special notes for your reviewer:
/sig storage
/sig scheduling
/assign @msau42 @jsafrane @saad-ali @bsalamat

Release note:

Basic plumbing for volume topology aware dynamic provisioning
@msau42
Copy link
Member

msau42 left a comment

So far I reviewed the scheduler_binder.go code. in general, this needs a lot more unit tests.

if err != nil {
return false, false, err
}

if !utilfeature.DefaultFeatureGate.Enabled(features.DynamicProvisioningScheduling) {
return

This comment has been minimized.

@msau42

msau42 Apr 27, 2018

Member

Can you reverse this check, and put checkVolumeProvisions inside here? It took me a while to realize and verify that it's returning the correct values in this case.

if className == "" {
return false, fmt.Errorf("no class for claim %q", getPVCName(claim))
}

This comment has been minimized.

@msau42

msau42 Apr 27, 2018

Member

This should also check if "kubernetes.io/no-provisioner" is set

This comment has been minimized.

@msau42

msau42 Apr 27, 2018

Member

Should we check if dynamic provisioning is in progress on this node? (ie selected node annotation)

This comment has been minimized.

@lichuqiang

lichuqiang Apr 28, 2018

Author Member

This should also check if "kubernetes.io/no-provisioner" is set

I didn't noticed that before, so a pvc is not expected to get provisioned if the annotation is set?

Should we check if dynamic provisioning is in progress on this node? (ie selected node annotation)

pvcs with selected node annotation should have been filtered out in shouldDelayBinding(called by getPodVolumes)

This comment has been minimized.

@msau42

msau42 Apr 28, 2018

Member

If the provisioner name in SC is set to "no-provisioner", then dynamic provisioning is not supported, and we should fail the predicate.

I think the predicate should also check for the selected node annotation, so that the scheduler does not choose a different node while provisioning is still in progress.

This comment has been minimized.

@lichuqiang

lichuqiang Apr 28, 2018

Author Member

If the provisioner name in SC is set to "no-provisioner", then dynamic provisioning is not supported, and we should fail the predicate.

Got it.

I think the predicate should also check for the selected node annotation, so that the scheduler does not choose a different node while provisioning is still in progress.

pvcs with selected node annotation will be filtered out in func getPodVolumes as type unboundClaimsImmediate, and the predicate will fail at the first beginning, that kind of claims will not be passed into the func here.


// The claims from method args can be pointing to watcher cache. We must not
// modify these, therefore create a copy.
// TODO: move to assume phase once we find way to get topology info in AssumePodVolumes()

This comment has been minimized.

@msau42

msau42 Apr 27, 2018

Member

can you explain what the issue is? AssumePodVolumes does pass in the node name

This comment has been minimized.

@lichuqiang

lichuqiang Apr 28, 2018

Author Member

It's mainly for the "provisionedTopology" anno, to get its value, we'll need to fetch the storageclass and the node. And I don't want to repeat it in assume phase again.

At first I implement the plumbing. allowedTopology and capacity report part at one time, and comments here is not updated accordingly after prs split

If we finally decide not to get the capacity part in, I'll consider to move the part to assume phase.

@lichuqiang

This comment has been minimized.

Copy link
Member Author

lichuqiang commented Apr 28, 2018

in general, this needs a lot more unit tests.

Yeah, in #63193, I have an individual commit for scheduler binder unit test update (after allowedTopology and capacity work in),
Since it's splited, I'll have an update for its test cases according to existing code

t.Errorf("Test %q failed: GetPVC %q returned error: %v", name, pvcKey, err)
continue
}
if pvc.Annotations[annSelectedNode] == "" {

This comment has been minimized.

@msau42

msau42 May 7, 2018

Member

Should this also validate the node?

pvs: []*v1.PersistentVolume{pvNode2},
expectedUnbound: false,
expectedBound: true,
podPVCs: []*v1.PersistentVolumeClaim{unboundPVC},

This comment has been minimized.

@msau42

msau42 May 7, 2018

Member

Instead of modifying existing tests, can you add brand new test cases for the provisioning case, and keep the existing tests as the "no provisioning" case. Then you can also combine the new dynamic provisioning checks into the same test function.

Also, a test case for when dynamic provisioning feature gate is disabled would be good too to make sure it works like the "no provisioning" case.

// Scheduler signal to the PV controller to start dynamic
// provisioning by setting the "annSelectedNode" annotation
// in the PVC
if _, ok := claim.Annotations[annSelectedNode]; ok {

This comment has been minimized.

@msau42

msau42 May 7, 2018

Member

TODO at L312 can be removed

@@ -1423,6 +1440,11 @@ func (ctrl *PersistentVolumeController) provisionClaimOperation(claim *v1.Persis
volume, err = provisioner.Provision()

This comment has been minimized.

@msau42

msau42 May 7, 2018

Member

Add a TODO here to modify the Provision() API to pass in topology info

// Provisioning not triggered by the scheduler, skip
return
}
annotations := claim.Annotations

This comment has been minimized.

@msau42

msau42 May 7, 2018

Member

I think before modifying claim, we need to make a new copy of it.

This comment has been minimized.

@lichuqiang

lichuqiang May 8, 2018

Author Member

you are right.

delete(annotations, annSelectedNode)
claim.Annotations = annotations
// Try to update the PVC object several times
for i := 0; i < ctrl.createProvisionedPVRetryCount; i++ {

This comment has been minimized.

@msau42

msau42 May 7, 2018

Member

Does this need to be retried? Other PVC updates don't retry. Also, I think storeClaimUpdate needs to be called after API update.

This comment has been minimized.

@lichuqiang

lichuqiang May 8, 2018

Author Member

Does this need to be retried? Other PVC updates don't retry.

The difference is just retry at once or do it at the next sync, both are okay to me.

I think storeClaimUpdate needs to be called after API update.

Oh, I remember I added it before, I might have got it lost during update

utilfeature.DefaultFeatureGate.Set("DynamicProvisioningScheduling=true")
defer utilfeature.DefaultFeatureGate.Set("DynamicProvisioningScheduling=false")

name = "dynamicProvisioningScheduling-feature-enabled"

This comment has been minimized.

@msau42

msau42 May 7, 2018

Member

Should have a test case for when annotation is and is not set when feature is enabled.

// Sort all the claims by increasing size request to get the smallest fits
sort.Sort(byPVCSize(claimsToBind))

chosenPVs := map[string]*v1.PersistentVolume{}

foundMatches = true
preBoundClaims := []*bindingInfo{}

This comment has been minimized.

@msau42

msau42 May 7, 2018

Member

A better name might be "matchedClaims"

return
}

func (b *volumeBinder) checkVolumeProvisions(pod *v1.Pod, claimsToProvision []*v1.PersistentVolumeClaim, node *v1.Node) (provisionSatisfied bool, err error) {

This comment has been minimized.

@msau42

msau42 May 7, 2018

Member

Can you add a comment to this function


// A decision includes bindingInfo and provisioned PVCs of the node
type nodeDecision struct {
bindings []*bindingInfo

This comment has been minimized.

@msau42

msau42 May 7, 2018

Member

Would it be simpler to modify bindingInfo to also store provisioning information? Instead of a new provisionings field?

This comment has been minimized.

@lichuqiang

lichuqiang May 8, 2018

Author Member

For now, a bindingInfo stores a pvc and a pv, to represent a binding relationship. So, in what way you want to expand it? Maybe add a field of bool, to indicate if the pvc needs provisioning? Or treat a pvc in a bindingInfo to be in need of provisioning if its pv not specified?

I'm not sure if it a good idea to put all of the things in the binding data. I think the binding and provisioning are two independent phases, it would be clearer to store and process their data separately. In fact I even considered to rename PodBindingCache to reflect the fact that the cache stores data of both bindings and provisionings, but not limited to binding info.

@lichuqiang lichuqiang force-pushed the lichuqiang:provision_plumbing branch from c4331fb to e987fd7 May 8, 2018

@k8s-ci-robot k8s-ci-robot added size/XXL and removed size/XL labels May 8, 2018

@lichuqiang lichuqiang force-pushed the lichuqiang:provision_plumbing branch 3 times, most recently from b186c1a to 0c50ffd May 8, 2018

@msau42
Copy link
Member

msau42 left a comment

This is looking good! just one minor comment. Can you also squash the new commits back into the original 3?

expectedUnbound: true,
expectedBound: true,
},
"immediate-unbound-pvc": {

This comment has been minimized.

@msau42

msau42 May 8, 2018

Member

Can you also add a test case with immediate-bound and delayed-provision

This comment has been minimized.

@lichuqiang

lichuqiang May 9, 2018

Author Member

Done

@lichuqiang lichuqiang force-pushed the lichuqiang:provision_plumbing branch 2 times, most recently from 5aef91c to 9f54b77 May 9, 2018

// Other places of failure has nothing to do with DynamicProvisioningScheduling,
// so just let controller retry in the next sync. We'll only call func
// triggerReprovisioning here when the underlying provisioning actually failed.
ctrl.triggerReprovisioning(claim)

This comment has been minimized.

@msau42

msau42 May 9, 2018

Member

A better name here might be "rescheduleProvisioning"

This comment has been minimized.

@msau42

msau42 May 9, 2018

Member

It would also be good (in a separate PR), to have an integration test for this interaction between scheduler, pv controller, and provisioner.

This comment has been minimized.

@lichuqiang

lichuqiang May 10, 2018

Author Member

Yeah, I meant to add the test after the base PR merged


// If the provisioner name in a storage class is set to "kubernetes.io/no-provisioner",
// then dynamic provisioning is not supported by the storage.
const NotSupportedProvisioner = "kubernetes.io/no-provisioner"

This comment has been minimized.

@msau42

msau42 May 9, 2018

Member

Does this need to be public? It is not used outside of this package

This comment has been minimized.

@lichuqiang

lichuqiang May 10, 2018

Author Member

Not really, since it's a general agreement, I made it public just in case it is potentially used in other places.

However, if it's really used elsewhere in the future (say used by storage plugins), then the pv controller should no longer be the proper place to define it. So agree to make it internal for now.

@lichuqiang lichuqiang force-pushed the lichuqiang:provision_plumbing branch from 9f54b77 to 183abfb May 10, 2018

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented May 10, 2018

/lgtm

@lichuqiang lichuqiang force-pushed the lichuqiang:provision_plumbing branch from c9fe6c8 to 91d7982 May 23, 2018

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented May 23, 2018

volume-scheduler role change LGTM

}

// Key = nodeName
// Value = array of bindingInfo
type nodeBindings map[string][]*bindingInfo
// Value = bindings & provisioned PVSs of the node

This comment has been minimized.

@jsafrane

jsafrane May 24, 2018

Member

typo: PVSs -> PVCs

This comment has been minimized.

@lichuqiang

lichuqiang May 24, 2018

Author Member

done

// Update claims objects to trigger volume provisioning. Let the PV controller take care of the rest
// PV controller is expect to signal back by removing related annotations if actual provisioning fails
for i, claim := range claimsToProvision {
if _, err := b.ctrl.kubeClient.CoreV1().PersistentVolumeClaims(claim.Namespace).Update(claim); err != nil {

This comment has been minimized.

@jsafrane

jsafrane May 24, 2018

Member

Since you update PVC, should you add it to PVC cache? Is it possible that revertAssumedPVCs() called by later iteration reverts to a version that's already old and API server has a new one?

This comment has been minimized.

@lichuqiang

lichuqiang May 24, 2018

Author Member

Since you update PVC, should you add it to PVC cache?

We should have already add it to the assume cache in the previous assume phase

Is it possible that revertAssumedPVCs() called by later iteration reverts to a version that's already old and API server has a new one?

I guess it will not happen, in the Restore func of the assume cache, we won't try to add something new, but change its latestObj same to its apiObj

This comment has been minimized.

@msau42

msau42 May 24, 2018

Member

RevertAssumedPVCs called later only reverts PVCs that haven't been API updated yet i:

This comment has been minimized.

@msau42

msau42 May 24, 2018

Member

The "provision-api-update-failed" unit test checks that 1st PVC shows update in cache, but 2nd PVC that failed API update got reverted in cache

@jsafrane

This comment has been minimized.

Copy link
Member

jsafrane commented May 24, 2018

Almost LGTM sans one typo.

@lichuqiang lichuqiang force-pushed the lichuqiang:provision_plumbing branch from 91d7982 to 446f365 May 24, 2018

@jsafrane

This comment has been minimized.

Copy link
Member

jsafrane commented May 24, 2018

controller changes LGTM

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented May 24, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 24, 2018

@lichuqiang

This comment has been minimized.

Copy link
Member Author

lichuqiang commented May 25, 2018

@liggitt can you give a lgtm for the auth change?

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented May 25, 2018

/approve
RBAC role change

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented May 25, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lichuqiang, liggitt, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lichuqiang

This comment has been minimized.

Copy link
Member Author

lichuqiang commented May 25, 2018

/retest

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented May 25, 2018

@lichuqiang: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-kubemark-e2e-gce-big 446f365 link /test pull-kubernetes-kubemark-e2e-gce-big

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented May 25, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit a8cf18c into kubernetes:master May 25, 2018

15 of 18 checks passed

pull-kubernetes-kubemark-e2e-gce-big Job failed.
Details
Submit Queue Required Github CI test is not green: pull-kubernetes-e2e-gce
Details
pull-kubernetes-e2e-gce-100-performance Job triggered.
Details
cla/linuxfoundation lichuqiang authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

li-ang pushed a commit to li-ang/kubernetes that referenced this pull request Jun 5, 2018

Kubernetes Submit Queue
Merge pull request kubernetes#63233 from lichuqiang/provision_api
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

API change for volume topology aware dynamic provisioning

**What this PR does / why we need it**:

Split PR kubernetes#63193 for better review
part 2: API change

Previous: kubernetes#63232
Next: kubernetes#63193

**Which issue(s) this PR fixes** 
Feature: kubernetes/enhancements#561
Design: kubernetes/community#2168

**Special notes for your reviewer**:
/sig storage
/sig scheduling
/assign @msau42 @jsafrane @thockin 


**Release note**:

```release-note
API change for volume topology aware dynamic provisioning
```

k8s-publishing-bot added a commit to kubernetes/api that referenced this pull request Jun 5, 2018

Merge pull request #63233 from lichuqiang/provision_api
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

API change for volume topology aware dynamic provisioning

**What this PR does / why we need it**:

Split PR kubernetes/kubernetes#63193 for better review
part 2: API change

Previous: kubernetes/kubernetes#63232
Next: kubernetes/kubernetes#63193

**Which issue(s) this PR fixes**
Feature: kubernetes/enhancements#561
Design: kubernetes/community#2168

**Special notes for your reviewer**:
/sig storage
/sig scheduling
/assign @msau42 @jsafrane @thockin

**Release note**:

```release-note
API change for volume topology aware dynamic provisioning
```

Kubernetes-commit: 77d996b27883bed9d8b9015b608f9a0f0c60fd23

dims pushed a commit to dims/kubernetes that referenced this pull request Jun 5, 2018

Kubernetes Submit Queue
Merge pull request kubernetes#63193 from lichuqiang/provision_0425
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Volume topology aware dynamic provisioning: work based on new API

**What this PR does / why we need it**:

The PR has been split to 3 parts:

Part1: kubernetes#63232 for basic scheduler and PV controller plumbing
Part2: kubernetes#63233 for API change

and the PR itself includes work based on the API change:

- Dynamic provisioning allowed topologies scheduler work
- Update provisioning interface to be aware of selected node and topology

**Which issue(s) this PR fixes** 
Feature: kubernetes/enhancements#561
Design: kubernetes/community#2168

**Special notes for your reviewer**:
/sig storage
/sig scheduling
/assign @msau42 @jsafrane @saad-ali @bsalamat

@kubernetes/sig-storage-pr-reviews
@kubernetes/sig-scheduling-pr-reviews

**Release note**:

```release-note
Volume topology aware dynamic provisioning
```

k8s-github-robot pushed a commit that referenced this pull request Jul 9, 2018

Kubernetes Submit Queue
Merge pull request #64226 from ddebroy/ddebroy-affinity1
Automatic merge from submit-queue (batch tested with PRs 64226, 65880). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Populate NodeAffinity on top of labels for cloud based PersistentVolumes

**What this PR does / why we need it**:

This PR populates the NodeAffinity field (on top of the existing labels) for PVs backed by cloud providers like EC2 EBS and GCE PD.

**Special notes for your reviewer**:
Related to #63232

Sample `describe pv` output for EBS with node affinity field populated:
```
kubectl describe pv pv0001
Name:              pv0001
Labels:            failure-domain.beta.kubernetes.io/region=us-west-2
                   failure-domain.beta.kubernetes.io/zone=us-west-2a
Annotations:       <none>
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      
Status:            Available
Claim:             
Reclaim Policy:    Retain
Access Modes:      RWO
Capacity:          5Gi
Node Affinity:     
  Required Terms:  
    Term 0:        failure-domain.beta.kubernetes.io/zone in [us-west-2a]
                   failure-domain.beta.kubernetes.io/region in [us-west-2]
Message:           
Source:
    Type:       AWSElasticBlockStore (a Persistent Disk resource in AWS)
    VolumeID:   vol-00cf03a068c62cbe6
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
Events:         <none>
```

/sig storage
/assign @msau42

**Release note**:
```NONE```

k8s-github-robot pushed a commit that referenced this pull request Aug 1, 2018

Kubernetes Submit Queue
Merge pull request #65730 from ddebroy/ebs-affinity1
Automatic merge from submit-queue (batch tested with PRs 65730, 66615, 66684, 66519, 66510). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add DynamicProvisioningScheduling support for EBS

**What this PR does / why we need it**:

This PR adds support for the DynamicProvisioningScheduling feature in EBS. With this in place, if VolumeBindingMode: WaitForFirstConsumer is specified in a EBS storageclass and DynamicProvisioningScheduling is enabled, EBS provisioner will use the selected node's LabelZoneFailureDomain as the zone to provision the EBS volume in.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:
Related to #63232

Sample `describe pv` output with NodeAffinity populated:

```
~$ kubectl describe pv pvc-f9d2138b-7e3e-11e8-a4ea-064124617820
Name:              pvc-f9d2138b-7e3e-11e8-a4ea-064124617820
Labels:            failure-domain.beta.kubernetes.io/region=us-west-2
                   failure-domain.beta.kubernetes.io/zone=us-west-2a
Annotations:       kubernetes.io/createdby=aws-ebs-dynamic-provisioner
                   pv.kubernetes.io/bound-by-controller=yes
                   pv.kubernetes.io/provisioned-by=kubernetes.io/aws-ebs
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      slow3
Status:            Bound
Claim:             default/pvc3
Reclaim Policy:    Delete
Access Modes:      RWO
Capacity:          6Gi
Node Affinity:     
  Required Terms:  
    Term 0:        failure-domain.beta.kubernetes.io/zone in [us-west-2a]
                   failure-domain.beta.kubernetes.io/region in [us-west-2]
Message:           
Source:
    Type:       AWSElasticBlockStore (a Persistent Disk resource in AWS)
    VolumeID:   aws://us-west-2a/vol-0fc1cdae7d10860f6
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
Events:         <none>
```

**Release note**:
```release-note
none
```

/sig storage
/assign @msau42 @jsafrane
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.