Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for external dynamic provisioners #30285

Merged

Conversation

@jsafrane
Copy link
Member

commented Aug 9, 2016

@kubernetes/sig-storage


This change is Reviewable

@googlebot googlebot added the cla: yes label Aug 9, 2016


## Design

This design represents the minimally viable changes required to provision based on storage classe configuration. Additional incremental features may be added as a separte effort.

We propose that:

1. For the base impelementation storage class and volume selectors are mutually exclusive.
1. For the base implementation storage class and volume selectors are mutually exclusive for now.

2. An api object will be incubated in extensions/v1beta1 named `storage` to hold the a `StorageClass`
API resource. Each StorageClass object contains parameters required by the provisioner to provision volumes of that class. These parameters are opaque to the user.

This comment has been minimized.

Copy link
@markturansky

markturansky Aug 9, 2016

Member

opaque to the user or to the system? The user (admin?) would know what 'gp2' is but the system wouldn't.

This comment has been minimized.

Copy link
@childsb

childsb Aug 9, 2016

Member

user as in the user creating the PVC.

0. Kubernetes administator can configure name of a default StorageClass. This
StorageClass instance is then used when user requests a dynamically
provisioned volume, but does not specify a StorageClass. In other words,
`claim.Spec.Class == ""`

This comment has been minimized.

Copy link
@markturansky

markturansky Aug 9, 2016

Member

Should this potentially be expanded to something like Params?

Assume that StorageClass is the correct place to configure whether or not a volume is encrypted. We need a way to pass an encryption key to the provisioner.

A new SecretType can be added to manage these keys for the user. The PVC would therefore need both claim.Spec.Class and claim.Spec.Secret as params to the provisioner.

All GCE volumes are encrypted by default, so an empty Secret means GCE creates its own keypair. AWS is optionally encrypted, so an empty key might mean "no encryption" or it might mean "generate a keypair".

This comment has been minimized.

Copy link
@childsb

childsb Aug 10, 2016

Member

the use case for secret is along the lines of control API credentials used during the creation of the asset (cloud credentials, admin RBD secret, login to fibre channel provisioning API etc..) a secret as a parameter to provisioner used to encrypt the volume or similar would result in every volume created by that provisioner using the same secret.

a better use case is the provisioner creating a unique secret for every provisioned asset.

This comment has been minimized.

Copy link
@markturansky

markturansky Aug 10, 2016

Member

In my suggestion above, I was saying each PVC would supply a Secret so that each volume is encrypted with a specific key the user controls.

For OpenShift Online, it would be desirable for each user to supply the keys they want to use and we'd use that key to encrypt the volume. Storing the keys in a Secret makes sense, though we need to work out how that key gets into AWS/GCE (keys are referenced by name in the EBS API).

This comment has been minimized.

Copy link
@jsafrane

jsafrane Aug 10, 2016

Author Member

Original docs/proposals/volume-provisioning.md specifies that the provisioned PV can be configured by PVC labels. Concrete implementation is not written yet, but I expect something like
claim.Labels['kubernetes.io/aws-ebs/zone"] = "us-east-1d". The provisioner then evaluates both claim.Class and claim.Labels. It would work the same for hypothetical encryption key ID or anything that the provisioner implements.

This comment has been minimized.

Copy link
@markturansky

markturansky Aug 17, 2016

Member

I talked with our Ops team and they said a key per namespace was not necessary. End user supplied keys was too much.

Instead, 1 key per StorageClass where admins manually create the keys is sufficient.

@bgrant0607 bgrant0607 assigned saad-ali and matchstick and unassigned bgrant0607 Aug 9, 2016

@bgrant0607

This comment has been minimized.

Copy link
Member

commented Aug 9, 2016

as internal provisioners do and user gets the same experience.

* The provisioner must implement also a deleter. It must be able to delete
storage assets it created.

This comment has been minimized.

Copy link
@childsb

This comment has been minimized.

Copy link
@jsafrane

jsafrane Aug 10, 2016

Author Member

as proposed below, the same plugin / external provisioner would be used to delete volumes.

We could add a separate annotation that provisioner could use to select another deleter, but do we need it?

storage assets it created.

The external provisioner runs in a separate process which watches claims. For
every claim creation or update, it implements these steps:

This comment has been minimized.

Copy link
@markturansky

markturansky Aug 10, 2016

Member

I can be the guinea pig to implement/test this, as we already have one downstream in Online.

shown that it is impossible to anticipate every aspect and manner of
provisioning that administrators will want to perform. The proposed design
should not prevent future work to allow out-of-tree provisioners.
project.

## Design

This design represents the minimally viable changes required to provision based on storage classe configuration. Additional incremental features may be added as a separte effort.

This comment has been minimized.

Copy link
@quolix

quolix Aug 11, 2016

Typo: classe

@k8s-bot

This comment has been minimized.

Copy link

commented Aug 30, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.


1. If the create operation for the `api.PersistentVolume` fails, it is
retried
5. If the deleter is not known to Kubernetes, it does not do anything.

This comment has been minimized.

Copy link
@wongma7

wongma7 Sep 9, 2016

Contributor

I thought the new annotation was the only change needed to implement this proposal, this is another one since right now kubernetes sets PV to failed if it can't find a deleter.

Just noting because I know I said annotation's the only change a couple times

This comment has been minimized.

Copy link
@jsafrane

jsafrane Sep 13, 2016

Author Member

@wongma7, good chatch. I filled pr #32565 not to report any error when external deleter is requested.

@jsafrane jsafrane force-pushed the jsafrane:external-provisioner-proposal branch from cb44799 to 5082776 Sep 19, 2016

@jsafrane jsafrane force-pushed the jsafrane:external-provisioner-proposal branch 3 times, most recently from d82402d to 96c643e Sep 19, 2016

@jsafrane

This comment has been minimized.

Copy link
Member Author

commented Sep 19, 2016

  • rebased
  • reformulated the protocol between external provisioners and Kubernetes to be very strict
  • added an example PVC+PV
@k8s-bot

This comment has been minimized.

Copy link

commented Sep 19, 2016

GCE e2e build/test passed for commit d82402d.

`api.PersistentVolume`, fills its `Class` attribute with `claim.Spec.Class`
and makes it already bound to the claim

1. If the create operation for the `api.PersistentVolume` fails, it is

This comment has been minimized.

Copy link
@rootfs

rootfs Sep 22, 2016

Member

is the numbering here and blow correct?

This comment has been minimized.

Copy link
@jsafrane

jsafrane Sep 26, 2016

Author Member

markdown expands it to different levels, i.e. this "1." is rendered as "a.", see "rich diff" at https://github.com/kubernetes/kubernetes/pull/30285/files?short_path=3cb6e02#diff-3cb6e02c12e0f637a83317ce28eb2c8c

Please report any problems with wrong cross-references, I might have missed some.. And yes, it's hard to review from looking at the .md file, better review rendered markdown or rich diff.


7. If `Provision` returns an error, the controller generates an event on the

This comment has been minimized.

Copy link
@rootfs

rootfs Sep 22, 2016

Member

should the provisioner be blacklisted if it keeps trashing?

This comment has been minimized.

Copy link
@jsafrane

jsafrane Sep 26, 2016

Author Member

I would leave such optimization to future when we see they're needed. The first implementation would IMO work well without it.

@jsafrane

This comment has been minimized.

Copy link
Member Author

commented Sep 27, 2016

@thockin ping?

@jsafrane jsafrane force-pushed the jsafrane:external-provisioner-proposal branch from 5813ff2 to 979b890 Oct 4, 2016

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2016

Jenkins unit/integration failed for commit 979b890. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Kubernetes was not blocked in any way during the provisioning and could
either bound the claim to another PV that was created by user or even the
claim may have been deleted by the user. In both cases, Kubernetes will mark
the PV to be delete using the protocol below.

This comment has been minimized.

Copy link
@cdragga

cdragga Oct 14, 2016

There are a couple of other race conditions here, I think, as hinted at by the paragraph below. Specifically,

  1. The claim can be changed to point to a storage class that uses a different provisioner.
  2. The claim can be resized to request more storage than it originally did.
  3. The claim can be resized to request less storage than it originally did.

Item 3 is a non-issue (the claim will bind to the PVC), but items 1 and 2 will likely cause problems. I'm unsure what the desired semantics for item 1 should be, as there'll be nothing to stop the binding from occurring, even though the user no longer wants to use the external provisioner. In the case of item 2, the binding won't happen, and the incorrectly-sized volume should probably be deleted and the binding retried.

This comment has been minimized.

Copy link
@jsafrane

jsafrane Oct 19, 2016

Author Member
  1. this is indeed possible in rare cases, however if two provisioners create a PV for one PVC, the PV controller will bind the PVC to one of them and it will mark the "loosing" one for deletion.

2-3) we do not support PVC resizing (yet), PVC are be immutable after creation.


* `pv.Spec.PersistentVolumeReclaimPolicy` SHOULD be set to `Delete` unless
user manually configures other reclaim policy.

This comment has been minimized.

Copy link
@cdragga

cdragga Oct 14, 2016

Should there be an explicit mechanism for doing this that all external provisioners should support, like an annotation in the PVC? Should this also apply to internal provisioners?

This comment has been minimized.

Copy link
@jsafrane

jsafrane Oct 19, 2016

Author Member

We're not there yet, maybe in the future.

`pv.Status.Phase == Released && pv.Annotations['pv.kubernetes.io/provisioned-by'] == <deleter name>`,
the deleter:

* It MUST delete the storage asset.

This comment has been minimized.

Copy link
@cdragga

cdragga Oct 14, 2016

This seems to contradict line 260, which implies that other reclaim options should be possible. In cases where the policy is set to "Retain", a useful approach would be to retain the storage asset until the PV itself was deleted, rather than the PVC. "Recycle" should probably have similar semantics to its current behavior, although the recycling process could be left to the plugin itself, possibly.

This comment has been minimized.

Copy link
@jsafrane

jsafrane Oct 19, 2016

Author Member

good point, the delete should check reclaim policy first. I'll add a note about it.

@jsafrane jsafrane force-pushed the jsafrane:external-provisioner-proposal branch from 979b890 to 6bdf5a8 Oct 19, 2016

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Oct 19, 2016

Jenkins GCI GKE smoke e2e failed for commit 6bdf5a8. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

deleter SHOULD retry to delete the volume periodically.

* The deleter SHOULD NOT use any information from StorageClass instance
referenced by the PV.

This comment has been minimized.

Copy link
@humblec

humblec Oct 21, 2016

Contributor

@jsafrane are you referring deleter of the external provisioner here ? if yes, we may require the storage class parameters at time of deleter() invocation. Isnt it ?

This comment has been minimized.

Copy link
@jsafrane

jsafrane Oct 25, 2016

Author Member

I'd like to move away from StorageClass to be present at the time of deletion. The only reason we need it there is for internal provisioners and their handling of secrets. External provisioners can have secrets configured in a different way, e.g. as Secret instance attached to the pod where the provisioner runs. So, let's suggest that the class may be missing at the time when a deleter is invoked. I'll add some text about that.

This comment has been minimized.

Copy link
@humblec

humblec Oct 26, 2016

Contributor

@jsafrane thanks, but then, if there are more than one storage class, the external provisioner need to keep track of the parameters of each SC internally and fetch and use it whenever a delete request comes in. iic on this assumption, it will be difficult to maintain the pointers to SC and PVs when deleter is invoked in the external provisioners . Isnt it ?

This comment has been minimized.

Copy link
@jsafrane

jsafrane Oct 26, 2016

Author Member

the provisioner still can pass parameters to deleter via PV annotations. Just don't pass pointers to secret this way.

This comment has been minimized.

Copy link
@humblec

humblec Oct 27, 2016

Contributor

@jsafrane thanks for the clarification ! that make sense :)

claim and goes back to step 1., i.e. it will retry provisioning periodically
The external provisioner runs in a separate process which watches claims, be
it an external storage appliance, a daemon or a Kubernetes pod. For every
claim creation or update, it implements these steps:

This comment has been minimized.

Copy link
@humblec

humblec Oct 21, 2016

Contributor

@jsafrane if its running as a pod, do we need to make sure the pod has certain privileges to serve the claims ? I mean, the access to the controller or the api server or something of that sort?

This comment has been minimized.

Copy link
@childsb

childsb Oct 24, 2016

Member

The pod's privileges are up to the storage admin & external provisioner author and shouldn't need accounting for in the binding controller.

This comment has been minimized.

Copy link
@humblec

humblec Oct 27, 2016

Contributor

Thanks @childsb .

@childsb childsb self-assigned this Oct 25, 2016

@childsb
Copy link
Member

left a comment

We should probably call out some lessons learned on security -

  • Do not store security sensitive properties on the PVC.

What happens when a PV is released belonging to a storageClass thats been deleted?


## Design

This design represents the minimally viable changes required to provision based on storage class configuration. Additional incremental features may be added as a separate effort.

We propose that:

1. For the base impelementation storage class and volume selectors are mutually exclusive.
1. For the base implementation storage class and volume selectors are mutually exclusive for now.

This comment has been minimized.

Copy link
@childsb

childsb Oct 25, 2016

Member

I think we could relax or undefine the behavior for label selectors and storage classes for out of tree provisioners. The only thing that needs to be enforced is that PVC.size < PV.size, PV.storageClass = PVC.storageClass and PVC.selector matches PV.labels.

None of the in-tree provisioners support labels+storageClass at this time, but there's no real way we can enforce out-of-tree behavior for storageClass and label selectors for provisioners.

Possible update: "For the in-tree storage provisioners, a PVC with labelSelectors is ignored for provisioning. For out of tree provisioners, the PV created by the provisioner must be labeled such that it matches the PVC selector."

This comment has been minimized.

Copy link
@jsafrane

jsafrane Oct 25, 2016

Author Member

done, thanks for a nice sentence

@jsafrane

This comment has been minimized.

Copy link
Member Author

commented Oct 25, 2016

Do not store security sensitive properties on the PVC.

Added "Security considerations" paragraph.

@jsafrane jsafrane force-pushed the jsafrane:external-provisioner-proposal branch from 6bdf5a8 to f30834e Oct 26, 2016

to achieve, it SHOULD report an error and it MUST NOT provision a volume.
All errors found during parsing or provisioning SHOULD be send as events
on the claim and the provisioner SHOULD retry periodically with step i.

This comment has been minimized.

Copy link
@kangarlou

kangarlou Oct 26, 2016

Contributor

The specification says "for in-tree provisioners a PVC with labelSelectors is ignored for provisioning" and "claim.Spec.Selector and claim.Spec.Class are mutually exclusive. User can either match existing volumes with Selector XOR match existing volumes with Class and get dynamic provisioning by using Class." However, external provisioners "MUST parse arguments in the StorageClass and claim.Spec.Selector and provisions appropriate storage asset that matches both the parameters and the selector." Why is there a distinction between in-tree and external provisioners? Can external provisioners ignore PVCs with labelSelectors?

This comment has been minimized.

Copy link
@jsafrane

jsafrane Oct 27, 2016

Author Member

You are right, the text is different for internal and external provisioners, however the result is the same: if there is a selector in a PVC, the provisioner must provision something that matches both the class and the selector. In-tree provisioners are stupid and don't parse selectors right now (this will change in future, 1.6?) and therefore refuse to provision anything with any selector. I would advise external provisoners do the same - either refuse to provision a claim with any selector or explicitly select few labels that are allowed in a PVC selector and document those as additional way how to configure a volume. Therefore "MUST parse claim.Spec.Selector" may be just if pvc.Spec.Selector != nil { return Error("can't parse selector!") }

I'll try to capture this idea in the proposal...

This comment has been minimized.

This comment has been minimized.

Copy link
@kangarlou

kangarlou Oct 27, 2016

Contributor

Thanks for the clarification! Is using selector as a means to passing arguments to provisioners an interim capability (say to support dynamic provisioning for releases before 1.4 where StorageClass abstraction didn't exist) or a capability planned for the long term?

This comment has been minimized.

Copy link
@jsafrane

jsafrane Oct 27, 2016

Author Member

For internal provisioners (GCE/AWS/Cinder) it's a feature we'd like introduce eventually, maybe in 1.6 (patches welcome!), external provisioner can go wild and parse selectors as they want, Kubernetes will not prevent them in doing so in any release.

@jsafrane jsafrane force-pushed the jsafrane:external-provisioner-proposal branch from f30834e to 9519753 Oct 27, 2016

@childsb

This comment has been minimized.

Copy link
Member

commented Oct 31, 2016

As mentioned in last weeks storage SIG i'm marking this for merge as reviewed by the community. Thanks everyone.

@childsb childsb added the lgtm label Oct 31, 2016

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2016

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 3e3ff44 into kubernetes:master Oct 31, 2016

12 checks passed

Jenkins GCE Node e2e Build succeeded.
Details
Jenkins GCE e2e Build succeeded.
Details
Jenkins GCE etcd3 e2e Build succeeded.
Details
Jenkins GCI GCE e2e Build succeeded.
Details
Jenkins GCI GKE smoke e2e Build succeeded.
Details
Jenkins GKE smoke e2e Build succeeded.
Details
Jenkins Kubemark GCE e2e Build succeeded.
Details
Jenkins unit/integration Build succeeded.
Details
Jenkins verification Build succeeded.
Details
Submit Queue Queued to run github e2e tests a second time.
Details
cla/google All necessary CLAs are signed
cla/linuxfoundation jsafrane authorized
Details
@jsafrane

This comment has been minimized.

Copy link
Member Author

commented Nov 1, 2016

implementation of the proposal: #35957

@thockin

This comment has been minimized.

Copy link
Member

commented Nov 7, 2016

LGTM

xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016
Kubernetes Submit Queue
Merge pull request kubernetes#30285 from jsafrane/external-provisione…
…r-proposal

Automatic merge from submit-queue

Proposal for external dynamic provisioners

@kubernetes/sig-storage
k8s-github-robot pushed a commit that referenced this pull request Feb 2, 2017
Kubernetes Submit Queue
Merge pull request #40308 from wongma7/pv-rbac
Automatic merge from submit-queue (batch tested with PRs 40574, 40806, 40308, 40771, 39440)

Add bootstrap cluster role for external pv provisioners

The set of permissions an external provisioner #30285 running as a pod will need. Technically in order to dynamically provision PVs one doesn't need to "update" PVCs or "watch" events but the controller https://github.com/kubernetes-incubator/nfs-provisioner/tree/master/controller we are recommending people use does those things to: set lock annotations on PVCs and watch `ProvisioningSucceeded`/`ProvisioningFailed` events.

Some external provisioners may need additional permissions, for example nfs-provisioner requires "get" access to Services and Endpoints when run "statefully." I think in that case we would recommend creating a new ClusterRole specific to that provisioner, using this as a base?

(This was to be a part of my redo/fix of the external e2e test #39545 but I'm submitting it as a separate PR for now due to some issues I had with running nfs-provisioner on gce.)

@kubernetes/sig-auth-misc ?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.