New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PersistentVolume Config & Provisioning proposal #17056

Closed
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
@markturansky
Member

markturansky commented Nov 10, 2015

This proposal ties together several pieces of PersistentVolumes that are currently missing or sub-optimal.

See provisioning.md for details.

Summary of proposed changes:

  1. Structured config data -- file config? extended API?
  2. Allow many recyclers and provisioners (we have 1 today) -- #13338
  3. Use selectors for everything (#14908)
  4. Refactor provisioning code from volumes plugin code (#14217)
  5. executable driver model for recyclers and provisioners

Tasks:

  1. Add pvc.Spec.PersistentVolumeSelector
  2. Expose more volume attributes (e.g, EBS volumeType) A map of arguments was added to config and plugins. Varying attribute types to be set there and to be used by the plugin.
  3. Add config data and initialize controller w/ config
  4. Controller consolidation: PRs #14537 and #16432 (issue #15632)
  5. Make controllers work with selectors per the proposal.
  6. Refactor current binding/recycling behavior into plugins and use in default config
  7. Bonus: executable drivers -- not required immediately

@thockin @saad-ali @kubernetes/rh-storage

A tech design session to discuss this would be helpful. Many of the tasks will be parallelizable given enough review bandwidth.

Show outdated Hide outdated docs/design/provisioning.md
Binders contains two selectors against which they match claims for binding. One selector matches claims and the other selector matches volumes. Both must be true in order to bind, in addition to other matching semantics (e.g, capacity). Claims can remain unbound indefinitely.
Provisioners create resources on demand for claims matching its selector. PersistentVolumeClaims with matching labels will be dynamically provisioned by a plugin named in the config. Volume attributes are copied from a template. A specific security context can be assigned to the provisioner. Volumes created by the provisioner automatically bind to the claim for which it was provisioned.

This comment has been minimized.

@deads2k

deads2k Nov 11, 2015

Contributor

I think the system will be easier to understand if provisioners simply add the PV and allow a Binder to handle the binding step.

@deads2k

deads2k Nov 11, 2015

Contributor

I think the system will be easier to understand if provisioners simply add the PV and allow a Binder to handle the binding step.

This comment has been minimized.

@markturansky

markturansky Nov 11, 2015

Member

That's exactly what happens now. I will make this text clearer.

The provisioner creates the PV w/ a ClaimRef to the Claim that was provisioned. The ClaimRef on the PV automatically binds to the PVC.

@markturansky

markturansky Nov 11, 2015

Member

That's exactly what happens now. I will make this text clearer.

The provisioner creates the PV w/ a ClaimRef to the Claim that was provisioned. The ClaimRef on the PV automatically binds to the PVC.

This comment has been minimized.

@deads2k

deads2k Nov 11, 2015

Contributor

The provisioner creates the PV w/ a ClaimRef to the Claim that was provisioned. The ClaimRef on the PV automatically binds to the PVC.

I don't know that I'm pro-claimRef. Having a provisioner create things seems great, but specifically tagging them feels more questionable. If its only job is to create PVs and the binder can decide what to do with the PV, it seems like a cleaner separation of concerns.

@deads2k

deads2k Nov 11, 2015

Contributor

The provisioner creates the PV w/ a ClaimRef to the Claim that was provisioned. The ClaimRef on the PV automatically binds to the PVC.

I don't know that I'm pro-claimRef. Having a provisioner create things seems great, but specifically tagging them feels more questionable. If its only job is to create PVs and the binder can decide what to do with the PV, it seems like a cleaner separation of concerns.

This comment has been minimized.

@markturansky

markturansky Nov 11, 2015

Member

Making a ClaimRef is the transaction that forms a bind. If a provisioner is using a claim from which it creates a PV, why not also add the ClaimRef on the very transaction that makes the PV?

It succeeds and is bound or it fails and there is no PV.

The binding order part of the doc explains the precedence when binding. Volumes first match their ClaimRef.

@markturansky

markturansky Nov 11, 2015

Member

Making a ClaimRef is the transaction that forms a bind. If a provisioner is using a claim from which it creates a PV, why not also add the ClaimRef on the very transaction that makes the PV?

It succeeds and is bound or it fails and there is no PV.

The binding order part of the doc explains the precedence when binding. Volumes first match their ClaimRef.

Show outdated Hide outdated docs/design/provisioning.md
Administrators map binders, provisioners, and recyclers to specific types of persistent claims and volumes by creating a PersistentStorageConfig file. The config file is set as a CLI flag.
Binders contains two selectors against which they match claims for binding. One selector matches claims and the other selector matches volumes. Both must be true in order to bind, in addition to other matching semantics (e.g, capacity). Claims can remain unbound indefinitely.

This comment has been minimized.

@deads2k

deads2k Nov 11, 2015

Contributor

Will a Binder look for a Provisioner to kick if no matching claims are available? Also, will a binder update the status of a PVC to indicate that it looked for a match, but couldn't find one? That could be used to indirectly kick the Provisioner and it would easily allow for adding a provisioner after the fact.

@deads2k

deads2k Nov 11, 2015

Contributor

Will a Binder look for a Provisioner to kick if no matching claims are available? Also, will a binder update the status of a PVC to indicate that it looked for a match, but couldn't find one? That could be used to indirectly kick the Provisioner and it would easily allow for adding a provisioner after the fact.

This comment has been minimized.

@markturansky

markturansky Nov 11, 2015

Member

I'd love your help reviewing that logic in the controller when I implement this functionality :) Your feedback on the last one was very helpful.

@markturansky

markturansky Nov 11, 2015

Member

I'd love your help reviewing that logic in the controller when I implement this functionality :) Your feedback on the last one was very helpful.

This comment has been minimized.

@smarterclayton

smarterclayton Nov 19, 2015

Contributor

It should be possible to define a default provisioner and a default binder. To me this proposal is roughly analogous to schedulers - there must be a default (even if the default is do nothing, or bind anything).

@smarterclayton

smarterclayton Nov 19, 2015

Contributor

It should be possible to define a default provisioner and a default binder. To me this proposal is roughly analogous to schedulers - there must be a default (even if the default is do nothing, or bind anything).

This comment has been minimized.

@markturansky

markturansky Nov 19, 2015

Member

The current binding behavior would be the default. I will make that into a plugin and add it to the default configuration. Same with the recycler.

@markturansky

markturansky Nov 19, 2015

Member

The current binding behavior would be the default. I will make that into a plugin and add it to the default configuration. Same with the recycler.

This comment has been minimized.

@markturansky

markturansky Nov 19, 2015

Member

Am I wrong to think of the binder and provisioner as the same? They both work in reaction to a claim's selector.

One implementation might create a PersistentVolume object pre-populated with a ClaimRef. Another implementation might look for PV's without a ClaimRef and check for a match. Either way, the return value is a PersistentVolume w/ ClaimRef bound to the claim (return value can also be nil)

A different process reacts to a newly created PV with phase Provisioning and creates the resource in the infrastructure. After fulfillment in the infrastructure, the volume is Available and the current claim sync period would bring the claim to Bound phase and ready for use.

OOTB behavior is the existing "binder" as a plugin, but I don't think I need to introduce a BinderPlugin when the provisioner can do whatever it wants to find/create a PV for a PVC.

@markturansky

markturansky Nov 19, 2015

Member

Am I wrong to think of the binder and provisioner as the same? They both work in reaction to a claim's selector.

One implementation might create a PersistentVolume object pre-populated with a ClaimRef. Another implementation might look for PV's without a ClaimRef and check for a match. Either way, the return value is a PersistentVolume w/ ClaimRef bound to the claim (return value can also be nil)

A different process reacts to a newly created PV with phase Provisioning and creates the resource in the infrastructure. After fulfillment in the infrastructure, the volume is Available and the current claim sync period would bring the claim to Bound phase and ready for use.

OOTB behavior is the existing "binder" as a plugin, but I don't think I need to introduce a BinderPlugin when the provisioner can do whatever it wants to find/create a PV for a PVC.

Show outdated Hide outdated docs/design/provisioning.md
// Driver is the name of the plugin to use for this Recycle or the name of the driver located beneath
// /usr/libexec/kubernetes/controllermanager-plugins/provisioning/exec
// A Recycler must specify a driver/plugin name.
Driver string

This comment has been minimized.

@deads2k

deads2k Nov 11, 2015

Contributor

We're currently using the RecyclerPodTemplate right? If so, does this have to be in this particular pull? It seems like it would add something to argue about without providing a lot of power.

@deads2k

deads2k Nov 11, 2015

Contributor

We're currently using the RecyclerPodTemplate right? If so, does this have to be in this particular pull? It seems like it would add something to argue about without providing a lot of power.

This comment has been minimized.

@markturansky

markturansky Nov 11, 2015

Member

You're referencing the driver in this comment? If so, I agree and yes, just the pod template would suffice for today's functionality.

@markturansky

markturansky Nov 11, 2015

Member

You're referencing the driver in this comment? If so, I agree and yes, just the pod template would suffice for today's functionality.

Show outdated Hide outdated docs/design/provisioning.md
// A Recycler must specify a driver/plugin name.
Driver string
// Args is a map of key/value pairs passed to the driver (but not plugin) as arguments
Args *map[string]string

This comment has been minimized.

@deads2k

deads2k Nov 11, 2015

Contributor

If you're going to have a Driver and this is part of its config, I'd create a struct to hold and clearly relate these two pieces. That would also provide an easier way to indicate mutual exclusivity with RecyclerPodTemplate

@deads2k

deads2k Nov 11, 2015

Contributor

If you're going to have a Driver and this is part of its config, I'd create a struct to hold and clearly relate these two pieces. That would also provide an easier way to indicate mutual exclusivity with RecyclerPodTemplate

This comment has been minimized.

@markturansky

markturansky Nov 11, 2015

Member

This is probably correct, but for the short term, I'm thinking of dropping the executable driver model for the first pass. So long as we don't preclude it in the future.

The current recycler just needs a pod, so a template w/ labels to match specific volumes makes sense.

The provisioners need to work for Cloud APIs, and they do today. These, too, can be configured with what is proposed.

I think an executable driver is v2 of this feature.

@markturansky

markturansky Nov 11, 2015

Member

This is probably correct, but for the short term, I'm thinking of dropping the executable driver model for the first pass. So long as we don't preclude it in the future.

The current recycler just needs a pod, so a template w/ labels to match specific volumes makes sense.

The provisioners need to work for Cloud APIs, and they do today. These, too, can be configured with what is proposed.

I think an executable driver is v2 of this feature.

Show outdated Hide outdated docs/design/provisioning.md
Args *map[string]string
}
type ProvisionerConfig struct {

This comment has been minimized.

@deads2k

deads2k Nov 11, 2015

Contributor

Would it be easier for people to produce and debug PersistentVolumeProvisioner plugins if they were simply pods that were called similar to recyclers? It seems like people may be more comfortable writing bash. That would allow a company to create their own custom procedure layered on top and it would allow easy debugging.

@deads2k

deads2k Nov 11, 2015

Contributor

Would it be easier for people to produce and debug PersistentVolumeProvisioner plugins if they were simply pods that were called similar to recyclers? It seems like people may be more comfortable writing bash. That would allow a company to create their own custom procedure layered on top and it would allow easy debugging.

This comment has been minimized.

@markturansky

markturansky Nov 11, 2015

Member

Not sure what you mean about "customer procedure" on top, unless you're referring to the driver model. I think the driver model is not strictly needed. It's a good enhancement but existing functionality can be implemented without it.

When/if we do the driver model, the pattern is clearly laid out in Networking in a way that makes it sane and safe. That pattern is being replicated in FlexVolume. It's the right way to go if we need this feature now, which I don't think we strictly do.

@markturansky

markturansky Nov 11, 2015

Member

Not sure what you mean about "customer procedure" on top, unless you're referring to the driver model. I think the driver model is not strictly needed. It's a good enhancement but existing functionality can be implemented without it.

When/if we do the driver model, the pattern is clearly laid out in Networking in a way that makes it sane and safe. That pattern is being replicated in FlexVolume. It's the right way to go if we need this feature now, which I don't think we strictly do.

@rootfs

This comment has been minimized.

Show comment
Hide comment
@rootfs

rootfs Nov 11, 2015

Member

A customer recycler may choose snapshot to recycle the volume: once the claim is bound, the volume is snapshot; during recycle, the snapshot is restored. In this case, both pv claim and the recycler have to be stateful: recycler must know the name of the snapshot that created during claim.

Member

rootfs commented Nov 11, 2015

A customer recycler may choose snapshot to recycle the volume: once the claim is bound, the volume is snapshot; during recycle, the snapshot is restored. In this case, both pv claim and the recycler have to be stateful: recycler must know the name of the snapshot that created during claim.

Show outdated Hide outdated docs/design/provisioning.md
Provisioning and Recycling both follow a plugin model where the plugin accepts generic arguments. A future enhancement can scan for drivers by name and consume the same arguments (following a similar pattern in [Networking](../../pkg/kubelet/network/exec/exec.go)).
The plugin model is similar to volume plugins, where each has a unique name and implements various interfaces. Arguments as passed as string key/value pairs to a plugin in `Init(host, args)`, which will be called exactly once before any plugin operation is invoked. Many provisioners can be configured using the same plugin with different arguments, such as "volumeType" for a Cloud API.

This comment has been minimized.

@smarterclayton

smarterclayton Nov 19, 2015

Contributor

I'm not yet sold on the model proposed here. Volumes suffer from a serious problem today - they aren't extensible except via checking code into Kube. A provisioner and a binder are controllers - the details of how they bind are really their own domain. An admin configures their applicability - that problem is solely the domain of administrators.

I'm going to describe an alternate model that I think is superior:

  1. We treat binders and provisioners as sets of controllers
  2. The implementation of those controllers is shared, and may be in the Kube source tree or another
  3. It must be easy for someone to glue together their own provisioner or binder with or without writing Go code
  4. When someone wants to write a new provisioner, they simply watch for PVCs based on some criteria
  5. If they are going to provision for the PVC, they annotate the PVC first (to take ownership) and then create a PV, then satisfy the PVC
  6. There should be a generic provisioner controller that can shell out to bash. The arguments to bash are details about the PVC. The arguments to the controller are a set of PVC labels to watch. The bash command is responsible for provisioning a PV and then returning 0 and the name of the new PV, or 1 and the exit code.
  7. There can be in-tree provisioners that replace the call out to bash with "call out to Go code compiled with me" via the interfaces described here.

The model above defines how anyone (not just a Kube programmer) can easily extend the logic. An admin can separate those controllers by their label matchers.

Kubernetes is intended to be a compositional system - whenever we add new abstractions, we have to clear a pretty high bar that the abstraction is easily composible. I think the proposal here does not go far enough to focus on that composability and extensibility.

@smarterclayton

smarterclayton Nov 19, 2015

Contributor

I'm not yet sold on the model proposed here. Volumes suffer from a serious problem today - they aren't extensible except via checking code into Kube. A provisioner and a binder are controllers - the details of how they bind are really their own domain. An admin configures their applicability - that problem is solely the domain of administrators.

I'm going to describe an alternate model that I think is superior:

  1. We treat binders and provisioners as sets of controllers
  2. The implementation of those controllers is shared, and may be in the Kube source tree or another
  3. It must be easy for someone to glue together their own provisioner or binder with or without writing Go code
  4. When someone wants to write a new provisioner, they simply watch for PVCs based on some criteria
  5. If they are going to provision for the PVC, they annotate the PVC first (to take ownership) and then create a PV, then satisfy the PVC
  6. There should be a generic provisioner controller that can shell out to bash. The arguments to bash are details about the PVC. The arguments to the controller are a set of PVC labels to watch. The bash command is responsible for provisioning a PV and then returning 0 and the name of the new PV, or 1 and the exit code.
  7. There can be in-tree provisioners that replace the call out to bash with "call out to Go code compiled with me" via the interfaces described here.

The model above defines how anyone (not just a Kube programmer) can easily extend the logic. An admin can separate those controllers by their label matchers.

Kubernetes is intended to be a compositional system - whenever we add new abstractions, we have to clear a pretty high bar that the abstraction is easily composible. I think the proposal here does not go far enough to focus on that composability and extensibility.

This comment has been minimized.

@markturansky

markturansky Nov 19, 2015

Member

I believe the driver model in the networking plugins allows that, by calling out safely to a binary at a known location. That's your bash model.

Plugins require compilation into Kube, of course. If admins configure no plugins, PVCs are pending indefinitely and require outside help.

This leads me to think we need examples of !Go languages watching the apiserver and responding in kind. In the end, my suggested plugins running in a controller are also just working against the API. All controllers are the same that way.

@markturansky

markturansky Nov 19, 2015

Member

I believe the driver model in the networking plugins allows that, by calling out safely to a binary at a known location. That's your bash model.

Plugins require compilation into Kube, of course. If admins configure no plugins, PVCs are pending indefinitely and require outside help.

This leads me to think we need examples of !Go languages watching the apiserver and responding in kind. In the end, my suggested plugins running in a controller are also just working against the API. All controllers are the same that way.

This comment has been minimized.

@markturansky

markturansky Nov 19, 2015

Member

Also, to your point about volume extensibility generally, there's FlexVolume (#13840) in the works to allow the same driver model to exec safely on a node to setup $FOO volume that's not compiled into Kube.

@markturansky

markturansky Nov 19, 2015

Member

Also, to your point about volume extensibility generally, there's FlexVolume (#13840) in the works to allow the same driver model to exec safely on a node to setup $FOO volume that's not compiled into Kube.

Show outdated Hide outdated docs/design/provisioning.md
`PersistentVolumeRecycler` is a plugin that reclaims discarded resources.
### Goals

This comment has been minimized.

@pmorie

pmorie Dec 1, 2015

Member

I suggest making this the first part of the TLDR section, since it's the actual TLDR content. The list of types is not the kind of thing I expect to read after reading TLDR.

@pmorie

pmorie Dec 1, 2015

Member

I suggest making this the first part of the TLDR section, since it's the actual TLDR content. The list of types is not the kind of thing I expect to read after reading TLDR.

Show outdated Hide outdated docs/design/provisioning.md
// Args is a map of key/value pairs passed to the driver (but not plugin) as arguments
Args *map[string]string
// SecurityContextName is the name of the security context to use when performing recycling operations
SecurityContextName string

This comment has been minimized.

@pmorie

pmorie Dec 1, 2015

Member

@markturansky

I don't understand what this represents, since security contexts are not resources and cannot be looked up by name.

@pmorie

pmorie Dec 1, 2015

Member

@markturansky

I don't understand what this represents, since security contexts are not resources and cannot be looked up by name.

@pmorie

This comment has been minimized.

Show comment
Hide comment
@pmorie

pmorie Dec 1, 2015

Member

I think the actual proposal would be a lot easier to review if the types were not part of this PR. IMO there's probably a fair amount of discussion left for this, and my experience has been that it's significantly easier for everyone involved if you don't try to carry API changes for an evolving API while a proposal is being discussed.

Member

pmorie commented Dec 1, 2015

I think the actual proposal would be a lot easier to review if the types were not part of this PR. IMO there's probably a fair amount of discussion left for this, and my experience has been that it's significantly easier for everyone involved if you don't try to carry API changes for an evolving API while a proposal is being discussed.

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Dec 1, 2015

Member

I agree, Paul. I included types in the beginning only to know that what I
was writing in the proposal actually made sense and worked. I'll remove
them because it is clutter.

On Tue, Dec 1, 2015 at 1:08 PM, Paul Morie notifications@github.com wrote:

I think the actual proposal would be a lot easier to review if the types
were not part of this PR. IMO there's probably a fair amount of discussion
left for this, and my experience has been that it's significantly easier
for everyone involved if you don't try to carry API changes for an evolving
API while a proposal is being discussed.


Reply to this email directly or view it on GitHub
#17056 (comment)
.

Member

markturansky commented Dec 1, 2015

I agree, Paul. I included types in the beginning only to know that what I
was writing in the proposal actually made sense and worked. I'll remove
them because it is clutter.

On Tue, Dec 1, 2015 at 1:08 PM, Paul Morie notifications@github.com wrote:

I think the actual proposal would be a lot easier to review if the types
were not part of this PR. IMO there's probably a fair amount of discussion
left for this, and my experience has been that it's significantly easier
for everyone involved if you don't try to carry API changes for an evolving
API while a proposal is being discussed.


Reply to this email directly or view it on GitHub
#17056 (comment)
.

Show outdated Hide outdated docs/design/provisioning.md
## New Kinds:
`PersistentVolumeProvisioner` -- is labeled by admins and requested by users via pvc.Spec.PersistentVolumeProvisionerSelector. Some provisioners fulfill user requests by creating volumes on demand while others may seeks to match claims against existing, unbound volumes.

This comment has been minimized.

@ncdc

ncdc Dec 8, 2015

Member

s/may seeks/may seek/

@ncdc

ncdc Dec 8, 2015

Member

s/may seeks/may seek/

Show outdated Hide outdated docs/design/provisioning.md
`PersistentVolumeProvisioner` -- is labeled by admins and requested by users via pvc.Spec.PersistentVolumeProvisionerSelector. Some provisioners fulfill user requests by creating volumes on demand while others may seeks to match claims against existing, unbound volumes.
`PersistentVolumeRecycler` -- contains pvr.Spec.PersistentVolumeSelector and recycles volumes matching the selector. PVs are labeled by admins to match.

This comment has been minimized.

@ncdc

ncdc Dec 8, 2015

Member

s/pvr/pvc/ ?

@ncdc

ncdc Dec 8, 2015

Member

s/pvr/pvc/ ?

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Dec 9, 2015

Member

This requires a rewrite now with the latest from Storage SIG.

  1. I'll replace the entire config section with the impending ConfigMap feature.
  2. I'll clarify the plugin approach: some built-ins, a built-in for exec on master, and a built-in for "run and watch this pod until completion".
Member

markturansky commented Dec 9, 2015

This requires a rewrite now with the latest from Storage SIG.

  1. I'll replace the entire config section with the impending ConfigMap feature.
  2. I'll clarify the plugin approach: some built-ins, a built-in for exec on master, and a built-in for "run and watch this pod until completion".
@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Jan 18, 2016

Member

Whats the status on this PR?

Member

thockin commented Jan 18, 2016

Whats the status on this PR?

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Jan 18, 2016

Member

I gutted the markdown doc last night and will finish the section on plugins this morning.

It contains the simplest requirements I could come up with: use ConfigMap, implement Selector, describe the plugin approach. Everything else (such as a driver model) is future work and no longer in this doc.

Also, I think we can drop StorageClass in favor of using PVSelector for everything. PVSelector matches a labeled config (which makes PVs) or matches a labeled PV directly. We've got a good use case that shows specifying one or the other is a leaky abstraction.

Member

markturansky commented Jan 18, 2016

I gutted the markdown doc last night and will finish the section on plugins this morning.

It contains the simplest requirements I could come up with: use ConfigMap, implement Selector, describe the plugin approach. Everything else (such as a driver model) is future work and no longer in this doc.

Also, I think we can drop StorageClass in favor of using PVSelector for everything. PVSelector matches a labeled config (which makes PVs) or matches a labeled PV directly. We've got a good use case that shows specifying one or the other is a leaky abstraction.

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Jan 18, 2016

Member

@saad-ali @thockin New version online. PTAL? It is smaller and contains only the immediate next steps while leaving room for future enhancements (like a driver/flex model for extension).

The discrete number of next steps is small. This is a pretty straightforward proposal now.

Member

markturansky commented Jan 18, 2016

@saad-ali @thockin New version online. PTAL? It is smaller and contains only the immediate next steps while leaving room for future enhancements (like a driver/flex model for extension).

The discrete number of next steps is small. This is a pretty straightforward proposal now.

Show outdated Hide outdated docs/design/provisioning.md
`--pv-provisioner-namespace=foobar`
### Example Configuration

This comment has been minimized.

@childsb

childsb Jan 18, 2016

Member

Where will the configuration live (master/which subdir)? Can you show this as the JSON or YAML ?

@childsb

childsb Jan 18, 2016

Member

Where will the configuration live (master/which subdir)? Can you show this as the JSON or YAML ?

This comment has been minimized.

@markturansky

markturansky Jan 18, 2016

Member

ConfigMap are API objects, so they will live in etcd.

@markturansky

markturansky Jan 18, 2016

Member

ConfigMap are API objects, so they will live in etcd.

This comment has been minimized.

@markturansky

markturansky Jan 18, 2016

Member

The example is a few lines down. It's in Go, but the data is the same. We'll have working examples for each provisioner, of course.

@markturansky

markturansky Jan 18, 2016

Member

The example is a few lines down. It's in Go, but the data is the same. We'll have working examples for each provisioner, of course.

This comment has been minimized.

@wattsteve

wattsteve Jan 19, 2016

Contributor

I think to concretely convey what this proposal will deliver from a user experience standpoint it would be really helpful/useful to see a YAML example for how this would be configured by an admin and a YAML example for how a VolumeSelector would be used a user.

@wattsteve

wattsteve Jan 19, 2016

Contributor

I think to concretely convey what this proposal will deliver from a user experience standpoint it would be really helpful/useful to see a YAML example for how this would be configured by an admin and a YAML example for how a VolumeSelector would be used a user.

Show outdated Hide outdated docs/design/provisioning.md
An additional benefit of using a selector for everything is the ability to deprecate PersistentVolumeAccessModes. These fields on PV are descriptors only. They have no functional value other than to describe the capabilities of the volume. AccessModes on a PVC represent only a way to select certain volumes. Admins can use labels to describe a volume's capabilities (e.g, ReadWriteMany) and users would select volumes accordingly.

This comment has been minimized.

@jeffvance

jeffvance Jan 19, 2016

Contributor

I like the idea of depcrecarting the current selection-but-non-enforcement behavior regarding accessModes.

@jeffvance

jeffvance Jan 19, 2016

Contributor

I like the idea of depcrecarting the current selection-but-non-enforcement behavior regarding accessModes.

This comment has been minimized.

@markturansky

markturansky Jan 21, 2016

Member

I removed this bit entirely. Saad plans on using the access modes for enforcement in the attach/detach controller.

@markturansky

markturansky Jan 21, 2016

Member

I removed this bit entirely. Saad plans on using the access modes for enforcement in the attach/detach controller.

Show outdated Hide outdated docs/design/provisioning.md
The binder attempts to fulfill a claim in the following order:
* If `pvc.Spec.PersistentVolumeSelector` is non-nil and matches a provisioner's labels, create a PV using the provisioner

This comment has been minimized.

@jeffvance

jeffvance Jan 19, 2016

Contributor

Did you consider searching for existing (static) PVs that match the selector before searching for a matching provisioner? One benefit of this order could be a reduction in proliferation of dynamic storage which is potentially expensive and/or wasteful.

@jeffvance

jeffvance Jan 19, 2016

Contributor

Did you consider searching for existing (static) PVs that match the selector before searching for a matching provisioner? One benefit of this order could be a reduction in proliferation of dynamic storage which is potentially expensive and/or wasteful.

This comment has been minimized.

@markturansky

markturansky Jan 19, 2016

Member

It could go either way and I have no preference.

On one hand, we could check first just in case there's a matching volume and reduce inventory. On the other hand, we could dynamically create an exact match (capacity-wise) for the claim.

@markturansky

markturansky Jan 19, 2016

Member

It could go either way and I have no preference.

On one hand, we could check first just in case there's a matching volume and reduce inventory. On the other hand, we could dynamically create an exact match (capacity-wise) for the claim.

This comment has been minimized.

@erinboyd

erinboyd Jan 19, 2016

@jeffvance I agree that we should search the pool first and this is the proper behavior

@erinboyd

erinboyd Jan 19, 2016

@jeffvance I agree that we should search the pool first and this is the proper behavior

Show outdated Hide outdated docs/design/provisioning.md
This document proposes a model for the configuration and management of dynamically provisioned persistent volumes in Kubernetes. Familiarity [Persistent Volumes](../user-guide/persistent-volumes/) is assumed.
Administrators use ConfigMap in a namespace they control to configure many provisioners. Users request volumes via a selector and/or storage class.

This comment has been minimized.

@wattsteve

wattsteve Jan 19, 2016

Contributor

Can you clarify the use of the namespace? I'm trying to clarify whether you are suggesting we need to define a classification system (which names map to which provisioners) for every single namespace in kubernetes. I had expected that the admin defines it once and then it is configured for the entire cluster.

@wattsteve

wattsteve Jan 19, 2016

Contributor

Can you clarify the use of the namespace? I'm trying to clarify whether you are suggesting we need to define a classification system (which names map to which provisioners) for every single namespace in kubernetes. I had expected that the admin defines it once and then it is configured for the entire cluster.

This comment has been minimized.

@jsafrane

jsafrane Jan 19, 2016

Member

+1, I thought the config maps will be global (similar to PV), not namespaced.

@jsafrane

jsafrane Jan 19, 2016

Member

+1, I thought the config maps will be global (similar to PV), not namespaced.

This comment has been minimized.

@markturansky

markturansky Jan 19, 2016

Member

ConfigMap is namespaced. While it started as a means to deliver configuration to applications, it is convenient enough to use for the system, too. There will be a Kube "system" namespace. We're proposing a separate namespace exclusive for provisioners.

@markturansky

markturansky Jan 19, 2016

Member

ConfigMap is namespaced. While it started as a means to deliver configuration to applications, it is convenient enough to use for the system, too. There will be a Kube "system" namespace. We're proposing a separate namespace exclusive for provisioners.

Show outdated Hide outdated docs/design/provisioning.md
Plugin params can vary by whatever attributes are exposed by a storage provider. An AWS EBS volume, for example, can be a general purpose SSD (gp2), provisioned IOPs (io1) for speed and throughput, and magnetic (standard) for low cost. Additionally, EBS volumes can optionally be encrypted. Configuring each volume type with optional encryption would require 6 distinct provisioners. Administrators can mix and match any attributes exposed by the provisioning plugin to create distinct storage classes.
## Controller behavior

This comment has been minimized.

@wattsteve

wattsteve Jan 19, 2016

Contributor

My understanding is that you're proposing one storage dynamic provisioner per cloud provider. Each of these provisioners would support parameters that allow the user to be explicit about what type of storage they want to provision from the respective cloud provider. Administrators can then configure these parameters for each volumeselector specified in the classification system for the cluster. For example:

"Gold" : EBS_Provisioner (gp_ssd)
"Silver": EBS_Provisioner(iops_ssd)
"Bronze": EBS_Provisioner (magnetic)

@wattsteve

wattsteve Jan 19, 2016

Contributor

My understanding is that you're proposing one storage dynamic provisioner per cloud provider. Each of these provisioners would support parameters that allow the user to be explicit about what type of storage they want to provision from the respective cloud provider. Administrators can then configure these parameters for each volumeselector specified in the classification system for the cluster. For example:

"Gold" : EBS_Provisioner (gp_ssd)
"Silver": EBS_Provisioner(iops_ssd)
"Bronze": EBS_Provisioner (magnetic)

This comment has been minimized.

@erinboyd

erinboyd Jan 19, 2016

I think we need to consider the user experience for both the ADMIN and the DEVELOPER when defining the storage and consuming it.
With using the selector rather than the class (where class is unique per the API) we yield to the possibility of duplicate of the storage description therefore making it difficult for the user DEVELOPER to determine the storage they are consuming.
If we assume a mixed use environment where we have both statically created storage and dynamically provisioned storage, the path could provide the opposite of what the DEVELOPER is expecting based on the selector. Since the selector will just select on the first one, if there are duplicates, how will the user ever be able to claim against the second storage?

It also put a lot of responsibility on the admin to deeply understand the system and makes it hard for the scaling use case where there are 100-1000 of PVs created. Today we won't provide a good interface or tooling to assist the ADMIN in seeing the topology of the current pool and with the selector not using any validation, I just don't see how we can have consistent paths for each operation.

@erinboyd

erinboyd Jan 19, 2016

I think we need to consider the user experience for both the ADMIN and the DEVELOPER when defining the storage and consuming it.
With using the selector rather than the class (where class is unique per the API) we yield to the possibility of duplicate of the storage description therefore making it difficult for the user DEVELOPER to determine the storage they are consuming.
If we assume a mixed use environment where we have both statically created storage and dynamically provisioned storage, the path could provide the opposite of what the DEVELOPER is expecting based on the selector. Since the selector will just select on the first one, if there are duplicates, how will the user ever be able to claim against the second storage?

It also put a lot of responsibility on the admin to deeply understand the system and makes it hard for the scaling use case where there are 100-1000 of PVs created. Today we won't provide a good interface or tooling to assist the ADMIN in seeing the topology of the current pool and with the selector not using any validation, I just don't see how we can have consistent paths for each operation.

Having both fields is confusing. Consider this on-premise installation of Kube:
> Admin Joe configures "Gold" to dynamically provision Ceph volumes. Joe also creates "Silver" volumes by manually provisioning a large number of NFS exports on expensive hardware (SSDs, highly available, stringent backup policy). Lastly, Joe creates "Bronze" NFS exports on older hardware with slower discs.

This comment has been minimized.

@wattsteve

wattsteve Jan 19, 2016

Contributor

+1. We shouldn't force users to have to understand the difference. All they should care about is what VolumeSelector names (Gold, Silver, Bronze, etc.) they have access to.

@wattsteve

wattsteve Jan 19, 2016

Contributor

+1. We shouldn't force users to have to understand the difference. All they should care about is what VolumeSelector names (Gold, Silver, Bronze, etc.) they have access to.

This comment has been minimized.

@markturansky

markturansky Jan 19, 2016

Member

The requirement for 1 path is noted and a good one to adhere to.

@derekwaynecarr is doing quota and it might be that a single field on PVC is easier for that effort, as opposed to parsing arbitrary labels.

It might be that we need to create a Provisioner plugin that's not part of any volume plugin. It's behavior is to pull from the pool of available volumes. Essentially, we would package the existing controller's behavior into a plugin and make manually provisioned storage explicit (use this plugin for this class) as opposed to implicit (no configmap found w/ that label set)

@markturansky

markturansky Jan 19, 2016

Member

The requirement for 1 path is noted and a good one to adhere to.

@derekwaynecarr is doing quota and it might be that a single field on PVC is easier for that effort, as opposed to parsing arbitrary labels.

It might be that we need to create a Provisioner plugin that's not part of any volume plugin. It's behavior is to pull from the pool of available volumes. Essentially, we would package the existing controller's behavior into a plugin and make manually provisioned storage explicit (use this plugin for this class) as opposed to implicit (no configmap found w/ that label set)

Show outdated Hide outdated docs/design/provisioning.md
4. CephProvisioner -- creates a PV specifically for a claim and then creates the CephVolume in the infrastructure.
# Tasks

This comment has been minimized.

@wattsteve

wattsteve Jan 19, 2016

Contributor

I think we need to add a section about mounting. Provisioners are things that dynamically create storage assets. Volume Plugins are things that we've historically referred to as something that mounts storage assets to containers within pods. If someone has a volume plugin that has no provisioner for it yet (such as the GlusterFS or NFS Volume Plugin) how does that person create a provisioner and tie the existing volume plugin to that provisioner so that when a claim is submitted that invokes the provisioner and a PV is created and bound to the claim, that when the claim is later used in a Pod, it knows which volume plugin to invoke for that type of PV. None of how this works is clear in this proposal.

@wattsteve

wattsteve Jan 19, 2016

Contributor

I think we need to add a section about mounting. Provisioners are things that dynamically create storage assets. Volume Plugins are things that we've historically referred to as something that mounts storage assets to containers within pods. If someone has a volume plugin that has no provisioner for it yet (such as the GlusterFS or NFS Volume Plugin) how does that person create a provisioner and tie the existing volume plugin to that provisioner so that when a claim is submitted that invokes the provisioner and a PV is created and bound to the claim, that when the claim is later used in a Pod, it knows which volume plugin to invoke for that type of PV. None of how this works is clear in this proposal.

@childsb

This comment has been minimized.

Show comment
Hide comment
@childsb

childsb Jan 19, 2016

Member

Some design points i'd like to see in the proposal -

  • Whats the provisioning workflow and locking? We discussed this at the face2face and I remember it being complex (... create PV, attach PV to claim, create volume, tie volume to PV?)
  • Where does each step execute (master or kubelet)
  • Whats an example provisioner look like? Is it an image, a script?
Member

childsb commented Jan 19, 2016

Some design points i'd like to see in the proposal -

  • Whats the provisioning workflow and locking? We discussed this at the face2face and I remember it being complex (... create PV, attach PV to claim, create volume, tie volume to PV?)
  • Where does each step execute (master or kubelet)
  • Whats an example provisioner look like? Is it an image, a script?
@childsb

This comment has been minimized.

Show comment
Hide comment
@childsb

childsb Jan 25, 2016

Member

I'd like to see the "template before provision" part removed from provisioner API. The framework could handle this bootstrap process cleanly and not offload to provisioner API. I understand that the PV name must be generated and bound some way but requiring the provisioner to create a template prior to provisioning is cumbersome.

Member

childsb commented Jan 25, 2016

I'd like to see the "template before provision" part removed from provisioner API. The framework could handle this bootstrap process cleanly and not offload to provisioner API. I understand that the PV name must be generated and bound some way but requiring the provisioner to create a template prior to provisioning is cumbersome.

@childsb

This comment has been minimized.

Show comment
Hide comment
@childsb

childsb Jan 25, 2016

Member

EDIT: This use case has been addressed in the discussion. We will remove template as a provisioner API requirement

Adding another Use Case:

  1. As a kubernetes adminstrator i have multiple existing storage types that I consider storage-class basic.. Some are NFS and some are GlusterFS. I want to label these as storage-class:basic and let users request storage-class:basic from the pool.

This scenario would not work if the provisioner is required to supply a partial template prior to the actual provision step.

Member

childsb commented Jan 25, 2016

EDIT: This use case has been addressed in the discussion. We will remove template as a provisioner API requirement

Adding another Use Case:

  1. As a kubernetes adminstrator i have multiple existing storage types that I consider storage-class basic.. Some are NFS and some are GlusterFS. I want to label these as storage-class:basic and let users request storage-class:basic from the pool.

This scenario would not work if the provisioner is required to supply a partial template prior to the actual provision step.

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Jan 25, 2016

Member

@childsb The flex plugin approach you are looking into can return whatever template you like. The template is a necessary step. You can make the template as specific or as generic as you want in your proposal. The community as a whole can decide what approach to take in the proposal.

Member

markturansky commented Jan 25, 2016

@childsb The flex plugin approach you are looking into can return whatever template you like. The template is a necessary step. You can make the template as specific or as generic as you want in your proposal. The community as a whole can decide what approach to take in the proposal.

@swagiaal

This comment has been minimized.

Show comment
Hide comment
@swagiaal

swagiaal Jan 26, 2016

Contributor

As a kubernetes adminstrator i have multiple existing storage types that I consider storage-class basic.. Some are NFS and some are GlusterFS. I want to label these as storage-class:basic and let users request storage-class:basic from the pool.

I don't quite follow are this existing or will be provisioned ? If existing then the admin can just label them with the label storage-class:basic. If they will be provisioned which one will the provisioner create ? It cannot create both. It will have to either be a Gluster provisioner or an NFS provisioiner in which case the template volume type has been decided at coding time

Contributor

swagiaal commented Jan 26, 2016

As a kubernetes adminstrator i have multiple existing storage types that I consider storage-class basic.. Some are NFS and some are GlusterFS. I want to label these as storage-class:basic and let users request storage-class:basic from the pool.

I don't quite follow are this existing or will be provisioned ? If existing then the admin can just label them with the label storage-class:basic. If they will be provisioned which one will the provisioner create ? It cannot create both. It will have to either be a Gluster provisioner or an NFS provisioiner in which case the template volume type has been decided at coding time

@childsb

This comment has been minimized.

Show comment
Hide comment
@childsb

childsb Jan 26, 2016

Member

@swagiaal existing volumes.

It will have to either be a Gluster provisioner or an NFS provisioiner in which case the template volume type has been decided at coding time

I'm proposing we remove the template from the design for this reason. I've been looking through the code and pretty sure it can be removed. Lets setup a call to discuss

Member

childsb commented Jan 26, 2016

@swagiaal existing volumes.

It will have to either be a Gluster provisioner or an NFS provisioiner in which case the template volume type has been decided at coding time

I'm proposing we remove the template from the design for this reason. I've been looking through the code and pretty sure it can be removed. Lets setup a call to discuss

Show outdated Hide outdated docs/design/provisioning.md
"data": {
"plugin-name": "kubernetes.io/gce-pd",
"parameters": "{'volumeType':'ssd', 'zone':'us-east'}" // OPAQUE JSON MAP OF PARAMS AS DEFINED BY THE PLUGIN
}

This comment has been minimized.

@markturansky

markturansky Feb 2, 2016

Member

Add "description" for human readability.

@markturansky

markturansky Feb 2, 2016

Member

Add "description" for human readability.

@eparis

This comment has been minimized.

Show comment
Hide comment
@eparis

eparis Feb 2, 2016

Member

In the sig call today there were discussions as people tried to help me understand these objects. I'm still not settled....

It is a given that every single thing in the PersistentVolumeSelector and everything in the Request.Resources must be satisfied by the eventually bound PV. So static provisioning is of no interesting consequence.

I'm yet to understand the idea of matching PersistentVolumeSelector to ConfigMap (w/ "type": "provisioner-config".) Which fields on the PVC are required to be labels on the ConfigMap and which items are considered to be arguments to the ConfigMap.

This proposal today says that there must be a ConfigMap which matches ALL of the fields in the PVSelector.

  • Things in Requests.Resources can generally be thought of as 'arguments' to the provisioner.
  • The PVSelector can be generally thought of as 'which' provisioner.

Thus if I must have a combinatorial explosion of DPs/configMaps if I expect there to be multiple labels in the PVSelector.

On the call @saad-ali floated the idea that the match between PVSelector and ConfigMap labels need not be complete. Instead the ConfigMap could use some of those PVSelector fields as arguments to the DP, potentially overwriting values in the parameters. I'm hesitant of this idea...

We don't overwrite 'parameters' with the Requests.Resources (at least I don't see that as listed in the proposal), so I'm not 100% sure how that works. But I think this would mean we are talking about making the parameters a template. And every time I see we need a 'template' I go running for the hills....

Member

eparis commented Feb 2, 2016

In the sig call today there were discussions as people tried to help me understand these objects. I'm still not settled....

It is a given that every single thing in the PersistentVolumeSelector and everything in the Request.Resources must be satisfied by the eventually bound PV. So static provisioning is of no interesting consequence.

I'm yet to understand the idea of matching PersistentVolumeSelector to ConfigMap (w/ "type": "provisioner-config".) Which fields on the PVC are required to be labels on the ConfigMap and which items are considered to be arguments to the ConfigMap.

This proposal today says that there must be a ConfigMap which matches ALL of the fields in the PVSelector.

  • Things in Requests.Resources can generally be thought of as 'arguments' to the provisioner.
  • The PVSelector can be generally thought of as 'which' provisioner.

Thus if I must have a combinatorial explosion of DPs/configMaps if I expect there to be multiple labels in the PVSelector.

On the call @saad-ali floated the idea that the match between PVSelector and ConfigMap labels need not be complete. Instead the ConfigMap could use some of those PVSelector fields as arguments to the DP, potentially overwriting values in the parameters. I'm hesitant of this idea...

We don't overwrite 'parameters' with the Requests.Resources (at least I don't see that as listed in the proposal), so I'm not 100% sure how that works. But I think this would mean we are talking about making the parameters a template. And every time I see we need a 'template' I go running for the hills....

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Feb 9, 2016

Member

I added "description" to the configmap and left a placeholder for "parameterization" of plugins to avoid the combinatorial increase in number of provisioners by attribute count. I don't think we are blocked on that.

As for selectors, i think it stands that selectors on the claim must match the labels on the volume (first), then match labels on a provisioner that can provide a volume. In both cases $N selectors on pvc.Spec.Selector will be found in $V labels on a volume. The PV may have more labels. Similarly, a provision may have more labels, such as "kubernetes.io/type":"provisioner". This follows Pod's NodeSelector.

Member

markturansky commented Feb 9, 2016

I added "description" to the configmap and left a placeholder for "parameterization" of plugins to avoid the combinatorial increase in number of provisioners by attribute count. I don't think we are blocked on that.

As for selectors, i think it stands that selectors on the claim must match the labels on the volume (first), then match labels on a provisioner that can provide a volume. In both cases $N selectors on pvc.Spec.Selector will be found in $V labels on a volume. The PV may have more labels. Similarly, a provision may have more labels, such as "kubernetes.io/type":"provisioner". This follows Pod's NodeSelector.

@eparis

This comment has been minimized.

Show comment
Hide comment
@eparis

eparis Feb 10, 2016

Member

I'm ok with this PR as it stands. I think we need to put serious thought into 'labels on selectors as parameters.' Its ok if the dynamic provisioner/configmap has 10 labels and the selector only 3. My problem comes when we start thinking about having 10 labels on the PVSelector and only 3 on the DP. And maybe, somehow, magically, 7 of those become parameters to the DP or something....

Since this PR doesn't talk about that, and requires exact matching, I think it is good as it stands...

Member

eparis commented Feb 10, 2016

I'm ok with this PR as it stands. I think we need to put serious thought into 'labels on selectors as parameters.' Its ok if the dynamic provisioner/configmap has 10 labels and the selector only 3. My problem comes when we start thinking about having 10 labels on the PVSelector and only 3 on the DP. And maybe, somehow, magically, 7 of those become parameters to the DP or something....

Since this PR doesn't talk about that, and requires exact matching, I think it is good as it stands...

Show outdated Hide outdated docs/design/provisioning.md
# Abstract
This document proposes a model for the configuration and management of dynamically provisioned persistent volumes in Kubernetes. Familiarity [Persistent Volumes](../user-guide/persistent-volumes/) is assumed.

This comment has been minimized.

@saad-ali

saad-ali Feb 16, 2016

Member

...Familiarity with...

@saad-ali

saad-ali Feb 16, 2016

Member

...Familiarity with...

Show outdated Hide outdated docs/design/provisioning.md
example usage:
kubectl --namespace=system-namespace get configmap -l type=provisioner-config

This comment has been minimized.

@saad-ali

saad-ali Feb 16, 2016

Member

Shouldn't namespace be foobar not system-namespace

@saad-ali

saad-ali Feb 16, 2016

Member

Shouldn't namespace be foobar not system-namespace

Show outdated Hide outdated docs/design/provisioning.md
CLI flags:
--pv-provisioner-namespace=foobar
--pv-provisioner-label="type=provisioner-config

This comment has been minimized.

@saad-ali

saad-ali Feb 16, 2016

Member

Missing closing quote

@saad-ali

saad-ali Feb 16, 2016

Member

Missing closing quote

"labels": {
"type": "provisioner-config",
"storage-class": "gold",
"zone": "us-east"

This comment has been minimized.

@saad-ali

saad-ali Feb 16, 2016

Member

Why do storage-class and zone need to be labels on the config map? The only match that will happen is on the config map name, which should be the storage class name.

@saad-ali

saad-ali Feb 16, 2016

Member

Why do storage-class and zone need to be labels on the config map? The only match that will happen is on the config map name, which should be the storage class name.

This comment has been minimized.

@markturansky

markturansky Feb 16, 2016

Member

We're not matching by configMap.Name. We're matching labels in the claim's PVSelector with configMap.Labels. This allows us to have "gold" mean one thing, instead of requiring the user to add "gold-east" and "gold-west" to their selector. Using separate labels ("storage-class"="gold" and "zone"="east") allows that.

@markturansky

markturansky Feb 16, 2016

Member

We're not matching by configMap.Name. We're matching labels in the claim's PVSelector with configMap.Labels. This allows us to have "gold" mean one thing, instead of requiring the user to add "gold-east" and "gold-west" to their selector. Using separate labels ("storage-class"="gold" and "zone"="east") allows that.

"persistentVolumeSelector":{
"matchLabels":{
"storage-class":"gold",
"zone":"us-east",

This comment has been minimized.

@saad-ali

saad-ali Feb 16, 2016

Member

This does not follow the example above. This should demonstrate the case where the cluster admin decided to bake the zone into his storage classes.

So zone should not be specified, and the storage-class should be gold-east.

The example below with selectors should show the counter example where the cluster admin did not bake the zone into the storage classes, and it is configurable via label.

@saad-ali

saad-ali Feb 16, 2016

Member

This does not follow the example above. This should demonstrate the case where the cluster admin decided to bake the zone into his storage classes.

So zone should not be specified, and the storage-class should be gold-east.

The example below with selectors should show the counter example where the cluster admin did not bake the zone into the storage classes, and it is configurable via label.

This comment has been minimized.

@markturansky

markturansky Feb 16, 2016

Member

Both are possible with the current model.

The admin could use "storage-class": "gold-east" and have that param map create a "gold" volume in the east zone.

The admin could also use "storage-class":"gold" and "zone":"east" as separate labels and the user would put both in their selector. The param map on the config is still set by the admin to be gold and east, respectively.

If we parameterize those config's params maps, the separate labels (for "zone") allow us to do that.

@markturansky

markturansky Feb 16, 2016

Member

Both are possible with the current model.

The admin could use "storage-class": "gold-east" and have that param map create a "gold" volume in the east zone.

The admin could also use "storage-class":"gold" and "zone":"east" as separate labels and the user would put both in their selector. The param map on the config is still set by the admin to be gold and east, respectively.

If we parameterize those config's params maps, the separate labels (for "zone") allow us to do that.

This comment has been minimized.

@eparis

eparis Feb 16, 2016

Member

@saad-ali I know on the SIG call 2 weeks ago we talked a little bit about using the LabelSelector as a form of parameterization but I'm highly skeptical of that concept. I don't think we do anything like it elsewhere in kube. I kinda feel like 'well known' parameters to the DP should live in the API of the PVC itself (like size is a resource/request) and DP specific parameters should live maybe as annotations.

If we do decide to use the LabelSelector as a mechanism for parameterization of the ConfigMap/DP I think that should be a follow-on PR to loosen the 'Must match everything' nature of this proposed Selector and should be given special thought in an of itself...

@eparis

eparis Feb 16, 2016

Member

@saad-ali I know on the SIG call 2 weeks ago we talked a little bit about using the LabelSelector as a form of parameterization but I'm highly skeptical of that concept. I don't think we do anything like it elsewhere in kube. I kinda feel like 'well known' parameters to the DP should live in the API of the PVC itself (like size is a resource/request) and DP specific parameters should live maybe as annotations.

If we do decide to use the LabelSelector as a mechanism for parameterization of the ConfigMap/DP I think that should be a follow-on PR to loosen the 'Must match everything' nature of this proposed Selector and should be given special thought in an of itself...

The same claim above (gold+east) can match on a PersistentVolume that is labelled similarly.
```
apiVersion: v1

This comment has been minimized.

@saad-ali

saad-ali Feb 16, 2016

Member

We should add in the ability for a cluster admin to decide what labels get passed through to the provisioner. This can be done by adding a whitelist in the config map of labels to pass through to the provisioner.

Therefore if an admin, for example, wants to retain control of "zone" he can exclude that from the whitelisted labels. If a user passes it in we can either fail the request or just silently not pass it through to the provisioner. In either case this is nice because the cluster admin can be guaranteed control, even if a particular provisioner exposes powerful option A B and C, unless the cluster admin explicitly enables those options, an end user can't use them.

We can discuss this further in the Storage SIG meeting in the morning.

@saad-ali

saad-ali Feb 16, 2016

Member

We should add in the ability for a cluster admin to decide what labels get passed through to the provisioner. This can be done by adding a whitelist in the config map of labels to pass through to the provisioner.

Therefore if an admin, for example, wants to retain control of "zone" he can exclude that from the whitelisted labels. If a user passes it in we can either fail the request or just silently not pass it through to the provisioner. In either case this is nice because the cluster admin can be guaranteed control, even if a particular provisioner exposes powerful option A B and C, unless the cluster admin explicitly enables those options, an end user can't use them.

We can discuss this further in the Storage SIG meeting in the morning.

This comment has been minimized.

@markturansky

markturansky Feb 16, 2016

Member

I understand. It seems easy to accommodate.

What about "zone" and relying on the end user for the correct value "us-east-c1"?

Override is disallowed by default (empty whitelist). It's up to the admin to accept responsibility? Do we need any validation now on inputs?

How much of this is "must have" versus "good next enhancement"? I don't think we're block on adding this.

I am open to it if it doesn't increase scope too much.

@markturansky

markturansky Feb 16, 2016

Member

I understand. It seems easy to accommodate.

What about "zone" and relying on the end user for the correct value "us-east-c1"?

Override is disallowed by default (empty whitelist). It's up to the admin to accept responsibility? Do we need any validation now on inputs?

How much of this is "must have" versus "good next enhancement"? I don't think we're block on adding this.

I am open to it if it doesn't increase scope too much.

This comment has been minimized.

@eparis

eparis Feb 16, 2016

Member

This is (to my knowledge) the first introduction of security policy and controls in kube... I think such a whitelist walks in the face of 'simple and obvious first.' There is no distinction between 'admin' and 'user' when we are talking about this stuff. I can just change the ConfigMap as a 'user.'

So really it's about keeping them from being able to shoot themselves in the foot? It seems like a future topic (and not necessarily a bad one) but not something we need to get moving (especially if the LabelSelector must match all labels, as I think this currently reads, in which case all labels must be on the whitelist)

@eparis

eparis Feb 16, 2016

Member

This is (to my knowledge) the first introduction of security policy and controls in kube... I think such a whitelist walks in the face of 'simple and obvious first.' There is no distinction between 'admin' and 'user' when we are talking about this stuff. I can just change the ConfigMap as a 'user.'

So really it's about keeping them from being able to shoot themselves in the foot? It seems like a future topic (and not necessarily a bad one) but not something we need to get moving (especially if the LabelSelector must match all labels, as I think this currently reads, in which case all labels must be on the whitelist)

This comment has been minimized.

@markturansky

markturansky Feb 17, 2016

Member

Per our SIG discussion, the admin has the guaranteed fine grained control you mention as a requirement and it has it to the point of tedium. Parameterization or other tools to manage ConfigMaps was agreed as non-blocked by the current proposal.

@markturansky

markturansky Feb 17, 2016

Member

Per our SIG discussion, the admin has the guaranteed fine grained control you mention as a requirement and it has it to the point of tedium. Parameterization or other tools to manage ConfigMaps was agreed as non-blocked by the current proposal.

The example above contains `"params": "${OPAQUE-JSON-MAP}"`, which is a string representing a JSON map of parameters. Each provisionable volume plugin will document and publish the parameters it accepts.
Plugin params can vary by whatever attributes are exposed by a storage provider. An AWS EBS volume, for example, can be a general purpose SSD (gp2), provisioned IOPs (io1) for speed and throughput, and magnetic (standard) for low cost. Additionally, EBS volumes can optionally be encrypted. Configuring each volume type with optional encryption would require 6 distinct provisioners. Administrators can mix and match any attributes exposed by the provisioning plugin to create distinct storage classes.

This comment has been minimized.

@saad-ali

saad-ali Feb 16, 2016

Member

Should add in that plugins will be responsible for handling labels as well, and should fail(?) if they are not able to handle all the labels passed in. And that these labels should be propagated to the provisioned PV.

@saad-ali

saad-ali Feb 16, 2016

Member

Should add in that plugins will be responsible for handling labels as well, and should fail(?) if they are not able to handle all the labels passed in. And that these labels should be propagated to the provisioned PV.

This comment has been minimized.

@markturansky

markturansky Feb 17, 2016

Member

Added.

@markturansky
@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Feb 16, 2016

Member

Just pushed another revision of the document. Some of the feedback above is implemented. More detail added.

Member

markturansky commented Feb 16, 2016

Just pushed another revision of the document. Some of the feedback above is implemented. More detail added.

@k8s-teamcity-mesosphere

This comment has been minimized.

Show comment
Hide comment
@k8s-teamcity-mesosphere

k8s-teamcity-mesosphere Feb 16, 2016

TeamCity OSS :: Kubernetes Mesos :: 4 - Smoke Tests Build 15996 outcome was SUCCESS
Summary: Tests passed: 1, ignored: 225 Build time: 00:08:48

TeamCity OSS :: Kubernetes Mesos :: 4 - Smoke Tests Build 15996 outcome was SUCCESS
Summary: Tests passed: 1, ignored: 225 Build time: 00:08:48

@k8s-bot

This comment has been minimized.

Show comment
Hide comment
@k8s-bot

k8s-bot Feb 17, 2016

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain to ixdy.

k8s-bot commented Feb 17, 2016

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain to ixdy.

@kangarlou

This comment has been minimized.

Show comment
Hide comment
@kangarlou

kangarlou Feb 17, 2016

Contributor

After yesterday's call, I had a chance to go over the design document and the discussion in this thread. It appears that there are two sets of parameters for dynamic provisioning:

  1. Parameters to select a dynamic provisioner: These parameters are defined in pvc.Spec.PersistentVolumeSelector and need to match with the labels defined in a provisioner’s ConfigMap.
  2. Arguments for the dynamic provisioner plugin:
    • Generic arguments (e.g., volume size).
    • Plugin-specific arguments as defined by the parameters field in ConfigMap.

The current design is flexible enough to support the following two use cases:

  1. Provider-agnostic provisioning: An abstract parameter in PersistentVolumeClaim (e.g., storage-class) gets mapped to a specific configuration on the storage backend by the provisioner.
  2. Provider-aware provisioning: By using provider-specific labels (say by using provisioner-name label) in ConfigMap and pvc.Spec.PersistentVolumeSelector, one can force selecting a certain provisioner to take advantage of the full capabilities of that backend.

I believe the above capabilities are sufficient for many use cases. However, as pointed out by @eparis, they lead to a “combinatorial explosion of DPs/configMaps” for user-supplied, numeric parameters. For instance, if one needs to set the IOPS limit, snapshot frequency, numbers of replicas, etc. for a volume, we end up with a large number of ConfigMaps/provisioners.

I would put IOPS in the same category as volume size as it's a generic parameter that denotes a required bound, so pvc.Spec.Resources may be the right place to set IOPS. However, I’m less clear about how we want to handle plugin-specific arguments like snapshot frequency, number of replicas, etc. Perhaps, there has to be a field in PersistentVolumeClaim to set such parameters for provider-aware provisioning. The parameters field in ConfigMap seems most suitable for setting the default configuration as defined by the storage admin but not for setting user-supplied parameters.  

The whitelist discussion seems to stem from the fact that not all labels are equal. Some labels denote a requirement and need to match across PVs, PVCs, and ConfigMaps, while others are desired and can potentially be ignored by the provisioner. Some labels may denote an absolute value (e.g., number of replicas, snapshot frequency) while some are limits (e.g., capacity, IOPS). In absence of a mechanism to properly support user-supplied parameters, some labels may just be arguments to the provisioner so matching across those labels may not make sense. We probably need to formalize different types of labels and the label matching criteria.

Contributor

kangarlou commented Feb 17, 2016

After yesterday's call, I had a chance to go over the design document and the discussion in this thread. It appears that there are two sets of parameters for dynamic provisioning:

  1. Parameters to select a dynamic provisioner: These parameters are defined in pvc.Spec.PersistentVolumeSelector and need to match with the labels defined in a provisioner’s ConfigMap.
  2. Arguments for the dynamic provisioner plugin:
    • Generic arguments (e.g., volume size).
    • Plugin-specific arguments as defined by the parameters field in ConfigMap.

The current design is flexible enough to support the following two use cases:

  1. Provider-agnostic provisioning: An abstract parameter in PersistentVolumeClaim (e.g., storage-class) gets mapped to a specific configuration on the storage backend by the provisioner.
  2. Provider-aware provisioning: By using provider-specific labels (say by using provisioner-name label) in ConfigMap and pvc.Spec.PersistentVolumeSelector, one can force selecting a certain provisioner to take advantage of the full capabilities of that backend.

I believe the above capabilities are sufficient for many use cases. However, as pointed out by @eparis, they lead to a “combinatorial explosion of DPs/configMaps” for user-supplied, numeric parameters. For instance, if one needs to set the IOPS limit, snapshot frequency, numbers of replicas, etc. for a volume, we end up with a large number of ConfigMaps/provisioners.

I would put IOPS in the same category as volume size as it's a generic parameter that denotes a required bound, so pvc.Spec.Resources may be the right place to set IOPS. However, I’m less clear about how we want to handle plugin-specific arguments like snapshot frequency, number of replicas, etc. Perhaps, there has to be a field in PersistentVolumeClaim to set such parameters for provider-aware provisioning. The parameters field in ConfigMap seems most suitable for setting the default configuration as defined by the storage admin but not for setting user-supplied parameters.  

The whitelist discussion seems to stem from the fact that not all labels are equal. Some labels denote a requirement and need to match across PVs, PVCs, and ConfigMaps, while others are desired and can potentially be ignored by the provisioner. Some labels may denote an absolute value (e.g., number of replicas, snapshot frequency) while some are limits (e.g., capacity, IOPS). In absence of a mechanism to properly support user-supplied parameters, some labels may just be arguments to the provisioner so matching across those labels may not make sense. We probably need to formalize different types of labels and the label matching criteria.

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Feb 17, 2016

Member

@kangarlou I think you eloquently highlighted the mismatch between some things that are labels and other things that are part of the request. Capacity is part of the request and other numerical values should be as well. Expressing a range of numerical values in ConfigMap would be bad.

@eparis and I have spoke a little bit about this mismatch. We can give prudent advice in our docs to keep these labels simple if we have a good way to enhance what the request for the resource looks like.

Member

markturansky commented Feb 17, 2016

@kangarlou I think you eloquently highlighted the mismatch between some things that are labels and other things that are part of the request. Capacity is part of the request and other numerical values should be as well. Expressing a range of numerical values in ConfigMap would be bad.

@eparis and I have spoke a little bit about this mismatch. We can give prudent advice in our docs to keep these labels simple if we have a good way to enhance what the request for the resource looks like.

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Apr 19, 2016

Member

@jsafrane @childsb is this PR still needed?

Member

markturansky commented Apr 19, 2016

@jsafrane @childsb is this PR still needed?

@livelace

This comment has been minimized.

Show comment
Hide comment
@livelace

livelace Apr 20, 2016

Labels are needed. Possibility of choice by labels is needed.

livelace commented Apr 20, 2016

Labels are needed. Possibility of choice by labels is needed.

@k8s-bot

This comment has been minimized.

Show comment
Hide comment
@k8s-bot

k8s-bot May 16, 2016

GCE e2e build/test failed for commit e32cdf2.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

k8s-bot commented May 16, 2016

GCE e2e build/test failed for commit e32cdf2.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

k8s-merge-robot added a commit that referenced this pull request May 23, 2016

Merge pull request #25413 from pmorie/storage-proposal
Automatic merge from submit-queue

Proposal: persistent volume selector

Partially replaces #17056.  Another proposal will follow dealing with dynamic provisioning on top of storage classes.

@kubernetes/sig-storage
@eparis

This comment has been minimized.

Show comment
Hide comment
@eparis

eparis Jun 6, 2016

Member

I'm going to close this PR, hopefully all discussion and thought have been taken into account in #26908

Member

eparis commented Jun 6, 2016

I'm going to close this PR, hopefully all discussion and thought have been taken into account in #26908

@eparis eparis closed this Jun 6, 2016

k8s-merge-robot added a commit that referenced this pull request Jul 14, 2016

Merge pull request #26908 from pmorie/pv-dynprov
Automatic merge from submit-queue

dynamic provisioning proposal

Proposal for dynamic provisioning using storage classes; supercedes #17056

@kubernetes/sig-storage

xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016

Merge pull request #25413 from pmorie/storage-proposal
Automatic merge from submit-queue

Proposal: persistent volume selector

Partially replaces #17056.  Another proposal will follow dealing with dynamic provisioning on top of storage classes.

@kubernetes/sig-storage

xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016

Merge pull request #26908 from pmorie/pv-dynprov
Automatic merge from submit-queue

dynamic provisioning proposal

Proposal for dynamic provisioning using storage classes; supercedes #17056

@kubernetes/sig-storage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment