-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate PV binding from PV.Spec.ClaimRef to Binding subresource #7438
Comments
|
Looking through pods, it appears the binding is stored on pod.Spec.Host inside a GuaranteedUpdate helper. Data-wise, this is 1:1 with pv.Spec.ClaimRef. The introduction of BindingREST for pods is meant to narrow who can call Bind? This would be similar to the status subresource, which prevents users, for example, from setting status. If my data is correct, then this is not a breaking API change. This is an improvement in the client for PVs and an improvement in the transaction (the update helper). @smarterclayton Does that sound correct to you? If it does, I think this lessens the immediacy of this change, even as it remains a good and necessary next step for PV. |
|
@thockin's argument is also that I have the bind on the wrong object in the paradigm. The only difference is that Node:Pod is 1:N while PV:Claim is 1:1. Querying for volumes with non-nil ClaimRefs == querying for Claims with non-nil VolumeNames. I think it's a wash, honestly. An array of pods on Node would be ugly. A single ref on either PV or PVC is equally clean, IMHO. There's an argument to be made to limit exposure and change the ref to claim.Status.VolumeName. This is an easy change. |
If pod:node as claim:pv, then this is not right. The node is consumed by In pod-space, you assign the pod to the host. You assign the claim to the Now, there's a difference in that we don't offer an API to traverse from a As we move toward this for networking, you'll see something like a
I don't buy that. We allow users to choose a node. It's not the common On Tue, Apr 28, 2015 at 9:44 AM, Mark Turansky notifications@github.com
|
There is no reason the pod:node/claim:pv bind has to work differently. Perhaps this nuance wasn't discussed during API review because it's 1:1 whereas 1:N for pods and nodes is a much bigger difference. I think the nuance you're defining is correct. It would be achieved exactly the same way as the current bind, just with the objects flipped, because it's 1:1. It is not an insignificant amount of rework.. The proposed API changes are:
Other improvements:
|
It's not just API changes, of course, but the binder as well. The volume plugin is easier. |
On Tue, Apr 28, 2015 at 11:05 AM, Mark Turansky
I am confused. This SOUNDS like what I wanted - which you were To be clear: the Binding is POSTed to $PVClaim/binding, referencing
|
@thockin Yes, these are the API changes you are wanting. I understand the semantic difference between the two ways of binding. Personally, I don't think having the bind on PV is wrong. Having it on Claim is also not wrong. To me, they are the same because this is 1:1, which is very different than 1:N for node/pod. Moving it is a matter of "purity", I think, because they are the same technically. I have no problem making these changes, though. I cringe at the rework but this won't be the first time I've had to rework something (nor the last). |
I think th emost important part for the short term is getting the *Ref On Wed, Apr 29, 2015 at 7:55 AM, Mark Turansky notifications@github.com
|
Agreed that the API changes are required first and the other improvements can happen subsequently. |
I've been thinking about the impact this change will have on the binder. I think it'll be simpler, actually. I retract my previous comments about the two being technically the same. |
Sorry I've been MIA on this. One concern I have with flipping the reference direction: We need a protocol for removing or recycling a PV once a PVC on it has been deleted. We don't want to immediately make it available for re-binding. Where would that information be recorded? Status isn't sufficient, since status must be observable from somewhere else. With pods, this information is in the container runtime (and on the filesystem). If we always wanted to delete the PV object upon removal of the PVC, we could record it in an existence dependency, as I proposed in #1535. |
/subscribe |
@bgrant0607 I see what you mean. With PV.Spec.ClaimRef, we know the volume was once bound. If that claim does not exist, we know the volume has been released. We can build Status from that. Conversely, if we move the bind to claim.Spec.VolumeName and the claim is deleted, we can't rebuild PV's Status should that data be lost for some reason. |
If @bgrant0607 can convince @thockin that the current bind is correct, then the PV framework can be ungated. It is already merged with the bind on pv.Spec.ClaimRef. |
Can we reach closure tomorrow on a direction? I would argue that claims and PVs should bond in both directions. The claim is released, and reclamation happens with the previous claim marked no longer in use. The PV is the entity that determines whether a claim is official - claims shouldn't be able to spuriously associate themselves with random PVs by guessing global names (I'm not allowed to create a claim bound to your PV unless the PV points to me). |
@smarterclayton It currently works nearly exactly like you describe, except that the PVC is deleted instead of being flagged unused. The PV retains ClaimRef as the binding reference and the absence of the PVC indicates the volume was released, advancing pv.Status.Phase. In another thread @bgrant0607 (and I think @thockin) were keen to make sure ClaimRef was not deleted by the binder on release but that another process would handle reclamation and remove ClaimRef. I'm currently working the recycler now. |
This didn't get mentioned during the hangout. :-( The API churn right now is impossible to keep up with. Expect latency to be high on any API changes. We may just need to stack-rank them and resolve them in order. I'd be happy with bi-directional. We're going to need something similar in order to support sole tenancy of users' pods on nodes. That's somehow going to need to be represented on the node once a pod requiring sole tenancy is bound to it. So, bind on PVC and it flows to PV (if allowed). |
A PV should have a way to "fail" a PVC that's not allowed, similar to how kubelet now can reject pods. |
So we're clear, is the existing implementation correct, then? The single bind from which I can build all other objects and status is pv.Spec.ClaimRef from which I derive pvc.Status.VolumeRef. I'm working through the recycler implementation and the bind as described above is a natural fit for the volume's lifecycle. If this is correct, I need to just remove the gate and this feature will be enabled by default. |
If we want the bind on pvc.Spec.VolumeName, then please review #7529. Either way, please let me know how to proceed and I'm happy to follow up. |
I may be on the wrong side of this battle, but i'll try anyway.
Technically, you can rediscover status by scanning all PVClaims and seeing
It's now trivial for a node to ask "what pods are assigned to me". It's If we care that it be easy to go from PV to any claim on that PV, we should Alternately, we could write to Spec on both PV and PVClaim, and by analogy I don't mean to jerk you around Mark. We should make a call and just live Tim On Fri, May 8, 2015 at 7:01 PM, Mark Turansky notifications@github.com
|
@thockin I'm ready to put this to bed. Hangout Tuesday to discuss works. How do we best schedule it? I think some of you have more demands on your time than I do. |
Except for a PVClaim that is deleted. volume.Status.ClaimRef might still be set on the PV, but that status is not observable by itself and cannot be rebuilt because there would be no corresponding claim. The volume wouldn't know it requires recycling. Perhaps this is why @smarterclayton suggested keeping PVClaim around until after the PV is recycled. That requires another Phase for PVClaim (such as Released with the subsequent delete of the claim happening after the volume is recycled). But adding extra phases to achieve the same effect seems like a violation of Occam's Razor. |
On Mon, May 11, 2015 at 6:51 AM, Mark Turansky notifications@github.com wrote:
Of course. Not sure how I let that escape. Shame. PV/PVC is
|
@thockin does that mean the current implementation w/ bind on PV.Spec.ClaimRef is correct? If so, I can submit a PR to remove the gate. |
Spoke with Brian, he agreed with my feeling. Binding should be posted to Workable? Or did I miss something? On Mon, May 11, 2015 at 9:16 AM, Mark Turansky notifications@github.com
|
We will eventually need to support sole tenancy for pods. I want to follow the same model in that case: the pod has /binding, as in the non-exclusive case, and the exclusive reservation is then confirmed/recorded by writing the spec of the node. An async. process can determine when the exclusive reservation is valid to remove. |
Brian, I think what we have here is compatible with that goal. On Tue, May 12, 2015 at 12:42 PM, Brian Grant notifications@github.com
|
@bgrant0607 this is a breaking API change.
@thockin @smarterclayton
From #6105
and
Per @thockin's questions above...
I think the PersistentVolume should own the binding. It is 1:1 with Claims and it is more natural to query for volumes and see what's bound than to traverse all namespaces to see if a claim exists for a given volume.
Claim.Status.VolumeRef (potentially changed to VolumeName, since Namespace is unneeded) can stay where it is. That status is reconstructable by observing the bound volume.
Allowing users to ask for a specific PV breaks the separation between them. We can steer users towards some volumes (through admission control that retains some types/sizes of volumes to certain namespaces) but still not give them the exact volume.
I can add the Binding subresource similar to pod, but I need some help understanding how to use it effectively. It looks like Pod's use of Binding is still in flux (see https://github.com/GoogleCloudPlatform/kubernetes/blob/master/plugin/pkg/scheduler/factory/factory.go#L294).
Technical advice and guidance is required :)
The text was updated successfully, but these errors were encountered: