WIP: initial provisioning rollback KEP #1703

pohly · 2020-04-20T12:57:34Z

Currently pods can get stuck when they use multiple volumes with late binding. The case with a single volume is handled by rescheduling the pod, but that can fail when some volumes already got provisioned and then provisioning of the remaining ones fails.

The proposal is to rollback the provisioning in that case to unblock rescheduling.

k8s-ci-robot · 2020-04-20T12:57:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
To complete the pull request process, please assign msau42
You can assign the PR to them by writing /assign @msau42 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-storage/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

msau42 · 2020-04-22T23:58:50Z

keps/sig-storage/1702-volume-provisioning-rollback/README.md

+picking some other node, the library can treat unused volumes as "not
+provisioned yet" and set a new selected node for them.
+
+The external-provisioner then needs to de-provision those PVCs which are:


How do we signal to external-provisioner to delete these PVs? Normally it's signaled when a user deletes a PVC object, but I don't think we can delete the PVC object here.

When the tentatively provisioned volume is no longer accessible from the currently selected node.

msau42 · 2020-04-23T00:00:13Z

keps/sig-storage/1702-volume-provisioning-rollback/README.md

+
+## Proposal
+
+The proposal is to mark provisioned PVCs as "unused" via a label until


Who sets this label to "used"? How do we ensure there aren't races between the time we check the label and when we go to delete the PV?

Perhaps we can solve this entirely in the external-provisioner: in addition to the "selected node" label that it gets already now, it also needs to know which pod(s) are triggering the provisioning so that it can group several PVCs together.

Then we need an intermediate phase where external-provisioner creates PVs, but doesn't finally bind them to the PVCs yet. That's the "unused" state. Then when all volumes have been created, external-provisioner finishes the binding.

But it will still be tricky to update several objects together atomically 😢

Scratch that: there is more than one external-provisioner involved, so this is the wrong place.

A better place is the volume scheduling code. That's where kube-scheduler checks whether a pod can move ahead; we can block that until all volumes have been provisioned. Before giving it's okay, that code can also remove the "unused" flag.

msau42 · 2020-04-23T00:02:15Z

keps/sig-storage/1702-volume-provisioning-rollback/README.md

+
+The proposal is to mark provisioned PVCs as "unused" via a label until
+a pod actually starts to use them. Another label stores the node that
+the PVC was provisioned for.


We already have "selected-node" on the PVC.

Yes, but that's the label that describes the desired state and when we reschedule, that will change.

My intention was to add another label "provisioned-node" and then compare that against "selected-node" to detect when a volume no longer meets the requirements. But that may be too strict. A better solution would be to use the accessible topology of the volume to determine whether it needs to be recreated: if the currently selected node has no access to the volume, then (and only then) do we delete it and try again.

msau42 · 2020-04-23T00:09:16Z

keps/sig-storage/1702-volume-provisioning-rollback/README.md

+- Only CSI drivers using an up-to-date external-provisioner will be
+  supported.
+
+## Proposal


I would prefer having some stronger guarantee on provisioning and rolling back a group of volumes atomically to avoid races that could result in deletion of a volume in use.

So something that can prevent a Pod from using a volume before all the volumes are successfully provisioned. Maybe something like taints can help us here.

I would prefer having some stronger guarantee on provisioning and rolling back a group of volumes atomically

As multiple different storage systems are involved, it will have to be Kubernetes which treats the set of volumes as something that is kept unused until all volumes are ready.

avoid races that could result in deletion of a volume in use

I don't think the current proposal would have led to deleting a volume that is in use, because once it is used, it won't be deleted anymore. But another pod starting to use one of the volumes is a potential issue because that then can prevent rescheduling of the original pod which triggered provisioning.

So something that can prevent a Pod from using a volume before all the volumes are successfully provisioned. Maybe something like taints can help us here.

Yes, something like that or another label.

I would prefer having some stronger guarantee on provisioning and rolling back a group of volumes atomically

As multiple different storage systems are involved, it will have to be Kubernetes which treats the set of volumes as something that is kept unused until all volumes are ready.

As for static binding, we need to reverse multiple volumes for later use too (kubernetes/kubernetes#73263). It's a special case in which volumes are already provisioned and don't need provisioning rollback.

saad-ali · 2020-04-23T18:30:54Z

Meeting Notes

Problem being solved (see Storage Capacity Constraints for Pod Scheduling #1353 (comment)):

As soon as the scheduler makes a decision the AvailableCapacity per StorageClass reported in the StoragePool object will be incorrect, how do you accommodate for that?

The scheduler will just retry if scheduling fails.

Even if the scheduler retries won't it be stuck if the first volume provision decision it makes is inefficient?

TODO - Yes, this is still an issue, scheduler would be stuck -- need to think of a solution to this.

Strawman Proposal:
1. Reservation space by actually creating volume
2. Don't make volume available to use until provisioning of all volumes is complete.
3. Delete and recreate previously created volume, if needed, for more efficient scheduling.
How common is it to have multiple volumes from different storage systems?
- Pretty common for Local Volumes -- disks from different storage classes (e.g. SSD or HDD)
Challanges?
- Swapping PVCs may be easy but roll back may be challenging
Next steps:
- Ensure we have a complete flow we are fairly confident will work.

cofyc · 2020-05-28T03:04:33Z

Do we try to rollback PV/PVC binding or bind multiple PV/PVC objects together? That would solve this problem too. If possible, we can solve static binding and dynamic provisioning problems together.

fejta-bot · 2020-08-26T04:10:27Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

pohly · 2020-08-26T06:26:21Z

/remove-lifecycle stale

kikisdeliveryservice

I understand this is a WIP but wanted to note it is missing a kep.yaml

fejta-bot · 2020-12-26T22:05:00Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

pohly · 2021-01-02T11:35:53Z

/remove-lifecycle stale

fejta-bot · 2021-04-02T11:51:05Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

pohly · 2021-04-02T14:35:17Z

/remove-lifecycle stale

mishoyama · 2021-04-29T14:25:35Z

keps/sig-storage/1702-volume-provisioning-rollback/README.md

+
+## Alternatives
+
+Instead of allocating volumes, space could be merely reserved in


In Bare-metal CSI we decided to go with reservation approach

k8s-triage-robot · 2021-07-28T14:39:30Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

pohly · 2021-08-02T12:01:00Z

/remove-lifecycle stale

k8s-triage-robot · 2021-10-31T12:37:04Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pohly · 2021-11-02T07:07:03Z

/remove-lifecycle stale

k8s-triage-robot · 2022-01-31T07:38:09Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-03-02T07:45:28Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

pohly · 2022-03-02T14:19:44Z

/remove-lifecycle stale

alculquicondor · 2022-03-04T15:08:03Z

Is this being actively reviewed? How does it play with GA graduation?

pohly · 2022-03-04T18:12:30Z

This is a problem for all volumes with "wait for first consumer" provisioning mode, whether they use storage capacity tracking or not.

Interest in solving this seems to be low, therefore this KEP has not progressed beyond the initial "we could solve it like this" phase. That's probably because it doesn't really affect many users.

mishoyama · 2022-03-05T06:10:45Z

This is a problem for all volumes with "wait for first consumer" provisioning mode, whether they use storage capacity tracking or not.

Interest in solving this seems to be low, therefore this KEP has not progressed beyond the initial "we could solve it like this" phase. That's probably because it doesn't really affect many users.

I'm very interested for this issue to be addressed.

pohly · 2022-03-05T16:54:01Z

@mishoyama: Is that because you have pods with multiple volumes and these volumes are dynamically provisioned with "wait for first consumer"? Can you describe your use case a bit? For example, how long are pods running?

mishoyama · 2022-03-14T12:13:39Z

@mishoyama: Is that because you have pods with multiple volumes and these volumes are dynamically provisioned with "wait for first consumer"? Can you describe your use case a bit? For example, how long are pods running?

@pohly yes, we have multiple volumes per pod with "wait for first consumer" binding mode. This is storage system and pod lifecycle is not limited.

pohly · 2022-03-14T13:28:55Z

What's the advantage of having multiple volumes per pod instead of just one? Do they come from different storage providers?

Are you perhaps interested in working on this? The SIG Storage community meeting would be a good place to start discussing who could work on this and when.

mishoyama · 2022-03-15T07:57:46Z

What's the advantage of having multiple volumes per pod instead of just one? Do they come from different storage providers?

Are you perhaps interested in working on this? The SIG Storage community meeting would be a good place to start discussing who could work on this and when.

We might have up to 100 disks attached to the physical node. Handling IO requests in a single pod allows to manage related resources more effectively.

At the current moment we use scheduler extender to solve the problem with storage capacity. But anyway I'm interested in working on this.

pohly · 2022-04-07T14:24:33Z

/close

I don't have time to work on this anytime soon. If someone wants to pick it up, please reach out to SIG Storage.

k8s-ci-robot · 2022-04-07T14:26:46Z

@pohly: Closed this PR.

In response to this:

/close

I don't have time to work on this anytime soon. If someone wants to pick it up, please reach out to SIG Storage.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

initial provisioning rollback KEP

77e6580

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 20, 2020

k8s-ci-robot requested review from jsafrane and msau42 April 20, 2020 12:57

k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Apr 20, 2020

This was referenced Apr 20, 2020

provisioning rollback #1702

Closed

Storage Capacity Constraints for Pod Scheduling #1353

Merged

msau42 reviewed Apr 23, 2020

View reviewed changes

pohly mentioned this pull request Aug 25, 2020

late binding volume binding + multiple volumes kubernetes/kubernetes#94217

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 26, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 26, 2020

kikisdeliveryservice reviewed Sep 27, 2020

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 2, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 2, 2021

mishoyama mentioned this pull request Apr 29, 2021

Integration with Storage Capacity feature dell/csi-baremetal#381

Open

mishoyama reviewed Apr 29, 2021

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 2, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 31, 2022

pohly mentioned this pull request Feb 16, 2022

External provisioning problems kubernetes/kubernetes#72031

Open

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 2, 2022

k8s-ci-robot closed this Apr 7, 2022

github-actions bot mentioned this pull request May 6, 2022

Blog: Storage Capacity Tracking reaches GA in Kubernetes 1.24 rajula96reddy/rss-issue-action-test#15

Open


		## Proposal

		The proposal is to mark provisioned PVCs as "unused" via a label until


		## Alternatives

		Instead of allocating volumes, space could be merely reserved in

WIP: initial provisioning rollback KEP #1703

WIP: initial provisioning rollback KEP #1703

Conversation

pohly commented Apr 20, 2020 • edited Loading

k8s-ci-robot commented Apr 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cofyc May 28, 2020 • edited Loading

Choose a reason for hiding this comment

saad-ali commented Apr 23, 2020

cofyc commented May 28, 2020 • edited Loading

fejta-bot commented Aug 26, 2020

pohly commented Aug 26, 2020

kikisdeliveryservice left a comment

Choose a reason for hiding this comment

fejta-bot commented Dec 26, 2020

pohly commented Jan 2, 2021

fejta-bot commented Apr 2, 2021

pohly commented Apr 2, 2021

Choose a reason for hiding this comment

k8s-triage-robot commented Jul 28, 2021

pohly commented Aug 2, 2021

k8s-triage-robot commented Oct 31, 2021

pohly commented Nov 2, 2021

k8s-triage-robot commented Jan 31, 2022

k8s-triage-robot commented Mar 2, 2022

pohly commented Mar 2, 2022

alculquicondor commented Mar 4, 2022

pohly commented Mar 4, 2022

mishoyama commented Mar 5, 2022

pohly commented Mar 5, 2022

mishoyama commented Mar 14, 2022

pohly commented Mar 14, 2022

mishoyama commented Mar 15, 2022

pohly commented Apr 7, 2022

k8s-ci-robot commented Apr 7, 2022

pohly commented Apr 20, 2020 •

edited

Loading

cofyc May 28, 2020 •

edited

Loading

cofyc commented May 28, 2020 •

edited

Loading