Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale BDC gets created if PVC is deleted before it is bound to PV. #2989

Closed
kmova opened this issue Apr 6, 2020 · 16 comments
Closed

Stale BDC gets created if PVC is deleted before it is bound to PV. #2989

kmova opened this issue Apr 6, 2020 · 16 comments

Comments

@kmova
Copy link
Member

kmova commented Apr 6, 2020

Local PV (device) provisioner leaves a stale BDC in the following case:

  • On PVC request, creates a BDC
  • If the system doesn't have a matching BD, then PVC request keeps retrying, and so does the NDM operator that keeps retrying
  • Now if the user deletes the PVC, then Local PV (device) provisioner doesn't get the trigger to process delete, as there was no PVC.

This causes the stale BDC to exist in the system, and NDM keeps on trying to process it.

To recover from this the BDC needs to be manually edited and then deleted.

Name:         bdc-pvc-4c887f9e-0a02-4cac-abf9-9f485ba8c39e
Namespace:    openebs
Labels:       <none>
Annotations:  <none>
API Version:  openebs.io/v1alpha1
Kind:         BlockDeviceClaim
Metadata:
  Creation Timestamp:  2020-04-06T13:34:34Z
  Finalizers:
    local.openebs.io/finalizer
  Generation:        2
  Resource Version:  104120
  Self Link:         /apis/openebs.io/v1alpha1/namespaces/openebs/blockdeviceclaims/bdc-pvc-4c887f9e-0a02-4cac-abf9-9f485ba8c39e
  UID:               7f2c0d11-3206-4f8a-8382-28da5897666c
Spec:
  Block Device Node Attributes:
    Host Name:  gke-kmova-helm-default-pool-61dc2adc-1ppx
  Device Claim Details:
    Block Volume Mode:  Block
  Device Type:          
  Host Name:            
  Resources:
    Requests:
      Storage:  2Gi
Status:
  Phase:  Pending
Events:   <none>
@github-actions
Copy link

github-actions bot commented Jul 6, 2020

Issues go stale after 90d of inactivity.

@Icedroid
Copy link

Icedroid commented Aug 5, 2020

bug fixed?

@kmova kmova reopened this Aug 5, 2020
@kmova
Copy link
Member Author

kmova commented Aug 5, 2020

Not yet, but I think I know how to get this fixed now. Looking for some help with contributions.

@supra08
Copy link

supra08 commented Sep 3, 2020

Hi @kmova, Is this open? I would like to give it a try.

@kmova
Copy link
Member Author

kmova commented Sep 3, 2020

Thanks @supra08. Yes this is still open. You can join our contributor channel on Kubernetes slack #openebs-dev and we can help you out with more details.

@supra08
Copy link

supra08 commented Sep 3, 2020

Sure! thanks

@maxisam
Copy link

maxisam commented Dec 2, 2020

well, I removed the finalizer and bdc is gone now but bd is still on claimed status

@kmova kmova added the Community Community Reported Issue label Dec 2, 2020
@kmova
Copy link
Member Author

kmova commented Dec 2, 2020

@maxisam - Is that BD associated with the BDC that was just deleted. If yes, as a workaround, you could delete the BD itself and restart the NDM pod on that node and the BD will be recreated.

Also, can you describe what steps were performed to get into this state.

@maxisam
Copy link

maxisam commented Dec 2, 2020

The whole thing was from a misunderstanding. I thought one bd can serve more than 1 bdc. so I created multiple pods asking for storages. And that is how things started.

I realized most pc and pvc can't bind properly. I decided to remove them. And that parts works.

Both bind and unbind are removed. I removed the deployment as well.

and then the bdc is still there. one of them is bound with bd.

Removed the finalizer of bdc. and all of them were gone, but bd were still there, one of them is claimed.
Uninstalled openebs chart but it doesn't make any difference on bd.

Finally, I just removed bd, and most of them are gone now. but the one was claimed is still there.

And removed the finalizer and the last bd is gone.

@kmova
Copy link
Member Author

kmova commented Dec 2, 2020

As part of this fix, the following case also needs to be solved:

  • Let the BD be bound to the BDC
  • Delete the BDC after manually removing the finalizers
  • Check if the BD is still pointing to the deleted BDC
  • Delete BD and make sure it is getting deleted

@github-actions
Copy link

github-actions bot commented Mar 3, 2021

Issues go stale after 90d of inactivity.

@niladrih niladrih self-assigned this Apr 20, 2021
@kmova kmova moved this from To Do to In Progress in Dynamic Local PV Apr 20, 2021
@kmova kmova reopened this Apr 20, 2021
@kmova kmova added this to Pre-commits and Designs - Due: Mar 31 2021 in 2.9 Release Tracker - Due May 15th. Apr 20, 2021
@kmova kmova moved this from Near term goals to In progress in Dynamic Local PV Apr 20, 2021
@kmova kmova moved this from Pre-commits and Designs - Due: Mar 31 2021 to To do in 2.9 Release Tracker - Due May 15th. Apr 27, 2021
@mynktl mynktl moved this from To do to Pre-commits and Designs - Due: May 31 2021 in 2.10 Release Tracker - Due June 15th. May 18, 2021
@Nivedita-coder
Copy link

Hi, @kmova is this still open? I would like to work on this issue.

@niladrih
Copy link
Member

niladrih commented May 21, 2021

@kmova -- Proposing a solution to this:

  1. Error while trying to find a BD bound to a BDC (link)
  2. Delete BDC
  3. Return errors upwards to Provision method.
  4. Control loop/sync handler invokes Provision again (from external-provisioner)
  5. Retry with a new BDC

@niladrih
Copy link
Member

Alternatively, please also consider this solution:

  1. Error while trying to find a BD bound to a BDC (link)
  2. Run a PVC watcher sidecar container
  3. Return error upwards to Provision method after container probe reports healthy status.
  4. Watch for PVC delete
  5. Delete BDC on PVC delete event
  6. Exit sidecar

@niladrih
Copy link
Member

Proposal (as discussed with @kmova):

  1. Error while trying to find a BD bound to a BDC (link)
  2. Increment counter for the number of tries
  3. Check for BDC age
  4. Continue to retry for DefaultFailedProvisionThreshold times.
  5. Check for number of retries and BDC age. If a set threshold is reached in either, delete the BDC.
  6. PV Provisioning is discarded from workqueue.

@mynktl mynktl moved this from Pre-commits and Designs - Due: May 31 2021 to Pushed to Next release due to WIP in 2.10 Release Tracker - Due June 15th. Jun 1, 2021
@github-actions
Copy link

Issues go stale after 90d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

@github-actions github-actions bot closed this as completed Feb 3, 2022
2.11 Release Tracker - Due July 15th. automation moved this from To do to Done Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Dynamic Local PV
  
In progress
Development

No branches or pull requests

7 participants