-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIx storagecluster deletion is stuck forever #1819
FIx storagecluster deletion is stuck forever #1819
Conversation
Currently during deletion of a storagecluster, the storageclasses are being deleted before the cephcluster. The PVs are not getting deleted as they need the storageclasses for that. And the PVs in turn are blocking the deletion of PVCs, Cephblockpool/cephfilesystem and finally the cephcluster. This PR moves the deletions of storageclasses to after the cephcluster deletion. Signed-off-by: Malay Kumar Parida <mparida@redhat.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: malayparida2000 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold I'd prefer we add a check for each storageclass deletion to check for existence of PV objects of respective driver and error out with the pv/pvc name so user is aware what to delete. |
Is there any particular reason why we can't alter the deletion order? /cc @travisn , Can you also please take a look |
There can be instances when PVC creation starts after cephcluster deletion step is completed. This will leave stale resources of PVC/PV pair with no backend cephcluster and corresponding SC when storageclass deletion step is completed. |
When the CephCluster CR is deleted, Rook will check for the existence of the PVs, so it won't continue with the deletion until those PVs are cleaned up. I believe the new ordering will work as we need it to. |
Simplest is the following.
The error thrown by ocs-op will very clearly state why deletion is blocked and what the user has to do. |
But if we delete the StorageClass before the cluster is deleted, users won't be able to create new PVCs, and that can prevent the race condition you mention. If we wait to delete them until after, PVCs may or may not fail to delete, and they may be creating while the cluster is deleting. Or is there something else going on as well? |
I think the comment in the BZ here explains the needed approach. I don't actually think there is an ordering problem, the ordering looks correct before this PR. The problem must be in properly waiting during deletion before continuing. As in that comment, we need to first delete the blockpool, objectstore, and filesystem CRs, and then wait for them to be deleted. Rook will not allow them to be deleted until all their PVs and buckets are deleted. Then the OCS operator can continue with the deletion of the storage classes and CephCluster CR, which should no longer have any PVs to wait on. As long as the block/object/file CRs are first deleted, we don't need to worry about a race condition of PVs being created before the storage class and CephCluster CR is deleted. If someone tries to create a PVC, the PV creation will fail because the block/object/file CRs are deleted. |
@malayparida2000: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Resolves-https://bugzilla.redhat.com/show_bug.cgi?id=2060897
Currently, during deletion of a storagecluster, the storageclasses are being deleted before the cephcluster. The PVs are not getting deleted as they need the storageclasses for that. And the PVs in turn are blocking the deletion of PVCs, Cephblockpool/cephfilesystem and finally the cephcluster. This PR moves the deletions of storageclasses to after the cephcluster deletion.
Signed-off-by: Malay Kumar Parida mparida@redhat.com