-
Notifications
You must be signed in to change notification settings - Fork 39.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes deleted volume unexpectedly while leaving PV/PVC state unchanged #44372
Comments
cc @kubernetes/sig-storage-bugs |
Can you please post complete controller-manager and apiserver logs between 23:02:08.631235 (volume is created) and 23:04:54.685169 (it's deleted)? |
@jsafrane sorry we have teared down the cluster. But I tried to paste everything related to that PV/PVC/Volume, and rests are irrelevant. For API server, that same go-routine panic happened 4~5 times and the one I pasted was the last one. For controller manager, that's really everything I have, as I copy and pasted them line by line here and anything in between were related to other controllers. Anything suspicious but missing? |
I want to see the reason why a PV is deleted. There should be at least There is a very slim possibility that the API server crashed after it has stored a new PV object and before sending a response to the controller that the object was written. The controller will then try to delete the volume and it won't try to delete the PV from Kubernetes because it was never successfully written. There should be "Error creating provisioned PV object for claim %s: %v. Deleting the volume" in the controller-manager logs at level 3 and also the same error should be visible in |
@jsafrane thanks for the detailed explanation. It might because we are turning Btw is this slim possibility fixed/improved in later version? Any pull request / bug I can refer to? |
I created patch #44719 which fixes my suspicion in my previous comment, but it's hard to guess if it really fixes the bug you are reporting because I don't know if it's the same bug. Collect the logs please next time. |
/sig storage |
Having a similar issue. Not sure if it is triggered by an apiserver panic, but is causing my PVs to get deleted, causing PVC status of 'Lost'. Most are prevented since they're in use, but unused ones are deleted about once every two weeks.
|
@Calpicow, this looks like a different bug. The log snippet shows that a PVC was deleted by something and Kubernetes tries to delete a PV as a consequence. It does not show a PVC getting Lost because something has deleted a PV. Please create a new issue and post there longer controller-manager log snippet where a PVC gets Lost + few minutes before. |
Before I create an issue... the PVC is definitely not getting deleted. Below is the result when the PV disappears:
|
Automatic merge from submit-queue (batch tested with PRs 44719, 48454) Fix handling of APIserver errors when saving provisioned PVs. When API server crashes *after* saving a provisioned PV and before sending 200 OK, the controller tries to save the PV again. In this case, it gets AlreadyExists error, which should be interpreted as success and not as error. Especially, a volume that corresponds to the PV should not be deleted in the underlying storage. Fixes #44372 ```release-note NONE ``` @kubernetes/sig-storage-pr-reviews
Is this a request for help? NO
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):
Kubernetes volume deleted unexpectedly. Kubernetes volume deleted after apiserver panic
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT
Kubernetes version (use
kubectl version
):Environment:
uname -a
):Linux ip-10-128-0-9 4.4.41-k8s #1 SMP Mon Jan 9 15:34:39 UTC 2017 x86_64 GNU/Linux
What happened:
kubernetes controller manager deleted aws ebs volume unintentionally (no one issued a DELETE api call), while leaving PV/PVC state unchanged. Before deleting volume, api server had couple of go routines crashed due to timeout
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Cannot reproduce it stably.
Anything else we need to know:
Here are some details:
PV status (After volume is deleted)
PVC status (after volume is deleted):
kube-controller-manager log about this volume (as well as pv, pvc)
kube-apiserver panics
The text was updated successfully, but these errors were encountered: