Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: DeleteVolume fails due to device busy #96

Closed
msau42 opened this issue Sep 7, 2019 · 7 comments
Closed

Flaky test: DeleteVolume fails due to device busy #96

msau42 opened this issue Sep 7, 2019 · 7 comments
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@msau42
Copy link
Collaborator

msau42 commented Sep 7, 2019

https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubernetes-csi_csi-driver-host-path/95/pull-kubernetes-csi-csi-driver-host-path-1-13-on-kubernetes-1-13/1170365877112016899

Test fails:

PersistentVolume pvc-b5383c0a-d189-11e9-a013-0242ac110002 still exists within 5m0s

Provisioner logs:

I0907 16:09:21.681142       1 controller.go:192] GRPC error: rpc error: code = Internal desc = failed to delete volume b5755daf-d189-11e9-93ab-2ab9e465132f: unlinkat /csi-data-dir/b5755daf-d189-11e9-93ab-2ab9e465132f: device or resource busy
E0907 16:09:21.681197       1 controller.go:1120] delete "pvc-b5383c0a-d189-11e9-a013-0242ac110002": volume deletion failed: rpc error: code = Internal desc = failed to delete volume b5755daf-d189-11e9-93ab-2ab9e465132f: unlinkat /csi-data-dir/b5755daf-d189-11e9-93ab-2ab9e465132f: device or resource busy

Hostpath driver logs show stage, publish, unstage being called, but not unpublish. Unfortunately, we don't have k8s system logs to see what's going on in kubelet.

@msau42
Copy link
Collaborator Author

msau42 commented Sep 7, 2019

/kind failing-test

@k8s-ci-robot k8s-ci-robot added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Sep 7, 2019
@msau42
Copy link
Collaborator Author

msau42 commented Sep 7, 2019

@kubernetes-csi/csi-misc

@msau42
Copy link
Collaborator Author

msau42 commented Sep 7, 2019

@msau42
Copy link
Collaborator Author

msau42 commented Sep 7, 2019

Ok I think it's because in 1.13 logic, we were checking if a directory was a mount point before calling unpublish: https://github.com/kubernetes/kubernetes/blob/release-1.13/pkg/volume/csi/csi_mounter.go#L299

Which I think won't work for hostpath since hostpath is just a bind mount of a directory on the root fs. We removed the logic in 1.14 when implementing ephemeral volumes.

I don't really understand why this isn't failing all the time though (instead of just being flaky).

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2019
@msau42
Copy link
Collaborator Author

msau42 commented Dec 6, 2019

/close
We no longer support 1.13 and 1.14 tests are looking stable

@k8s-ci-robot
Copy link
Contributor

@msau42: Closing this issue.

In response to this:

/close
We no longer support 1.13 and 1.14 tests are looking stable

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants