Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flawed test: Pod Disks detach in a disrupted environment [Slow] [Disruptive] when node is deleted #85972

Open
msau42 opened this issue Dec 5, 2019 · 6 comments
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@msau42
Copy link
Member

msau42 commented Dec 5, 2019

Which jobs are failing:

Which test(s) are failing:
Pod Disks detach in a disrupted environment [Slow] [Disruptive] when node is deleted

Since when has it been failing:
12/4

Testgrid link:
https://k8s-testgrid.appspot.com/sig-storage-kubernetes#gce-serial

Reason for failure:

Anything else we need to know:

@msau42 msau42 added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Dec 5, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 5, 2019
@msau42
Copy link
Member Author

msau42 commented Dec 5, 2019

@kubernetes/sig-storage-test-failures

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 5, 2019
@msau42
Copy link
Member Author

msau42 commented Dec 5, 2019

It seems very odd that 9eda997#diff-2cebda6448c20d8dff9dc037fb8c0577 would cause the test to fail, but the timing seems to line up very closely.

@msau42
Copy link
Member Author

msau42 commented Dec 5, 2019

In trying to fix the failure, I also noticed that the test itself is not right. If a node gets recreated fast enough, then the Pod may not get evicted, and would still remain scheduled to the same node, so the disk will also still be scheduled to the same node. However the test is checking that the disk does get detached from the node, which may never happen. However, it's also passing because it doesn't actually fail on error.

Also I think there's a broader issue that these tests are very provider/platform specific. I would like to see if we can rewrite these tests to be more platform-agnostic, and test higher-level functionality rather than attach/detach of disks. For example, a test case for a pod being rescheduled to another node can be written in a platform-agnostic way and still test attach/detach functionality. We already have some tests like that and we should see if any of the tests in pd.go can be removed.

@msau42
Copy link
Member Author

msau42 commented Dec 5, 2019

For the short term, I'm just going to disable this test. It will require a bit of reworking to fix it properly.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2020
@msau42
Copy link
Member Author

msau42 commented Apr 3, 2020

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

3 participants