New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PD multizone tests are flaky #72378
Comments
@kubernetes/sig-storage-test-failures |
The runs started flaking after #70862 merged. cc @pohly @verult Here's one failing run: Pod pod-subpath-test-gcepd-dynamicpv-pfsm failed to schedule because of conflicting node selectors:
We shouldn't need to set the node selector for PD tests using PVs. |
The PR broke the subpath test because the modification to the config struct embedded in the driver info now persists across tests: kubernetes/test/e2e/storage/drivers/in_tree.go Lines 1250 to 1255 in a9c7dfb
A quick fix would be to reset that field in CreateDriver. The long-term fix is the change discussed in #72288 My preference is to do the quick fix now, then either merge or close PRs in this order:
What I'd like to avoid is having to rebase PR #70992 on top of the long-term solution for issue #72288 - that'll be lots of code conflicts. |
/assign PR is here: PR #72410 |
PR kubernetes#70862 made each driver responsible for resetting its config, but as it turned out, one place was missed in that PR: the in-tree gcepd sets a node selector. Not resetting that caused other tests to fail randomly depending on test execution order. Now the test suite resets the config by taking a copy after setting up the driver and restoring that copy before each test. Long term the intention is to separate the entire test config from the static driver info (kubernetes#72288), but for now resetting the config is the fastest way to fix the test flake. Fixes: kubernetes#72378
Which jobs are failing:
gce/gke multizone/regional jobs
Which test(s) are failing:
Only PD tests
Since when has it been failing:
Around 12/21 with the 16:49 run
Testgrid link:
https://k8s-testgrid.appspot.com/sig-storage#gce-multizone
Reason for failure:
TBD
Anything else we need to know:
/kind failing-test
The text was updated successfully, but these errors were encountered: