New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run unit tests 2 instead of 3 times via bazel #94699
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: spiffxp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/sig testing |
7 CPUs to run unit tests feels excessive. If we are not actually caching unit test results from unaffected packages and are running all tests 3 times on every PR push, then reducing to 2 runs is reasonable. |
/lgtm unhold at will |
cached-ness is easily confirmed in the log output. a line like:
means we ran it. it will say something like |
/milestone v1.20 It's pretty easy to see when #93605 landed. Makes me wonder if the retry_flake_attempts setting was added before the move to RBE |
(Pulling this out of slack) and yes it turns out if we run multiple times, we can't cache, unless we want to accept potentially caching failures https://docs.bazel.build/versions/master/user-manual.html#flag--cache_test_results
|
#59283 added ... but it was in build/root/Makefile as bazel flags before that So, yeah, it was in the first PR that added .bazelrc on 2017-01-20 (#40231). Definitely before we tried running bazel via RBE |
What type of PR is this?
/kind cleanup
/kind flake
What this PR does / why we need it:
Followup to #93605 which changed from retryng 3 times to avoid flakes, to requiring a pass 3 times.
In order to migrate the pull-kubernetes-bazel-test job to a communty owned cluster with dedicated resources, we need to stop using RBE. When we did so, we started seeing many more flakes. (ref: kubernetes/test-infra#19070 (comment))
RBE fans out to a large pool of compute, whereas now we're running in a single pod on a single node.
It may be that using 2 runs instead of 3 will relieve some resource contention, giving us an appropriate compromise between not ignoring flakes but not leaning all the way into causing them. (ref: kubernetes/test-infra#19070 (comment))
Which issue(s) this PR fixes:
n/a
Special notes for your reviewer:
/hold
I'd like to see if the behavior of pull-kubernetes-bazel-test improves now that we've given it 7 CPUs instead of 4. I'm teeing this up as a further mitigation if we decide that's not enough, or if 7 CPUs is excessive resource consumption.
/cc @liggitt @BenTheElder