Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Flaking Test: e2e-gci-gce-scalability #69473
This test has been flaky since 9/24. That's a bit of a problem for 1.13 release, because this is the fastest-running of the e2e scalability tests, and we count on checking it first for scalability fixes.
The problem appears to be that the cluster of Google VMs sometimes doesn't deploy:
Can someone from SIG-scalability take a look at this? Thanks!
... in fact, for the last 15 hours, it's been failing exactly every other run.
When I look at the end of the last successful run, and the start of the next run, there's less than 3 minutes gap. Could it just be that the test runs are too close together, and gcloud hasn't dropped all the instances yet?
So I took a look into that internally, and sometimes clusters don't start due to lack of quota.
So it's configuration issue.
the question is what has changed recently - my hypothesis is that we are choosing projects for those suites somewhat randomly and not all of them have enough quota.
@krzyzacy - if you remember any recent changes here from the top of your head
The problematic thing is quota for:
Because of some unknown reason, this quota is not even present in the other 4 projects.
So given that 4 out of 8 projects have broken quota, that explains roughly 50% failure rate.