-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #33839
Comments
We should ignore failures for enormous-cluster, those were experiments. As for kubemark-scale, those issues were etcd-related. We are already working on migrating to etcd3. |
Failed: [k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite}
|
Failed: [k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite}
|
All failures from kubemark-scale and gce-enormous cluster are expected. So I'm removing them all from here. |
So there are 2 failures from gce-scalability that we should take a look. |
Regarding the second failure: I took a look into it and here are some findings:
This shows that the error happened: kubernetes/test/e2e/framework/util.go Line 2793 in a2bf827
because there is no other return between the first log: kubernetes/test/e2e/framework/util.go Line 2774 in a2bf827
and the second one: kubernetes/test/e2e/framework/util.go Line 2796 in a2bf827
That said, I have no idea why the log is:
it should be something like:
Any thought on it? @gmarek ?? The second part is also super suspicious, because the WaitPollImmediate can return
This failure is extremely strange. |
Actually - please ignore my message above. I misread the logs.
And there is no this log for 121 RC:
That said what happened here is that we hit this error: |
I looked into apiserver logs, and it confirms that theory.
it shows that the podStoreForRC should be created around 11:53:14. However from apiserver logs (11;53:14 is 18:53:14) so:
What is more:
Looking deeper into logs, the watch that finished at 19:04:24 (thus it started at 18:55:08) was the one that is corresponding to our podStoreRC. So the problem is that podStoreRC wasn't initialized correctly, because WATCH request was sent 2 minutes after it was supposed to be send. |
For the first test:
I filed: #36502 for this problem. |
So the only remaining question in this issue is to answer, why "WATCH" request from the reflector wasn't issued in this run: [as described in details in https://github.com//issues/33839#issuecomment-259431136 ] |
Actually, it's not about issuing "WATCH" itself. We now that the LIST request from reflector finished successfuly:
at 18:53:18. From the reflector code:
That means that we didn't get to the second place for the next ~1 minute (which is a timeout for initializing podStore). So, the only reason for that could have been overloaded machine where the test was running.
I also looked whether this could have been because of qps-limits, and it wasn't because there weren't many requests from e2e test. So as the only possible AI, I will just increase timeout for waiting for podStore to be initialized. |
Automatic merge from submit-queue Increase initialization timeout for podStore Fix #33839 - see #33839 (comment) for more details.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-kubemark-gce-scale/1451/
Failed: [k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite}
Previous issues for this test: #26544 #26938 #27595 #30146 #30469 #31374 #31427 #31433 #31589 #31981 #32257 #33711
The text was updated successfully, but these errors were encountered: