Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node E2E Test timeout #63240

Closed
Random-Liu opened this issue Apr 27, 2018 · 6 comments
Closed

Node E2E Test timeout #63240

Random-Liu opened this issue Apr 27, 2018 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@Random-Liu
Copy link
Member

Random-Liu commented Apr 27, 2018

Node E2E Test constantly timeout now.

https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet&graph-metrics=test-duration-minutes
From the test duration graph, it is very clear that the node e2e test duration changed from ~45m to >1h after the following 2 PRs are merged:

Since #63142 is only a README change, I believe it is #62913.

Because the test timeout, we don't know how long the test can be, or whether some component just stuck.

In node e2e test, we do use namespace controller. So I believe #62913 makes namespace controller work much slower or probably sometimes stuck.

This seems to be a significant regression to me. @deads2k @liggitt

/cc @kubernetes/sig-node-bugs @kubernetes/sig-api-machinery-bugs

@Random-Liu Random-Liu added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Apr 27, 2018
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Apr 27, 2018
@Random-Liu
Copy link
Member Author

@deads2k Is this expected?

@liggitt
Copy link
Member

liggitt commented Apr 27, 2018

Since the namespace controller is using a single rest client rather than constructing lots of separate ones now, a commensurate increase in the allowed QPS should be made

@liggitt
Copy link
Member

liggitt commented Apr 27, 2018

opened #63251

@liggitt
Copy link
Member

liggitt commented Apr 27, 2018

also, just noticed the node-e2e was still using default QPS of 5 (burst 10), which would have made it way slower than the normal namespace controller that runs

k8s-github-robot pushed a commit that referenced this issue Apr 27, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump QPS on namespace controller

#62913 switched from using a client pool, where each groupVersionResource got its own rest client, to a single client.

This increases the QPS to account for increased requests using a single rest client rate limiter.

Fixes #63240

```release-note
NONE
```
@Random-Liu
Copy link
Member Author

Since the namespace controller is using a single rest client rather than constructing lots of separate ones now, a commensurate increase in the allowed QPS should be made

Make sense.

@liggitt Thanks for fixing this! I'll keep an eye on the test dashboard.

@Random-Liu
Copy link
Member Author

I believe this fixes the node e2e test. Thanks! @liggitt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

3 participants