-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using wait.Until doesn't work for long durations #31345
Comments
So to summarize, the main problem in my opinion is that: Adding @smarterclayton - who added wait.Until IIRC @kubernetes/sig-scalability |
We should be renewing the watch. However that's not enough - you really need Get then Watch. So there has to be something that abstracts watcher creation (a watch wrapper). |
Yeah - I agree that this is not trivial change. It should be somewhat similar to what we are doing in reflector. I guess it's not 1.4 change though. |
Agreed that anyone who switched to watch.Until may have done so prematurely. |
Automatic merge from submit-queue Don't set timeouts in clients in tests We are not setting timeouts in production - we shouldn't do it in tests then... Addresses point 2. of #31345
This is causing quite a lot of test flakes. Can we replace |
We should maybe just turn the client timeout way up. It's kind of arbitrary and I opened another issue about removing it. |
Filed #41112 |
Automatic merge from submit-queue (batch tested with PRs 41112, 41201, 41058, 40650, 40926) e2e test flakes: remove some uses of watch.Until in e2e tests `watch.Until` is somewhat broken and is causing quite a lot of test flakes. See #39879 (comment) and #31345 for more context. @wojtek-t @yujuhong @Kargakis
Do we want to do something with this soon-ish? |
It seems like we want a client aware watcher abstraction that is basically a lightweight, low cost informer. |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
/lifecycle frozen |
Automatic merge from submit-queue (batch tested with PRs 60470, 59149, 56075, 60280, 60504). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Make Scale() for RC poll-based until #31345 is fixed Fixes #56064 ,in the short-term until issue #31345 is fixed. We should eventually move RS, job, deployment, etc all to watch-based (#56071) /cc @wojtek-t - SGTY? ```release-note NONE ```
/priority backlog |
Fix is ready here: #50102 |
/remove-lifecycle frozen |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale #50102 (once figured out) should fix this issue. |
Hi @wojtek-t My big deployment with helm is failing with below error. I am starting the helm installation with helm history dev-so -o yaml
|
Automatic merge from submit-queue (batch tested with PRs 41112, 41201, 41058, 40650, 40926) e2e test flakes: remove some uses of watch.Until in e2e tests `watch.Until` is somewhat broken and is causing quite a lot of test flakes. See kubernetes/kubernetes#39879 (comment) and kubernetes/kubernetes#31345 for more context. @wojtek-t @yujuhong @Kargakis Kubernetes-commit: 558c37aee3ae62356bb16068af9973e5489aa86a
Density test is failing without GC on large clusters with the following error:
However, this "working as implemented" currently.
The problem is that in this case, underneath we are using "DeleteRCAndPods" method to delete RC, and this in turn is using "ReplicationControllerReaper", which results in calling this one:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/scale.go#L199
However, when watch is close, "wait.Until" returns "ErrWaitTimeout":
https://github.com/kubernetes/kubernetes/blob/master/pkg/watch/until.go#L63
The test is failing after exactly 5 minutes of waiting for the RC deletion, because of Timeout on http.Client:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/util.go#L1749
I think that there are two issues here:
We may have similar problems in other places of the code.
[For this one I'm going to send a PR that will stop setting timeouts for http.Client in tests.]
The text was updated successfully, but these errors were encountered: