-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
watcher error after 20m #12
Comments
It is expected. Server will timeout the watch connection, see code. Though I'm not sure why it's 20 mins, it timeout after 30-60 mins if you didn't change the default configs of your server, because the default
I guess you have a loop around the select? I think you need to check if case ev, ok := <-watcher.ResultChan():
if !ok {
// break the loop
} |
You can take a look at this for loop in reflector.go as a reference on how to renew watch. |
Thanks @caesarxuchao . I'm giving that a try. I'm 99.999% certain that my problems are solved. Somehow it simply did not occur to me that the channel might be closed at that point, and I actually thought that it would panic otherwise. With regards to the 20m: I'm testing/developing with minikube which is using localkube I believe? I definitely have not changed any default values. I don't even know how to do that. |
I'm going to close this issue as it is not a bug, but expected behaviour. Additionally, just to point it out again if people are running into the same error/misbehaviour: you have to test if the channel is closed. So the solution is very simple as @caesarxuchao pointed out:
Note, that after you recreate the watch, you will receive a lot of ADDED messages for all the pods/objects that match your watch. Thanks again for the help @caesarxuchao! |
Resolves: GoogleContainerTools#531 See also: kubernetes/client-go#12 There is an issue in which the Pods watcher gets into a infinite tight loop and begins consuming 100% of the CPU. This happens after skaffold dev has been running for a while (~30 mins) and once it starts, it doesn't stop. The issue was narrowed down by @ajbouh to the event polling loop in `logs.go`, which was not checking if the `ResultChan()` is closed or not. Kubernetes actually closes the connection after a timeout (default is in the range of 30-60 mins according to the related issue linked to above). In this case, the intended solution is to start the watcher again. This refactors the polling into two nested loops. One to start (and restart) the Pods watcher itself and another to receive and process the events from the watcher. If the `ResultChan()` is closed, the entire watcher loop is restarted and log tailing continues. There is a subtle difference in error handling as a result of this change. Previously any error returned from `client.Pods("").Watch()` would be immediately returned from the `Watch()` func in `logs.go`. This is no longer possible since the watcher is initialized in the goroutine started by that func. As such, in the case the watcher cannot be initialized, we simply log the error and stop tailing logs. Open to suggestions as to be a better way to handle this error. Retrying in a tight loop seems potentially problematic in the error scenario.
Resolves: GoogleContainerTools#531 See also: kubernetes/client-go#12 There is an issue in which the Pods watcher gets into a infinite tight loop and begins consuming 100% of the CPU. This happens after skaffold dev has been running for a while (~30 mins) and once it starts, it doesn't stop. The issue was narrowed down by @ajbouh to the event polling loop in `logs.go`, which was not checking if the `ResultChan()` is closed or not. Kubernetes actually closes the connection after a timeout (default is in the range of 30-60 mins according to the related issue linked to above). In this case, the intended solution is to start the watcher again. This refactors the polling into two nested loops. One to start (and restart) the Pods watcher itself and another to receive and process the events from the watcher. If the `ResultChan()` is closed, the entire watcher loop is restarted and log tailing continues. There is a subtle difference in error handling as a result of this change. Previously any error returned from `client.Pods("").Watch()` would be immediately returned from the `Watch()` func in `logs.go`. This is no longer possible since the watcher is initialized in the goroutine started by that func. As such, in the case the watcher cannot be initialized, we simply log the error and stop tailing logs. Open to suggestions as to be a better way to handle this error. Retrying in a tight loop seems potentially problematic in the error scenario.
You could override the timeout in
|
When I create a
watch
on pods e.g. like this:I have the trouble that after exactly 20 minutes, the
ResultChan()
receives infinite messages, but they are all empty. Trying to print the returnedev
from the channel shows that it is an empty/uninitialized mapmap[Type: Object:<nil>]
.Is this expected behaviour? Am I supposed to renew the watch after some time?
The text was updated successfully, but these errors were encountered: