watcher error after 20m #12

mheese · 2016-10-12T18:02:38Z

When I create a watch on pods e.g. like this:

import (
...
  "k8s.io/client-go/1.4/kubernetes"
  "k8s.io/client-go/1.4/pkg/api"
  apiv1 "k8s.io/client-go/1.4/pkg/api/v1"
  "k8s.io/client-go/1.4/pkg/labels"
  "k8s.io/client-go/1.4/pkg/selection"
  "k8s.io/client-go/1.4/pkg/util/sets"
  "k8s.io/client-go/1.4/pkg/watch"
  ...
)
...
podsClient := clientset.Core().Pods("default")
watcher, err := podsClient.Watch(api.ListOptions{...})
...
select {
...
  case ev := <-watcher.ResultChan():
  ...
}

I have the trouble that after exactly 20 minutes, the ResultChan() receives infinite messages, but they are all empty. Trying to print the returned ev from the channel shows that it is an empty/uninitialized map map[Type: Object:<nil>].

Is this expected behaviour? Am I supposed to renew the watch after some time?

The text was updated successfully, but these errors were encountered:

caesarxuchao · 2016-10-12T21:42:52Z

It is expected. Server will timeout the watch connection, see code. Though I'm not sure why it's 20 mins, it timeout after 30-60 mins if you didn't change the default configs of your server, because the default MinRequestTimeout is 30 mins.

Trying to print the returned ev from the channel shows that it is an empty/uninitialized map map[Type: Object:]

I guess you have a loop around the select? I think you need to check if watcher.ResultChan() is closed,

 case ev, ok := <-watcher.ResultChan():
    if !ok {
       // break the loop
    }

caesarxuchao · 2016-10-12T21:46:09Z

You can take a look at this for loop in reflector.go as a reference on how to renew watch.

mheese · 2016-10-13T01:01:58Z

Thanks @caesarxuchao . I'm giving that a try. I'm 99.999% certain that my problems are solved. Somehow it simply did not occur to me that the channel might be closed at that point, and I actually thought that it would panic otherwise.

With regards to the 20m: I'm testing/developing with minikube which is using localkube I believe? I definitely have not changed any default values. I don't even know how to do that.

mheese · 2016-10-13T09:24:18Z

I'm going to close this issue as it is not a bug, but expected behaviour.

Additionally, just to point it out again if people are running into the same error/misbehaviour: you have to test if the channel is closed.

So the solution is very simple as @caesarxuchao pointed out:

case ev, ok := <-watcher.ResultChan():
    if !ok {
       // break the loop
    }

Note, that after you recreate the watch, you will receive a lot of ADDED messages for all the pods/objects that match your watch.

Thanks again for the help @caesarxuchao!

@ajbouh

Resolves: GoogleContainerTools#531 See also: kubernetes/client-go#12 There is an issue in which the Pods watcher gets into a infinite tight loop and begins consuming 100% of the CPU. This happens after skaffold dev has been running for a while (~30 mins) and once it starts, it doesn't stop. The issue was narrowed down by @ajbouh to the event polling loop in `logs.go`, which was not checking if the `ResultChan()` is closed or not. Kubernetes actually closes the connection after a timeout (default is in the range of 30-60 mins according to the related issue linked to above). In this case, the intended solution is to start the watcher again. This refactors the polling into two nested loops. One to start (and restart) the Pods watcher itself and another to receive and process the events from the watcher. If the `ResultChan()` is closed, the entire watcher loop is restarted and log tailing continues. There is a subtle difference in error handling as a result of this change. Previously any error returned from `client.Pods("").Watch()` would be immediately returned from the `Watch()` func in `logs.go`. This is no longer possible since the watcher is initialized in the goroutine started by that func. As such, in the case the watcher cannot be initialized, we simply log the error and stop tailing logs. Open to suggestions as to be a better way to handle this error. Retrying in a tight loop seems potentially problematic in the error scenario.

@ajbouh

Resolves: GoogleContainerTools#531 See also: kubernetes/client-go#12 There is an issue in which the Pods watcher gets into a infinite tight loop and begins consuming 100% of the CPU. This happens after skaffold dev has been running for a while (~30 mins) and once it starts, it doesn't stop. The issue was narrowed down by @ajbouh to the event polling loop in `logs.go`, which was not checking if the `ResultChan()` is closed or not. Kubernetes actually closes the connection after a timeout (default is in the range of 30-60 mins according to the related issue linked to above). In this case, the intended solution is to start the watcher again. This refactors the polling into two nested loops. One to start (and restart) the Pods watcher itself and another to receive and process the events from the watcher. If the `ResultChan()` is closed, the entire watcher loop is restarted and log tailing continues. There is a subtle difference in error handling as a result of this change. Previously any error returned from `client.Pods("").Watch()` would be immediately returned from the `Watch()` func in `logs.go`. This is no longer possible since the watcher is initialized in the goroutine started by that func. As such, in the case the watcher cannot be initialized, we simply log the error and stop tailing logs. Open to suggestions as to be a better way to handle this error. Retrying in a tight loop seems potentially problematic in the error scenario.

scaszoo · 2019-10-09T15:25:44Z

You could override the timeout in ListOptions

timeout := int64(60 * 60 * 24) // 24 hours
watcher, err := client.CoreV1().Pods("default").Watch(metav1.ListOptions{ TimeoutSeconds: &timeout, })

mheese closed this as completed Oct 13, 2016

jbw976 mentioned this issue Nov 29, 2017

Node watch returns events with nil objects #334

Closed

d11wtq mentioned this issue Jun 19, 2018

High CPU usage on Mac OS X GoogleContainerTools/skaffold#531

Closed

d11wtq mentioned this issue Jun 19, 2018

Fix issue with 100% CPU usage in logs.go. GoogleContainerTools/skaffold#704

Merged

mimowo mentioned this issue Mar 27, 2024

[multikueue] Decrease the number logged errors kubernetes-sigs/kueue#1823

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

watcher error after 20m #12

watcher error after 20m #12

mheese commented Oct 12, 2016

caesarxuchao commented Oct 12, 2016

caesarxuchao commented Oct 12, 2016

mheese commented Oct 13, 2016

mheese commented Oct 13, 2016

scaszoo commented Oct 9, 2019 •

edited

watcher error after 20m #12

watcher error after 20m #12

Comments

mheese commented Oct 12, 2016

caesarxuchao commented Oct 12, 2016

caesarxuchao commented Oct 12, 2016

mheese commented Oct 13, 2016

mheese commented Oct 13, 2016

scaszoo commented Oct 9, 2019 • edited

scaszoo commented Oct 9, 2019 •

edited