ReflectorRunnable retries at a very high frequency while trying to reconnect informer #1030

avmohan · 2020-07-02T05:17:08Z

When a shared informer is created on a cluster, ReflectorRunnable keeps retrying list-watch at a very high frequency and filling up the logs with stack traces. Any clues on how to go about fixing this? Is it ok to just suppress this logger?

Check the timestamps - it seems to be retrying too fast.

E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {}

Exception stack trace for the error lines:

io.kubernetes.client.openapi.ApiException: 
	at io.kubernetes.client.openapi.ApiClient.handleResponse(ApiClient.java:979)
	at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:895)
	at io.kubernetes.client.informer.SharedInformerFactory$1.list(SharedInformerFactory.java:141)
	at io.kubernetes.client.informer.cache.ReflectorRunnable.run(ReflectorRunnable.java:45)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

The text was updated successfully, but these errors were encountered:

yue9944882 · 2020-07-02T06:40:53Z

this is weird.. by default, the reflector retries list-watching w/ a constant 1 second backoff.. but from your pasted logs, it looks like looping w/o any backing-off.. what version are you using?

java/util/src/main/java/io/kubernetes/client/informer/cache/Controller.java

Lines 107 to 108 in 8f5aabe

    
           reflectExecutor.scheduleWithFixedDelay( 
        
               reflector::run, 0L, DEFAULT_PERIOD, TimeUnit.MILLISECONDS);

additionally customized exception handling for lister-watcher is supported by #1026. you're free to develop your own backoff policy in the handler. thanks.

avmohan · 2020-07-02T08:23:39Z

I'm on version 8.0.2

avmohan · 2020-07-03T03:49:40Z

Based on the linked code, it does seems like it should retry at 1 second interval only. And I'm on the latest version too. Weird.
The mentioned pr seems to be not released. Will use the back off once it gets out.

fejta-bot · 2020-10-01T04:11:59Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-10-31T04:54:38Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-11-30T05:39:16Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-11-30T05:39:18Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

avmohan changed the title ~~ReflectorRunnable logging ApiException at a high frequency while trying to reconnect informer~~ ReflectorRunnable retries at a very high frequency while trying to reconnect informer Jul 2, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 31, 2020

k8s-ci-robot closed this as completed Nov 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReflectorRunnable retries at a very high frequency while trying to reconnect informer #1030

ReflectorRunnable retries at a very high frequency while trying to reconnect informer #1030

avmohan commented Jul 2, 2020

yue9944882 commented Jul 2, 2020

avmohan commented Jul 2, 2020

avmohan commented Jul 3, 2020

fejta-bot commented Oct 1, 2020

fejta-bot commented Oct 31, 2020

fejta-bot commented Nov 30, 2020

k8s-ci-robot commented Nov 30, 2020

ReflectorRunnable retries at a very high frequency while trying to reconnect informer #1030

ReflectorRunnable retries at a very high frequency while trying to reconnect informer #1030

Comments

avmohan commented Jul 2, 2020

yue9944882 commented Jul 2, 2020

avmohan commented Jul 2, 2020

avmohan commented Jul 3, 2020

fejta-bot commented Oct 1, 2020

fejta-bot commented Oct 31, 2020

fejta-bot commented Nov 30, 2020

k8s-ci-robot commented Nov 30, 2020