Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReflectorRunnable retries at a very high frequency while trying to reconnect informer #1030

Closed
avmohan opened this issue Jul 2, 2020 · 7 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@avmohan
Copy link

avmohan commented Jul 2, 2020

When a shared informer is created on a cluster, ReflectorRunnable keeps retrying list-watch at a very high frequency and filling up the logs with stack traces. Any clues on how to go about fixing this? Is it ok to just suppress this logger?

Check the timestamps - it seems to be retrying too fast.

E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 
I 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Start listing and watching... 
E 2020-07-02T05:10:12.053364412Z class io.kubernetes.client.openapi.models.V1DaemonSet#Failed to list-watch: {} 

Exception stack trace for the error lines:

io.kubernetes.client.openapi.ApiException: 
	at io.kubernetes.client.openapi.ApiClient.handleResponse(ApiClient.java:979)
	at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:895)
	at io.kubernetes.client.informer.SharedInformerFactory$1.list(SharedInformerFactory.java:141)
	at io.kubernetes.client.informer.cache.ReflectorRunnable.run(ReflectorRunnable.java:45)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
@avmohan avmohan changed the title ReflectorRunnable logging ApiException at a high frequency while trying to reconnect informer ReflectorRunnable retries at a very high frequency while trying to reconnect informer Jul 2, 2020
@yue9944882
Copy link
Member

this is weird.. by default, the reflector retries list-watching w/ a constant 1 second backoff.. but from your pasted logs, it looks like looping w/o any backing-off.. what version are you using?

reflectExecutor.scheduleWithFixedDelay(
reflector::run, 0L, DEFAULT_PERIOD, TimeUnit.MILLISECONDS);

additionally customized exception handling for lister-watcher is supported by #1026. you're free to develop your own backoff policy in the handler. thanks.

@avmohan
Copy link
Author

avmohan commented Jul 2, 2020

I'm on version 8.0.2

@avmohan
Copy link
Author

avmohan commented Jul 3, 2020

Based on the linked code, it does seems like it should retry at 1 second interval only. And I'm on the latest version too. Weird.
The mentioned pr seems to be not released. Will use the back off once it gets out.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 31, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants