Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watch.Watch().stream(custom_resource,...) generator cannot recover itself after broken connection recovery with k8s api. #869

Closed
rfum opened this issue Jul 9, 2019 · 7 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@rfum
Copy link

rfum commented Jul 9, 2019

At version 8.0.0 the library throws the following error messages continuously when the api server goes offline. They are written into stdout or stderr(but not throwing exception for sure) so I cannot handle the error from the code.

2019-07-05 11:22:20,296 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /apis/foo.com/v1alpha1/namespaces/kube-public/fooobject
2019-07-05 11:22:20,296 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /apis/foo.com/v1alpha1/namespaces/kube-public/fooobject
2019-07-05 11:22:20,302 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),)': /apis/foo.com/v1alpha1/namespaces/kube-public/fooobject
2019-07-05 11:22:20,302 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),)': /apis/foo.com/v1alpha1/namespaces/kube-public/fooobject

After we recover the k8s api server, stream() python generator is stoping to watch changes on the k8s api. In order to make my code work, I'm manually restarting the python interpreter process this saves the code from trap.
Is that a known issue? If so any workaround exists?

Thanks,
Furkan.

@rfum rfum changed the title watch.Watch().stream(custom_resource,...) generator cannot recover itself after connection recovery with k8s api. watch.Watch().stream(custom_resource,...) generator cannot recover itself after broken connection recovery with k8s api. Jul 9, 2019
@roycaihw
Copy link
Member

roycaihw commented Jul 9, 2019

this is by design that watch by itself doesn't handle connectivity drops. #868 is tracking adding informer support to the python client which would address multiple use cases including this one

They are written into stdout or stderr(but not throwing exception for sure) so I cannot handle the error from the code

I think the error happens here: https://github.com/kubernetes-client/python-base/blob/95858d5ce843e98d65a327f3099cd0b6cdf119fc/watch/watch.py#L142.

@rfum
Copy link
Author

rfum commented Jul 10, 2019

Thanks for the response.
I also have a log from the underlying urllib3 library. It might also have a few contributions to the exception handling problem.

[Thread-23] [WARNING] [connectionpool.py:urlopen:665] [2019-07-09 13:35:58,389] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),)': /apis/foo.com/v1alpha1/namespaces/kube-public/foo/foo-bar 

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 8, 2019
@rfum
Copy link
Author

rfum commented Nov 1, 2019

Any updates on this?

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 1, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants