New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch channel does not get closed ever #755
Comments
What version of the API server are you running against? They were known issues related to this that were fixed in 1.15 |
We are running 1.14.8-gke.33 . |
The issue was fixed in kubernetes/kubernetes#78029 in 1.15 and the associated bugs are linked from that |
Thanks for pointing this out @LiGgit, 💯 |
I checked the release notes and from here it seems that it was released with |
I'm not sure. Starting a 1.14.8 cluster and starting a watch with timeoutSeconds set is closing the connection and exiting the watch as I would expect. |
Hm, should we then rather reopen this issue? Any other hints what it might be? |
The client-side code closes the channel as soon as the server closes the connection, so this issue likely doesn't belong in this repo. If you want to open an issue against https://github.com/kubernetes/kubernetes/issues/, you can, though it would need to be reproduced against a maintained OSS branch (currently 1.15+). More information about what you are doing/seeing would be helpful as well:
|
Resync-period is set to 12hrs, and timeout is not set explicitly, defaulted.
Definitions are re-applied by the controller but not necessarily changed, not in a regular fashion. Custom resource instances are updated regularly when the watch is open. We see below log-lines regularly for a custom-resource but it stops after sometime when the watch is hung I believe. Ref to the actual implementation. |
Ideally, the client-side should implement the timeout by itself without depending on the server side to close the channel on timeout. This has already been done recently in client-go but not released yet. We should adopt this change as soon as it is released to insulate ourselves from such issues in the future. |
the CR-specific issues were resolved in 1.15 (and picked back to 1.12.9, 1.13.7, 1.14.3), but there are more systemic issues that can occur at the client transport level if the underlying tcp connection is disrupted but not disconnected (like #374) |
Won't using client-side timeout (now that it has been made possible) be the right way to reliably refresh buggy listen connections? |
Problem Description
The watch appears to stop receiving the events for certain custom-resources. This leads to stale-cache reads, but the watch is not re-established.
What is expected?
We expect the following client log message to be seen after every timeout minutes[~10mins],
I0226 11:09:10.181906 1 reflector.go:405] sample-controller/pkg/client/informers/externalversions/factory.go:117: Watch close - *v1alpha1.Flunder total 0 items received
From basic investigation, it seems
watchHandler
could technically loop forever ifResultChan
is hung or never closed from servere-side. This could happen if watch is closed from APIServer and the client isn't aware, not sure how and why. There does not seem to be a timeout from client side. If the server closes the connection but the client is missing it then it could get hung forever? (See also: https://stackoverflow.com/questions/51399407/watch-in-k8s-golang-api-watches-and-get-events-but-after-sometime-doesnt-get-an)Any help will be really appreciated, thanks in advance.
The text was updated successfully, but these errors were encountered: