New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop Watching when there is encoding error #84693
Conversation
/assign @liggitt |
/test pull-kubernetes-kubemark-e2e-gce-big |
/lgtm Tests seem broken for unrelated reason. Is there a fix if you rebase? /retest |
/priority important-soon |
Good call @liggitt, I'm not sure. Maybe. We should understand the contract better; I can't dig in at the moment. It'd be best if we could defer the call and be certain it'll be called.... |
@liggitt
|
ah, you're right. this seems like it should do the same in this method (but can be a follow up if we want to keep this minimal for back-porting) |
@@ -285,10 +285,12 @@ func (s *WatchServer) HandleWS(ws *websocket.Conn) { | |||
buf := &bytes.Buffer{} | |||
streamBuf := &bytes.Buffer{} | |||
ch := s.Watching.ResultChan() | |||
|
|||
defer s.Watching.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to double-stop this thing? (e.g. for line 298 it is likely already stopped)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, that's exactly the path exercised by line 204/226 above in the normal case of the watch channel closing
Nifty. Let's get the merge train started and we can verify in parallel. |
/lgtm |
picks opened to 1.14, 1.15, 1.16 |
…3-upstream-release-1.16 Automated cherry pick of #84693: Stop Watching when there is encoding error
…3-upstream-release-1.15 Automated cherry pick of #84693: Stop Watching when there is encoding error
…3-upstream-release-1.14 Automated cherry pick of #84693: Stop Watching when there is encoding error
Could you provide some more detail logs about this bug, as I found even the code has some defect, the Because the kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/get.go Lines 252 to 262 in 81af5ba
kubernetes/staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go Lines 1360 to 1375 in c97baa3
|
@answer1991 Thanks for the investigation; when was the watcher made context aware? I think some versions still in support don't have that change? And this is definitely easy to cherry-pick. I think we need to have metrics, one for "running watches" and one for "in-flight requests", so we can compare and definitively see that watches are not being leaked. @tedyu still looking for a test :) |
@lavalamp |
From v1.10, |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
In WatchServer#HandleWS, if s.EmbeddedEncoder.Encode() fails, s.Watching should be stopped.
This would make encoding error handling consistent with that of s.Encoder.Encode().
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: