-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache: add error handling to informers #87329
cache: add error handling to informers #87329
Conversation
Welcome @nicks! |
Hi @nicks. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @deads2k |
13f9ad7
to
57b93a8
Compare
@@ -327,7 +342,7 @@ func (r *Reflector) ListAndWatch(stopCh <-chan struct{}) error { | |||
case err == io.ErrUnexpectedEOF: | |||
klog.V(1).Infof("%s: Watch for %v closed with unexpected EOF: %v", r.name, r.expectedTypeName, err) | |||
default: | |||
utilruntime.HandleError(fmt.Errorf("%s: Failed to watch %v: %v", r.name, r.expectedTypeName, err)) | |||
errToReturn = fmt.Errorf("Failed to watch %v: %v", r.expectedTypeName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return the error right away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wanted to preserve the current behavior where we don't return on connection refused (i.e., line 351)
@@ -318,6 +332,7 @@ func (r *Reflector) ListAndWatch(stopCh <-chan struct{}) error { | |||
|
|||
w, err := r.listerWatcher.Watch(options) | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about removing all the specific error handling here to a DefaultDropWatchHandler that is registered by default?
Because I'd expect people would like to handle EOF and ConnectionRefused error (if only for monitoring purposes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most important thing for us is to have some mechanism to bubble up to user "hey, your auth tokens expired, and we can't talk to kubernetes anymore".
I'm deeply worried that this PR is going to turn into a quagmire where we can't add any error-handling mechanism at all until we've had a long discussion on which errors bubble up and which do not.
I think that's a good discussion to have. But I don't feel like I'm well-equipped to facilitate that discussion, or to weigh competing needs (e.g., people who want the informer to retry EOF internally vs those who do not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, but I'm just suggesting to keep the existing error handling but move it to a DefaultDropWatchHandler. That way a user can customize it without changing the existing default behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm...i spent some time playing around with this, and couldn't come up with a solution that fit well. This error handling depends on unexported fields and functions of the package (r.expectedTypeName, isExpiredError), and wasn't sure how much we should really be exposing.
Did you have a particular API in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicks A DefaultDropWatchHandler could use these fields, right? Something like this:
func DefaultDropWatchHandler(err error, doneCh <-chan struct{}) {
switch {
case isExpiredError(err):
r.setIsLastSyncResourceVersionExpired(true)
klog.V(4).Infof("%s: watch of %v closed with: %v", r.name, r.expectedTypeName, err)
case err == io.EOF:
// watch closed normally
case err == io.ErrUnexpectedEOF:
klog.V(1).Infof("%s: Watch for %v closed with unexpected EOF: %v", r.name, r.expectedTypeName, err)
default:
utilruntime.HandleError(fmt.Errorf("%s: Failed to watch %v: %v", r.name, r.expectedTypeName, err))
}
if !utilnet.IsConnectionRefused(err) {
doneCh <- <- struct{}{}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i pushed a new branch to demonstrate what that would look like, what do you think?
/assign @jpbetz |
57b93a8
to
16b822d
Compare
16b822d
to
d302563
Compare
d302563
to
2c6a01a
Compare
|
||
type DropWatchHandler func(err error, doneCh <-chan struct{}) | ||
|
||
func createDefaultDropWatchHandler(r *Reflector) DropWatchHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd change the DropWatchHandler
type to func(r *Reflector, err error, doneCh <- chan struct{})
and then define func DefaultDropWatchHandler(...)
. Seems more idiomatic to me and having access to Reflector in custom handlers seems useful anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lavalamp, nicks The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
When creating an informer, this adds a way to add custom error handling, so that Kubernetes tooling can properly surface the errors to the end user. Fixes kubernetes/client-go#155
8890889
to
435b40a
Compare
/retest |
@nicks: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
|
/test pull-kubernetes-verify @lavalamp I 100% believe you that verify is not flaky, but is there a guide anywhere on how to read its output? The logs both seem to be saying "all tests passed" and the job exited with failure. |
/retest oh, nm, it passed!! |
I think it actually is flaky right now. Oops! The one time I don't check...
…On Mon, Mar 16, 2020 at 3:38 PM Nick Santos ***@***.***> wrote:
/retest
oh, nm, it passed!!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#87329 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAE6BFTG3GNQDTMD4MMR2D3RH2S7FANCNFSM4KIL67ZQ>
.
|
Is there anything left to do with this PR? Tide says "Must be in milestone v1.18", but I'm not sure if that's valid anymore now that v1.18.0 is out. |
/lgtm |
/retest |
1 similar comment
/retest |
/retest Review the full test history for this PR. Silence the bot with an |
2 similar comments
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
@nicks not trying to necro the PR but considering almost all but two members of cache.Reflector are unexported: |
The WaitForCacheSync waits forever and never returns in the case a persisten error occurs. On the other hand, it looks like there is no way in the current version of surfacing informer problems to the caller, as stated in this issue: kubernetes/client-go#155. Error handling for informers has been added recently in kubernetes/kubernetes#87329 which is only available in master for the moment.
When creating an informer, this adds a way to add custom error handling
or backoff logic, so that Kubernetes tooling can properly surface
the errors to the terminal.
Fixes kubernetes/client-go#155
/kind feature