-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Bug 1701041: Adding timeout on dependent watch establishment #1638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@shawn-hurley: This pull request references a valid Bugzilla bug. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@djzager can you take a look as well |
djzager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I wish I could wrap my head around why the watch would hang.
* no longer hang ansible execution because we are unable to create a watch * exposing an error message with hint on potential cause of error
|
@shawn-hurley: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@shawn-hurley can you rebase this or do you need someone else to pick it up? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shawn-hurley,
Could you please rebase it with the master?
Also, could you please add the CHANGELOG or we are able to move forward with?
The CI shows the error: (which shows related to)
pkg/ansible/proxy/cache_response.go:231:20: undefined: cacheEscacheEstablishmentTimeout
pkg/ansible/proxy/cache_response.go:255:20: undefined: cacheEstacacheEstablishmentTimeout
| ) | ||
|
|
||
| // This is the default timeout to wait for the cache to respond | ||
| // TODO: Eventually this should be configurable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // TODO: Eventually this should be configurable | |
| // todo(shawn-hurley): Eventually this should be configurable |
| errChan := make(chan error, 1) | ||
| go func() { | ||
| err := c.informerCache.Get(context.Background(), obj, un) | ||
| errChan <- err | ||
| }() | ||
|
|
||
| select { | ||
| case watchErr := <-errChan: | ||
| if watchErr != nil { | ||
| // break here in case resource doesn't exist in cache but exists on APIserver | ||
| // This is very unlikely but provides user with expected 404 | ||
| log.Info(fmt.Sprintf("cache miss: %v err-%v", k, watchErr)) | ||
| return nil, watchErr | ||
| } | ||
| case <-time.After(cacheEstacacheEstablishmentTimeout): | ||
| return nil, fmt.Errorf("timeout establishing watch, commonly permissions of the controller are not sufficent") | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using a channel, a goroutine, and a timer, a more idiomatic approach would be to use context.WithTimeout(context.Background(), cacheEstablishmentTimeout)
And then check the error from the Get() call to see if it is a context.DeadlineExceeded error.
Ditto for the informerCache.List() call above.
| func addWatch(c controller.Controller, s source.Source, eh handler.EventHandler, predicates ...predicate.Predicate) error { | ||
| errChan := make(chan error, 1) | ||
| go func() { | ||
| err := c.Watch(s, eh, predicates...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should work upstream to get a controller.WatchWithContext() function so that we don't need to resort to this.
The problem here is that it is not possible to cancel the c.Watch() call, so if we reach the deadline, the c.Watch call will just stay running in the background forever. This is effectively a resource leak.
|
Closing this one in favour of #2264 |
**Description** Add timeout in the watch feature for Ansible based-operators proxy to avoid appears that the reconcile is stuck and hang when the operator has not the correct permissions to List and Watch the resources. **Motivation for the change:** - #1638 - https://bugzilla.redhat.com/show_bug.cgi?id=1701041 **Note** Also, solved by kubernetes-sigs/controller-runtime#663.
watch
Description of the change:
Motivation for the change: