-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch WaitForCertificate to informers to avoid broken watches #73030
Conversation
@tnozicka: GitHub didn't allow me to request PR reviews from the following users: tedyu. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1c8a8dd
to
7b7406c
Compare
@@ -417,30 +416,6 @@ func TestRotateCertCreateCSRError(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestRotateCertWaitingForResultError(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a different watch error we could/should trigger to exercise this case, or is timeout the only error that can occur now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think informers just go on in parallel no matter what happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't setting the timeout short let us exercise this case?
I'll have to fix the simple fake added 2 months ago here https://github.com/kubernetes/kubernetes/blame/fde87329cbfbd08c6cdf3b6b8dd354ee8e10a858/cmd/kubelet/app/server_bootstrap_test.go#L263 as the strict path matching won't work otherwise |
requires #73080 first |
7b7406c
to
557b82c
Compare
557b82c
to
693eb08
Compare
rebased to pick #73080 |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - modulo some minor comment.
@@ -254,7 +255,11 @@ func (s *csrSimulator) ServeHTTP(w http.ResponseWriter, req *http.Request) { | |||
defer s.lock.Unlock() | |||
t := s.t | |||
|
|||
t.Logf("Request %s %s %s", req.Method, req.URL, req.UserAgent()) | |||
// filter out timeouts as csrSimulator don't support them | |||
req.URL.RawQuery = regexp.MustCompile("&timeout=[0-9smh]*").ReplaceAllLiteralString(req.URL.RawQuery, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The * at the end means that we can probably filter out some other parameters.
Maybe that's fine, but if so this requires updating the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the *
applies only to the set it is following so it shouldn't filter out other params - https://play.golang.org/p/eOMx58K6U2x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest
q := req.URL.Query()
q.Del("timeout")
q.Del("timeoutSeconds")
req.URL.RawQuery = q.Encode()
693eb08
to
6924610
Compare
}); err != nil { | ||
// Wait for the certificate to be signed. This interface and internal timout | ||
// is a remainder after the old design using raw watch wrapped with backoff. | ||
timeout := 15 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope that seems like a reasonable timeout - there was 1m + the backoff
this whole func is wrapped in backoff and poolInfinite in other place where it accounts for failures. It would be nice to plumb it through but that's not in the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously, the backoff was a private package var we manipulated in tests... can we do the same with the timeout, to restore TestRotateCertWaitingForResultError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(previously, the backoff took 4 * 1m + ~8 minutes + watch durations, so 15 minutes approximates current delay and seems ok)
/retest |
@@ -417,30 +416,6 @@ func TestRotateCertCreateCSRError(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestRotateCertWaitingForResultError(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't setting the timeout short let us exercise this case?
}); err != nil { | ||
// Wait for the certificate to be signed. This interface and internal timout | ||
// is a remainder after the old design using raw watch wrapped with backoff. | ||
timeout := 15 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously, the backoff was a private package var we manipulated in tests... can we do the same with the timeout, to restore TestRotateCertWaitingForResultError?
}); err != nil { | ||
// Wait for the certificate to be signed. This interface and internal timout | ||
// is a remainder after the old design using raw watch wrapped with backoff. | ||
timeout := 15 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(previously, the backoff took 4 * 1m + ~8 minutes + watch durations, so 15 minutes approximates current delay and seems ok)
6924610
to
b51cb90
Compare
@liggitt thanks, comments addressed |
b51cb90
to
29ba8b2
Compare
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, tnozicka The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
/retest |
@tnozicka: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
ListWatchUntil can't handle closed watches and is being replaced by UntilWithSync based on informers.
Special notes for your reviewer:
Split from #50102
Replaces #73027 since this was already done in #50102 and we don't need to write it all over again.
Requires:
Release note:
/cc @wojtek-t @liggitt @tedyu