-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ExponentialFailureRateLimiter slightly slower and cap the backof… #32082
Conversation
LGTM |
@gmarek - please fix errors (test failures and gofmt) |
@k8s-bot gke test this, issue: #IGNORE (Jenkins issue) |
Power of two sounds good, but I still think that something that fails enough should end up with a multi-minute backoff. How about 10 minutes max, with powers of two, that would be 16-ish, right? |
@deads2k - did you read this: #27503 (comment) |
With powers of two, that's a lot of failures. At 100 seconds max, you'll be close to the limit before you reattempt anyway, right? |
@gmarek - I talked with @deads2k offline, and we basically agreed that as a first step we would like to just change powers of 10 to powers of 2 and leave the higher max. |
LGTM |
If this won't fix the problem we should probably increase the default timeout for removing namespace (which is currently 5 minutes). But this change should hopefully be enough. |
@@ -38,7 +38,7 @@ type RateLimiter interface { | |||
// both overall and per-item rate limitting. The overall is a token bucket and the per-item is exponential | |||
func DefaultControllerRateLimiter() RateLimiter { | |||
return NewMaxOfRateLimiter( | |||
DefaultItemBasedRateLimiter(), | |||
NewItemExponentialFailureRateLimiter(5*time.Millisecond, 1000*time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1000 seconds?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This didn't change - it was 1000s also before this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that @deads2k has strong opinion that this value should be high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that @deads2k has strong opinion that this value should be high.
@smarterclayton does too, since we want to have controllers retry on external conditions that are eventually fixed, but not loop quickly on them. An image import that fails 20 times for instance. Maybe it comes back up. May as well try infrequently, especially since we're talking about removing resync-ing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I wanted to make it a 100, but there was a strong opposition:)
GCE e2e build/test passed for commit 0b8aeaf. |
Automatic merge from submit-queue |
Fix #27503
cc @deads2k @derekwaynecarr @ncdc @wojtek-t
For the context of this change see: #27503 (comment)
This change is