Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ExponentialFailureRateLimiter slightly slower and cap the backof… #32082

Merged
merged 1 commit into from Sep 6, 2016

Conversation

gmarek
Copy link
Contributor

@gmarek gmarek commented Sep 5, 2016

Fix #27503

cc @deads2k @derekwaynecarr @ncdc @wojtek-t

For the context of this change see: #27503 (comment)


This change is Reviewable

@gmarek gmarek added the release-note-none Denotes a PR that doesn't merit a release note. label Sep 5, 2016
@gmarek gmarek added this to the v1.4 milestone Sep 5, 2016
@wojtek-t
Copy link
Member

wojtek-t commented Sep 5, 2016

LGTM

@k8s-github-robot k8s-github-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Sep 5, 2016
@wojtek-t
Copy link
Member

wojtek-t commented Sep 5, 2016

@gmarek - please fix errors (test failures and gofmt)

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 5, 2016
@wojtek-t
Copy link
Member

wojtek-t commented Sep 6, 2016

@k8s-bot gke test this, issue: #IGNORE (Jenkins issue)

@deads2k
Copy link
Contributor

deads2k commented Sep 6, 2016

Power of two sounds good, but I still think that something that fails enough should end up with a multi-minute backoff. How about 10 minutes max, with powers of two, that would be 16-ish, right?

@wojtek-t
Copy link
Member

wojtek-t commented Sep 6, 2016

@deads2k - did you read this: #27503 (comment)
If you think that 10 minutes max is something that we really need, it seems that we need namespace deletiong timeout higher than 5 minutes by default.

@deads2k
Copy link
Contributor

deads2k commented Sep 6, 2016

@deads2k - did you read this: #27503 (comment)
If you think that 10 minutes max is something that we really need, it seems that we need namespace deletiong timeout higher than 5 minutes by default.

With powers of two, that's a lot of failures. At 100 seconds max, you'll be close to the limit before you reattempt anyway, right?

@wojtek-t
Copy link
Member

wojtek-t commented Sep 6, 2016

@gmarek - I talked with @deads2k offline, and we basically agreed that as a first step we would like to just change powers of 10 to powers of 2 and leave the higher max.
After some discussion we both think that it should be enough - if not, we can do some other changes.
Can you please update the PR according to that?

@derekwaynecarr derekwaynecarr added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 6, 2016
@wojtek-t
Copy link
Member

wojtek-t commented Sep 6, 2016

LGTM

@wojtek-t
Copy link
Member

wojtek-t commented Sep 6, 2016

If this won't fix the problem we should probably increase the default timeout for removing namespace (which is currently 5 minutes). But this change should hopefully be enough.

@@ -38,7 +38,7 @@ type RateLimiter interface {
// both overall and per-item rate limitting. The overall is a token bucket and the per-item is exponential
func DefaultControllerRateLimiter() RateLimiter {
return NewMaxOfRateLimiter(
DefaultItemBasedRateLimiter(),
NewItemExponentialFailureRateLimiter(5*time.Millisecond, 1000*time.Second),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000 seconds?!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't change - it was 1000s also before this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that @deads2k has strong opinion that this value should be high.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that @deads2k has strong opinion that this value should be high.

@smarterclayton does too, since we want to have controllers retry on external conditions that are eventually fixed, but not loop quickly on them. An image import that fails 20 times for instance. Maybe it comes back up. May as well try infrequently, especially since we're talking about removing resync-ing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I wanted to make it a 100, but there was a strong opposition:)

@gmarek
Copy link
Contributor Author

gmarek commented Sep 6, 2016

@k8s-bot test this issue: #27462, #26822

@k8s-bot
Copy link

k8s-bot commented Sep 6, 2016

GCE e2e build/test passed for commit 0b8aeaf.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit c9fde2b into kubernetes:master Sep 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants