New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kube-proxy] Fix for non-blocking updates during min-sync-period #37726
Conversation
// Lastly, the sync timer will be set to ensure we catch the last update. | ||
if proxier.throttle.TryAccept() == false { | ||
duration := time.Since(proxier.lastSync) | ||
if duration < proxier.minSyncPeriod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say you get one notification, and it starts syncing proxy rules taking 10s
in that 10s you get 5 other notifications
they all get saved somewhere, but try accept fails all 5 times
now we need to wait for IPTableSyncPeriod before the 5 notifications take effect?
can we just drop the no-op updates before applying any rate limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to wait the min-sync-period not full-sync-period.
can we just drop the no-op updates before applying any rate limits?
Potentially yes, there is a separate patch for that #36006
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timothysc The #36006 patch checks for no-op after any rate limits would be applied though, so that won't help this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't really matter when you think about it.
@knobunc iptables is a major concern, and is our current lowest ceiling for scale because it consumes such a large amount of CPU time on every node, as the rulesets grow. At larger scales it can easily consume an entire core. |
@jeremyeder thanks. @dcbw has looked at the iptables performance problems. |
The RHBZ that corresponds to this PR is: https://bugzilla.redhat.com/show_bug.cgi?id=1387149 |
@saad-ali I'd propose this as a cherry-pick for 1.5, it's a performance issue for large scale clusters. |
6a01357
to
9ae62e3
Compare
It is pretty late. We are less than 1 week from launch. Is this critical? If it does get in what is the risk to the 1.5 release? |
If we get a repro for the soft lockup, I'd like to try it with this rate limit. It's the only known issue that results in more than the required cpu for frequent (once a second) syncs. |
@saad-ali Given the change is an opt-in behavior now, I see little risk. The benefit is we will see reduced load on the nodes, with modest effects to SLOs. |
Spoke with @bprashanth offline. @timothysc Is everything in this PR 100% flag gated, and is the default disabled? If so, it is fine for 1.5. |
@k8s-bot gce etcd3 e2e test this |
1 similar comment
@k8s-bot gce etcd3 e2e test this |
@saad-ali tests are all green, but someone needs to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this if you can do the due diligence to confirm what @saad-ali asked (if the command line flag is unset or set to 0, this has no net change other than the replacement of a ticker with a timer+reset).
While this is desirable overall, I'm more comfortable switching the default rate limit flag in 1.5.X, with more testing.
@@ -788,11 +792,25 @@ func (proxier *Proxier) execConntrackTool(parameters ...string) error { | |||
// assumes proxier.mu is held | |||
func (proxier *Proxier) syncProxyRules() { | |||
if proxier.throttle != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you confirm that proxier.throttle is nil if no ratelimits are set via command line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, default atm is to do nothing proxier.throttle is nill without input option.
@@ -169,6 +169,8 @@ type Proxier struct { | |||
haveReceivedServiceUpdate bool // true once we've seen an OnServiceUpdate event | |||
haveReceivedEndpointsUpdate bool // true once we've seen an OnEndpointsUpdate event | |||
throttle flowcontrol.RateLimiter | |||
timer *time.Timer | |||
lastSync time.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the way you've structured this make it hard to reason about and unittests. We need something that can:
- use a fake clock
- takes time.Duration uniformly
- no-ops trivially if no rate limits are specified
i.e:
type syncThrottler struct {
timer, lastSync, throttle, util.clock
}
// canSync returns true we are below allowed rate limits.
// everytime it returns false it also tries to reduce the frequency
// of period sync events returned by nextSync
func (s *syncThrottle) canSync() bool{
if s.throttle == nil {
return true
}
// rest of the function
}
// setLastSync records the last sync timestamp
func (s *syncThrottle) setLastSync(time.Duration) {}
// setNextSync resets the timer to when we expect the next sync notification
func (s *syncThrottle) setNextSync(time.Duration) {}
// nextSync hangs till it's time for the next periodic sync
func (s *syncThrottle) nextSync() {}
newSyncThrottler() *syncThrottle {}
...
func Sync() {
proxier.syncProxyRules()
proxier.syncThrottle.setNextSync(time.Now() + proxier.syncPeriod)
}
...
func syncProxyRules () {
if !proxier.syncThrottle.canSync() {
return
}
defer proxier.syncThrottle.setSyncTime(time.Now())
// sync
}
but this is too close to the release to ask for a restructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could leave this out of 1.5 entirely, or we could take a minimal patch for a 1.5 cherrypick followed by this sort of cleanup AND TESTING. @timothysc what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would prefer, I could clean it up and aim for 1.5.1 vs. holding it up? I don't disagree with @bprashanth 's suggestion. This isn't really a feature addition at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k, so I looked through this several times, and I see what you're saying, but I would move it out as a cleanup item.
imho the proxy will likely be due for some overhaul in the mid-near-term due to the endpoints spamming. Even with the filter changes, this is still 1:(N) broadcast and as we add a number of other HA components that will become C*N. So refactor has to come.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think avioiding endpoint spam should be pretty easy without an extensive refactor. The recommended code structure is 100% limited to this pr, not the surrounding code, so we don't make things any harder to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my concern is that testing without a fake clock is ~impossible. Testing this could be possible if we captured all of the timer logic into a simple struct and function-pointer, sort of design. If we merge as-is, I'd like a commitment to fix it please :)
This code is not well tested enough, let's not make it worse when the fix is O(easy)
@@ -408,14 +410,16 @@ func (proxier *Proxier) Sync() { | |||
proxier.mu.Lock() | |||
defer proxier.mu.Unlock() | |||
proxier.syncProxyRules() | |||
proxier.timer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the SyncLoop the only caller of this function ? if not, are we guaranteed that timer will be initialized before all calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the SyncLoop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on this? Or maybe rename to sync
or syncOnce
for { | ||
<-t.C | ||
<-proxier.timer.C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if someone refactors the resets() embedded somewhere in the call path, this timer will hang. Coupling like that should always be encapsulated in a struct so some eager newbie doesn't shoot themselves in the foot. Please add a comment explaining where we reset it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable, I'll add a comment.
@thockin PTAL. |
@bprashanth could you walk over and poke @thockin for me ;-) |
@thockin re-ping. |
} | ||
|
||
if minSyncPeriod != 0 { | ||
syncsPerSecond := float32(time.Second) / float32(minSyncPeriod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is minSyncPeriod allowed to be greater than time.Second? eg will it work OK if syncsPerSecond is 0.5? It looks like a float32 all the way through to juju/ratelimit, but would be nice to confirm. Maybe newSyncThrottle could get some godoc to state any limitations on its arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minSyncPeriod is a duration, and it's being divided by seconds. You can do .5 all the way down. I can add suppositions on behavior above.
@smarterclayton unblocking wand request, I know @thockin is busy, but this is pretty important to us. @jeremyeder has slides and graphs showing how bad it is. |
[APPROVALNOTIFIER] Needs approval from an approver in each of these OWNERS Files: We suggest the following people: |
// it's not exported. | ||
type syncThrottle struct { | ||
rl flowcontrol.RateLimiter | ||
timer *time.Timer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments on these would be nice
func (s *syncThrottle) allowSync() bool { | ||
if s.rl != nil { | ||
if s.rl.TryAccept() == false { | ||
duration := s.timeEllapsedSinceLastSync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Elapsed" only has one "l"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
if s.rl != nil { | ||
if s.rl.TryAccept() == false { | ||
duration := s.timeEllapsedSinceLastSync() | ||
if duration < s.minSyncPeriod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if TryAccept() == false, and duration >= s.minSyncPeriod, you'll hit the true case, but it's not at all clear to me that this is correct. I am not intimately familiar with the RateLimiter.
Can you convince me, by way of comments, that this is all correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redux'd to remove my latent paranoia.
// utility wrapper to handle time based sync updates | ||
// it's specifically meant for the iptables proxy so | ||
// it's not exported. | ||
type syncThrottle struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this thing need a mutex? It's not obviously safe in the face of concurrent access, and it's not documented as being safe for some non-obvious reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
"k8s.io/kubernetes/pkg/util/flowcontrol" | ||
) | ||
|
||
// utility wrapper to handle time based sync updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't super helpful. Since this problem warrants a whole new type, I'd really like a thorough explanation here.
rl flowcontrol.RateLimiter | ||
timer *time.Timer | ||
lastSync time.Time | ||
minSyncPeriod time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"minSyncPeriod" is a weird name, and it makes reading code hard. It's not a minimum sync period, it's more like a hysteresis or rest, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same name as before just shuffled. technically correct.
f93316a
to
cc45ab9
Compare
Jenkins GKE smoke e2e failed for commit cc45ab9. Full PR test history. cc @timothysc, your PR dashboard The magic incantation to run this job again is Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
I am looking right now, I am still not 100% sure I follow the intent. I have patched it into a client, and I have an idea to simplify. Back in a little while with result. |
I stared and I stared and I stared and I finally figured out what felt wrong. You can not Reset() a Timer while a read is pending. At the very least you have to Stop() it, and the docs are unclear if that is actually enough. This makes it a little complicated. while staring at it I kept jumping back and forth, and I felt like we could encapsulate it better. https://gist.github.com/thockin/1c05beb4075025798e9d242e082e4852 I did a lot of manual testing, but did not write a test yet. |
That statement isn't true, https://golang.org/pkg/time/#Timer.Reset. Can you find a test condition in the code that fails a test case? b/c I wrote the wrapper explicitly on your request for testability, so it seems perfectly reasonable to write a test case where the wrapper fails. At this point I don't even care what it looks like anymore so long as the behavior is correct, and there is sound evidence and test behind the choice. FWIW - It is very easy to beat the heck out of this by changing the timeout intervals behind the endpoint annotation updates and setting minsync to say 2s and just letting it run with log level=4. |
I patched your change into a model so I could exercise it harder, and I
managed to make it stop syncing at all. It's possible I screwed up the
model. I'll stare at it again. Maybe we can get this in, and then if I
want to encapsulate better/differently I can follow-up with that on my own
time.
…On Wed, Jan 25, 2017 at 5:34 AM, Timothy St. Clair ***@***.*** > wrote:
You can not Reset() a Timer while a read is pending. At the very least you
have to Stop() it, and the docs are unclear if that is actually enough
That statement isn't true, https://golang.org/pkg/time/#Timer.Reset.
The docs state that you can't "reuse" without calling stop and draining
the channel. Which means using time after it has been signaled, it must be
drained and stop called. I might be missing something, but given the code
paths I don't see how this isn't true.
Can you find a test condition in the code that fails a test case? b/c I
wrote the wrapper explicitly on your request for testability, so it seems
perfectly reasonable to write a test case where the wrapper fails.
At this point I don't even care what it looks like anymore so long as the
behavior is correct, and there is sound evidence and test behind the choice.
FWIW - It is very easy to beat the heck out of this by changing the
timeout intervals behind the endpoint annotation updates and setting
minsync to say 2s and just letting it run with log level=4.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#37726 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVE-ATzF7tqn42FX8SM2htEz_OH0Iks5rV094gaJpZM4LA0uq>
.
|
From the go docs: "This should not be done concurrent to other receives from the Timer's channel." It is unclear whether "this" is Reset() or the draining procedure. |
I stop and reset the timer here: https://gist.github.com/thockin/1c05beb4075025798e9d242e082e4852#file-periodic_runner_2_timers-go-L70 After it's been pulled off the channel. |
Other than this bit, I am OK with it. |
if s.rl.TryAccept() == false { | ||
duration := s.timeElapsedSinceLastSync() | ||
glog.V(4).Infof("Attempting to synch too often. Duration: %v, min period: %v", duration, s.minSyncPeriod) | ||
s.timer.Reset(s.minSyncPeriod - duration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the Reset that concerns me - I think you at least need to Stop() first, and even that is not clear since the docs say not to do Reset() concurrent to receives.
Imagine goroutine 1 is in SyncLoop()
receiving on s.timer.C, and goroutine 2 call Sync()
which calls syncProxyRules()
which calls allowSync()
whcih calls Reset()
. From looking at the Timer core code, it looks like you can call Stop() here and then Reset, and it's possible the timer will fire in-between but that's OK. I'm more concerned with the potential for corruption or other badness if you reset while the timer is not stopped. Docs just aren't clear on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I emailed with go folks, and they said that if you call Stop() before Reset() it is "safe" with the exception that you might still take a delivery in the interim. That's fine for us, I think.
if proxier.throttle != nil { | ||
proxier.throttle.Accept() | ||
if !proxier.throttle.allowSync() { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold up. syncProxyRules() is called by OnEndpointsUpdate
and OnServiceUpdate
, both of which call something AFTER the sync. They expect that the sync actually sync'ed, and if it didn't (was deferred) this is going to do the wrong thing.
I think we need a larger change than this, I am sad to say. We need to defer the events, not just the sync..
I am sorry this review is so painful - this code is very async and overly delicate.
@timothysc PR needs rebase |
closing in favor to #40868 , I'll let @danwinship drive this through. |
I am on a short vacation, but I have a branch here with your changes and I
am refactoring all the On*Update functions to fit (adding tests to make the
work sane)
…On Feb 2, 2017 9:40 AM, "Timothy St. Clair" ***@***.***> wrote:
closing in favor to #40868
<#40868> , I'll let
@danwinship <https://github.com/danwinship> drive this through.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#37726 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVBkBYsVrL6xPXOq_UW_MBQk8vs97ks5rYfjYgaJpZM4LA0uq>
.
|
What this PR does / why we need it:
Addresses the issue that was seen in #36281
Fixes: #33693
Related: #36332
Related: #35334
/cc @thockin @bprashanth @jeremyeder
Special notes for your reviewer:
Follow on PR from #35334
Release note: