[kube-proxy] Fix for non-blocking updates during min-sync-period #37726

timothysc · 2016-11-30T23:37:35Z

What this PR does / why we need it:
Addresses the issue that was seen in #36281

Fixes: #33693
Related: #36332
Related: #35334

Special notes for your reviewer:
Follow on PR from #35334

Release note:

NONE

k8s-oncall · 2016-11-30T23:37:44Z

This change is

bprashanth · 2016-12-01T00:52:15Z

pkg/proxy/iptables/proxier.go

+		// Lastly, the sync timer will be set to ensure we catch the last update.
+		if proxier.throttle.TryAccept() == false {
+			duration := time.Since(proxier.lastSync)
+			if duration < proxier.minSyncPeriod {


say you get one notification, and it starts syncing proxy rules taking 10s
in that 10s you get 5 other notifications
they all get saved somewhere, but try accept fails all 5 times

now we need to wait for IPTableSyncPeriod before the 5 notifications take effect?

can we just drop the no-op updates before applying any rate limits?

You have to wait the min-sync-period not full-sync-period.

can we just drop the no-op updates before applying any rate limits?

Potentially yes, there is a separate patch for that #36006

@timothysc The #36006 patch checks for no-op after any rate limits would be applied though, so that won't help this issue.

doesn't really matter when you think about it.

jeremyeder · 2016-12-01T15:08:36Z

@knobunc iptables is a major concern, and is our current lowest ceiling for scale because it consumes such a large amount of CPU time on every node, as the rulesets grow. At larger scales it can easily consume an entire core.

knobunc · 2016-12-01T15:29:46Z

@jeremyeder thanks. @dcbw has looked at the iptables performance problems.

jeremyeder · 2016-12-01T15:39:24Z

The RHBZ that corresponds to this PR is: https://bugzilla.redhat.com/show_bug.cgi?id=1387149

timothysc · 2016-12-01T18:29:37Z

@saad-ali I'd propose this as a cherry-pick for 1.5, it's a performance issue for large scale clusters.

saad-ali · 2016-12-02T00:02:38Z

@saad-ali I'd propose this as a cherry-pick for 1.5, it's a performance issue for large scale clusters.

It is pretty late. We are less than 1 week from launch. Is this critical? If it does get in what is the risk to the 1.5 release?

CC @bprashanth @thockin

bprashanth · 2016-12-02T00:58:32Z

If we get a repro for the soft lockup, I'd like to try it with this rate limit. It's the only known issue that results in more than the required cpu for frequent (once a second) syncs.

timothysc · 2016-12-02T01:19:06Z

@saad-ali Given the change is an opt-in behavior now, I see little risk.

The benefit is we will see reduced load on the nodes, with modest effects to SLOs.

saad-ali · 2016-12-02T01:58:49Z

Spoke with @bprashanth offline. @timothysc Is everything in this PR 100% flag gated, and is the default disabled? If so, it is fine for 1.5.

dims · 2016-12-02T14:01:58Z

@k8s-bot gce etcd3 e2e test this

timothysc · 2016-12-02T14:04:58Z

@k8s-bot gce etcd3 e2e test this

timothysc · 2016-12-02T16:54:52Z

@saad-ali tests are all green, but someone needs to review.

bprashanth

I'm fine with this if you can do the due diligence to confirm what @saad-ali asked (if the command line flag is unset or set to 0, this has no net change other than the replacement of a ticker with a timer+reset).

While this is desirable overall, I'm more comfortable switching the default rate limit flag in 1.5.X, with more testing.

bprashanth · 2016-12-02T03:05:47Z

pkg/proxy/iptables/proxier.go

@@ -788,11 +792,25 @@ func (proxier *Proxier) execConntrackTool(parameters ...string) error {
 // assumes proxier.mu is held
 func (proxier *Proxier) syncProxyRules() {
 	if proxier.throttle != nil {


can you confirm that proxier.throttle is nil if no ratelimits are set via command line?

Yes, default atm is to do nothing proxier.throttle is nill without input option.

bprashanth · 2016-12-02T03:05:50Z

pkg/proxy/iptables/proxier.go

@@ -169,6 +169,8 @@ type Proxier struct {
 	haveReceivedServiceUpdate   bool // true once we've seen an OnServiceUpdate event
 	haveReceivedEndpointsUpdate bool // true once we've seen an OnEndpointsUpdate event
 	throttle                    flowcontrol.RateLimiter
+	timer                       *time.Timer
+	lastSync                    time.Time


I feel like the way you've structured this make it hard to reason about and unittests. We need something that can:

use a fake clock

takes time.Duration uniformly

no-ops trivially if no rate limits are specified

i.e:

type syncThrottler struct { timer, lastSync, throttle, util.clock } // canSync returns true we are below allowed rate limits. // everytime it returns false it also tries to reduce the frequency // of period sync events returned by nextSync func (s *syncThrottle) canSync() bool{ if s.throttle == nil { return true } // rest of the function } // setLastSync records the last sync timestamp func (s *syncThrottle) setLastSync(time.Duration) {} // setNextSync resets the timer to when we expect the next sync notification func (s *syncThrottle) setNextSync(time.Duration) {} // nextSync hangs till it's time for the next periodic sync func (s *syncThrottle) nextSync() {} newSyncThrottler() *syncThrottle {} ... func Sync() { proxier.syncProxyRules() proxier.syncThrottle.setNextSync(time.Now() + proxier.syncPeriod) } ... func syncProxyRules () { if !proxier.syncThrottle.canSync() { return } defer proxier.syncThrottle.setSyncTime(time.Now()) // sync }

but this is too close to the release to ask for a restructure.

We could leave this out of 1.5 entirely, or we could take a minimal patch for a 1.5 cherrypick followed by this sort of cleanup AND TESTING. @timothysc what do you think?

If we would prefer, I could clean it up and aim for 1.5.1 vs. holding it up? I don't disagree with @bprashanth 's suggestion. This isn't really a feature addition at this point.

k, so I looked through this several times, and I see what you're saying, but I would move it out as a cleanup item.

imho the proxy will likely be due for some overhaul in the mid-near-term due to the endpoints spamming. Even with the filter changes, this is still 1:(N) broadcast and as we add a number of other HA components that will become C*N. So refactor has to come.

I think avioiding endpoint spam should be pretty easy without an extensive refactor. The recommended code structure is 100% limited to this pr, not the surrounding code, so we don't make things any harder to debug.

my concern is that testing without a fake clock is ~impossible. Testing this could be possible if we captured all of the timer logic into a simple struct and function-pointer, sort of design. If we merge as-is, I'd like a commitment to fix it please :)

This code is not well tested enough, let's not make it worse when the fix is O(easy)

bprashanth · 2016-12-02T03:07:17Z

pkg/proxy/iptables/proxier.go

@@ -408,14 +410,16 @@ func (proxier *Proxier) Sync() {
 	proxier.mu.Lock()
 	defer proxier.mu.Unlock()
 	proxier.syncProxyRules()
+	proxier.timer.Stop()


Is the SyncLoop the only caller of this function ? if not, are we guaranteed that timer will be initialized before all calls?

Only the SyncLoop

comment on this? Or maybe rename to sync or syncOnce

bprashanth · 2016-12-02T18:00:58Z

pkg/proxy/iptables/proxier.go

 	for {
-		<-t.C
+		<-proxier.timer.C


if someone refactors the resets() embedded somewhere in the call path, this timer will hang. Coupling like that should always be encapsulated in a struct so some eager newbie doesn't shoot themselves in the foot. Please add a comment explaining where we reset it.

Sounds reasonable, I'll add a comment.

timothysc · 2017-01-09T17:43:23Z

@thockin PTAL.

timothysc · 2017-01-12T15:10:35Z

@bprashanth could you walk over and poke @thockin for me ;-)

timothysc · 2017-01-16T16:54:47Z

@thockin re-ping.

dcbw · 2017-01-18T15:56:20Z

pkg/proxy/iptables/syncthrottle.go

+	}
+
+	if minSyncPeriod != 0 {
+		syncsPerSecond := float32(time.Second) / float32(minSyncPeriod)


Is minSyncPeriod allowed to be greater than time.Second? eg will it work OK if syncsPerSecond is 0.5? It looks like a float32 all the way through to juju/ratelimit, but would be nice to confirm. Maybe newSyncThrottle could get some godoc to state any limitations on its arguments?

minSyncPeriod is a duration, and it's being divided by seconds. You can do .5 all the way down. I can add suppositions on behavior above.

timothysc · 2017-01-19T22:06:01Z

@smarterclayton unblocking wand request, I know @thockin is busy, but this is pretty important to us. @jeremyeder has slides and graphs showing how bad it is.

k8s-github-robot · 2017-01-24T00:15:44Z

[APPROVALNOTIFIER] Needs approval from an approver in each of these OWNERS Files:

pkg/proxy/iptables/OWNERS

We suggest the following people:
cc @thockin
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

thockin · 2017-01-25T01:25:54Z

pkg/proxy/iptables/syncthrottle.go

+// it's not exported.
+type syncThrottle struct {
+	rl            flowcontrol.RateLimiter
+	timer         *time.Timer


comments on these would be nice

thockin · 2017-01-25T01:36:15Z

pkg/proxy/iptables/syncthrottle.go

+func (s *syncThrottle) allowSync() bool {
+	if s.rl != nil {
+		if s.rl.TryAccept() == false {
+			duration := s.timeEllapsedSinceLastSync()


"Elapsed" only has one "l"

thockin · 2017-01-25T01:44:27Z

pkg/proxy/iptables/syncthrottle.go

+	if s.rl != nil {
+		if s.rl.TryAccept() == false {
+			duration := s.timeEllapsedSinceLastSync()
+			if duration < s.minSyncPeriod {


if TryAccept() == false, and duration >= s.minSyncPeriod, you'll hit the true case, but it's not at all clear to me that this is correct. I am not intimately familiar with the RateLimiter.

Can you convince me, by way of comments, that this is all correct?

Redux'd to remove my latent paranoia.

thockin · 2017-01-25T01:46:53Z

pkg/proxy/iptables/syncthrottle.go

+// utility wrapper to handle time based sync updates
+// it's specifically meant for the iptables proxy so
+// it's not exported.
+type syncThrottle struct {


does this thing need a mutex? It's not obviously safe in the face of concurrent access, and it's not documented as being safe for some non-obvious reason

thockin · 2017-01-25T01:48:54Z

pkg/proxy/iptables/syncthrottle.go

+	"k8s.io/kubernetes/pkg/util/flowcontrol"
+)
+
+// utility wrapper to handle time based sync updates


This comment isn't super helpful. Since this problem warrants a whole new type, I'd really like a thorough explanation here.

thockin · 2017-01-25T01:50:44Z

pkg/proxy/iptables/syncthrottle.go

+	rl            flowcontrol.RateLimiter
+	timer         *time.Timer
+	lastSync      time.Time
+	minSyncPeriod time.Duration


"minSyncPeriod" is a weird name, and it makes reading code hard. It's not a minimum sync period, it's more like a hysteresis or rest, I think?

same name as before just shuffled. technically correct.

k8s-ci-robot · 2017-01-25T04:55:13Z

Jenkins GKE smoke e2e failed for commit cc45ab9. Full PR test history.

cc @timothysc, your PR dashboard

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

thockin · 2017-01-25T05:00:00Z

I am looking right now, I am still not 100% sure I follow the intent. I have patched it into a client, and I have an idea to simplify. Back in a little while with result.

thockin · 2017-01-25T08:24:45Z

I stared and I stared and I stared and I finally figured out what felt wrong.

You can not Reset() a Timer while a read is pending. At the very least you have to Stop() it, and the docs are unclear if that is actually enough. This makes it a little complicated. while staring at it I kept jumping back and forth, and I felt like we could encapsulate it better.

https://gist.github.com/thockin/1c05beb4075025798e9d242e082e4852

I did a lot of manual testing, but did not write a test yet.

timothysc · 2017-01-25T13:34:30Z

You can not Reset() a Timer while a read is pending. At the very least you have to Stop() it, and the docs are unclear if that is actually enough

That statement isn't true, https://golang.org/pkg/time/#Timer.Reset.
The docs state that you can't "reuse" without calling stop and draining the channel. Which means using time after it has been signaled, it must be drained and stop called. I might be missing something, but given the code paths I don't see how this isn't true.

Can you find a test condition in the code that fails a test case? b/c I wrote the wrapper explicitly on your request for testability, so it seems perfectly reasonable to write a test case where the wrapper fails.

At this point I don't even care what it looks like anymore so long as the behavior is correct, and there is sound evidence and test behind the choice.

FWIW - It is very easy to beat the heck out of this by changing the timeout intervals behind the endpoint annotation updates and setting minsync to say 2s and just letting it run with log level=4.

thockin · 2017-01-25T16:40:32Z

I patched your change into a model so I could exercise it harder, and I managed to make it stop syncing at all. It's possible I screwed up the model. I'll stare at it again. Maybe we can get this in, and then if I want to encapsulate better/differently I can follow-up with that on my own time.

…

On Wed, Jan 25, 2017 at 5:34 AM, Timothy St. Clair ***@***.*** > wrote: You can not Reset() a Timer while a read is pending. At the very least you have to Stop() it, and the docs are unclear if that is actually enough That statement isn't true, https://golang.org/pkg/time/#Timer.Reset. The docs state that you can't "reuse" without calling stop and draining the channel. Which means using time after it has been signaled, it must be drained and stop called. I might be missing something, but given the code paths I don't see how this isn't true. Can you find a test condition in the code that fails a test case? b/c I wrote the wrapper explicitly on your request for testability, so it seems perfectly reasonable to write a test case where the wrapper fails. At this point I don't even care what it looks like anymore so long as the behavior is correct, and there is sound evidence and test behind the choice. FWIW - It is very easy to beat the heck out of this by changing the timeout intervals behind the endpoint annotation updates and setting minsync to say 2s and just letting it run with log level=4. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#37726 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVE-ATzF7tqn42FX8SM2htEz_OH0Iks5rV094gaJpZM4LA0uq> .

thockin · 2017-01-25T16:41:43Z

From the go docs: "This should not be done concurrent to other receives from the Timer's channel." It is unclear whether "this" is Reset() or the draining procedure.

timothysc · 2017-01-25T17:29:12Z

I stop and reset the timer here: https://gist.github.com/thockin/1c05beb4075025798e9d242e082e4852#file-periodic_runner_2_timers-go-L70

After it's been pulled off the channel.

thockin · 2017-01-25T18:18:01Z

Other than this bit, I am OK with it.

thockin · 2017-01-25T18:17:27Z

pkg/proxy/iptables/syncthrottle.go

+		if s.rl.TryAccept() == false {
+			duration := s.timeElapsedSinceLastSync()
+			glog.V(4).Infof("Attempting to synch too often.  Duration: %v, min period: %v", duration, s.minSyncPeriod)
+			s.timer.Reset(s.minSyncPeriod - duration)


This is the Reset that concerns me - I think you at least need to Stop() first, and even that is not clear since the docs say not to do Reset() concurrent to receives.

Imagine goroutine 1 is in SyncLoop() receiving on s.timer.C, and goroutine 2 call Sync() which calls syncProxyRules() which calls allowSync() whcih calls Reset(). From looking at the Timer core code, it looks like you can call Stop() here and then Reset, and it's possible the timer will fire in-between but that's OK. I'm more concerned with the potential for corruption or other badness if you reset while the timer is not stopped. Docs just aren't clear on this.

I emailed with go folks, and they said that if you call Stop() before Reset() it is "safe" with the exception that you might still take a delivery in the interim. That's fine for us, I think.

thockin · 2017-01-25T19:30:07Z

pkg/proxy/iptables/proxier.go

-	if proxier.throttle != nil {
-		proxier.throttle.Accept()
+	if !proxier.throttle.allowSync() {
+		return


Hold up. syncProxyRules() is called by OnEndpointsUpdate and OnServiceUpdate, both of which call something AFTER the sync. They expect that the sync actually sync'ed, and if it didn't (was deferred) this is going to do the wrong thing.

I think we need a larger change than this, I am sad to say. We need to defer the events, not just the sync..

I am sorry this review is so painful - this code is very async and overly delicate.

k8s-github-robot · 2017-02-02T06:21:51Z

@timothysc PR needs rebase

timothysc · 2017-02-02T15:39:49Z

closing in favor to #40868 , I'll let @danwinship drive this through.

thockin · 2017-02-02T15:51:22Z

I am on a short vacation, but I have a branch here with your changes and I am refactoring all the On*Update functions to fit (adding tests to make the work sane)

…

On Feb 2, 2017 9:40 AM, "Timothy St. Clair" ***@***.***> wrote: closing in favor to #40868 <#40868> , I'll let @danwinship <https://github.com/danwinship> drive this through. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#37726 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVBkBYsVrL6xPXOq_UW_MBQk8vs97ks5rYfjYgaJpZM4LA0uq> .

timothysc added area/kube-proxy release-note-none Denotes a PR that doesn't merit a release note. labels Nov 30, 2016

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 30, 2016

k8s-github-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 30, 2016

bprashanth reviewed Dec 1, 2016

View reviewed changes

timothysc assigned thockin Dec 1, 2016

timothysc added the cherrypick-candidate label Dec 1, 2016

k8s-cherrypick-bot removed the cherrypick-candidate label Dec 1, 2016

timothysc added this to the v1.5 milestone Dec 1, 2016

timothysc added the cherrypick-candidate label Dec 1, 2016

timothysc force-pushed the proxy_min_sync_fix branch 2 times, most recently from 6a01357 to 9ae62e3 Compare December 1, 2016 22:10

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 1, 2016

timothysc assigned bprashanth Dec 2, 2016

timothysc added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. and removed cherrypick-candidate labels Dec 2, 2016

bprashanth reviewed Dec 2, 2016

View reviewed changes

timothysc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 16, 2017

dcbw reviewed Jan 18, 2017

View reviewed changes

thockin reviewed Jan 25, 2017

View reviewed changes

Fix for non-blocking updates during min-syc-period

cc45ab9

timothysc force-pushed the proxy_min_sync_fix branch from f93316a to cc45ab9 Compare January 25, 2017 04:02

thockin reviewed Jan 25, 2017

View reviewed changes

k8s-github-robot assigned matchstick Jan 31, 2017

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2017

danwinship mentioned this pull request Feb 2, 2017

[kube-proxy] Fix for non-blocking updates during min-sync-period #40868

Closed

timothysc closed this Feb 2, 2017

thockin added a commit to thockin/kubernetes that referenced this pull request Feb 6, 2017

Tim's kubernetes#37726

b9bfe95

[kube-proxy] Fix for non-blocking updates during min-sync-period #37726

[kube-proxy] Fix for non-blocking updates during min-sync-period #37726

Conversation

timothysc commented Nov 30, 2016 • edited

k8s-oncall commented Nov 30, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyeder commented Dec 1, 2016

knobunc commented Dec 1, 2016

jeremyeder commented Dec 1, 2016

timothysc commented Dec 1, 2016 • edited

saad-ali commented Dec 2, 2016

bprashanth commented Dec 2, 2016

timothysc commented Dec 2, 2016

saad-ali commented Dec 2, 2016

dims commented Dec 2, 2016

timothysc commented Dec 2, 2016

timothysc commented Dec 2, 2016

bprashanth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothysc commented Jan 9, 2017

timothysc commented Jan 12, 2017

timothysc commented Jan 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothysc commented Jan 19, 2017

k8s-github-robot commented Jan 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 25, 2017

thockin commented Jan 25, 2017

thockin commented Jan 25, 2017

timothysc commented Jan 25, 2017

thockin commented Jan 25, 2017 via email

thockin commented Jan 25, 2017

timothysc commented Jan 25, 2017

thockin commented Jan 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin Jan 25, 2017 • edited

Choose a reason for hiding this comment

k8s-github-robot commented Feb 2, 2017

timothysc commented Feb 2, 2017

thockin commented Feb 2, 2017 via email

timothysc commented Nov 30, 2016 •

edited

timothysc commented Dec 1, 2016 •

edited

thockin Jan 25, 2017 •

edited