runtime: smooth cons/mark with a moving average and use actual trigger

This change modifies the pacer in two ways: * It replaces the PI controller used as a smoothing function with a simple two-cycle moving average. * It makes the pacer use the actual GC trigger point for cons/mark (and other) calculations instead of the precomputed one. The second part of this change was attempted in the Go 1.19 release cycle, but could not be done because although it resulted in a better-behaved pacer, it exploited the PI controller's sensitivity to history in a way that was ultimately unfavorable for most applications. This sensitivity is complex to reason about, and forces us into choices that don't really make sense (like using the precomputed trigger over the actual one -- that's really a bug fix). The net effect of this change is intended to be: * The pacer computes a more stable estimate of the actual cons/mark ratio, making it easier to understand. * The pacer is much more accurate at hitting the heap goal, so the GC respects GOGC much more often and reliably. * The pacer forces a more stable rate of GC assists in the steady-state overall. See https://perf.golang.org/search?q=upload:20221106.10 for benchmark results and #53892 for complete context. The benchmarks that regress in memory use appear to be not worth worrying about. In all cases, it appears that the GC was triggering super early, resulting in a lack of adherence to rule of GOGC. The fogleman benchmarks just have a single final GC that triggers early and also happens to be the peak heap size. The tile38 WithinCircle benchmark only has 4 GC cycles in the benchmarked region, so it's very sensitive to pacing. In this case, the old smoothing function is getting lucky by starting way too early, avoiding assists. Meanwhile the 2-cycle moving average is more accurate on the heap goal, but the 1st and 2nd cycle after the initialization phase are operating on a cons/mark from the initialization phase which is much less active, resulting in a cons/mark that's too low, causing the GC to start too late, increasing assists, and therefore latency (but only transiently, at this phase change). I really do think the PI controller is just getting lucky here with a particular history, because I've definitely observed it oscillating wildly in response to a phase change. This change also moves the PI controller out of mgcpacer.go, because it's no longer used there. It now lives in mgcscavenge.go, where it's actually used. Fixes #53892. Change-Id: I3f875a2e40f31f381920f91d8b090556b17a2b16 Reviewed-on: https://go-review.googlesource.com/c/go/+/417558 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
golang · Jul 20, 2023 · 1073818 · 1073818
1 parent 7213c36
commit 1073818
Show file tree

Hide file tree

Showing 4 changed files with 125 additions and 188 deletions.
diff --git a/src/runtime/mgcpacer.go b/src/runtime/mgcpacer.go
@@ -136,12 +136,10 @@ type gcControllerState struct {
 	// Updated at the end of each GC cycle, in endCycle.
 	consMark float64
 
-	// consMarkController holds the state for the mark-cons ratio
-	// estimation over time.
-	//
-	// Its purpose is to smooth out noisiness in the computation of
-	// consMark; see consMark for details.
-	consMarkController piController
+	// lastConsMark is the computed cons/mark value for the previous GC
+	// cycle. Note that this is *not* the last value of cons/mark, but the
+	// actual computed value. See endCycle for details.
+	lastConsMark float64
 
 	// gcPercentHeapGoal is the goal heapLive for when next GC ends derived
 	// from gcPercent.
@@ -372,28 +370,6 @@ type gcControllerState struct {
 func (c *gcControllerState) init(gcPercent int32, memoryLimit int64) {
 	c.heapMinimum = defaultHeapMinimum
 	c.triggered = ^uint64(0)
-
-	c.consMarkController = piController{
-		// Tuned first via the Ziegler-Nichols process in simulation,
-		// then the integral time was manually tuned against real-world
-		// applications to deal with noisiness in the measured cons/mark
-		// ratio.
-		kp: 0.9,
-		ti: 4.0,
-
-		// Set a high reset time in GC cycles.
-		// This is inversely proportional to the rate at which we
-		// accumulate error from clipping. By making this very high
-		// we make the accumulation slow. In general, clipping is
-		// OK in our situation, hence the choice.
-		//
-		// Tune this if we get unintended effects from clipping for
-		// a long time.
-		tt:  1000,
-		min: -1000,
-		max: 1000,
-	}
-
 	c.setGCPercent(gcPercent)
 	c.setMemoryLimit(memoryLimit)
 	c.commit(true) // No sweep phase in the first GC cycle.
@@ -416,26 +392,7 @@ func (c *gcControllerState) startCycle(markStartTime int64, procs int, trigger g
 	c.fractionalMarkTime.Store(0)
 	c.idleMarkTime.Store(0)
 	c.markStartTime = markStartTime
-
-	// TODO(mknyszek): This is supposed to be the actual trigger point for the heap, but
-	// causes regressions in memory use. The cause is that the PI controller used to smooth
-	// the cons/mark ratio measurements tends to flail when using the less accurate precomputed
-	// trigger for the cons/mark calculation, and this results in the controller being more
-	// conservative about steady-states it tries to find in the future.
-	//
-	// This conservatism is transient, but these transient states tend to matter for short-lived
-	// programs, especially because the PI controller is overdamped, partially because it is
-	// configured with a relatively large time constant.
-	//
-	// Ultimately, I think this is just two mistakes piled on one another: the choice of a swingy
-	// smoothing function that recalls a fairly long history (due to its overdamped time constant)
-	// coupled with an inaccurate cons/mark calculation. It just so happens this works better
-	// today, and it makes it harder to change things in the future.
-	//
-	// This is described in #53738. Fix this for #53892 by changing back to the actual trigger
-	// point and simplifying the smoothing function.
-	heapTrigger, heapGoal := c.trigger()
-	c.triggered = heapTrigger
+	c.triggered = c.heapLive.Load()
 
 	// Compute the background mark utilization goal. In general,
 	// this may not come out exactly. We round the number of
@@ -498,6 +455,7 @@ func (c *gcControllerState) startCycle(markStartTime int64, procs int, trigger g
 	c.revise()
 
 	if debug.gcpacertrace > 0 {
+		heapGoal := c.heapGoal()
 		assistRatio := c.assistWorkPerByte.Load()
 		print("pacer: assist ratio=", assistRatio,
 			" (scan ", gcController.heapScan.Load()>>20, " MB in ",
@@ -700,31 +658,12 @@ func (c *gcControllerState) endCycle(now int64, procs int, userForced bool) {
 	currentConsMark := (float64(c.heapLive.Load()-c.triggered) * (utilization + idleUtilization)) /
 		(float64(scanWork) * (1 - utilization))
 
-	// Update cons/mark controller. The time period for this is 1 GC cycle.
-	//
-	// This use of a PI controller might seem strange. So, here's an explanation:
-	//
-	// currentConsMark represents the consMark we *should've* had to be perfectly
-	// on-target for this cycle. Given that we assume the next GC will be like this
-	// one in the steady-state, it stands to reason that we should just pick that
-	// as our next consMark. In practice, however, currentConsMark is too noisy:
-	// we're going to be wildly off-target in each GC cycle if we do that.
-	//
-	// What we do instead is make a long-term assumption: there is some steady-state
-	// consMark value, but it's obscured by noise. By constantly shooting for this
-	// noisy-but-perfect consMark value, the controller will bounce around a bit,
-	// but its average behavior, in aggregate, should be less noisy and closer to
-	// the true long-term consMark value, provided its tuned to be slightly overdamped.
-	var ok bool
+	// Update our cons/mark estimate. This is the raw value above, but averaged over 2 GC cycles
+	// because it tends to be jittery, even in the steady-state. The smoothing helps the GC to
+	// maintain much more stable cycle-by-cycle behavior.
 	oldConsMark := c.consMark
-	c.consMark, ok = c.consMarkController.next(c.consMark, currentConsMark, 1.0)
-	if !ok {
-		// The error spiraled out of control. This is incredibly unlikely seeing
-		// as this controller is essentially just a smoothing function, but it might
-		// mean that something went very wrong with how currentConsMark was calculated.
-		// Just reset consMark and keep going.
-		c.consMark = 0
-	}
+	c.consMark = (currentConsMark + c.lastConsMark) / 2
+	c.lastConsMark = currentConsMark
 
 	if debug.gcpacertrace > 0 {
 		printlock()
@@ -733,9 +672,6 @@ func (c *gcControllerState) endCycle(now int64, procs int, userForced bool) {
 		print(c.heapScanWork.Load(), "+", c.stackScanWork.Load(), "+", c.globalsScanWork.Load(), " B work (", c.lastHeapScan+c.lastStackScan.Load()+c.globalsScan.Load(), " B exp.) ")
 		live := c.heapLive.Load()
 		print("in ", c.triggered, " B -> ", live, " B (∆goal ", int64(live)-int64(c.lastHeapGoal), ", cons/mark ", oldConsMark, ")")
-		if !ok {
-			print("[controller reset]")
-		}
 		println()
 		printunlock()
 	}
@@ -1379,74 +1315,6 @@ func readGOMEMLIMIT() int64 {
 	return n
 }
 
-type piController struct {
-	kp float64 // Proportional constant.
-	ti float64 // Integral time constant.
-	tt float64 // Reset time.
-
-	min, max float64 // Output boundaries.
-
-	// PI controller state.
-
-	errIntegral float64 // Integral of the error from t=0 to now.
-
-	// Error flags.
-	errOverflow   bool // Set if errIntegral ever overflowed.
-	inputOverflow bool // Set if an operation with the input overflowed.
-}
-
-// next provides a new sample to the controller.
-//
-// input is the sample, setpoint is the desired point, and period is how much
-// time (in whatever unit makes the most sense) has passed since the last sample.
-//
-// Returns a new value for the variable it's controlling, and whether the operation
-// completed successfully. One reason this might fail is if error has been growing
-// in an unbounded manner, to the point of overflow.
-//
-// In the specific case of an error overflow occurs, the errOverflow field will be
-// set and the rest of the controller's internal state will be fully reset.
-func (c *piController) next(input, setpoint, period float64) (float64, bool) {
-	// Compute the raw output value.
-	prop := c.kp * (setpoint - input)
-	rawOutput := prop + c.errIntegral
-
-	// Clamp rawOutput into output.
-	output := rawOutput
-	if isInf(output) || isNaN(output) {
-		// The input had a large enough magnitude that either it was already
-		// overflowed, or some operation with it overflowed.
-		// Set a flag and reset. That's the safest thing to do.
-		c.reset()
-		c.inputOverflow = true
-		return c.min, false
-	}
-	if output < c.min {
-		output = c.min
-	} else if output > c.max {
-		output = c.max
-	}
-
-	// Update the controller's state.
-	if c.ti != 0 && c.tt != 0 {
-		c.errIntegral += (c.kp*period/c.ti)*(setpoint-input) + (period/c.tt)*(output-rawOutput)
-		if isInf(c.errIntegral) || isNaN(c.errIntegral) {
-			// So much error has accumulated that we managed to overflow.
-			// The assumptions around the controller have likely broken down.
-			// Set a flag and reset. That's the safest thing to do.
-			c.reset()
-			c.errOverflow = true
-			return c.min, false
-		}
-	}
-	return output, true
-}
-
-// reset resets the controller state, except for controller error flags.
-func (c *piController) reset() {
-	c.errIntegral = 0
-}
-
 // addIdleMarkWorker attempts to add a new idle mark worker.
 //
 // If this returns true, the caller must become an idle mark worker unless

diff --git a/src/runtime/mgcpacer_test.go b/src/runtime/mgcpacer_test.go
@@ -1019,51 +1019,6 @@ func (f float64Stream) limit(min, max float64) float64Stream {
 	}
 }
 
-func FuzzPIController(f *testing.F) {
-	isNormal := func(x float64) bool {
-		return !math.IsInf(x, 0) && !math.IsNaN(x)
-	}
-	isPositive := func(x float64) bool {
-		return isNormal(x) && x > 0
-	}
-	// Seed with constants from controllers in the runtime.
-	// It's not critical that we keep these in sync, they're just
-	// reasonable seed inputs.
-	f.Add(0.3375, 3.2e6, 1e9, 0.001, 1000.0, 0.01)
-	f.Add(0.9, 4.0, 1000.0, -1000.0, 1000.0, 0.84)
-	f.Fuzz(func(t *testing.T, kp, ti, tt, min, max, setPoint float64) {
-		// Ignore uninteresting invalid parameters. These parameters
-		// are constant, so in practice surprising values will be documented
-		// or will be other otherwise immediately visible.
-		//
-		// We just want to make sure that given a non-Inf, non-NaN input,
-		// we always get a non-Inf, non-NaN output.
-		if !isPositive(kp) || !isPositive(ti) || !isPositive(tt) {
-			return
-		}
-		if !isNormal(min) || !isNormal(max) || min > max {
-			return
-		}
-		// Use a random source, but make it deterministic.
-		rs := rand.New(rand.NewSource(800))
-		randFloat64 := func() float64 {
-			return math.Float64frombits(rs.Uint64())
-		}
-		p := NewPIController(kp, ti, tt, min, max)
-		state := float64(0)
-		for i := 0; i < 100; i++ {
-			input := randFloat64()
-			// Ignore the "ok" parameter. We're just trying to break it.
-			// state is intentionally completely uncorrelated with the input.
-			var ok bool
-			state, ok = p.Next(input, setPoint, 1.0)
-			if !isNormal(state) {
-				t.Fatalf("got NaN or Inf result from controller: %f %v", state, ok)
-			}
-		}
-	})
-}
-
 func TestIdleMarkWorkerCount(t *testing.T) {
 	const workers = 10
 	c := NewGCController(100, math.MaxInt64)

diff --git a/src/runtime/mgcscavenge.go b/src/runtime/mgcscavenge.go
@@ -1114,3 +1114,71 @@ func (s *scavengeIndex) mark(base, limit uintptr) {
 func (s *scavengeIndex) clear(ci chunkIdx) {
 	s.chunks[ci/8].And(^uint8(1 << (ci % 8)))
 }
+
+type piController struct {
+	kp float64 // Proportional constant.
+	ti float64 // Integral time constant.
+	tt float64 // Reset time.
+
+	min, max float64 // Output boundaries.
+
+	// PI controller state.
+
+	errIntegral float64 // Integral of the error from t=0 to now.
+
+	// Error flags.
+	errOverflow   bool // Set if errIntegral ever overflowed.
+	inputOverflow bool // Set if an operation with the input overflowed.
+}
+
+// next provides a new sample to the controller.
+//
+// input is the sample, setpoint is the desired point, and period is how much
+// time (in whatever unit makes the most sense) has passed since the last sample.
+//
+// Returns a new value for the variable it's controlling, and whether the operation
+// completed successfully. One reason this might fail is if error has been growing
+// in an unbounded manner, to the point of overflow.
+//
+// In the specific case of an error overflow occurs, the errOverflow field will be
+// set and the rest of the controller's internal state will be fully reset.
+func (c *piController) next(input, setpoint, period float64) (float64, bool) {
+	// Compute the raw output value.
+	prop := c.kp * (setpoint - input)
+	rawOutput := prop + c.errIntegral
+
+	// Clamp rawOutput into output.
+	output := rawOutput
+	if isInf(output) || isNaN(output) {
+		// The input had a large enough magnitude that either it was already
+		// overflowed, or some operation with it overflowed.
+		// Set a flag and reset. That's the safest thing to do.
+		c.reset()
+		c.inputOverflow = true
+		return c.min, false
+	}
+	if output < c.min {
+		output = c.min
+	} else if output > c.max {
+		output = c.max
+	}
+
+	// Update the controller's state.
+	if c.ti != 0 && c.tt != 0 {
+		c.errIntegral += (c.kp*period/c.ti)*(setpoint-input) + (period/c.tt)*(output-rawOutput)
+		if isInf(c.errIntegral) || isNaN(c.errIntegral) {
+			// So much error has accumulated that we managed to overflow.
+			// The assumptions around the controller have likely broken down.
+			// Set a flag and reset. That's the safest thing to do.
+			c.reset()
+			c.errOverflow = true
+			return c.min, false
+		}
+	}
+	return output, true
+}
+
+// reset resets the controller state, except for controller error flags.
+func (c *piController) reset() {
+	c.errIntegral = 0
+}
diff --git a/src/runtime/mgcscavenge_test.go b/src/runtime/mgcscavenge_test.go
@@ -7,6 +7,7 @@ package runtime_test
 import (
 	"fmt"
 	"internal/goos"
+	"math"
 	"math/rand"
 	. "runtime"
 	"runtime/internal/atomic"
@@ -707,3 +708,48 @@ func TestScavengeIndex(t *testing.T) {
 		find(0, 0)
 	})
 }
+
+func FuzzPIController(f *testing.F) {
+	isNormal := func(x float64) bool {
+		return !math.IsInf(x, 0) && !math.IsNaN(x)
+	}
+	isPositive := func(x float64) bool {
+		return isNormal(x) && x > 0
+	}
+	// Seed with constants from controllers in the runtime.
+	// It's not critical that we keep these in sync, they're just
+	// reasonable seed inputs.
+	f.Add(0.3375, 3.2e6, 1e9, 0.001, 1000.0, 0.01)
+	f.Add(0.9, 4.0, 1000.0, -1000.0, 1000.0, 0.84)
+	f.Fuzz(func(t *testing.T, kp, ti, tt, min, max, setPoint float64) {
+		// Ignore uninteresting invalid parameters. These parameters
+		// are constant, so in practice surprising values will be documented
+		// or will be other otherwise immediately visible.
+		//
+		// We just want to make sure that given a non-Inf, non-NaN input,
+		// we always get a non-Inf, non-NaN output.
+		if !isPositive(kp) || !isPositive(ti) || !isPositive(tt) {
+			return
+		}
+		if !isNormal(min) || !isNormal(max) || min > max {
+			return
+		}
+		// Use a random source, but make it deterministic.
+		rs := rand.New(rand.NewSource(800))
+		randFloat64 := func() float64 {
+			return math.Float64frombits(rs.Uint64())
+		}
+		p := NewPIController(kp, ti, tt, min, max)
+		state := float64(0)
+		for i := 0; i < 100; i++ {
+			input := randFloat64()
+			// Ignore the "ok" parameter. We're just trying to break it.
+			// state is intentionally completely uncorrelated with the input.
+			var ok bool
+			state, ok = p.Next(input, setPoint, 1.0)
+			if !isNormal(state) {
+				t.Fatalf("got NaN or Inf result from controller: %f %v", state, ok)
+			}
+		}
+	})
+}