Skip to content

runtime: smoother growth rate transition yields great results #64877

@shoham-b

Description

@shoham-b

Proposal Details

Proposal

In growslice function, smoothly decrease the growth rate of newCap from x2 to x1.125
This is the benchmarked code change -
Edit: bug fixing the formula

    	        // Growth rate decent range.
		// Use a slope that slowly decreases the growth rate from x2 up to x1.125.
		// The slope is such that it reaches 1x at 1<<26 but stops at 1.125x at 1<<24, in steps of 1/16
		const transitionStart = 1 << 10
		const transitionEnd = 1 << 24
		if oldCap < transitionStart {
			newcap = doublecap
		} else {
			var growthRateMul int
			if oldCap <= transitionEnd {
				growthRateMul = 27 - bits.Len64(uint64(oldCap)) // 27 - log2(oldCap)
			} else {
				growthRateMul = 2
			}
			// Check 0 < newcap to detect overflow
			// and prevent an infinite loop.`
			for 0 < newcap && newcap < newLen {
				newcap += growthRateMul * (newcap >> 4)
			}

Benchmark

bash-3.2$ /usr/local/go/bin/go version
go version go1.21.5 darwin/arm64
bash-3.2$  GOROOT=/usr/local/go  /usr/local/go/bin/go test go-playground/pkg -bench .
goos: darwin
goarch: arm64
pkg: go-playground/pkg
BenchmarkGrowSlice-8   	       1	9178279667 ns/op	78178716512 B/op	   38481 allocs/op
PASS
ok  	go-playground/pkg	9.701s
bash-3.2$  GOROOT=~/git/go GOPATH=~/go ~/git/go/bin/go test go-playground/pkg -bench .
goos: darwin
goarch: arm64
pkg: go-playground/pkg
BenchmarkGrowSlice-8   	       1	8682857792 ns/op	72003354976 B/op	   25826 allocs/op
PASS
ok  	go-playground/pkg	8.951s

So about x1.5 less allocations, x1.08 less memory and x1.05 less time.
The results are consistent with small variability, and also work with different capacity ranges.

This is the benchmark code -

// From runtime/sizeclasses.go:
// We only test allocations above it, since for anything below the "price" for allocation is too small.
const _MaxSmallSize = 32768

func BenchmarkGrowSlice(b *testing.B) {
	baseSlice := make([]byte, _MaxSmallSize)
	debug.SetMemoryLimit(1 << 32)
	b.ReportAllocs()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		// Test the average across a range of target sizes.
		for k := _MaxSmallSize * 10; k < 1<<24; k += 10001 {
			debug.FreeOSMemory()

			testSlice := baseSlice
			for j := 0; j < k; j += 10 {
				// Unroll the loop to reduce dependency on the loop time.
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
			}
		}
	}
}

Background

Up until go1.17, if the capacity was below 1024, the growth rate was x2 and if the capacity was above 1024, the growth rate was x1.25
In go1.18 this was smoothed somewhat https://go-review.googlesource.com/c/go/+/347917
I was not able to find documentation of the rational and reasoning behind that.

Rational

The rational behind growing the capacity exponentially is to reduce the amount of required reallocation times.
It makes the required number of reallocation to grow in amortized complexity of O(log(n)) where n is the length.

But this has a trade-off; the greater the exponent (growth factor) is, the more redundant capacity was allocated and not used.
This grows relative to the last realloaction size hence O(exp(log(n))) ~ O(n)

To mitigate this, go reduced the growth rate of large slices where memory starts to matter.

This change is too abrupt, both before the go1.18 fix and after, causing 2 issues -

  1. the growth is too small for small slices, so too many reallocations are needed.
  2. For big slices the growth rate is too big, so too much memory is used.

The proposal is to gradually decrease the growth factor from x2 to x1.125 across a typical size of 1<<10 up until 1<<24 at which the growth rate will be kept at 1.125

To keep using integers, I quantized it to steps of 1/16.
Keeping a high

Since it's a difference in the exponent, it leads to substantial performance differences.

Further investigation

The golden ratio

Some claim having a growth rate below the golden ratio (~1.61) is optimal for memory reuse.
https://stackoverflow.com/questions/1100311/what-is-the-ideal-growth-rate-for-a-dynamically-allocated-array
So it might be beneficial to quickly decrease to something below it, and only then start the gradual decrease.

Decrease rate

The rate at which the growth factor is decreasing is linear with the number of allocations and exponential with the increasing capacity.
To do this I spread the growth in powers of 2, even though the growth rate slows.
There might be a better strategy, but it might not be worth the complexity.

Keep it high

Keeping the growth rate high at the beginning had tremendously better results, but I don't know the exact form of it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions