runtime: smoother growth rate transition yields great results

### Proposal Details

# Proposal
In `growslice` function, smoothly decrease the growth rate of `newCap` from x2 to x1.125
This is the benchmarked code change -
Edit: bug fixing the formula
```go
    	        // Growth rate decent range.
		// Use a slope that slowly decreases the growth rate from x2 up to x1.125.
		// The slope is such that it reaches 1x at 1<<26 but stops at 1.125x at 1<<24, in steps of 1/16
		const transitionStart = 1 << 10
		const transitionEnd = 1 << 24
		if oldCap < transitionStart {
			newcap = doublecap
		} else {
			var growthRateMul int
			if oldCap <= transitionEnd {
				growthRateMul = 27 - bits.Len64(uint64(oldCap)) // 27 - log2(oldCap)
			} else {
				growthRateMul = 2
			}
			// Check 0 < newcap to detect overflow
			// and prevent an infinite loop.`
			for 0 < newcap && newcap < newLen {
				newcap += growthRateMul * (newcap >> 4)
			}
```
# Benchmark
```bash
bash-3.2$ /usr/local/go/bin/go version
go version go1.21.5 darwin/arm64
bash-3.2$  GOROOT=/usr/local/go  /usr/local/go/bin/go test go-playground/pkg -bench .
goos: darwin
goarch: arm64
pkg: go-playground/pkg
BenchmarkGrowSlice-8   	       1	9178279667 ns/op	78178716512 B/op	   38481 allocs/op
PASS
ok  	go-playground/pkg	9.701s
bash-3.2$  GOROOT=~/git/go GOPATH=~/go ~/git/go/bin/go test go-playground/pkg -bench .
goos: darwin
goarch: arm64
pkg: go-playground/pkg
BenchmarkGrowSlice-8   	       1	8682857792 ns/op	72003354976 B/op	   25826 allocs/op
PASS
ok  	go-playground/pkg	8.951s
```
So about x1.5 less allocations, x1.08 less memory and x1.05 less time. 
The results are consistent with small variability, and also work with different capacity ranges. 

This is the benchmark code -
```go
// From runtime/sizeclasses.go:
// We only test allocations above it, since for anything below the "price" for allocation is too small.
const _MaxSmallSize = 32768

func BenchmarkGrowSlice(b *testing.B) {
	baseSlice := make([]byte, _MaxSmallSize)
	debug.SetMemoryLimit(1 << 32)
	b.ReportAllocs()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		// Test the average across a range of target sizes.
		for k := _MaxSmallSize * 10; k < 1<<24; k += 10001 {
			debug.FreeOSMemory()

			testSlice := baseSlice
			for j := 0; j < k; j += 10 {
				// Unroll the loop to reduce dependency on the loop time.
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
				testSlice = append(testSlice, 1)
			}
		}
	}
}
```

# Background
Up until go1.17, if the capacity was below 1024, the growth rate was x2 and if the capacity was above 1024, the growth rate was x1.25
In go1.18 this was smoothed somewhat https://go-review.googlesource.com/c/go/+/347917
I was not able to find documentation of the rational and reasoning behind that.

# Rational
The rational behind growing the capacity exponentially is to reduce the amount of required reallocation times.
It makes the required number of reallocation to grow in amortized complexity of `O(log(n))` where n is the length. 

But this has a trade-off; the greater the exponent (growth factor) is, the more redundant capacity was allocated and not used.
This grows relative to the last realloaction size hence `O(exp(log(n))) ~ O(n)` 

To mitigate this, go reduced the growth rate of large slices where memory starts to matter.

This change is too abrupt, both before the go1.18 fix and after, causing 2 issues - 
1. the growth is too small for small slices, so too many reallocations are needed.
2. For big slices the growth rate is too big, so too much memory is used.

The proposal is to gradually decrease the growth factor from x2 to x1.125 across a typical size of `1<<10` up until `1<<24` at which the growth rate will be kept at 1.125

To keep using integers, I quantized it to steps of 1/16.
Keeping a high 

Since it's a difference in the exponent, it leads to substantial performance differences.

# Further investigation
## The golden ratio
Some claim having a growth rate below the golden ratio (`~1.61`) is optimal for memory reuse.
https://stackoverflow.com/questions/1100311/what-is-the-ideal-growth-rate-for-a-dynamically-allocated-array
So it might be beneficial to quickly decrease to something below it, and only then start the gradual decrease.

## Decrease rate
The rate at which the growth factor is decreasing is linear with the number of allocations and exponential with the increasing capacity.
To do this I spread the growth in powers of 2, even though the growth rate slows.
There might be a better strategy, but it might not be worth the complexity.

## Keep it high
Keeping the growth rate high at the beginning had tremendously better results, but I don't know the exact form of it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runtime: smoother growth rate transition yields great results #64877

Proposal Details

Proposal

Benchmark

Background

Rational

Further investigation

The golden ratio

Decrease rate

Keep it high

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

runtime: smoother growth rate transition yields great results #64877

Description

Proposal Details

Proposal

Benchmark

Background

Rational

Further investigation

The golden ratio

Decrease rate

Keep it high

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions