Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: detect and optimize slice insertion idiom append(sa, append(sb, sc...)...) #31592

Open
go101 opened this issue Apr 21, 2019 · 4 comments

Comments

Projects
None yet
5 participants
@go101
Copy link

commented Apr 21, 2019

What version of Go are you using (go version)?

go version go1.12.2 linux/amd64

Does this issue reproduce with the latest release?

yes

What did you do?

package main

import (
	"testing"
)

type T = int
var sx = []T{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
var sy = []T{111, 222, 333, 444}
var index = 6

func SliceInsertion_OneLine(base, inserted []T, i int) []T {
	return append(base[:i], append(inserted, base[i:]...)...)
}

func SliceInsertion_Verbose(base, inserted []T, i int) []T {
	if cap(base)-len(base) >= len(inserted) {
		s := base[:len(base)+len(inserted)]
		copy(s[i+len(inserted):], s[i:])
		copy(s[i:], inserted)
		return s
	} else {
		s := make([]T, 0, len(inserted)+len(base))
		s = append(s, base[:i]...)
		s = append(s, inserted...)
		s = append(s, base[i:]...)
		return s
	}
}

var s1 []T
func Benchmark_SliceInsertion_OneLine(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s1 = SliceInsertion_OneLine(sx, sy, index)
	}
}

var s2 []T
func Benchmark_SliceInsertion_Verbose(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s2 = SliceInsertion_Verbose(sx, sy, index)
	}
}

What did you expect to see?

Small performance difference.

What did you see instead?

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: app/t
Benchmark_SliceInsertion_OneLine-4   	 5000000	       311 ns/op	     368 B/op	       2 allocs/op
Benchmark_SliceInsertion_Verbose-4   	10000000	       146 ns/op	     160 B/op	       1 allocs/op
PASS
ok  	app/t	3.535s
@go101

This comment has been minimized.

Copy link
Author

commented Apr 21, 2019

I would say this optimization is more needed than this one, #21266, made in Go SDK 1.11, for it is used more popularly, and in fact, before Go 1.11, there is already a slice-extending method which is even more efficient than the optimization made in Go SDK 1.11. Evidence:

package main

import (
	"testing"
)

type T = int
var sx = []T{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}

func SliceGrow_OneLine(base []T, newCapacity int) []T {
	return append(base, make([]T, newCapacity-cap(base))...)
}

func SliceGrow_VerboseCopy(base []T, newCapacity int) []T {
	m := make([]T, newCapacity)
	copy(m, base)
	return m
}

var sa []T
func Benchmark_SliceGrow_OneLine(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sa = SliceGrow_OneLine(sx, 100)
	}
}

var sc []T
func Benchmark_SliceGrow_VerboseCopy(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sc = SliceGrow_VerboseCopy(sx, 100)
	}
}

Benchmark result:

Benchmark_SliceGrow_OneLine-4       	 3000000	       426 ns/op	     896 B/op	       1 allocs/op
Benchmark_SliceGrow_VerboseCopy-4   	 5000000	       350 ns/op	     896 B/op	       1 allocs/op
@julieqiu

This comment has been minimized.

Copy link

commented May 28, 2019

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

cc @martisch

There are a lot of variants of this. E.g. splice, introduced in a go-diff commit, offered very substantial performance improvements.

I'm genuinely not sure how many of these we want to recognize in the compiler vs, say, writing a blog post illustrating how to write these efficiently or wait for generics and implement them in the standard library.

@martisch

This comment has been minimized.

Copy link
Member

commented May 28, 2019

I see at least 3 patterns here:

  • append(slice[:index],` append(elements, slice[index:]...)...)
  • append(slice[:index],` append(elements, slice[index+len(elements):]...)...)
  • append(slice[:index],` append(elements, slice[index+amount:]...)...)

I think the first 2 should be easiest to implement and subjectively also the most used.
If we have them recognized we could look at the how difficult and how much more complexity the third case adds.

If others dont think the first 2 are a no-go I can have go at it :) (but if someone else is eager to work on it dont hold off on it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.