cmd/compile: detect and optimize slice insertion idiom append(sa, append(sb, sc...)...) #31592

zigo101 · 2019-04-21T04:43:33Z

What version of Go are you using (`go version`)?

go version go1.12.2 linux/amd64

Does this issue reproduce with the latest release?

yes

What did you do?

package main

import (
	"testing"
)

type T = int
var sx = []T{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
var sy = []T{111, 222, 333, 444}
var index = 6

func SliceInsertion_OneLine(base, inserted []T, i int) []T {
	return append(base[:i], append(inserted, base[i:]...)...)
}

func SliceInsertion_Verbose(base, inserted []T, i int) []T {
	if cap(base)-len(base) >= len(inserted) {
		s := base[:len(base)+len(inserted)]
		copy(s[i+len(inserted):], s[i:])
		copy(s[i:], inserted)
		return s
	} else {
		s := make([]T, 0, len(inserted)+len(base))
		s = append(s, base[:i]...)
		s = append(s, inserted...)
		s = append(s, base[i:]...)
		return s
	}
}

var s1 []T
func Benchmark_SliceInsertion_OneLine(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s1 = SliceInsertion_OneLine(sx, sy, index)
	}
}

var s2 []T
func Benchmark_SliceInsertion_Verbose(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s2 = SliceInsertion_Verbose(sx, sy, index)
	}
}

What did you expect to see?

Small performance difference.

What did you see instead?

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: app/t
Benchmark_SliceInsertion_OneLine-4   	 5000000	       311 ns/op	     368 B/op	       2 allocs/op
Benchmark_SliceInsertion_Verbose-4   	10000000	       146 ns/op	     160 B/op	       1 allocs/op
PASS
ok  	app/t	3.535s

The text was updated successfully, but these errors were encountered:

zigo101 · 2019-04-21T04:56:33Z

I would say this optimization is more needed than this one, #21266, made in Go SDK 1.11, for it is used more popularly, and in fact, before Go 1.11, there is already a slice-extending method which is even more efficient than the optimization made in Go SDK 1.11. Evidence:

package main

import (
	"testing"
)

type T = int
var sx = []T{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}

func SliceGrow_OneLine(base []T, newCapacity int) []T {
	return append(base, make([]T, newCapacity-cap(base))...)
}

func SliceGrow_VerboseCopy(base []T, newCapacity int) []T {
	m := make([]T, newCapacity)
	copy(m, base)
	return m
}

var sa []T
func Benchmark_SliceGrow_OneLine(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sa = SliceGrow_OneLine(sx, 100)
	}
}

var sc []T
func Benchmark_SliceGrow_VerboseCopy(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sc = SliceGrow_VerboseCopy(sx, 100)
	}
}

Benchmark result:

Benchmark_SliceGrow_OneLine-4       	 3000000	       426 ns/op	     896 B/op	       1 allocs/op
Benchmark_SliceGrow_VerboseCopy-4   	 5000000	       350 ns/op	     896 B/op	       1 allocs/op

julieqiu · 2019-05-28T15:40:10Z

/cc @randall77 @griesemer

josharian · 2019-05-28T19:33:27Z

cc @martisch

There are a lot of variants of this. E.g. splice, introduced in a go-diff commit, offered very substantial performance improvements.

I'm genuinely not sure how many of these we want to recognize in the compiler vs, say, writing a blog post illustrating how to write these efficiently or wait for generics and implement them in the standard library.

martisch · 2019-05-28T19:55:35Z

I see at least 3 patterns here:

append(slice[:index],` append(elements, slice[index:]...)...)
append(slice[:index],` append(elements, slice[index+len(elements):]...)...)
append(slice[:index],` append(elements, slice[index+amount:]...)...)

I think the first 2 should be easiest to implement and subjectively also the most used.
If we have them recognized we could look at the how difficult and how much more complexity the third case adds.

If others dont think the first 2 are a no-go I can have go at it :) (but if someone else is eager to work on it dont hold off on it)

ear7h · 2019-09-11T09:58:02Z

I'd like to try this, but I think it could be generalized, as the original comment mentioned.

For nested appends, one would need at most 1 call to growslice (but more calls to memmove). append(s1, append(s2, append( s3, append(. . .)...)...)...) could be transformed to the following by walkappend:

init {
	// continue walking the node tree and place the args
	// in temporaries s1, s2, s3, ...
	newcap := len(s1) + len(s2) + len(s3) + ...
	if newcap > cap(s1) {
	    s1 = growslice(s1, newcap)
	}

	s1 = s1[:s1]
	idx := 0

	copy(s1[idx:], s2)
	idx += len(s2)
	if cap(s2) >= len(s3) {
	    copy(s2[:cap(s2)], s3)
	}

	copy(s1[idx:], s3)
	idx += len(s3)
	if cap(s3) >= len(s4) {
	    copy(s3[:cap(s3)], s4)
	}

	...
}
s

This only allocates for the resulting slice and mimics the behavior of actually calling append where it writes between len and cap iff there's enough room. Since we do the writing conditionally, the number of copies is also reduced (along with allocations). This could also have logic for handling single values, and make efficiently by checking types and ops when generating the additions and copies. In the make case, the copy is replaced with memclr.

I should mention that I think I understand most of what's going on in the slice generation except for this bit:

go/src/cmd/compile/internal/gc/walk.go

Lines 2891 to 2895 in 904f046

    
           // General case, with no function calls left as arguments. 
        
           // Leave for gen, except that instrumentation requires old form. 
        
           if !instrumenting || compiling_runtime { 
        
           	return n 
        
           }

martisch · 2019-09-12T04:03:22Z

Careful with newcap := len(s1) + len(s2) + len(s3) + ... this can overflow and wrap around making the resulting capacity to small. The adds need to be done one by one with overflow checked each time. If any problem is encountered on the way the up to then good appends need to be applied so there is no difference in semantics. There can be logic that the calculation can not overflow e.g. for an existing length (or lower)+constant insert append(slice[:index], append(elements, slice[index:]...)...)`.

I would still suggest to solve this for the simple case to understand the complexities involved better and then we can see if the generalization is worth even more complexity while still being used in production code.

quasilyte · 2021-12-06T16:04:21Z

To provide some statistics (source: gocorpus with all repositories checked):

append($b[:$i], append($_, $b[$i:]...)...) - 9 hits
append($b[:$i], append([]$_{$_}, $b[$i:]...)...) - 5 hits
$b = append($b, $_); copy($b[$i+1:], $b[$i:]); $b[$i] = $_ - 12 hits (1 match per ~866,636 SLOC)

1-2. output=append(output,0);copy(output[i+1:],output[i:]);output[i]=n (2 matches)
3-4. impDecl.Specs=append(impDecl.Specs,nil);copy(impDecl.Specs[insertAt+1:],impDecl.Specs[insertAt:]);impDecl.Specs[insertAt]=newImport (2 matches)
5. coords=append(coords,geom.Coord{});copy(coords[index+1:],coords[index:]);coords[index]=point.Coords()
6. l.data=append(l.data,nil);copy(l.data[i+1:],l.data[i:]);l.data[i]=s
7. t.inUseKeys=append(t.inUseKeys,writtenKeySpan{});copy(t.inUseKeys[start+1:],t.inUseKeys[start:]);t.inUseKeys[start]=writtenKeySpan{key:span,writer:w,txn:txn}
8. arr=append(arr,nil);copy(arr[idx+1:],arr[idx:]);arr[idx]=val
9. sb.endpoints=append(sb.endpoints,nil);copy(sb.endpoints[i+1:],sb.endpoints[i:]);sb.endpoints[i]=ep
10. s.Calls=append(s.Calls,Call{});copy(s.Calls[index+1:],s.Calls[index:]);s.Calls[index]=Call{Stack:gr.Stack}
11. archive.Files=append(archive.Files,txtar.File{});copy(archive.Files[gosumIdx+1:],archive.Files[gosumIdx:]);archive.Files[gosumIdx]=txtar.File{Name:"go.sum",Data:newGosumData}
12. es=append(es,muxEntry{});copy(es[i+1:],es[i:]);es[i]=e

For comparison, here are some results on the same corpus:

append($_, $_) - 37008 hits
append($_, $_, $_) - 723 hits

Which is understandable, 2-arguments append is far more common than any other form, but this gives us the answer by how much (~51 times).

(The search patterns are written in gogrep syntax, if you're curious.)

Just out of curiosity, does this slices.Insert perform the required operation in the comparable performance that we can achieve by teaching the compiler to make the suggested rewrite?

dotaheor · 2021-12-07T15:26:37Z

Just out of curiosity, does this slices.Insert perform the required operation in the comparable performance that we can achieve by teaching the compiler to make the suggested rewrite?

No, it doesn't. Generics doesn't help here.
The make call still zero elements in the generic code.

func Insert[S ~[]E, E any](s S, i int, v ...E) S {
	tot := len(s) + len(v)
	if tot <= cap(s) {
		s2 := s[:tot]
		copy(s2[i+len(v):], s[i:])
		copy(s2[i:], v)
		return s2
	}
	s2 := make(S, tot)
	copy(s2, s[:i])
	copy(s2[i:], v)
	copy(s2[i+len(v):], s[i:])
	return s2
}

That's why I proposed to add merge builtin function (#23905).

On the other hand, the generic code does provide a chance to simplify the potential compiler optimization.

quasilyte · 2021-12-07T15:32:17Z

Excessive zeroing is a known issue. I think it's a separate fruit on its own.
I'm working on it by adding a memclrelim pass.
Right now it doesn't handle make/copy/etc, but there are plans to extend it (see the TODO notes, but there are actually more ideas).
https://go-review.googlesource.com/c/go/+/367496

dotaheor · 2021-12-07T15:46:58Z

Right now it doesn't handle make/copy/etc

Mostly, except for the case handled by a Go 1.15 optimization:

var s = make([]T, n)
copy(s, x)
copy(s[len(x):], y)

In the above code, the elements within s[:len(x)] are not zeroed.

For the complexities in reality, the optimization has many limitations:

the make call must take exact two arguments,
x must be an identifier.
the elements within s[len(x):] are still zeroed even if the previous conditions are satisfied.

That means, currently, the above generic code could be a little faster:

func Insert[S ~[]E, E any](s S, i int, v ...E) S {
	tot := len(s) + len(v)
	if tot <= cap(s) {
		s2 := s[:tot]
		copy(s2[i+len(v):], s[i:])
		copy(s2[i:], v)
		return s2
	}
	x := s[:i]
	s2 := make(S, tot)
	copy(s2, x)
	copy(s2[i:], v)
	copy(s2[i+len(v):], s[i:])
	return s2
}

zigo101 · 2023-01-30T06:33:15Z

I decided to close this issue, in favor of slices.Insert: https://cs.opensource.google/go/x/exp/+/a960b378:slices/slices.go;l=142

The one line trick might cause data incomplete when the result slice length overflows int.

zigo101 · 2023-01-30T13:27:15Z

~~Sorry, after rethinking, I think data will not become incomplete even if the result slice length overflows int.~~ (Edit: It is still possible when out of memory in the second step).
Anyway, the one-line trick is not recommended to use. So keep closed.

dsnet added the Performance label Apr 21, 2019

julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 28, 2019

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

seankhliao added this to the Unplanned milestone Aug 20, 2022

zigo101 closed this as completed Jan 30, 2023

golang locked and limited conversation to collaborators Jan 31, 2024

gopherbot added the FrozenDueToAge label Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: detect and optimize slice insertion idiom append(sa, append(sb, sc...)...) #31592

cmd/compile: detect and optimize slice insertion idiom append(sa, append(sb, sc...)...) #31592

zigo101 commented Apr 21, 2019

zigo101 commented Apr 21, 2019 •

edited

Loading

julieqiu commented May 28, 2019

josharian commented May 28, 2019

martisch commented May 28, 2019

ear7h commented Sep 11, 2019

martisch commented Sep 12, 2019 •

edited

Loading

quasilyte commented Dec 6, 2021 •

edited

Loading

dotaheor commented Dec 7, 2021 •

edited

Loading

quasilyte commented Dec 7, 2021 •

edited

Loading

dotaheor commented Dec 7, 2021 •

edited

Loading

zigo101 commented Jan 30, 2023

zigo101 commented Jan 30, 2023 •

edited

Loading

cmd/compile: detect and optimize slice insertion idiom append(sa, append(sb, sc...)...) #31592

cmd/compile: detect and optimize slice insertion idiom append(sa, append(sb, sc...)...) #31592

Comments

zigo101 commented Apr 21, 2019

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What did you do?

What did you expect to see?

What did you see instead?

zigo101 commented Apr 21, 2019 • edited Loading

julieqiu commented May 28, 2019

josharian commented May 28, 2019

martisch commented May 28, 2019

ear7h commented Sep 11, 2019

martisch commented Sep 12, 2019 • edited Loading

quasilyte commented Dec 6, 2021 • edited Loading

dotaheor commented Dec 7, 2021 • edited Loading

quasilyte commented Dec 7, 2021 • edited Loading

dotaheor commented Dec 7, 2021 • edited Loading

zigo101 commented Jan 30, 2023

zigo101 commented Jan 30, 2023 • edited Loading

What version of Go are you using (`go version`)?

zigo101 commented Apr 21, 2019 •

edited

Loading

martisch commented Sep 12, 2019 •

edited

Loading

quasilyte commented Dec 6, 2021 •

edited

Loading

dotaheor commented Dec 7, 2021 •

edited

Loading

quasilyte commented Dec 7, 2021 •

edited

Loading

dotaheor commented Dec 7, 2021 •

edited

Loading

zigo101 commented Jan 30, 2023 •

edited

Loading