New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: memmove sometimes faster than memclrNoHeapPointers #23306

Open
alandonovan opened this Issue Jan 2, 2018 · 4 comments

Comments

Projects
None yet
7 participants
@alandonovan
Contributor

alandonovan commented Jan 2, 2018

Memory allocation using make([]int, K) is surprisingly slow compared to append(nil, ...), even though append does strictly more work, such as copying.

$ cat a_test.go
package main

import "testing"

const K = 1e6
var escape []int

func BenchmarkMake(b *testing.B) {
	for i := 0; i < b.N; i++ {
		escape = make([]int, K)
	}
}

var empty [K]int

func BenchmarkAppend(b *testing.B) {
	for i := 0; i < b.N; i++ {
		escape = append([]int(nil), empty[:]...)
	}
}

$ go version
go version devel +6317adeed7 Tue Jan 2 13:39:20 2018 +0000 linux/amd64

$ go test -bench=. a_test.go
BenchmarkAppend-12    	    1000	   1208800 ns/op
BenchmarkMake-12      	    1000	   1473106 ns/op

While reporting this issue, I initially used an older runtime from December 18 in which the effect was much stronger: 10x-20x slowdown. But that seems to have been fixed.

Curiously, this issue is the exact opposite of the problem reported in #14718 (now closed).

@bcmills

This comment has been minimized.

Member

bcmills commented Jan 2, 2018

append does strictly more work, such as copying.

append has to copy, but make has to zero, and either of those operations may be hardware-accelerated. It's not obvious that either is strictly more work than the other.

Are you sure that the escape analysis is working as you expect? Since the escape variable is package-local the compiler could reasonably see through it (and hoist the allocations out of either or both of those loops).

@mdempsky

This comment has been minimized.

Member

mdempsky commented Jan 2, 2018

Here's a benchmark of the underlying memory copying/clearing primitives (you'll need to put this in its own package directory, along with an empty .s file to workaround #23311):

package main

import (
    "testing"
    "unsafe"
)

//go:linkname memclrNoHeapPointers runtime.memclrNoHeapPointers
func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)

//go:linkname memmove runtime.memmove
func memmove(to, from unsafe.Pointer, n uintptr)

const K = 6e5

var a1, a2 [K]int

func BenchmarkMemclr(b *testing.B) {
    for i := 0; i < b.N; i++ {
            memclrNoHeapPointers(unsafe.Pointer(&a1), unsafe.Sizeof(a1))
    }
}

func BenchmarkMemmove(b *testing.B) {
    for i := 0; i < b.N; i++ {
            memmove(unsafe.Pointer(&a1), unsafe.Pointer(&a2), unsafe.Sizeof(a1))
    }
}

On my laptop, the relative performance seems very sensitive to the exact value of K. For example, at K=6e5, I get:

BenchmarkMemclr-4           5000            322261 ns/op
BenchmarkMemmove-4          5000            305383 ns/op

But at K=1e7, I get:

BenchmarkMemclr-4            300           4485500 ns/op
BenchmarkMemmove-4           300           5060492 ns/op

@mdempsky mdempsky changed the title from runtime: allocation using make is 40% slower than append(nil, ...) to runtime: memmove sometimes faster than memclrNoHeapPointers Jan 2, 2018

@josharian

This comment has been minimized.

Contributor

josharian commented Jan 11, 2018

@TocarIP

This comment has been minimized.

Contributor

TocarIP commented Feb 28, 2018

For original benchmark memmove and memclr use different strategies. Memmove switches to non-temporal movs, while memclr uses regular movs. Changing non-temporal mov threshould in memmove to match memclr makes append faster:

Make-6    1.58ms ± 1%  1.58ms ± 1%     ~     (p=0.912 n=10+10)
Append-6  1.36ms ± 1%  1.89ms ± 1%  +39.07%  (p=0.000 n=10+10)

However, for memmove tests from runtime switching to regular movs makes benchmark slower for larger sizes:

Memmove/65536-6                 14.9GB/s ± 0%  14.9GB/s ± 0%   +0.16%  (p=0.028 n=9+10)
Memmove/1048576-6               8.67GB/s ± 1%  8.26GB/s ± 2%   -4.80%  (p=0.000 n=10+10)
Memmove/4194304-6               8.51GB/s ± 2%  8.20GB/s ± 3%   -3.74%  (p=0.000 n=10+10)
Memmove/8388608-6               8.55GB/s ± 2%  6.31GB/s ± 4%  -26.28%  (p=0.000 n=10+10)
Memmove/16777216-6              7.92GB/s ± 1%  4.33GB/s ± 2%  -45.30%  (p=0.000 n=10+10)
Memmove/67108864-6              6.56GB/s ± 2%  6.59GB/s ± 1%     ~     (p=0.315 n=10+9)

MemmoveUnalignedDst/65536-6     14.5GB/s ± 1%  14.5GB/s ± 0%     ~     (p=1.000 n=10+7)
MemmoveUnalignedDst/1048576-6   8.70GB/s ± 2%  8.14GB/s ± 1%   -6.48%  (p=0.000 n=10+9)
MemmoveUnalignedDst/4194304-6   8.64GB/s ± 2%  8.13GB/s ± 2%   -5.92%  (p=0.000 n=10+10)
MemmoveUnalignedDst/8388608-6   8.55GB/s ± 3%  6.24GB/s ± 3%  -27.00%  (p=0.000 n=10+10)
MemmoveUnalignedDst/16777216-6  7.93GB/s ± 3%  4.36GB/s ± 1%  -45.08%  (p=0.000 n=10+9)
MemmoveUnalignedDst/67108864-6  6.66GB/s ± 1%  6.76GB/s ± 2%   +1.49%  (p=0.000 n=9+10)

MemmoveUnalignedSrc/65536-6     14.5GB/s ± 1%  14.5GB/s ± 1%     ~     (p=0.796 n=10+10)
MemmoveUnalignedSrc/1048576-6   8.57GB/s ± 1%  8.20GB/s ± 2%   -4.29%  (p=0.000 n=9+10)
MemmoveUnalignedSrc/4194304-6   8.54GB/s ± 2%  8.19GB/s ± 2%   -4.18%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/8388608-6   8.53GB/s ± 2%  6.25GB/s ± 4%  -26.66%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/16777216-6  8.02GB/s ± 2%  4.36GB/s ± 2%  -45.67%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/67108864-6  6.73GB/s ± 2%  6.82GB/s ± 2%   +1.32%  (p=0.035 n=10+10)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment