New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: assigning large values does not use memmove #10362

Open
davecheney opened this Issue Apr 7, 2015 · 5 comments

Comments

Projects
None yet
5 participants
@davecheney
Contributor

davecheney commented Apr 7, 2015

Consider this piece of code

package main

import "fmt"

func main() {
        f()
}

func f() {
        var a [200]int
        var b [200]int

        a = b
        b = a

        fmt.Println(&a, &b)
}

The assignments of a = b or b = a where the size of a or b is above the DUFFCOPY limit of 128 words produces some very simplistic code

        a = b
   10c60:       e1a01004        mov     r1, r4
   10c64:       e1a00005        mov     r0, r5
   10c68:       e2843e32        add     r3, r4, #800    ; 0x320
   10c6c:       e4912004        ldr     r2, [r1], #4
   10c70:       e4802004        str     r2, [r0], #4
   10c74:       e1530001        cmp     r3, r1
   10c78:       1afffffb        bne     10c6c <main.f+0x4c>
        b = a
   10c7c:       e1a01005        mov     r1, r5
   10c80:       e1a00004        mov     r0, r4
   10c84:       e2853e32        add     r3, r5, #800    ; 0x320
   10c88:       e4912004        ldr     r2, [r1], #4
   10c8c:       e4802004        str     r2, [r0], #4
   10c90:       e1530001        cmp     r3, r1
   10c94:       1afffffb        bne     10c88 <main.f+0x68>

Should sgen/stackcopy take the opportunity to setup a call to runtime.memmove for values larger than 128 words ?

@davecheney

This comment has been minimized.

Contributor

davecheney commented Apr 7, 2015

This benchmark, shows the cliff when values pass the upper limit of DUFFCOPY

http://paste.ubuntu.com/10762232/

root@labs-782e8a:~/src/duffbench# go test -bench=.                                                                                                                        
testing: warning: no tests to run
PASS
BenchmarkCopy1          300000000                3.89 ns/op
BenchmarkCopy4          50000000                40.0 ns/op
BenchmarkCopy16         20000000                77.5 ns/op
BenchmarkCopy32         20000000               139 ns/op
BenchmarkCopy64          5000000               255 ns/op
BenchmarkCopy128         3000000               538 ns/op
BenchmarkCopy129         3000000               745 ns/op   <<<<
BenchmarkCopy256         2000000              1088 ns/op
@josharian

This comment has been minimized.

Contributor

josharian commented Apr 7, 2015

6g and 8g use REP with MOVSL/MOVSQ, which I believe @randall77 determined to be faster around that threshold. I would believe that the other architectures could benefit from a call to memmove or something similar. (This is a place where NEON should shine.)

@rsc

This comment has been minimized.

Contributor

rsc commented Apr 10, 2015

[Please don't use { } syntax in bug headings. It doesn't sort well.]

This may apply to some subset of the non-x86 systems.
The x86 systems are doing the right thing.

@rsc rsc changed the title from cmd/{5,6,7,8,9g}: assigning large values does not use memmove to cmd/gc: assigning large values does not use memmove Apr 10, 2015

@minux

This comment has been minimized.

Member

minux commented Apr 10, 2015

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

@randall77

This comment has been minimized.

Contributor

randall77 commented Apr 15, 2015

538->745 is hardly a "cliff". I'm surprised it is so close given the lack of anyone tuning this mechanism on arm. (Or did I miss someone doing that?)

minux is right, the moves generated here are sometimes used to marshal arguments to a function, so we can't call a function to do the marshaling. For other situations like your a=b example you could call memmove. It might take some work to distinguish those two cases, however. At the move generation point the marshaling has already been turned into a=b assignments.

@rsc rsc changed the title from cmd/gc: assigning large values does not use memmove to cmd/compile: assigning large values does not use memmove Jun 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment