New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: some problems on optimization when takes care about of arrays in struct #20022

Open
zhaozhiqianghw opened this Issue Apr 18, 2017 · 7 comments

Comments

Projects
None yet
6 participants
@zhaozhiqianghw

zhaozhiqianghw commented Apr 18, 2017

As @ianlancetaylor and @ALTree mentioned, I changed the version from 1.7.4 to 1.8.1 to show the bug clearly.

What version of Go are you using (go version)?

go version go1.8.1 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/home/zhaozq/src/go"
GOTOOLDIR="/home/zhaozq/src/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build701526462=/tmp/go-build"
CXX="g++"
CGO_ENABLED="1"

What did you do?

I test several examples to test the performance of iterating an array. All of the version are compile by go build

version I

I put the array into a struct

package main

const NUM = 100

type T struct {
     arr [NUM]int
}

func sub(t *T) {
     for i := 0; i < NUM; i++ {
         t.arr[i] = 3
     }
}

func main() {
     var t T
     sub(&t)
}

The assemble of function sub is

000000000044d5b0 <main.sub>:
  44d5b0:       48 8b 44 24 08          mov    0x8(%rsp),%rax
  44d5b5:       31 c9                   xor    %ecx,%ecx
  44d5b7:       48 83 f9 64             cmp    $0x64,%rcx
  44d5bb:       7d 13                   jge    44d5d0 <main.sub+0x20>
  44d5bd:       84 00                   test   %al,(%rax)
  44d5bf:       48 c7 04 c8 03 00 00    movq   $0x3,(%rax,%rcx,8)
  44d5c6:       00
  44d5c7:       48 ff c1                inc    %rcx
  44d5ca:       48 83 f9 64             cmp    $0x64,%rcx
  44d5ce:       7c ed                   jl     44d5bd <main.sub+0xd>
  44d5d0:       c3                      retq

And the loop is

  44d5bd:       84 00                   test   %al,(%rax)
  44d5bf:       48 c7 04 c8 03 00 00    movq   $0x3,(%rax,%rcx,8)
  44d5c6:       00
  44d5c7:       48 ff c1                inc    %rcx
  44d5ca:       48 83 f9 64             cmp    $0x64,%rcx
  44d5ce:       7c ed                   jl     44d5bd <main.sub+0xd>

The line 44d5bdis quite strange. What's the usage of test here?? I think it's useless, and it will waste a cycle. We care about it, because we will use this kind of function a lot.

version II

We use a way to make the compiler do not what we are passing.

package main

import "unsafe"

const NUM = 100

type T struct {
     arr [NUM]int
}

func sub(p unsafe.Pointer) {
     for i := 0; i < NUM; i++ {
         *(*int)(p) = 3
         p = unsafe.Pointer(uintptr(p) + 8)
     }
}

func main() {
     var t T
     sub(unsafe.Pointer(&t))
}

The assembler of function sub

000000000044d5b0 <main.sub>:
  44d5b0:       48 8b 44 24 08          mov    0x8(%rsp),%rax
  44d5b5:       31 c9                   xor    %ecx,%ecx
  44d5b7:       48 83 f9 64             cmp    $0x64,%rcx
  44d5bb:       7d 14                   jge    44d5d1 <main.sub+0x21>
  44d5bd:       48 c7 00 03 00 00 00    movq   $0x3,(%rax)
  44d5c4:       48 ff c1                inc    %rcx
  44d5c7:       48 83 c0 08             add    $0x8,%rax
  44d5cb:       48 83 f9 64             cmp    $0x64,%rcx
  44d5cf:       7c ec                   jl     44d5bd <main.sub+0xd>
  44d5d1:       c3                      retq

The loop is

  44d5bd:       48 c7 00 03 00 00 00    movq   $0x3,(%rax)
  44d5c4:       48 ff c1                inc    %rcx
  44d5c7:       48 83 c0 08             add    $0x8,%rax
  44d5cb:       48 83 f9 64             cmp    $0x64,%rcx
  44d5cf:       7c ec                   jl     44d5bd <main.sub+0xd>

The number of operations in version II is the same as version I, but memory read times have been reduced, so this kind of version is faster.

version III

To reduce the numbers of operations in version II, I changed again.

package main

import "unsafe"

const NUM = 100

type T struct {
     arr [NUM]int
}

func sub(p unsafe.Pointer) {
     maxp := uintptr(p) + 8 * NUM
     for ; uintptr(p) < maxp; {
         *(*int)(p) = 3
         p = unsafe.Pointer(uintptr(p) + 8)
     }
}

func main() {
     var t T
     sub(unsafe.Pointer(&t))
}

The assemble of function sub

000000000044d5b0 <main.sub>:
  44d5b0:       48 8b 44 24 08          mov    0x8(%rsp),%rax
  44d5b5:       48 89 c1                mov    %rax,%rcx
  44d5b8:       48 81 c1 20 03 00 00    add    $0x320,%rcx
  44d5bf:       48 89 c2                mov    %rax,%rdx
  44d5c2:       48 39 ca                cmp    %rcx,%rdx
  44d5c5:       73 13                   jae    44d5da <main.sub+0x2a>
  44d5c7:       48 c7 00 03 00 00 00    movq   $0x3,(%rax)
  44d5ce:       48 83 c0 08             add    $0x8,%rax
  44d5d2:       48 89 c2                mov    %rax,%rdx
  44d5d5:       48 39 ca                cmp    %rcx,%rdx
  44d5d8:       72 ed                   jb     44d5c7 <main.sub+0x17>
  44d5da:       c3                      retq

The loop is

  44d5c7:       48 c7 00 03 00 00 00    movq   $0x3,(%rax)
  44d5ce:       48 83 c0 08             add    $0x8,%rax
  44d5d2:       48 89 c2                mov    %rax,%rdx
  44d5d5:       48 39 ca                cmp    %rcx,%rdx
  44d5d8:       72 ed                   jb     44d5c7 <main.sub+0x17>

The number of operations are not reduced. The reason is 44d5d2 and 44d5d2.

  44d5d2:       48 89 c2                mov    %rax,%rdx
  44d5d5:       48 39 ca                cmp    %rcx,%rdx```

Why not `cmp %rcx %rax`? Why introduce `%rdx`....?

### What did you expect to see?

Please focus version I and version III.
@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Apr 18, 2017

We don't use the issue tracker for discussions. If you just want to talk about this, please raise it on the mailing list golang-dev. If there is a specific bug here that you think should be addressed, please be precise: give one test case, and show why the output is sub-optimal. Right now I find this issue too complex--I don't know what is broken and what should be fixed. I also don't care about the performance of code compiled with -B. Thanks.

@ianlancetaylor ianlancetaylor changed the title from [performance] compile: some puzzles on optimization when takes care about of arrays in struct to cmd/compile: some puzzles on optimization when takes care about of arrays in struct Apr 18, 2017

@ALTree

This comment has been minimized.

Member

ALTree commented Apr 18, 2017

Also you should disassemble go1.8 code at least (even better: tip, the current master). Reports that point out deficiencies in the go1.7 compiler are not very useful because whoever process your report will have to re-check everything with tip just to understand if the issues you reported are still valid or not.

@zhaozhiqianghw

This comment has been minimized.

zhaozhiqianghw commented Apr 18, 2017

@ianlancetaylor Actually, I use -B to make the assemble easy to read. And the bug(I think it's a bug) is also there when remove -B. And I hope this can be fixed(by removing useless operations).

@zhaozhiqianghw

This comment has been minimized.

zhaozhiqianghw commented Apr 18, 2017

@ALTree Go 1.8 is the same as Go 1.7,4. I should put Go 1.8 instead of Go 1.7. But I really tried Go 1.8. This situation is always there.

@zhaozhiqianghw

This comment has been minimized.

zhaozhiqianghw commented Apr 18, 2017

@ianlancetaylor @ALTree Changed to 1.8.1, and show the problem more clearly.

@zhaozhiqianghw zhaozhiqianghw changed the title from cmd/compile: some puzzles on optimization when takes care about of arrays in struct to cmd/compile: some problems on optimization when takes care about of arrays in struct Apr 18, 2017

@randall77

This comment has been minimized.

Contributor

randall77 commented Apr 18, 2017

Both of these cases (II and IV) are still happening at tip.

In II, that extra instruction is a nil check. The compiler does not yet know that it can lift that nil check out of the loop. You could do something like

	_ = t.arr[0]

before the loop to lift that nil check out yourself.

In IV, the extra move is an unfortunate side-effect of how we track unsafe.Pointer <-> uintptr conversions. We are careful in the compiler to keep those two types separate to keep the pointer maps for the garbage collector accurate. This means we end up with an extra MOVQconvert that isn't necessary.
A rule like (CMPQ (MOVQconvert x) y) -> (CMPQ x y) would probably help.

@josharian

This comment has been minimized.

Contributor

josharian commented Aug 26, 2017

For IV (the extra move), see #21572.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment