Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: incomplete dead code elimination for conditional #33713

Open
marigonzes opened this issue Aug 19, 2019 · 13 comments

Comments

@marigonzes
Copy link

commented Aug 19, 2019

What version of Go are you using (go version)?

$ go version
go version devel +0dd120d Sun Aug 18 01:16:33 2019 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What did you do?

I compiled the following function and inspected the generated assembly (https://godbolt.org/z/q8MXUH):

func f(x string) int {
    var y int
    if x == "1" && len(x) == 2 {
        y = 1
    }

    return y
}

What did you expect to see?

I expected the generated code to be a simple return, as in this case: https://godbolt.org/z/o3uNbH.

        movq    $0, "".~r1+24(SP)
        ret

What did you see instead?

Instead, the generated code contains a redundant ´jump´ instruction:

        movq    "".x+16(SP), AX
        cmpq    AX, $1
        jne     f_pc11
f_pc11:
        pcdata  $1, $2
        xorps   X0, X0
        movups  X0, "".~r1+24(SP)
        ret

The same happens for this function https://godbolt.org/z/tDm4z2.

@martisch

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

Is this a manually constructed example or does this come up often in Go programs? Was it in hand written or generated code?

It will be very slow up to impossible to find all boolean tautologies. So there likely has to be some tradeoff how much time is spend in the compiler on finding these vs just leaving them as is. Which brings the question how often this comes up in what forms in non crafted examples and where.

Relevant discussion: #32713

@marigonzes

This comment has been minimized.

Copy link
Author

commented Aug 19, 2019

I probably didn't express my intent in a clear manner. I'm not saying these kinds of constructs should be treated as tautologies, but I happened to notice that this was happening in this case https://godbolt.org/z/o3uNbH.
This lead me to experiment with the compiler, ending up finding https://godbolt.org/z/q8MXUH and https://godbolt.org/z/tDm4z2, where the problem is not about boolean tautologies, but related to many redundant instructions being generated, given that the compiler knows the return value is always 0.

@martisch

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

As the patterns that could be optimised can vary widely and we should make sure we detect the ones (without to much overhead) that help with real world code (which ideally we use simplified as test cases) Do you know examples of similar patterns not optimised in existing codebases?

If I understand the code and reply correctly the deeper issue here is that the compiler misses some dead code elimination of code that has no effect on the return value and no side effects?

@marigonzes

This comment has been minimized.

Copy link
Author

commented Aug 19, 2019

I'm not really a Go developer, I just like the language and enjoy experimenting with the compiler. That said, I don't have any concrete example of this issue (apart from these ones), but I believe this is a shortcoming of dead code elimination, given that for https://godbolt.org/z/q8MXUH and https://godbolt.org/z/tDm4z2 the compiler produces some unnecessary instructions that end up having no effect in the outcome of the function (the return value is always 0).

@martisch

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

@martisch martisch changed the title cmd/compile: redundant jump for tautological string conditional cmd/compile: incomplete dead code elimination for conditional Aug 19, 2019

@marigonzes

This comment has been minimized.

Copy link
Author

commented Aug 19, 2019

Yeah, that is what I meant all along. Thanks for your help in renaming the issue accordingly, @martisch 😉

@randall77

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2019

I don't think we want to do anything explicit for cases like this. Unless someone can demonstrate that this occurs in real code.
But if this gets fixed as a side effect of some other improvement to optimizations (the prove pass, probably), then that would be fine.

@laboger

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2019

I believe this is a similar example. It comes from bytes.(*Buffer).WriteByte.

TEXT bytes.(*Buffer).WriteByte(SB) /home/boger/golang/fresh/go/src/bytes/buffer.go
func (b *Buffer) WriteByte(c byte) error {
  0xc3130               e87e0010                MOVD 16(R30),R3                         
  0xc3134               7c230840                CMPU R3,R1                              
  0xc3138               41800010                BLT 0xc3148                             
  0xc313c               7ca802a6                MOVD LR,R5                              
  0xc3140               4bfa5e41                CALL runtime.morestack_noctxt(SB)       
  0xc3144               4bffffec                BR bytes.(*Buffer).WriteByte(SB)        
  0xc3148               7fe802a6                MOVD LR,R31                             
  0xc314c               fbe1ffc9                MOVDU R31,-56(R1)                       
        b.lastRead = opInvalid
  0xc3150               e8c10058                MOVD 88(R1),R6          
  0xc3154               98060020                MOVB R0,32(R6)          
        if l := len(b.buf); n <= cap(b.buf)-l {
  0xc3158               e8e60008                MOVD 8(R6),R7           
  0xc315c               e8a60010                MOVD 16(R6),R5          
  0xc3160               7d072850                SUB R7,R5,R8            
  0xc3164               2c280001                CMP R8,$1               
  0xc3168               4180006c                BLT 0xc31d4             

This should be BLT 0xc31b8, since 0xc31d4 just sets a constant and then
branches to a constant compare of that value. The other constant value
set in this block is not used on the path it actually takes.

                b.buf = b.buf[:l+n]
  0xc316c               38870001                ADD R7,$1,R4            
  0xc3170               7c242840                CMPU R4,R5              
  0xc3174               41810074                BGT 0xc31e8             
  0xc3178               f8860008                MOVD R4,8(R6)           

Since a predecessor of 0xc3180 can be eliminated, the CMPW only has 1 predecessor which
is setting a constant. As a result the next 3 instructions could be eliminated.

  0xc317c               38600001                MOVD $1,R3              
        m, ok := b.tryGrowByReslice(1)
  0xc3180               7c030000                CMPW R3,R0              
        if !ok {
  0xc3184               41820034                BEQ 0xc31b8             
        b.buf[m] = c

  0xc3188               e8860008                MOVD 8(R6),R4           
  0xc318c               e8a60000                MOVD 0(R6),R5           
  0xc3190               7c272040                CMPU R7,R4              
  0xc3194               4080004c                BGE 0xc31e0             
  0xc3198               88610060                MOVBZ 96(R1),R3         
  0xc319c               7c6729ae                MOVB R3,(R5)(R7)        
        return nil
  0xc31a0               f8010068                MOVD R0,104(R1)         
  0xc31a4               f8010070                MOVD R0,112(R1)         
  0xc31a8               ebe10000                MOVD 0(R1),R31          
  0xc31ac               7fe803a6                MOVD R31,LR             
  0xc31b0               38210038                ADD R1,$56,R1           
  0xc31b4               4e800020                RET                     
                m = b.grow(1)
  0xc31b8               f8c10020                MOVD R6,32(R1)                  
  0xc31bc               38600001                MOVD $1,R3                      
  0xc31c0               f8610028                MOVD R3,40(R1)                  
  0xc31c4               4bfff65d                CALL bytes.(*Buffer).grow(SB)   
  0xc31c8               e8e10030                MOVD 48(R1),R7                  
         b.buf[m] = c
  0xc31cc               e8c10058                MOVD 88(R1),R6          
                m = b.grow(1)
  0xc31d0               4bffffb8                BR 0xc3188              

 This block could go away

  0xc31d4               7c070378                OR R0,R0,R7             
  0xc31d8               7c030378                OR R0,R0,R3             
        m, ok := b.tryGrowByReslice(1)
  0xc31dc               4bffffa4                BR 0xc3180              

        b.buf[m] = c
  0xc31e0               7ce33b78                OR R7,R7,R3                     
  0xc31e4               4bfa828d                CALL runtime.panicIndex(SB)     
                b.buf = b.buf[:l+n]
  0xc31e8               4bfa82c9                CALL runtime.panicSliceAcap(SB) 
  0xc31ec               7c000378                OR R0,R0,R0                     

I don't know enough about passes to know where this type of optimization might be done. This is from an objdump on ppc64le, but the x86 objdump looks similar with respect the unnecessary branches and compares I mentioned.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2019

Lynn's example feels like the shortcircuit pass should handle it. It optimizes things of the form:

b1:
  x = true
  goto b2

b2:
   v = phi(x, ...other values from other predecessors...)
   if v: goto b3 else b4

shortcircuit will make b1 go directly to b3. It might be worth investigating why shortcircuit doesn't apply, and what might need to be upgraded to make it apply.

Maybe the merged thing is not a boolean, but an int constant?

b1:
   x = 0
   goto b2
b2:
   v = phi(x, ...)
   w = cmp(v, 0)
   if w goto b3 else b4

We could also make b1 go directly to b3 in this situation.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2019

I have a shortcircuit CL outstanding. Perhaps it helps.

@laboger

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

Is this the one you mean @josharian https://go-review.googlesource.com/c/go/+/178197?

I can give it a try. This happens in some std pkgs, I've seen it in runtime and strconv so far.

@laboger

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

The problem still occurs with this CL.
Another function with a similar problem is in strconv.(*decimal).Round. It also has a block with a single predecessor and successor that sets the constant that is being compare against a constant. But what caught my eye in this function was this sequence:

v31
00045 (344) CMPW	R4, $0
v85
00046 (344) MOVD	$1, R31
v85
00047 (344) ISEL	$2, R0, R31, R4
v62
00048 (+358) CMPW	R4, $0
b12
00049 (358) BNE	30

By itself, there should be no need for the ISEL and the CMP following it. However, the 2nd CMP has a phi operand, and the other phi operands are constant settings that should be short circuited.

This happens on ppc64le as well as AMD64 AFAICT.

I can post more detail, such as an objdump as above or more ssa output if helpful.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2019

The problem still occurs with this CL.

Thanks for checking. I'm mostly AFK this cycle, but I will try to take a look at this once my CL above lands, since I've been wading through shortcircuit recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.