New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: BCE/Prove do not take arithmetic into account #25197

Open
navytux opened this Issue May 1, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@navytux
Contributor

navytux commented May 1, 2018

This is follow-up to #19714 where prove/BCE were taught to take transitivity into account, but originally-motivating example where indices are constructed as arithmetic expressions remains with bounds checks not eliminated:

package hex 

func Encode(dst, src []byte) {
        if len(dst) < len(src) * 2 { 
              panic("dst overflow")
        }

        for i, _ := range src {
                dst[i*2] = 0         // XXX BC not eliminated
                dst[i*2+1] = 0       // XXX BC not eliminated
        }   
}
$ gotip version
go version devel +030ac2c719 Tue May 1 05:02:43 2018 +0000 linux/amd64
$ gotip tool compile -S 3.go |go2asm
TEXT ·Encode(SB), $24-48 // 3.go:3
        // MOVQ    (TLS), CX (stack growth prologue)
        // CMPQ    SP, 16(CX)
        // JLS     158
        // SUBQ    $24, SP
        // MOVQ    BP, 16(SP) (BP save)
        // LEAQ    16(SP), BP (BP init)
        // FUNCDATA $0, gclocals·7578f313ff9d15b1ec5bd5c7e7ab3d8c(SB) (args)
        FUNCDATA   $1, gclocals·69c1753bd5f81501d95132d08af04464(SB) (locals)
        MOVQ       src+32(FP), AX       // 3.go:4
        MOVQ       AX, CX
        SHLQ       $1, AX
        MOVQ       dst+8(FP), DX
        CMPQ       DX, AX
        JLT        pc128
        MOVQ       dst+0(FP), AX
        XORL       BX, BX
        JMP        pc72                 // 3.go:8
pc63:
        MOVB       $0, 1(BX)(AX*1)      // 3.go:10
        LEAQ       1(SI), BX            // 3.go:8
pc72:
        CMPQ       BX, CX
        JGE        pc104
        MOVQ       BX, SI
        SHLQ       $1, BX               // 3.go:9
        CMPQ       BX, DX
        JCC        pc121                                // <-- NOTE
        MOVB       $0, (AX)(BX*1)
        LEAQ       1(SI)(SI*1), DI      // 3.go:10
        CMPQ       DI, DX
        JCS        pc63
        JMP        pc114                                // <-- NOTE
pc114:
        // PCDATA  $0, $1 (stack growth)
        // CALL    runtime.panicindex(SB)               // <-- NOTE
        // UNDEF
pc121:
        // PCDATA  $0, $1               // 3.go:9
        // CALL    runtime.panicindex(SB)               // <-- NOTE
        // UNDEF
pc128:
        // LEAQ    type.string(SB), AX  // 3.go:5
        // MOVQ    AX, (SP)
        // LEAQ    ·statictmp_0(SB), AX
        // MOVQ    AX, 8(SP)
        // PCDATA  $0, $1
        // CALL    runtime.gopanic(SB)
        // UNDEF
        // NOP
        // PCDATA  $0, $-1              // 3.go:3
        // CALL    runtime.morestack_noctxt(SB)
        // JMP     0

/cc @rasky, @josharian

@rasky

This comment has been minimized.

Member

rasky commented May 1, 2018

FWIW, this is going to be very hard and surely impossible to generalize within the context of the current prove pass.

@martisch

This comment has been minimized.

Member

martisch commented May 1, 2018

I do not think its safe to remove the bounds checks in the example given if e.g. len(src) is Int max. If i remember correctly wrap around of arithmetic operations is defined in go and the compiler can not assume they do not overflow. This could lead to dst being accessed with a negative index in the example.

@rasky

This comment has been minimized.

Member

rasky commented May 1, 2018

Yes, part of the complexity is making sure there are no overflows.

It's questionable that len(src)*2 can overflow for instance, it would mean that the slice has at least 2**63 elements, which is theoretically feasible only for one-byte elements (and I'm already assuming it's a mmap-ed virtual area, not something really returned by an allocation). I've had a short discussion with @dr2chase and @aclements about what could be assumed as a maximum length for a slice, but their suggestion was to avoid creating dependency on this in optimizers.

Anyway, an alternative would be to change the check in len(dst) / 2 < len(src) which has no overflows. So I'm not worried about this; I'm more worried about tracking values depending on induction vars, which is not something that prove is structured to do.

@martisch

This comment has been minimized.

Member

martisch commented May 1, 2018

On 32bit archs it only needs 1<<30+1 byte elements to overflow and the following program can be run:

GOARCH=386 go run main.go

package main

//go:noinline
func test(s []byte) {
	print("2*len is: ", len(s)*2, "\n") // 2*len is :-2147483646
}

func main() {
	s := make([]byte, 1<<30+1)
	test(s)
}

Only wanted to note that proving with arithmetic involved in go is more complex in any prove implementation than when it can be assumed that overflow is undefined behavior and that for the given example it is not safe (without further constraints).

@andybons andybons added this to the Unplanned milestone May 1, 2018

@navytux

This comment has been minimized.

Contributor

navytux commented May 2, 2018

@martisch, @rasky, indeed the pre-check as presented in my original example is not 100% correct because of potential len(src) * 2 overflow, ans as Giovanni says it has to be changed to the following for correctness:

package hex 

func Encode(dst, src []byte) {
        if len(dst) / 2 < len(src) { 
              panic("dst overflow")
        }

        for i, _ := range src {
                dst[i*2] = 0         // XXX BC not eliminated
                dst[i*2+1] = 0       // XXX BC not eliminated
        }   
}

my bad, thanks for spotting this.


About how to make deductions in such case, if I recall correctly, rational expressions has to be introduced, then we have rules:

  1. len(src) ∈ [0, maxint]
  2. len(dst) ∈ [0, maxint]
  3. len(dst) / 2 ≥ len(src)
  4. i ≥ 0
  5. ilen(src) - 1

=>

  • 2⋅i ∈ [0, 2⋅len(src) - 2]
  • 2⋅i + 1 ∈ [1, 2⋅len(src) - 1]

and the question is whether those expressions are within [0, len(dst) - 1] range.

Because we have connectivity between len(dst) and len(src) (rule 3), index expressions can be transformed to the following:

  • 2⋅i ∈ [0, 2⋅⌊len(dst) / 2⌋ - 2]
  • 2⋅i + 1 ∈ [1, 2⋅⌊len(dst) / 2⌋ - 1]

then we can omit the flooring in the upper limit because flooring makes expression only potentially a bit less, and by omiting it we can make the upper limit only a bit more which would still hold true, and then 2*/2 cancels:

  • 2⋅i ∈ [0, len(dst) - 2]
  • 2⋅i + 1 ∈ [1, len(dst) - 1]

which shows that both expressions are withing correct range for dst indexing.

@aclements

This comment has been minimized.

Member

aclements commented May 3, 2018

@navytux, that's true, but bounds-check elimination needs to strike a balance between coverage and performance. We could put a full SMT engine in there and it could figure out amazing things, but I assert [1] that it wouldn't get substantially better coverage of most code. It would also substantially slow down the compiler while mostly spending that build time on code that's not performance critical (simply because most code is not) [2].

[1] It would be interesting to actually verify this. You could imagine wiring together https://godoc.org/golang.org/x/tools/go/ssa with, say, https://godoc.org/github.com/aclements/go-z3 to get a sense of what's possible.

[2] A possible far-off but intriguing idea is to use profile-guided optimization not just to direct the compiler's optimization decisions, but to direct where the compiler focuses more expensive optimizations.

@navytux

This comment has been minimized.

Contributor

navytux commented May 3, 2018

@aclements thanks for feedback and for Z3 links. I agree there should be balance between compilation time and optimizations applied in order not to loose the fast edit-try property of Go. However currently whenever there is a need to process data slices with indices non-trivially, bounds checking kills the performance. An example of this is Fannkuch11 from Go1 benchmark set, where it is known that by making BCE more clever whole execution time could be reduced by ~ 30% (please see #24660 for details).

The idea to apply optimizations based on runtime feedback on where it is mostly needed is interesting.
Two points arise here:

  1. how far we are practically from doing this? and

  2. there are cases that don't have code locality, but still optimizations for them make overal sense - for example if we are talking about changing function prologue, there is no code locality in particular program that clearly shows that the time is spent in such and such place, but there is code locality in the compiler which emits generated prolugues and optimizing that would benefit the whole program. However runtime profile-guided feedback won't, at least directly, show that.

    I tend to think that bounds checking could be of similar nature, at least for some class of programs.

So maybe not a full SMT but something more simple would be still good to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment