Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: inlining budget excludes cross-package calls #19261

Closed
bamiaux opened this issue Feb 23, 2017 · 12 comments

Comments

Projects
None yet
5 participants
@bamiaux
Copy link

commented Feb 23, 2017

Please answer these questions before submitting your issue. Thanks!

What did you do?

From https://github.com/bamiaux/iobit
I copied binary.BigEndian.Uint64 implementation, renamed it load64, and replaced the single call in Reader.get64 with the load64 function

What did you expect to see?

I expected no performance changes

What did you see instead?

Severe performance loss, every function using get64 is now not inlined while the code is strictly the same

benchmark                       old ns/op     new ns/op     delta
BenchmarkReads/byte-4           475           1009          +112.42%
BenchmarkReads/le16-4           260           582           +123.85%
BenchmarkReads/be16-4           254           504           +98.43%
BenchmarkReads/le32-4           173           329           +90.17%
BenchmarkReads/be32-4           109           235           +115.60%
BenchmarkReads/le64-4           128           265           +107.03%
BenchmarkReads/be64-4           103           251           +143.69%
BenchmarkReads/u8_7bits-4       533           1158          +117.26%
BenchmarkReads/i8_7bits-4       531           1207          +127.31%
BenchmarkReads/u16_15bits-4     253           586           +131.62%
BenchmarkReads/i16_15bits-4     274           586           +113.87%
BenchmarkReads/u32_31bits-4     128           311           +142.97%
BenchmarkReads/i32_31bits-4     126           316           +150.79%
BenchmarkReads/u64_63bits-4     157           264           +68.15%
BenchmarkReads/i64_63bits-4     157           270           +71.97%

Using go tool compile -m show that go now consider the get64 function not inlinable anymore
But the function from encoding.binary is inlinable

System details

go version go1.8 windows/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=".exe"
GOHOSTARCH="amd64"
GOHOSTOS="windows"
GOOS="windows"
GOPATH="c:\dev\gopath"
GORACE=""
GOROOT="C:\apps\go-1.8"
GOTOOLDIR="C:\apps\go-1.8\pkg\tool\windows_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-m64 -mthreads -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
GOROOT/bin/go version: go version go1.8 windows/amd64
GOROOT/bin/go tool compile -V: compile version go1.8 X:framepointer
@ALTree

This comment has been minimized.

Copy link
Member

commented Feb 23, 2017

Can you post a selfcontained reproducer? I see no load64 function in github.com/bamiaux/iobit

Also if you compile with -m -m instead of a single -m it should say why your function is not inlineable.

@bamiaux

This comment has been minimized.

Copy link
Author

commented Feb 23, 2017

Sure, I will create a branch with the modification applied

@ALTree ALTree changed the title cmd/inline: weird inlining heuristic cmd/compile: weird inlining heuristic Feb 23, 2017

@bamiaux

This comment has been minimized.

Copy link
Author

commented Feb 23, 2017

With the master branch using binary.BigEndian.Uint64, I get

reader.go:99: can inline (*Reader).get64 as: method(*Reader) func(uint) uint64 { skip := min(r.idx >> 5 << 2, r.max); val := binary.BigEndian.Uint64(r.src[skip:]); val <<= r.idx - skip << 3; r.idx += bits; return val }
reader.go:100: inlining call to min func(uint, uint) uint { if a > b { return b }; return a }
reader.go:101: inlining call to binary.binary.bigEndian.Uint64 method(binary.bigEndian) func([]byte) uint64 { _ = binary.b·2[int(7)]; return uint64(binary.b·2[int(7)]) | uint64(binary.b·2[int(6)]) << uint(8) | uint64(binary.b·2[int(5)]) << uint(16) | uint64(binary.b·2[int(4)]) << uint(24) | uint64(binary.b·2[int(3)]) << uint(32) | uint64(binary.b·2[int(2)]) << uint(40) | uint64(binary.b·2[int(1)]) << uint(48) | uint64(binary.b·2[int(0)]) << uint(56) }
reader.go:107: can inline (*Reader).read32 as: method(*Reader) func(uint) uint64 { return r.get64(bits) >> (64 - bits) }

By copying this function locally into load64, I get

reader.go:87: inlining call to min func(uint, uint) uint { if a > b { return b }; return a }
reader.go:95: can inline load64 as: func([]byte) uint64 { _ = b[7]; return uint64(b[7]) | uint64(b[6]) << 8 | uint64(b[5]) << 16 | uint64(b[4]) << 24 | uint64(b[3]) << 32 | uint64(b[2]) << 40 | uint64(b[1]) << 48 | uint64(b[0]) << 56 }
reader.go:102: inlining call to min func(uint, uint) uint { if a > b { return b }; return a }
reader.go:103: inlining call to load64 func([]byte) uint64 { _ = b[7]; return uint64(b[7]) | uint64(b[6]) << 8 | uint64(b[5]) << 16 | uint64(b[4]) << 24 | uint64(b[3]) << 32 | uint64(b[2]) << 40 | uint64(b[1]) << 48 | uint64(b[0]) << 56 }
reader.go:109: cannot inline (*Reader).read32: non-leaf method

Note that the tool does not report anymore if get64 is inlined or not, is it lost ?

@josharian

This comment has been minimized.

Copy link
Contributor

commented Feb 23, 2017

There was a missing diagnostic message for inlining with -m -m in Go 1.8. Tip should do a better job explaining why get64 was not inlined. At first glance (on my phone), this does look funky, though.

@bamiaux

This comment has been minimized.

Copy link
Author

commented Feb 23, 2017

On tip with binary.BigEndian.Uint64

reader.go:99: can inline (*Reader).get64 as: method(*Reader) func(uint) uint64 { skip := min(r.idx >> 5 << 2, r.max); val := binary.BigEndian.Uint64(r.src[skip:]); val <<= r.idx - skip << 3; r.idx += bits; return val }
reader.go:100: inlining call to min func(uint, uint) uint { if a > b { return b }; return a }
reader.go:101: inlining call to binary.binary.bigEndian.Uint64 method(binary.bigEndian) func([]byte) uint64 { _ = binary.b·2[int(7)]; return uint64(binary.b·2[int(7)]) | uint64(binary.b·2[int(6)]) << uint(8) | uint64(binary.b·2[int(5)]) << uint(16) | uint64(binary.b·2[int(4)]) << uint(24) | uint64(binary.b·2[int(3)]) << uint(32) | uint64(binary.b·2[int(2)]) << uint(40) | uint64(binary.b·2[int(1)]) << uint(48) | uint64(binary.b·2[int(0)]) << uint(56) }
reader.go:107: can inline (*Reader).read32 as: method(*Reader) func(uint) uint64 { return r.get64(bits) >> (64 - bits) }

On tip with load64

reader.go:101: cannot inline (*Reader).get64: function too complex
reader.go:102: inlining call to min func(uint, uint) uint { if a > b { return b }; return a }
reader.go:103: inlining call to load64 func([]byte) uint64 { _ = b[7]; return uint64(b[7]) | uint64(b[6]) << 8 | uint64(b[5]) << 16 | uint64(b[4]) << 24 | uint64(b[3]) << 32 | uint64(b[2]) << 40 | uint64(b[1]) << 48 | uint64(b[0]) << 56 }
reader.go:109: cannot inline (*Reader).read32: non-leaf method
@ALTree

This comment has been minimized.

Copy link
Member

commented Feb 23, 2017

Here's a smaller reproducer:

package p

import "encoding/binary"

type Reader struct {
	src []byte
	idx uint
}

// load64 is an exact copy of binary.BigEndian.Uint64, except for the fact that
// binary.BigEndian.Uint64 is a method:
//   func (bigEndian) Uint64(b []byte) uint64
// while load64 is not.
func load64(b []byte) uint64 {
	_ = b[7]
	return uint64(b[7]) | uint64(b[6])<<8 | uint64(b[5])<<16 | uint64(b[4])<<24 |
		uint64(b[3])<<32 | uint64(b[2])<<40 | uint64(b[1])<<48 | uint64(b[0])<<56
}

func (r *Reader) get64(bits uint) uint64 {
	var skip uint
	val := load64(r.src[:])
	val <<= r.idx - skip<<3
	return val
}

func (r *Reader) get64_2(bits uint) uint64 {
	var skip uint
	val := binary.BigEndian.Uint64(r.src[:])
	val <<= r.idx - skip<<3
	return val
}

gotip build -gcflags="-m -m" test.go reports that get64 cannot be inlined, while get64_2 can:

./prova.go:10: can inline load64 as: func([]byte) uint64 { _ = b[7]; return uint64(b[7]) | uint64(b[6]) << 8 | uint64(b[5]) << 16 | uint64(b[4]) << 24 | uint64(b[3]) << 32 | uint64(b[2]) << 40 | uint64(b[1]) << 48 | uint64(b[0]) << 56 }
./prova.go:16: cannot inline (*Reader).get64: function too complex
./prova.go:18: inlining call to load64 func([]byte) uint64 { _ = b[7]; return uint64(b[7]) | uint64(b[6]) << 8 | uint64(b[5]) << 16 | uint64(b[4]) << 24 | uint64(b[3]) << 32 | uint64(b[2]) << 40 | uint64(b[1]) << 48 | uint64(b[0]) << 56 }
./prova.go:23: can inline (*Reader).get64_2 as: method(*Reader) func(uint) uint64 { var skip uint; skip = <N>; val := binary.BigEndian.Uint64(r.src[:]); val <<= r.idx - skip << 3; return val }
./prova.go:25: inlining call to binary.binary.bigEndian.Uint64 method(binary.bigEndian) func([]byte) uint64 { _ = binary.b·2[int(7)]; return uint64(binary.b·2[int(7)]) | uint64(binary.b·2[int(6)]) << uint(8) | uint64(binary.b·2[int(5)]) << uint(16) | uint64(binary.b·2[int(4)]) << uint(24) | uint64(binary.b·2[int(3)]) << uint(32) | uint64(binary.b·2[int(2)]) << uint(40) | uint64(binary.b·2[int(1)]) << uint(48) | uint64(binary.b·2[int(0)]) << uint(56) }
./prova.go:10: load64 b does not escape
./prova.go:16: (*Reader).get64 r does not escape
./prova.go:23: (*Reader).get64_2 r does not escape
@josharian

This comment has been minimized.

Copy link
Contributor

commented Feb 23, 2017

Thanks! Will look. May take a day or three.

@ALTree

This comment has been minimized.

Copy link
Member

commented Feb 23, 2017

Looks like for some reason the version that uses binary.BigEndian has complexity < 80, while the other one has complexity 83. If we change const MaxBudget in inl.go to 83, both are inlined.

The only difference is that binary.BigEndian.Uint64 is a method, while load64 is a function. Maybe a bug in how we subtract inlining budget points when handling methods(?).

Leaving for josharian.

@ALTree ALTree removed the WaitingForInfo label Feb 23, 2017

@bamiaux

This comment has been minimized.

Copy link
Author

commented Feb 23, 2017

It's more complex, in the following code, it's also a method and it doesn't work either

package p

import "encoding/binary"

type Reader struct {
	src []byte
	idx uint
}

var BigEndian bigEndian

type bigEndian struct{}

// load64 is an exact local copy of binary.BigEndian.Uint64
func (bigEndian) load64(b []byte) uint64 {
	_ = b[7]
	return uint64(b[7]) | uint64(b[6])<<8 | uint64(b[5])<<16 | uint64(b[4])<<24 |
		uint64(b[3])<<32 | uint64(b[2])<<40 | uint64(b[1])<<48 | uint64(b[0])<<56
}

func (r *Reader) get64(bits uint) uint64 {
	var skip uint
	val := BigEndian.load64(r.src[:])
	val <<= r.idx - skip<<3
	return val
}

func (r *Reader) get64_2(bits uint) uint64 {
	var skip uint
	val := binary.BigEndian.Uint64(r.src[:])
	val <<= r.idx - skip<<3
	return val
}
.\test.go:18: can inline bigEndian.load64 as: method(bigEndian) func([]byte) uint64 { _ = b[7]; return uint64(b[7]) | uint64(b[6]) << 8 | uint64(b[5]) << 16 | uint64(b[4]) << 24 | uint64(b[3]) << 32 | uint64(b[2]) << 40 | uint64(b[1]) << 48 | uint64(b[0]) << 56 }
.\test.go:24: cannot inline (*Reader).get64: function too complex
.\test.go:26: inlining call to bigEndian.load64 method(bigEndian) func([]byte) uint64 { _ = b[7]; return uint64(b[7]) | uint64(b[6]) << 8 | uint64(b[5]) << 16 | uint64(b[4]) << 24 | uint64(b[3]) << 32 | uint64(b[2]) << 40 | uint64(b[1]) << 48 | uint64(b[0]) << 56 }
.\test.go:31: can inline (*Reader).get64_2 as: method(*Reader) func(uint) uint64 { var skip uint; skip = <N>; val := binary.BigEndian.Uint64(r.src[:]); val <<= r.idx - skip << 3; return val }
.\test.go:33: inlining call to binary.binary.bigEndian.Uint64 method(binary.bigEndian) func([]byte) uint64 { _ = binary.b·2[int(7)]; return uint64(binary.b·2[int(7)]) | uint64(binary.b·2[int(6)]) << uint(8) | uint64(binary.b·2[int(5)]) << uint(16) | uint64(binary.b·2[int(4)]) << uint(24) | uint64(binary.b·2[int(3)]) << uint(32) | uint64(binary.b·2[int(2)]) << uint(40) | uint64(binary.b·2[int(1)]) << uint(48) | uint64(binary.b·2[int(0)]) << uint(56) }
.\test.go:18: bigEndian.load64 b does not escape
.\test.go:24: (*Reader).get64 r does not escape
.\test.go:31: (*Reader).get64_2 r does not escape
<autogenerated>:1: inlining call to bigEndian.load64 method(bigEndian) func([]byte) uint64 { _ = b[7]; return uint64(b[7]) | uint64(b[6]) << 8 | uint64(b[5]) << 16 | uint64(b[4]) << 24 | uint64(b[3]) << 32 | uint64(b[2]) << 40 | uint64(b[1]) << 48 | uint64(b[0]) << 56 }
<autogenerated>:1: (*bigEndian).load64 .this does not escape
<autogenerated>:1: (*bigEndian).load64 b does not escape
@josharian

This comment has been minimized.

Copy link
Contributor

commented Feb 25, 2017

The cost being applied for the call of binary.bigEndian.Uint64 is currently 0, as opposed to the (correct) cost being applied of the call of load64, which is 59. Since the same thing does not happen for a method in the same package, I suspect that the problem is the importer/exporter.

The relevant bit of code from inl.go:

	case OCALLMETH:
		t := n.Left.Type
		if t == nil {
			Fatalf("no function type for [%p] %+v\n", n.Left, n.Left)
		}
		if t.Nname() == nil {
			Fatalf("no function definition for [%p] %+v\n", t, t)
		}
		if inlfn := t.Nname().Func; inlfn.Inl.Len() != 0 {
			*budget -= inlfn.InlCost
			break
		}
		if Debug['l'] < 4 {
			*reason = "non-leaf method"
			return true
		}

In this case, inlfn.InlCost is incorrectly 0.

And some lines above, we find this:

	// hack, TODO, check for better way to link method nodes back to the thing with the ->inl
	// this is so export can find the body of a method
	fn.Type.SetNname(n)

I suspect that the exporter fails to record the InlCost attached in this roundabout way, or that the importer fails to restore it.

I don't know whether the fix is to make the importer/exporter fully handle this hack, or to remove the hack.

Over to you, @mdempsky or @griesemer.

@mdempsky

This comment has been minimized.

Copy link
Member

commented Oct 10, 2017

I don't think this issue is specific to methods. It's that we fail to set InlCost for imported functions/methods at all.

@ALTree ALTree changed the title cmd/compile: weird inlining heuristic cmd/compile: inlining budget excludes cross-package calls Oct 10, 2017

@gopherbot

This comment has been minimized.

Copy link

commented Oct 11, 2017

Change https://golang.org/cl/70151 mentions this issue: cmd/compile: record InlCost in export data

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.