Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: binary.PutUint64 performance degradation in 1.10 #24078

Closed
pascaldekloe opened this issue Feb 23, 2018 · 14 comments

Comments

Projects
None yet
6 participants
@pascaldekloe
Copy link
Contributor

commented Feb 23, 2018

Two little highly-optimised functions suffer from serious performance when compared to Go version 1.9.

name          old time/op    new time/op    delta
PutUint64-12    4.09ns ± 0%    5.79ns ± 0%  +41.62%  (p=0.000 n=20+19)
Uint64-12       4.24ns ± 0%    6.54ns ± 0%  +54.35%  (p=0.000 n=20+19)

The assembler output from go build -gcflags=-S is exactly the same so I don't know where to look for. If anybody can give me some pointers I'd be happy to build a clean test case.

Here follows a snippet from https://github.com/pascaldekloe/flit for context.

// PutUint64 encodes an integer into buf and returns the serial size.
// If the buffer is smaller than 9 bytes, PutUint64 may panic.
func PutUint64(buf []byte, v uint64) (n int) {
	if v < 128 {
		buf[0] = uint8(v)<<1 | 1
		return 1
	}
	if v >= uint64(1) << 56 {
		buf[0] = 0
		binary.LittleEndian.PutUint64(buf[1:], v)
		return 9
	}

	lz := bits.LeadingZeros64(v)
	// extra bytes = (bits - 1) / 7 = (63 - lz) / 7
	e := ((63 - lz) * 2454267027) >> 34

	v <<= 1
	v |= 1
	v <<= uint(e) & 63
	binary.LittleEndian.PutUint64(buf, v)

	return e + 1
}
@bcmills

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

The assembler output from go build -gcflags=-S is exactly the same so I don't know where to look for.

Last time I saw a performance issue like that, it turned out that the performance of the CPU was sensitive to the alignment of the function in memory.

That said, this function seems small enough that it is likely to be inlined. Do you have a larger benchmark that shows the same regressions, or do they only occur in microbenchmarks? (That is, what is the impact of the regression on the overall program, rather than just the benchmark?)

@mvdan

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

git diff go1.9..go1.10 -- src/encoding/binary shows no changes to the implementation, so this is likely a compiler issue if there is an issue at all.

Perhaps the function used to be inlined, but isn't being inlined anymore?

@ALTree

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

The assembler output from go build -gcflags=-S is exactly the same

This is not completely true. The 1.10 function is slightly larger.

1.9:

"".PutUint64 STEXT nosplit size=269 args=0x28 locals=0x8

1.10 (note size=):

"".PutUint64 STEXT nosplit size=276 args=0x28 locals=0x8

the reason the 1.10 is larger is that it has one more panicindex

0x010b 00267 (prova.go:14)	UNDEF
0x010d 00269 (prova.go:10)	PCDATA	$0, $1
0x010d 00269 (prova.go:10)	CALL	runtime.panicindex(SB)

at the bottom. 1.9 has three, 1.10 has four.

@mvdan mvdan changed the title Performance degradation in 1.10 cmd/compile: binary.PutUint64 performance degradation in 1.10 Feb 23, 2018

@pascaldekloe

This comment has been minimized.

Copy link
Contributor Author

commented Feb 23, 2018

Spot on @mvdan! Go 1.10 does no longer inline these functions into the Benchmark functions.

I was just about to apply this function and for as far as I know there are no users yet @bcmills. I care not so much for this specific case yet this could have been something not visible for most cases.

The panic case is not benchmarked @ALTree.

The help is much appreciated. Thanks a bunch.

@mvdan

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

I'm confused - are you closing this because Go 1.10 no longer inlines calls made in benchmark functions by design?

@ALTree

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

The panic case is not benchmarked

I know that. But the function size sometime matters in microbenchmarks.

Also why did you close this? Is the fact that the function is no longer inlined expected?

@pascaldekloe

This comment has been minimized.

Copy link
Contributor Author

commented Feb 23, 2018

Weather the compiler inlines a function or not can hardly be a bug right? And this might be triggered by the slightly larger size indeed. I was worried that some essential was seriously off.

@mvdan

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

I would expect that, given that PutUint64 hasn't changed between 1.9 and 1.10, the compiled code should be the same or better - not worse. Unless I'm missing something obvious?

One way or another, no harm in leaving this open in case there is actually a compiler regression somewhere.

@ALTree

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

Weather the compiler inlines a function or not can hardly be a bug right?

It absolutely can. We are definitely interested in understanding why your function is no longer inlined in 1.10.

@ALTree ALTree reopened this Feb 23, 2018

@ALTree

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

1.10 "-m -m" says

cannot inline PutUint64: function too complex: cost 216 exceeds budget 80

while 1.9 inlines it without complaining.

@mvdan

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

@ALTree

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

I believe this is caused by the fix for #19261.

1.9 inlines the function because it incorrectly computed a cost of 0 for cross-package calls like binary.LittleEndian.PutUint64. This was fixed for 1.10 in CL 70151, and since the cost of the binary methods calls is now computed correctly, OP's function is much more heavier and it's no longer marked as inlineable.

So I believe this is WAI.

@TocarIP

This comment has been minimized.

Copy link
Contributor

commented Feb 23, 2018

I think https://go-review.googlesource.com/c/go/+/95475 is relevant here
Edit:
It fixes PutUint32/16 degradation and not 64, so it isn't relevant here.

@ALTree

This comment has been minimized.

Copy link
Member

commented Mar 1, 2018

Closing this, since it is expected that the function is no longer inlined (i linked the fix that caused this above).

@ALTree ALTree closed this Mar 1, 2018

pascaldekloe added a commit to pascaldekloe/flit that referenced this issue Jul 28, 2018

@golang golang locked and limited conversation to collaborators Mar 1, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.