Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize code in the encoding/binary package #51724

Closed
perillo opened this issue Mar 16, 2022 · 7 comments
Closed

cmd/compile: optimize code in the encoding/binary package #51724

perillo opened this issue Mar 16, 2022 · 7 comments
Labels
NeedsInvestigation
Milestone

Comments

@perillo
Copy link
Contributor

@perillo perillo commented Mar 16, 2022

The encoding/binary package uses portable bit operations to read and write integers.

Using the program:

package test

func readUint64BigEndian(b []byte) uint64 {
	_ = b[7] // bounds check hint to compiler; see golang.org/issue/14808

	return uint64(b[7]) | uint64(b[6])<<8 | uint64(b[5])<<16 | uint64(b[4])<<24 |
		uint64(b[3])<<32 | uint64(b[2])<<40 | uint64(b[1])<<48 | uint64(b[0])<<56
}

and building with:

GOAMD64=v3 go1.18 build -o test.o test.go

the resulting assembly is:

SUBQ $0x18, SP		
MOVQ BP, 0x10(SP)	
LEAQ 0x10(SP), BP	
MOVQ AX, 0x20(SP)	
CMPQ $0x7, BX		
JBE 0x9ac		
MOVBE 0(AX), AX		
MOVQ 0x10(SP), BP	
ADDQ $0x18, SP		
RET			
MOVL $0x7, AX		
MOVQ BX, CX		
CALL 0x9b9		[1:5]R_CALL:runtime.panicIndex	
NOPL

The Zig compiler, with the same code and ReleaseSafe build mode, is able to generate the assembly:

movbe   rax, qword ptr [rdi]
ret

See https://godbolt.org/z/MPbE3P9j8.

Note that the movbe instruction was added in AMD64 v3 (https://en.wikipedia.org/wiki/X86-64).

Zig currently uses the LLVM backend, so I don't know if the cost/complexity of this optimization is high.

@ALTree
Copy link
Member

@ALTree ALTree commented Mar 16, 2022

That zig code is not equivalent because you're passing a [8]u8, while the Go snippet uses a slice, which we need to boundcheck. This Go Function:

func readUint64BigEndian(b [8]byte) uint64 {
	return uint64(b[7]) | uint64(b[6])<<8 | uint64(b[5])<<16 | uint64(b[4])<<24 |
		uint64(b[3])<<32 | uint64(b[2])<<40 | uint64(b[1])<<48 | uint64(b[0])<<56
}

generates with GOAMD64=v3 go tool compile -S test.go:

0x0000 00000 (test.go:5)	MOVBEQ	"".b+8(SP), AX
0x0007 00007 (test.go:5)	RET

Which is the same as the LLVM output.

@perillo
Copy link
Contributor Author

@perillo perillo commented Mar 16, 2022

@ALTree, thank you for the clarification.

I did not notice the problem, since in Zig it is possible to convert a slice to an array; as an example:

var start: usize = 2;
var buf = [_]u8{ 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8 };
var b = buf[start..]; // A slice since start is not a constant
var bytes = b[0..8]; // An array since the indices are constants

It should be possible(?) to implement this in Go, but it will break compatibility.

@ALTree
Copy link
Member

@ALTree ALTree commented Mar 16, 2022

In the Go version that takes a slice we have to boundcheck it, but if you remove the instructions related to that, it's just a single MOVBEQ (AX), AX again (but I could be missing something). Is there anything else the compiler should do? (I mean for the binary part which is the topic of this issue).

@mdempsky mdempsky changed the title compile: optimize code in the encoding/binary package cmd/compile: optimize code in the encoding/binary package Mar 16, 2022
@heschi heschi added the NeedsInvestigation label Mar 16, 2022
@heschi heschi added this to the Backlog milestone Mar 16, 2022
@perillo
Copy link
Contributor Author

@perillo perillo commented Mar 17, 2022

I wrote this test code:

package test

import "encoding/binary"

func Parse(b []byte) uint64 {
	v := binary.BigEndian.Uint64(b[0:8])

	return v
}

func ParseUint64(b []byte, start, end int) uint64 {
	v := binary.BigEndian.Uint64(b[start:end])

	return v
}

func ParseUint64_2(b []byte, start, end int) uint64 {
	v := binary.BigEndian.Uint64(b[start:end][0:8])

	return v
}

The first and last functions (where slicing is done using constants), generate a single MOVBEQ instruction for the binary.BigEndian.Uint64 function, while the second function generates additional instructions.

I searched for real examples and found:
https://cs.opensource.google/go/x/net/+/27dd8689:http2/frame.go;l=764

The code can probably be improved with (not tested):

binary.BigEndian.Uint16(buf[i*6:][:2])

Zig, as a counter example, will report a compiler error so it is more easy to detect the problem.

NOTE:
The actual instruction I see is MOVBE, not MOVBEQ.
Also, for binary.BigEndian.UInt16, the Go compiler generates two instructions:

MOVZX 0(AX), AX
ROLW $0x8, AX

while the Zig compiler generates

movbe  ax,WORD PTR [rdi]

The source code is:

package test

import "encoding/binary"

func Parse(b []byte) uint16 {
	v := binary.BigEndian.Uint16(b[0:2])

	return v
}

Is there a reason for not using MOVBE?

@randall77
Copy link
Contributor

@randall77 randall77 commented Mar 17, 2022

Is there a reason for not using MOVBE?

No particular reason, 16-bit versions just hasn't been implemented yet.

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 24, 2022

Change https://go.dev/cl/395474 mentions this issue: cmd/compile: add MOVBE index load/store

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 30, 2022

Change https://go.dev/cl/396617 mentions this issue: cmd/compile: add MOVBEWstore support for GOAMD64>=3

gopherbot pushed a commit that referenced this issue Apr 3, 2022
This CL add MOVBE support for 16-bit version, but MOVBEWload is
excluded because it does not satisfy zero extented.

For #51724

Change-Id: I3fadf20bcbb9b423f6355e6a1e340107e8e621ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/396617
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

5 participants