Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gofrontend/gccgo: Improve identity comparison #21618

legrosbuffle opened this issue Aug 25, 2017 · 1 comment


Copy link

commented Aug 25, 2017

The issue

gofrontend currently generates very inefficient code for T$equal and t1 == t2 when dealing with identity-comparable types (i.e. integers, pointers, and unpadded structs and arrays of such).

For example:

  • The following code does two 4-byte compares and a jump in T$equal, and calls memcmp in t1 == t2. For t1 == t2, gofrontend has a trick to turn this into a single 8-byte compare if the struct is aligned and size<=16 bytes, but it's not the case here.
type T struct {
            a int32
            b int32
  • the following code always generates a call to memcmp:
type T struct {
            a int32
            b int32
            c int32
            d int32

A solution

In 2016 gcc introduced the __builtin_memcmp_eq builtin that knows how to lower memcmp efficiently when the result is only used for equality comparison (i.e. equality with 0 instead of 3-way ordering). This is typically useful when the size is a constexpr (as is the case here).

The basic idea is to replace a larger chain of integer comparisons loaded from contiguous memory locations into a smaller chain of bigger integer comparisons. Benefits are twofold:

  • There are less jumps, and therefore less opportunities for mispredictions and I-cache misses.
  • The code is smaller, both because jumps are removed and because the encoding of a 2*n byte compare is smaller than that of two n-byte compares.

As a first step, I’m simply proposing to replace calls to runtime.memequal with calls to __builtin_memcmp_eq. This only improves the generated code.

In first second example above, this would change the generated code (gccgo -march=haswell -m64 -O3 -c test.go) from:

  • t1 == t2
   b:	8b 56 04             	mov    0x4(%rsi),%edx
   e:	8b 4f 04             	mov    0x4(%rdi),%ecx
  11:	31 c0                	xor    %eax,%eax
  13:	8b 36                	mov    (%rsi),%esi
  15:	39 37                	cmp    %esi,(%rdi)
  17:	74 07                	je     20 <go.test.DoStuff+0x20>
  19:	c3                   	retq   
  1a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  20:	39 d1                	cmp    %edx,%ecx
  22:	0f 94 c0             	sete   %al
  25:	c3                   	retq   
  • T$equal
  7b:   48 83 ec 08          	sub    $0x8,%rsp
  7f:	ba 08 00 00 00       	mov    $0x8,%edx
  84:	e8 00 00 00 00       	callq  89 <go.test.T$equal+0x19>
			85: R_X86_64_PC32	runtime.memequal-0x4
  89:	48 83 c4 08          	add    $0x8,%rsp
  8d:	c3                   	retq   

To (in both cases):

  7b:	48 8b 06             	mov    (%rsi),%rax
  7e:	48 39 07             	cmp    %rax,(%rdi)
  81:	0f 94 c0             	sete   %al
  84:	c3                   	retq

This is both smaller in terms of code size and much more efficient.

Going further

Simplifying gofrontend

This also allows removing any specific code for handling sizes smaller than 16 bytes since they are already handled by gcc.

More performance improvements

This should be extended to piecewise-identity-comparable structs. For example, the following structure should be compared with three builtin calls ({a}, {c,d,e}, and {f,g}) and a float compare.

type T struct {
	a int32
	b float32  // Floats are not identity-comparable
	c int32
	d int32
	e byte
	// Implicit _ [3]byte padding
	f int32
	g int32

@gopherbot gopherbot added this to the Gccgo milestone Aug 25, 2017


This comment has been minimized.

Copy link

commented Aug 25, 2017

Here's a proof of concept (it handles only T$equal and is untested).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
2 participants
You can’t perform that action at this time.