Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: No BCE for len(x)&y in slice expression #52563

Closed
greatroar opened this issue Apr 26, 2022 · 4 comments
Closed

cmd/compile: No BCE for len(x)&y in slice expression #52563

greatroar opened this issue Apr 26, 2022 · 4 comments
Labels
help wanted NeedsFix Performance
Milestone

Comments

@greatroar
Copy link

@greatroar greatroar commented Apr 26, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18 linux/amd64
$ gotip version
go version devel go1.19-96c8cc7fea9 Sun Apr 24 01:22:21 2022 +0000 linux/amd64

What did you do?

const blocksize = 16
fullBlocks := len(p) &^ (blocksize - 1)
blk := p[:fullBlocks]
rem := p[fullBlocks:]

What did you expect to see?

I expected the bounds checks on the final two lines to be eliminated, since len(p)&x <= cap(p) always.

What did you see instead?

Bounds checks. This is similar to #38476, but presumably easier to tackle. GOSSAFUNC output shows one of these checks as

v6 = SliceLen <int> v5
v8 = Const64 <int> [-16]
v9 = And64 <int> v6 v8 (fullBlocks[int])
v12 = SliceCap <int> v5
v14 = IsSliceInBounds <bool> v9 v12
@greatroar
Copy link
Author

@greatroar greatroar commented Apr 26, 2022

#38717 is also similar, but for IsInBounds instead of IsSliceInBounds.

@randall77 randall77 added this to the Unplanned milestone Apr 26, 2022
@randall77 randall77 added the NeedsFix label Apr 26, 2022
@mengzhuo mengzhuo self-assigned this Apr 27, 2022
@greatroar
Copy link
Author

@greatroar greatroar commented Apr 29, 2022

I just noticed that the examples from #38476, ChaCha20 and Poly1305, would reduce to this if the compiler also knew that

x-(x&y) = x&^y

Jorropo added a commit to Jorropo/go that referenced this issue May 3, 2022
This allows to rewrite:
  AND BX, AX
  SUB AX, BX

Into (with GOAMD64=v3):
  ANDN AX, BX, AX

Which is twice as fast, and smaller.

Into (without GOAMD64=v3):
  NOT BX
  AND AX, BX

Which if AX's depency isn't resolved yet but BX is in most cases twice as
fast because now the NOT BX and AX's depency can execute in parallel.
In other cases, it has negligeable positive impact.

Other instructions set will also benefits from the pipelining adventage.
And also benefit from the smaller ANDN instruction if they have something
similar.

Help reduce ChaCha20 and Poly1305 to golang#52563
@gopherbot
Copy link

@gopherbot gopherbot commented May 3, 2022

Change https://go.dev/cl/403654 mentions this issue: cmd/compile: ssa rewrite x-(x&y) => x&^y

Jorropo added a commit to Jorropo/go that referenced this issue May 3, 2022
This allows to rewrite:
  AND BX, AX
  SUB AX, BX

Into (with GOAMD64=v3):
  ANDN AX, BX, AX

Which is twice as fast, and smaller.

Into (without GOAMD64=v3):
  NOT BX
  AND AX, BX

Which if AX's depency isn't resolved yet but BX is in most cases twice as
fast because now the NOT BX and AX's depency can execute in parallel.
In other cases, it has negligeable positive impact.

Other instructions set will also benefits from the pipelining adventage.
And also benefit from the smaller ANDN instruction if they have something
similar.

Help reduce ChaCha20 and Poly1305 to golang#52563
@mengzhuo mengzhuo removed their assignment May 5, 2022
@gopherbot
Copy link

@gopherbot gopherbot commented May 6, 2022

Change https://go.dev/cl/404315 mentions this issue: cmd/compile: teach prove about and operation

greatroar pushed a commit to greatroar/go-mph that referenced this issue May 8, 2022
This is about as fast on a 54763 corpus, but means the data structure
is no longer up to 2× too large. Go 1.18, amd64:

name   old time/op  new time/op  delta
MPH-8  1.33ms ± 1%  1.31ms ± 1%  -1.57%  (p=0.000 n=10+8)
Map-8  1.97ms ± 5%  1.92ms ± 1%  -2.66%  (p=0.001 n=9+10)

A fix for golang/go#52563 might undo this speed benefit, because with
the old code, a bounds check is inserted for

	xorshiftMult64(uint64(seed)+hash) & (size - 1)

that might get eliminated. reducerange causes a bounds check either way,
unless the compiler were amended to know about it as well.
greatroar added a commit to greatroar/go-mph that referenced this issue May 10, 2022
This is about as fast on a 54763 corpus, but means the data structure
is no longer up to 2× too large. Go 1.18, amd64:

name   old time/op  new time/op  delta
MPH-8  1.33ms ± 1%  1.31ms ± 1%  -1.57%  (p=0.000 n=10+8)
Map-8  1.97ms ± 5%  1.92ms ± 1%  -2.66%  (p=0.001 n=9+10)

A fix for golang/go#52563 might undo this speed benefit, because with
the old code, a bounds check is inserted for

	xorshiftMult64(uint64(seed)+hash) & (size - 1)

that might get eliminated. reducerange causes a bounds check either way,
unless the compiler were amended to know about it as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsFix Performance
Projects
None yet
Development

No branches or pull requests

4 participants