Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: open code unsafe.Slice #48798

Open
josharian opened this issue Oct 5, 2021 · 12 comments
Open

cmd/compile: open code unsafe.Slice #48798

josharian opened this issue Oct 5, 2021 · 12 comments

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Oct 5, 2021

golang.org/cl/352953 used unsafe.Slice in the runtime. I am about to revert that change b/c it causes a 4% performance regression in a stack copying benchmark. At least half of that is due to function call overhead. We might want to open code unsafe.Slice instead of making it a runtime call.

cc @mdempsky

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 5, 2021

Change https://golang.org/cl/354090 mentions this issue: Revert "runtime: use unsafe.Slice in getStackMap"

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 5, 2021

Looking at the current implementation of unsafe.Slice, I see that it gives an error if the overall size of the slice is > maxAlloc, but I don't know why. One can imagine using unsafe.Slice on a huge chunk of memory that is larger than maxAlloc, and doesn't seem any less safe than unsafe.Slice itself.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 5, 2021

I think my rationale was to ensure the slice didn't wrap around the address space, and I just copied that from growslice. But reflecting on it now, it doesn't actually achieve that. Perhaps there's no reason to cap the addressed memory range at maxAlloc.

Probably a better check would be mem > -uintptr(ptr) to prevent wrap around.

@mknyszek mknyszek added this to the Backlog milestone Oct 5, 2021
gopherbot pushed a commit that referenced this issue Oct 5, 2021
This reverts commit golang.org/cl/352953.

Reason for revert: unsafe.Slice is considerably slower.
Part of this is extra safety checks (good), but most of it
is the function call overhead. We should consider open-coding it (#48798).

Impact of this change:

name                   old time/op  new time/op  delta
StackCopyWithStkobj-8  12.1ms ± 5%  11.6ms ± 3%  -4.03%  (p=0.009 n=10+8)

Change-Id: Ib2448e3edac25afd8fb55ffbea073b8b11521bde
Reviewed-on: https://go-review.googlesource.com/c/go/+/354090
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Oct 12, 2021

I think my rationale was to ensure the slice didn't wrap around the address space, and I just copied that from growslice. But reflecting on it now, it doesn't actually achieve that. Perhaps there's no reason to cap the addressed memory range at maxAlloc.

Probably a better check would be mem > -uintptr(ptr) to prevent wrap around.

It's already "unsafe". Can we just omit all checks and have the compiler emit ideal code that assumes its usage is correct, just like unsafe.Pointer casts? Then we can put all sorts of safety checks behind checkptr, again like unsafe.Pointer casts.

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 12, 2021

Change https://golang.org/cl/355252 mentions this issue: cmd/compile: speed up unsafe.Slice by omitting checks

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 12, 2021

Can we just omit all checks and have the compiler emit ideal code that assumes its usage is correct, just like unsafe.Pointer casts?

In cases where we can statically determine that to be safe, I'm happy for the compiler to elide the bounds checks. I'm also happy to open code the checks so the function call overhead is reduced.

However, I'm skeptical that most Go code that would want to use unsafe.Slice is unable to afford the bounds checks. E.g., any subsequent indexing/slicing operations will still have bounds checks and the code can evidently afford those, otherwise it would directly compute indices using unchecked unsafe.Pointer arithmetic.

Package unsafe is used by an unfortunately large percentage of Go programs, and many Go programmers (including expert Go programmers) do not carefully read the fine print on how to use unsafe.Pointer correctly. unsafe.Slice exists to help those users write correct code, not maximally performant code. Use cases that really can't afford the extra overhead can continue to use unsafe.Pointer and reflect.SliceHeader; those aren't going away.

@josharian
Copy link
Contributor Author

@josharian josharian commented Oct 12, 2021

I think that we should start by open coding the checks. It's not much code, the function call overhead is much larger than the bounds checks overhead, and if they are open coded then the compiler will have a chance to optimize them away. Then we can re-assess whether there's still a performance concern and resume discussion if necessary.

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 12, 2021

Change https://golang.org/cl/355490 mentions this issue: unsafe: optimize Slice bounds checking

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 12, 2021

Change https://golang.org/cl/355489 mentions this issue: unsafe: allow unsafe.Slice up to end of address space

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 12, 2021

CL 355490 streamlines unsafe.Slice's bounds checks somewhat, so hopefully it's easier to open code. It would also be interesting to know if it has any measurable impact to unsafe.Slice's overhead when used for stack copying.

@josharian
Copy link
Contributor Author

@josharian josharian commented Oct 13, 2021

Great. I'm thousands of miles from my computer, so I'm afraid I can't benchmark for you now. But the benchmark is checked in—StackCopyWithStkobj.

@cuonglm
Copy link
Member

@cuonglm cuonglm commented Oct 13, 2021

@josharian The performance seems now equal:

name                   old time/op    new time/op    delta
StackCopyWithStkobj-8    8.66ms ± 4%    8.56ms ± 3%   ~     (p=0.481 n=10+10)

name                   old alloc/op   new alloc/op   delta
StackCopyWithStkobj-8     16.0B ± 0%     16.0B ± 0%   ~     (all equal)

name                   old allocs/op  new allocs/op  delta
StackCopyWithStkobj-8      1.00 ± 0%      1.00 ± 0%   ~     (all equal)

With this patch:

diff --git a/src/runtime/stack.go b/src/runtime/stack.go
index 284c6b3b84..aedec1765d 100644
--- a/src/runtime/stack.go
+++ b/src/runtime/stack.go
@@ -1328,7 +1328,9 @@ func getStackMap(frame *stkframe, cache *pcvalueCache, debug bool) (locals, args
                if p != nil {
                        n := *(*uintptr)(p)
                        p = add(p, goarch.PtrSize)
-                       *(*slice)(unsafe.Pointer(&objs)) = slice{array: noescape(p), len: int(n), cap: int(n)}
+                       r0 := (*stackObjectRecord)(noescape(p))
+                       objs = unsafe.Slice(r0, int(n))
+                       // *(*slice)(unsafe.Pointer(&objs)) = slice{array: noescape(p), len: int(n), cap: int(n)}
                        // Note: the noescape above is needed to keep
                        // getStackMap from "leaking param content:
                        // frame".  That leak propagates up to getgcmask, then

Update

It seems strange, but I even now can't reproduce on my M1. With and without unsafe.Slice give me the same benchmark result.

gopherbot pushed a commit that referenced this issue Oct 13, 2021
Allow the user to construct slices that are larger than the Go heap as
long as they don't overflow the address space.

Updates #48798.

Change-Id: I659c8334d04676e1f253b9c3cd499eab9b9f989a
Reviewed-on: https://go-review.googlesource.com/c/go/+/355489
Trust: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
gopherbot pushed a commit that referenced this issue Oct 13, 2021
This reduces the number of branches to bounds check non-empty slices
from 5 to 3. It does also increase the number of branches to handle
empty slices from 1 to 3; but for non-panicking calls, they should all
be predictable.

Updates #48798.

Change-Id: I3ffa66857096486f4dee417e1a66eb8fdf7a3777
Reviewed-on: https://go-review.googlesource.com/c/go/+/355490
Trust: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants