-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
register ABI: allow selecting ABI when Go code refers to assembly routine #44065
Comments
(While I proposed the idea of using a pragma) I don't really like having many pragmas. And I'll note that the current approach (marking certain assembly functions as ABIInternal) "works". Personally I'd be fine just doing that. From discussions, one problem of marking assembly functions as ABIInternal is that they are not really "ABIInternal", so marking them ABIInternal is somewhat a "lie". They are also not really ABI0 (at least for some of them). If we are aiming for clarity, we may want to mark them ABISpecial, either as assembly marker or as a pragma on the declarations in Go side. One counter argument is that their specialness is implementation detail, i.e. "internal", so ABIInternal is fine. The |
A few notes: "once this pragma is used in the package, we can no longer call the routine in question from Go within the package" is something we could support, since that is actually the transition plan for getting to the new ABI -- enable it for testing with a pragma, till it works, then throw the big switch. We could keep the old plumbing (which is already working in development as them no-register case of a register ABI. This would not be hard; we could probably never call such a method with reflection, as a goroutine, etc, but we could probably ensure that with static checks, or at least vet checks. I'm torn on the too-many-pragmas question.
I'm a little interested in the unsafe thing -- we can certainly see constant strings in the compiler (certainly in SSA, we have optimizations on constants already), we could probably make that work, too. |
Change https://golang.org/cl/290029 mentions this issue: |
CL https://go-review.googlesource.com/c/go/+/290029 is an alternative approach: directly taking the address in assembly, and only pass the address to Go, as the Go side only needs the address. Then we don't need the ABI markers (because it is local to assembly now). |
Change https://golang.org/cl/302109 mentions this issue: |
In the runtime there are Windows-specific assembly routines that are address-taken via funcPC and are not intended to be called through a wrapper. Mark them as ABIInternal so that we don't grab the wrapper, because that will break in all sorts of contexts. For #40724. For #44065. Change-Id: I12a728786786f423e5b229f8622e4a80ec27a31c Reviewed-on: https://go-review.googlesource.com/c/go/+/302109 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Change https://golang.org/cl/304231 mentions this issue: |
Change https://golang.org/cl/304232 mentions this issue: |
Change https://golang.org/cl/304230 mentions this issue: |
When ABI wrappers are used, there are cases where in Go code we need the PC of the defined function instead of the ABI wrapper. Currently we work around this by define such functions as ABIInternal, even if they do not actually follow the internal ABI. This CL introduces internal/abi.FuncPCABIxxx functions as compiler intrinsics, which return the underlying defined function's entry PC if the argument is a direct reference of a function of the expected ABI, and reject it if it is of a different ABI. As a proof of concept, change runtime.goexit back to ABI0 and use internal/abi.FuncPCABI0 to retrieve its PC. Updates #44065. Change-Id: I02286f0f9d99e6a3090f9e8169dbafc6804a2da6 Reviewed-on: https://go-review.googlesource.com/c/go/+/304232 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com>
sync.SeqCount relies on the following memory orderings: - All stores following BeginWrite() in program order happen after the atomic read-modify-write (RMW) of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads being acquire-seqcst. - All stores preceding EndWrite() in program order happen before the RMW of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic stores being release-seqcst. - All loads following BeginRead() in program order happen after the load of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads being acquire-seqcst. - All loads preceding ReadOk() in program order happen before the load of SeqCount.epoch. The Go 1.19 memory model does not imply this property. The x86 memory model *does* imply this final property, and in practice the current Go compiler does not reorder memory accesses around the load of SeqCount.epoch, so sync.SeqCount behaves correctly on x86. However, on ARM64, the instruction that is actually emitted for the atomic load from SeqCount.epoch is LDAR: ``` gvisor/pkg/sentry/kernel/kernel.SeqAtomicTryLoadTaskGoroutineSchedInfo(): gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:34 56371c: f9400025 ldr x5, [x1] 563720: f9400426 ldr x6, [x1, #8] 563724: f9400822 ldr x2, [x1, #16] 563728: f9400c23 ldr x3, [x1, #24] gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:36 56372c: d503201f nop gvisor/pkg/sync/sync.(*SeqCount).ReadOk(): gvisor/pkg/sync/seqcount.go:107 563730: 88dffc07 ldar w7, [x0] 563734: 6b0400ff cmp w7, w4 ``` LDAR is explicitly documented as not implying the required memory ordering: https://developer.arm.com/documentation/den0024/latest/Memory-Ordering/Barriers/One-way-barriers Consequently, SeqCount.ReadOk() is incorrectly memory-ordered on weakly-ordered architectures. To fix this, we need to introduce an explicit memory fence. On ARM64, there is no way to implement the memory fence in question without resorting to assembly, so the implementation is straightforward. On x86, we introduce a compiler fence, since future compilers might otherwise reorder memory accesses to after atomic loads; the only apparent way to do so is also by using assembly, which unfortunately introduces overhead: - After the call to sync.MemoryFenceReads(), callers zero XMM15 and reload the runtime.g pointer from %fs:-8, reflecting the switch from ABI0 to ABIInternal. This is a relatively small cost. - Before the call to sync.MemoryFenceReads(), callers spill all registers to the stack, since ABI0 function calls clobber all registers. The cost of this depends on the state of the caller before the call, and is not reflected in BenchmarkSeqCountReadUncontended (which does not read any protected state between the calls to BeginRead() and ReadOk()). Both of these problems are caused by Go assembly functions being restricted to ABI0. Go provides a way to mark assembly functions as using ABIInternal instead, but restricts its use to functions in package runtime (golang/go#44065). runtime.publicationBarrier(), which is semantically "sync.MemoryFenceWrites()", is implemented as a compiler fence on x86; defining sync.MemoryFenceReads() as an alias for that function (using go:linkname) would mitigate the former problem, but not the latter. Thus, for simplicity, we define sync.MemoryFenceReads() in (ABI0) assembly, and have no choice but to eat the overhead. ("Fence" and "barrier" are often used interchangeably in this context; Linux uses "barrier" (e.g. `smp_rmb()`), while C++ uses "fence" (e.g. `std::atomic_memory_fence()`). We choose "fence" to reduce ambiguity with "write barriers", since Go is a GC'd language.) PiperOrigin-RevId: 572675753
sync.SeqCount relies on the following memory orderings: - All stores following BeginWrite() in program order happen after the atomic read-modify-write (RMW) of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads being acquire-seqcst. - All stores preceding EndWrite() in program order happen before the RMW of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic stores being release-seqcst. - All loads following BeginRead() in program order happen after the load of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads being acquire-seqcst. - All loads preceding ReadOk() in program order happen before the load of SeqCount.epoch. The Go 1.19 memory model does not imply this property. The x86 memory model *does* imply this final property, and in practice the current Go compiler does not reorder memory accesses around the load of SeqCount.epoch, so sync.SeqCount behaves correctly on x86. However, on ARM64, the instruction that is actually emitted for the atomic load from SeqCount.epoch is LDAR: ``` gvisor/pkg/sentry/kernel/kernel.SeqAtomicTryLoadTaskGoroutineSchedInfo(): gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:34 56371c: f9400025 ldr x5, [x1] 563720: f9400426 ldr x6, [x1, #8] 563724: f9400822 ldr x2, [x1, #16] 563728: f9400c23 ldr x3, [x1, #24] gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:36 56372c: d503201f nop gvisor/pkg/sync/sync.(*SeqCount).ReadOk(): gvisor/pkg/sync/seqcount.go:107 563730: 88dffc07 ldar w7, [x0] 563734: 6b0400ff cmp w7, w4 ``` LDAR is explicitly documented as not implying the required memory ordering: https://developer.arm.com/documentation/den0024/latest/Memory-Ordering/Barriers/One-way-barriers Consequently, SeqCount.ReadOk() is incorrectly memory-ordered on weakly-ordered architectures. To fix this, we need to introduce an explicit memory fence. On ARM64, there is no way to implement the memory fence in question without resorting to assembly, so the implementation is straightforward. On x86, we introduce a compiler fence, since future compilers might otherwise reorder memory accesses to after atomic loads; the only apparent way to do so is also by using assembly, which unfortunately introduces overhead: - After the call to sync.MemoryFenceReads(), callers zero XMM15 and reload the runtime.g pointer from %fs:-8, reflecting the switch from ABI0 to ABIInternal. This is a relatively small cost. - Before the call to sync.MemoryFenceReads(), callers spill all registers to the stack, since ABI0 function calls clobber all registers. The cost of this depends on the state of the caller before the call, and is not reflected in BenchmarkSeqCountReadUncontended (which does not read any protected state between the calls to BeginRead() and ReadOk()). Both of these problems are caused by Go assembly functions being restricted to ABI0. Go provides a way to mark assembly functions as using ABIInternal instead, but restricts its use to functions in package runtime (golang/go#44065). runtime.publicationBarrier(), which is semantically "sync.MemoryFenceWrites()", is implemented as a compiler fence on x86; defining sync.MemoryFenceReads() as an alias for that function (using go:linkname) would mitigate the former problem, but not the latter. Thus, for simplicity, we define sync.MemoryFenceReads() in (ABI0) assembly, and have no choice but to eat the overhead. ("Fence" and "barrier" are often used interchangeably in this context; Linux uses "barrier" (e.g. `smp_rmb()`), while C++ uses "fence" (e.g. `std::atomic_memory_fence()`). We choose "fence" to reduce ambiguity with "write barriers", since Go is a GC'd language.) PiperOrigin-RevId: 573861378
[Note: this is a tracking issue, not a regular bug.]
In CL 260477 as part of the register ABI work (tracking issue 40724), the assembler was changed to add syntax for specifying the ABI (ABI0 vs ABIInternal) when defining a given assembler routine or referring to some other routine. This change made it possible to have finer control over "special" assembler routines that are defined and used in non-standard ways.
For example, a function such as
runtime.panicIndex
is called by the compiler using a custom ABI (not ABI0), hence in order to get the correct semantics, there had to be some way to insure that a call from Go code toruntime.panicIndex
did not go through an ABIInternal->ABI0 wrapper. Changing the assembly definition ofruntime.panicIndex
to have it be ABIInternal took care of this, but it is worth pointing out that declaringruntime.panicIndex
as ABIInternal is something of a fiction, since the compiler currently doesn't follow the new ABIInternal rules exactly when emitting calls to the function.The function
runtime.goexit
is another example: this function is never called, only used as a sentinel (e.g. the runtime takes its address, but never actually invokes it). When the runtime takes its address, however, the actual PC of the function itself it needed, as opposed to taking the address of the ABIInternal->ABI0 wrapper for the function. To resolve this problem, the assembly source forruntime.goexit
was changed the to give it a definition as ABIInternal-- this meant that references from the runtime Go code would pick up the function directly, as opposed to the ABIInternal->ABI0 wrapper for it. Again here things are a little misleading, since there will never be an ABIInternal call, or any other call, toruntime.goexit
.A third case comes up with syscall wrappers: for functions like the x509 assembly wrappers, if we take the address of the wrapper function from Go to pass it to a syscall, we have to insure that we get the function itself and not an ABIInternal->ABI0 wrapper.
To improve on this situation, it might make more sense to instead create some sort of mechanism at the Go level to allow control over which ABI is selected (this idea from @cherrymui). For example, something like
would be a way to say, "Compiler, I know that this function is an assembly routine, but when I refer to it, I want to get the ABI0 symbol and not the ABIInternal->ABI0 wrapper for the symbol". The one potential drawback here is that once this pragma is used in the package, we can no longer call the routine in question from Go within the package (if for some reason we need to have both direct calls and a reference that bypasses the ABI wrapper).
Another possibility would be to develop a pragma that applies to the referring expression itself, e.g.
An advantage of this second option would be that we could communicate to the compiler that a function uses an unknown or non-standard ABI, and should not be calllable directly from Go. Example:
Unsure as to how much work either of these options would be to implement. Also probably might make sense to restrict the use of these constructs to "runtime" packages (as we already do with the assembler feature).
The text was updated successfully, but these errors were encountered: