register ABI: allow selecting ABI when Go code refers to assembly routine #44065

thanm · 2021-02-02T15:36:04Z

[Note: this is a tracking issue, not a regular bug.]

In CL 260477 as part of the register ABI work (tracking issue 40724), the assembler was changed to add syntax for specifying the ABI (ABI0 vs ABIInternal) when defining a given assembler routine or referring to some other routine. This change made it possible to have finer control over "special" assembler routines that are defined and used in non-standard ways.

For example, a function such as runtime.panicIndex is called by the compiler using a custom ABI (not ABI0), hence in order to get the correct semantics, there had to be some way to insure that a call from Go code to runtime.panicIndex did not go through an ABIInternal->ABI0 wrapper. Changing the assembly definition of runtime.panicIndex to have it be ABIInternal took care of this, but it is worth pointing out that declaring runtime.panicIndex as ABIInternal is something of a fiction, since the compiler currently doesn't follow the new ABIInternal rules exactly when emitting calls to the function.

The function runtime.goexit is another example: this function is never called, only used as a sentinel (e.g. the runtime takes its address, but never actually invokes it). When the runtime takes its address, however, the actual PC of the function itself it needed, as opposed to taking the address of the ABIInternal->ABI0 wrapper for the function. To resolve this problem, the assembly source for runtime.goexit was changed the to give it a definition as ABIInternal-- this meant that references from the runtime Go code would pick up the function directly, as opposed to the ABIInternal->ABI0 wrapper for it. Again here things are a little misleading, since there will never be an ABIInternal call, or any other call, to runtime.goexit.

A third case comes up with syscall wrappers: for functions like the x509 assembly wrappers, if we take the address of the wrapper function from Go to pass it to a syscall, we have to insure that we get the function itself and not an ABIInternal->ABI0 wrapper.

To improve on this situation, it might make more sense to instead create some sort of mechanism at the Go level to allow control over which ABI is selected (this idea from @cherrymui). For example, something like

  //go:abi0
  func goexit() ...

would be a way to say, "Compiler, I know that this function is an assembly routine, but when I refer to it, I want to get the ABI0 symbol and not the ABIInternal->ABI0 wrapper for the symbol". The one potential drawback here is that once this pragma is used in the package, we can no longer call the routine in question from Go within the package (if for some reason we need to have both direct calls and a reference that bypasses the ABI wrapper).

Another possibility would be to develop a pragma that applies to the referring expression itself, e.g.

  // Take address of "runtime.someAssemblyFunc", insuring that we get
  // the func itself and not the ABIInternal->ABI0 wrapper for the symbol
  myFuncPtr := unsafe.SelectABI(runtime.someAssemblyFunc, "ABI0")

An advantage of this second option would be that we could communicate to the compiler that a function uses an unknown or non-standard ABI, and should not be calllable directly from Go. Example:

  // Here 'anotherAssemblyRoutine' uses the C++ ABI, so ti should not  be called directly from Go.
  myCppFuncPtr := unsafe.SelectABI(runtime.anotherAssemblyRoutine, "ABIUnspecified")

  // legal (we're just storing the address)
  C.somestruct.cfuncptr = myCppFuncPtr

  // illegal-- generates a compiler error
  myCppFuncPtr(101, nil)

Unsure as to how much work either of these options would be to implement. Also probably might make sense to restrict the use of these constructs to "runtime" packages (as we already do with the assembler feature).

The text was updated successfully, but these errors were encountered:

cherrymui · 2021-02-02T15:57:35Z

(While I proposed the idea of using a pragma) I don't really like having many pragmas. And I'll note that the current approach (marking certain assembly functions as ABIInternal) "works". Personally I'd be fine just doing that.

From discussions, one problem of marking assembly functions as ABIInternal is that they are not really "ABIInternal", so marking them ABIInternal is somewhat a "lie". They are also not really ABI0 (at least for some of them). If we are aiming for clarity, we may want to mark them ABISpecial, either as assembly marker or as a pragma on the declarations in Go side. One counter argument is that their specialness is implementation detail, i.e. "internal", so ABIInternal is fine.

The unsafe.SelectABI is interesting but I don't know how to implement it. runtime.anotherAssemblyRoutine is not quoted, so it is an identifier in the Go code. How does the compiler resolve that identifier? Do you still need a declaration? What is its result type? (Maybe unsafe.Pointer or uintptr? So you can pass it to other places where the PC is expected, but cannot call it without explicitly cast it to a function pointer.) I would say this mechanism is for a limited set of special functions. We probably don't want to make it a public API (as in the unsafe package). Also, it probably doesn't make sense for other implementations of Go.

dr2chase · 2021-02-02T17:55:26Z

A few notes:

"once this pragma is used in the package, we can no longer call the routine in question from Go within the package"

is something we could support, since that is actually the transition plan for getting to the new ABI -- enable it for testing with a pragma, till it works, then throw the big switch. We could keep the old plumbing (which is already working in development as them no-register case of a register ABI. This would not be hard; we could probably never call such a method with reflection, as a goroutine, etc, but we could probably ensure that with static checks, or at least vet checks.

I'm torn on the too-many-pragmas question.

we hate knobs, and these are knobs.
I don't think they are under the compatibility guarantee, which is good.
not adding more pragmas, but then overloading existing mechanisms to mean the thing we didn't want to add a pragma for, is, actually, adding a knob. Knobs are technical debt, but overloaded multifunction knobs are also technical debt.

I'm a little interested in the unsafe thing -- we can certainly see constant strings in the compiler (certainly in SSA, we have optimizations on constants already), we could probably make that work, too.

gopherbot · 2021-02-05T17:27:19Z

Change https://golang.org/cl/290029 mentions this issue: [dev.regabi] runtime: take address in assembly for syscall wrappers on darwin

cherrymui · 2021-02-05T17:29:40Z

CL https://go-review.googlesource.com/c/go/+/290029 is an alternative approach: directly taking the address in assembly, and only pass the address to Go, as the Go side only needs the address. Then we don't need the ABI markers (because it is local to assembly now).

gopherbot · 2021-03-18T17:53:43Z

Change https://golang.org/cl/302109 mentions this issue: runtime: mark Windows' address-taken asm routines as ABIInternal

In the runtime there are Windows-specific assembly routines that are address-taken via funcPC and are not intended to be called through a wrapper. Mark them as ABIInternal so that we don't grab the wrapper, because that will break in all sorts of contexts. For #40724. For #44065. Change-Id: I12a728786786f423e5b229f8622e4a80ec27a31c Reviewed-on: https://go-review.googlesource.com/c/go/+/302109 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>

gopherbot · 2021-03-23T23:11:46Z

Change https://golang.org/cl/304231 mentions this issue: cmd/compile, internal/abi: add FuncPC intrinsic

gopherbot · 2021-03-23T23:11:47Z

Change https://golang.org/cl/304232 mentions this issue: cmd/compile, internal/abi: add FuncPC intrinsic

gopherbot · 2021-03-23T23:11:47Z

Change https://golang.org/cl/304230 mentions this issue: cmd/compile, internal/abi: add FuncPC intrinsic

When ABI wrappers are used, there are cases where in Go code we need the PC of the defined function instead of the ABI wrapper. Currently we work around this by define such functions as ABIInternal, even if they do not actually follow the internal ABI. This CL introduces internal/abi.FuncPCABIxxx functions as compiler intrinsics, which return the underlying defined function's entry PC if the argument is a direct reference of a function of the expected ABI, and reject it if it is of a different ABI. As a proof of concept, change runtime.goexit back to ABI0 and use internal/abi.FuncPCABI0 to retrieve its PC. Updates #44065. Change-Id: I02286f0f9d99e6a3090f9e8169dbafc6804a2da6 Reviewed-on: https://go-review.googlesource.com/c/go/+/304232 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com>

sync.SeqCount relies on the following memory orderings: - All stores following BeginWrite() in program order happen after the atomic read-modify-write (RMW) of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads being acquire-seqcst. - All stores preceding EndWrite() in program order happen before the RMW of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic stores being release-seqcst. - All loads following BeginRead() in program order happen after the load of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads being acquire-seqcst. - All loads preceding ReadOk() in program order happen before the load of SeqCount.epoch. The Go 1.19 memory model does not imply this property. The x86 memory model *does* imply this final property, and in practice the current Go compiler does not reorder memory accesses around the load of SeqCount.epoch, so sync.SeqCount behaves correctly on x86. However, on ARM64, the instruction that is actually emitted for the atomic load from SeqCount.epoch is LDAR: ``` gvisor/pkg/sentry/kernel/kernel.SeqAtomicTryLoadTaskGoroutineSchedInfo(): gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:34 56371c: f9400025 ldr x5, [x1] 563720: f9400426 ldr x6, [x1, #8] 563724: f9400822 ldr x2, [x1, #16] 563728: f9400c23 ldr x3, [x1, #24] gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:36 56372c: d503201f nop gvisor/pkg/sync/sync.(*SeqCount).ReadOk(): gvisor/pkg/sync/seqcount.go:107 563730: 88dffc07 ldar w7, [x0] 563734: 6b0400ff cmp w7, w4 ``` LDAR is explicitly documented as not implying the required memory ordering: https://developer.arm.com/documentation/den0024/latest/Memory-Ordering/Barriers/One-way-barriers Consequently, SeqCount.ReadOk() is incorrectly memory-ordered on weakly-ordered architectures. To fix this, we need to introduce an explicit memory fence. On ARM64, there is no way to implement the memory fence in question without resorting to assembly, so the implementation is straightforward. On x86, we introduce a compiler fence, since future compilers might otherwise reorder memory accesses to after atomic loads; the only apparent way to do so is also by using assembly, which unfortunately introduces overhead: - After the call to sync.MemoryFenceReads(), callers zero XMM15 and reload the runtime.g pointer from %fs:-8, reflecting the switch from ABI0 to ABIInternal. This is a relatively small cost. - Before the call to sync.MemoryFenceReads(), callers spill all registers to the stack, since ABI0 function calls clobber all registers. The cost of this depends on the state of the caller before the call, and is not reflected in BenchmarkSeqCountReadUncontended (which does not read any protected state between the calls to BeginRead() and ReadOk()). Both of these problems are caused by Go assembly functions being restricted to ABI0. Go provides a way to mark assembly functions as using ABIInternal instead, but restricts its use to functions in package runtime (golang/go#44065). runtime.publicationBarrier(), which is semantically "sync.MemoryFenceWrites()", is implemented as a compiler fence on x86; defining sync.MemoryFenceReads() as an alias for that function (using go:linkname) would mitigate the former problem, but not the latter. Thus, for simplicity, we define sync.MemoryFenceReads() in (ABI0) assembly, and have no choice but to eat the overhead. ("Fence" and "barrier" are often used interchangeably in this context; Linux uses "barrier" (e.g. `smp_rmb()`), while C++ uses "fence" (e.g. `std::atomic_memory_fence()`). We choose "fence" to reduce ambiguity with "write barriers", since Go is a GC'd language.) PiperOrigin-RevId: 572675753

sync.SeqCount relies on the following memory orderings: - All stores following BeginWrite() in program order happen after the atomic read-modify-write (RMW) of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads being acquire-seqcst. - All stores preceding EndWrite() in program order happen before the RMW of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic stores being release-seqcst. - All loads following BeginRead() in program order happen after the load of SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads being acquire-seqcst. - All loads preceding ReadOk() in program order happen before the load of SeqCount.epoch. The Go 1.19 memory model does not imply this property. The x86 memory model *does* imply this final property, and in practice the current Go compiler does not reorder memory accesses around the load of SeqCount.epoch, so sync.SeqCount behaves correctly on x86. However, on ARM64, the instruction that is actually emitted for the atomic load from SeqCount.epoch is LDAR: ``` gvisor/pkg/sentry/kernel/kernel.SeqAtomicTryLoadTaskGoroutineSchedInfo(): gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:34 56371c: f9400025 ldr x5, [x1] 563720: f9400426 ldr x6, [x1, #8] 563724: f9400822 ldr x2, [x1, #16] 563728: f9400c23 ldr x3, [x1, #24] gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:36 56372c: d503201f nop gvisor/pkg/sync/sync.(*SeqCount).ReadOk(): gvisor/pkg/sync/seqcount.go:107 563730: 88dffc07 ldar w7, [x0] 563734: 6b0400ff cmp w7, w4 ``` LDAR is explicitly documented as not implying the required memory ordering: https://developer.arm.com/documentation/den0024/latest/Memory-Ordering/Barriers/One-way-barriers Consequently, SeqCount.ReadOk() is incorrectly memory-ordered on weakly-ordered architectures. To fix this, we need to introduce an explicit memory fence. On ARM64, there is no way to implement the memory fence in question without resorting to assembly, so the implementation is straightforward. On x86, we introduce a compiler fence, since future compilers might otherwise reorder memory accesses to after atomic loads; the only apparent way to do so is also by using assembly, which unfortunately introduces overhead: - After the call to sync.MemoryFenceReads(), callers zero XMM15 and reload the runtime.g pointer from %fs:-8, reflecting the switch from ABI0 to ABIInternal. This is a relatively small cost. - Before the call to sync.MemoryFenceReads(), callers spill all registers to the stack, since ABI0 function calls clobber all registers. The cost of this depends on the state of the caller before the call, and is not reflected in BenchmarkSeqCountReadUncontended (which does not read any protected state between the calls to BeginRead() and ReadOk()). Both of these problems are caused by Go assembly functions being restricted to ABI0. Go provides a way to mark assembly functions as using ABIInternal instead, but restricts its use to functions in package runtime (golang/go#44065). runtime.publicationBarrier(), which is semantically "sync.MemoryFenceWrites()", is implemented as a compiler fence on x86; defining sync.MemoryFenceReads() as an alias for that function (using go:linkname) would mitigate the former problem, but not the latter. Thus, for simplicity, we define sync.MemoryFenceReads() in (ABI0) assembly, and have no choice but to eat the overhead. ("Fence" and "barrier" are often used interchangeably in this context; Linux uses "barrier" (e.g. `smp_rmb()`), while C++ uses "fence" (e.g. `std::atomic_memory_fence()`). We choose "fence" to reduce ambiguity with "write barriers", since Go is a GC'd language.) PiperOrigin-RevId: 573861378

cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 3, 2021

seankhliao added this to the Backlog milestone Aug 20, 2022

seankhliao added the umbrella label Aug 20, 2022

copybara-service bot mentioned this issue Oct 11, 2023

Fix sync.SeqCount.ReadOk() on non-x86 architectures. google/gvisor#9507

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

register ABI: allow selecting ABI when Go code refers to assembly routine #44065

register ABI: allow selecting ABI when Go code refers to assembly routine #44065

thanm commented Feb 2, 2021

cherrymui commented Feb 2, 2021

dr2chase commented Feb 2, 2021

gopherbot commented Feb 5, 2021

cherrymui commented Feb 5, 2021

gopherbot commented Mar 18, 2021

gopherbot commented Mar 23, 2021

gopherbot commented Mar 23, 2021

gopherbot commented Mar 23, 2021

register ABI: allow selecting ABI when Go code refers to assembly routine #44065

register ABI: allow selecting ABI when Go code refers to assembly routine #44065

Comments

thanm commented Feb 2, 2021

cherrymui commented Feb 2, 2021

dr2chase commented Feb 2, 2021

gopherbot commented Feb 5, 2021

cherrymui commented Feb 5, 2021

gopherbot commented Mar 18, 2021

gopherbot commented Mar 23, 2021

gopherbot commented Mar 23, 2021

gopherbot commented Mar 23, 2021