Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

register ABI: allow selecting ABI when Go code refers to assembly routine #44065

Open
thanm opened this issue Feb 2, 2021 · 8 comments
Open
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. umbrella
Milestone

Comments

@thanm
Copy link
Contributor

thanm commented Feb 2, 2021

[Note: this is a tracking issue, not a regular bug.]

In CL 260477 as part of the register ABI work (tracking issue 40724), the assembler was changed to add syntax for specifying the ABI (ABI0 vs ABIInternal) when defining a given assembler routine or referring to some other routine. This change made it possible to have finer control over "special" assembler routines that are defined and used in non-standard ways.

For example, a function such as runtime.panicIndex is called by the compiler using a custom ABI (not ABI0), hence in order to get the correct semantics, there had to be some way to insure that a call from Go code to runtime.panicIndex did not go through an ABIInternal->ABI0 wrapper. Changing the assembly definition of runtime.panicIndex to have it be ABIInternal took care of this, but it is worth pointing out that declaring runtime.panicIndex as ABIInternal is something of a fiction, since the compiler currently doesn't follow the new ABIInternal rules exactly when emitting calls to the function.

The function runtime.goexit is another example: this function is never called, only used as a sentinel (e.g. the runtime takes its address, but never actually invokes it). When the runtime takes its address, however, the actual PC of the function itself it needed, as opposed to taking the address of the ABIInternal->ABI0 wrapper for the function. To resolve this problem, the assembly source for runtime.goexit was changed the to give it a definition as ABIInternal-- this meant that references from the runtime Go code would pick up the function directly, as opposed to the ABIInternal->ABI0 wrapper for it. Again here things are a little misleading, since there will never be an ABIInternal call, or any other call, to runtime.goexit.

A third case comes up with syscall wrappers: for functions like the x509 assembly wrappers, if we take the address of the wrapper function from Go to pass it to a syscall, we have to insure that we get the function itself and not an ABIInternal->ABI0 wrapper.

To improve on this situation, it might make more sense to instead create some sort of mechanism at the Go level to allow control over which ABI is selected (this idea from @cherrymui). For example, something like

  //go:abi0
  func goexit() ...

would be a way to say, "Compiler, I know that this function is an assembly routine, but when I refer to it, I want to get the ABI0 symbol and not the ABIInternal->ABI0 wrapper for the symbol". The one potential drawback here is that once this pragma is used in the package, we can no longer call the routine in question from Go within the package (if for some reason we need to have both direct calls and a reference that bypasses the ABI wrapper).

Another possibility would be to develop a pragma that applies to the referring expression itself, e.g.

  // Take address of "runtime.someAssemblyFunc", insuring that we get
  // the func itself and not the ABIInternal->ABI0 wrapper for the symbol
  myFuncPtr := unsafe.SelectABI(runtime.someAssemblyFunc, "ABI0")

An advantage of this second option would be that we could communicate to the compiler that a function uses an unknown or non-standard ABI, and should not be calllable directly from Go. Example:

  // Here 'anotherAssemblyRoutine' uses the C++ ABI, so ti should not  be called directly from Go.
  myCppFuncPtr := unsafe.SelectABI(runtime.anotherAssemblyRoutine, "ABIUnspecified")

  // legal (we're just storing the address)
  C.somestruct.cfuncptr = myCppFuncPtr

  // illegal-- generates a compiler error
  myCppFuncPtr(101, nil)

Unsure as to how much work either of these options would be to implement. Also probably might make sense to restrict the use of these constructs to "runtime" packages (as we already do with the assembler feature).

@cherrymui
Copy link
Member

(While I proposed the idea of using a pragma) I don't really like having many pragmas. And I'll note that the current approach (marking certain assembly functions as ABIInternal) "works". Personally I'd be fine just doing that.

From discussions, one problem of marking assembly functions as ABIInternal is that they are not really "ABIInternal", so marking them ABIInternal is somewhat a "lie". They are also not really ABI0 (at least for some of them). If we are aiming for clarity, we may want to mark them ABISpecial, either as assembly marker or as a pragma on the declarations in Go side. One counter argument is that their specialness is implementation detail, i.e. "internal", so ABIInternal is fine.

The unsafe.SelectABI is interesting but I don't know how to implement it. runtime.anotherAssemblyRoutine is not quoted, so it is an identifier in the Go code. How does the compiler resolve that identifier? Do you still need a declaration? What is its result type? (Maybe unsafe.Pointer or uintptr? So you can pass it to other places where the PC is expected, but cannot call it without explicitly cast it to a function pointer.) I would say this mechanism is for a limited set of special functions. We probably don't want to make it a public API (as in the unsafe package). Also, it probably doesn't make sense for other implementations of Go.

@dr2chase
Copy link
Contributor

dr2chase commented Feb 2, 2021

A few notes:

"once this pragma is used in the package, we can no longer call the routine in question from Go within the package"

is something we could support, since that is actually the transition plan for getting to the new ABI -- enable it for testing with a pragma, till it works, then throw the big switch. We could keep the old plumbing (which is already working in development as them no-register case of a register ABI. This would not be hard; we could probably never call such a method with reflection, as a goroutine, etc, but we could probably ensure that with static checks, or at least vet checks.

I'm torn on the too-many-pragmas question.

  • we hate knobs, and these are knobs.
  • I don't think they are under the compatibility guarantee, which is good.
  • not adding more pragmas, but then overloading existing mechanisms to mean the thing we didn't want to add a pragma for, is, actually, adding a knob. Knobs are technical debt, but overloaded multifunction knobs are also technical debt.

I'm a little interested in the unsafe thing -- we can certainly see constant strings in the compiler (certainly in SSA, we have optimizations on constants already), we could probably make that work, too.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 3, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/290029 mentions this issue: [dev.regabi] runtime: take address in assembly for syscall wrappers on darwin

@cherrymui
Copy link
Member

CL https://go-review.googlesource.com/c/go/+/290029 is an alternative approach: directly taking the address in assembly, and only pass the address to Go, as the Go side only needs the address. Then we don't need the ABI markers (because it is local to assembly now).

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/302109 mentions this issue: runtime: mark Windows' address-taken asm routines as ABIInternal

gopherbot pushed a commit that referenced this issue Mar 19, 2021
In the runtime there are Windows-specific assembly routines that are
address-taken via funcPC and are not intended to be called through a
wrapper. Mark them as ABIInternal so that we don't grab the wrapper,
because that will break in all sorts of contexts.

For #40724.
For #44065.

Change-Id: I12a728786786f423e5b229f8622e4a80ec27a31c
Reviewed-on: https://go-review.googlesource.com/c/go/+/302109
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/304231 mentions this issue: cmd/compile, internal/abi: add FuncPC intrinsic

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/304232 mentions this issue: cmd/compile, internal/abi: add FuncPC intrinsic

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/304230 mentions this issue: cmd/compile, internal/abi: add FuncPC intrinsic

gopherbot pushed a commit that referenced this issue Apr 23, 2021
When ABI wrappers are used, there are cases where in Go code we
need the PC of the defined function instead of the ABI wrapper.
Currently we work around this by define such functions as
ABIInternal, even if they do not actually follow the internal ABI.

This CL introduces internal/abi.FuncPCABIxxx functions as compiler
intrinsics, which return the underlying defined function's entry
PC if the argument is a direct reference of a function of the
expected ABI, and reject it if it is of a different ABI.

As a proof of concept, change runtime.goexit back to ABI0 and use
internal/abi.FuncPCABI0 to retrieve its PC.

Updates #44065.

Change-Id: I02286f0f9d99e6a3090f9e8169dbafc6804a2da6
Reviewed-on: https://go-review.googlesource.com/c/go/+/304232
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
@seankhliao seankhliao added this to the Backlog milestone Aug 20, 2022
copybara-service bot pushed a commit to google/gvisor that referenced this issue Oct 11, 2023
sync.SeqCount relies on the following memory orderings:

- All stores following BeginWrite() in program order happen after the atomic
  read-modify-write (RMW) of SeqCount.epoch. In the Go 1.19 memory model, this
  is implied by atomic loads being acquire-seqcst.

- All stores preceding EndWrite() in program order happen before the RMW of
  SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic stores
  being release-seqcst.

- All loads following BeginRead() in program order happen after the load of
  SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads
  being acquire-seqcst.

- All loads preceding ReadOk() in program order happen before the load of
  SeqCount.epoch. The Go 1.19 memory model does not imply this property.

The x86 memory model *does* imply this final property, and in practice the
current Go compiler does not reorder memory accesses around the load of
SeqCount.epoch, so sync.SeqCount behaves correctly on x86.
However, on ARM64, the instruction that is actually emitted for the atomic load
from SeqCount.epoch is LDAR:

```
gvisor/pkg/sentry/kernel/kernel.SeqAtomicTryLoadTaskGoroutineSchedInfo():
gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:34
  56371c:       f9400025        ldr     x5, [x1]
  563720:       f9400426        ldr     x6, [x1, #8]
  563724:       f9400822        ldr     x2, [x1, #16]
  563728:       f9400c23        ldr     x3, [x1, #24]
gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:36
  56372c:       d503201f        nop
gvisor/pkg/sync/sync.(*SeqCount).ReadOk():
gvisor/pkg/sync/seqcount.go:107
  563730:       88dffc07        ldar    w7, [x0]
  563734:       6b0400ff        cmp     w7, w4
```

LDAR is explicitly documented as not implying the required memory ordering:
https://developer.arm.com/documentation/den0024/latest/Memory-Ordering/Barriers/One-way-barriers
Consequently, SeqCount.ReadOk() is incorrectly memory-ordered on weakly-ordered
architectures. To fix this, we need to introduce an explicit memory fence.

On ARM64, there is no way to implement the memory fence in question without
resorting to assembly, so the implementation is straightforward. On x86, we
introduce a compiler fence, since future compilers might otherwise reorder
memory accesses to after atomic loads; the only apparent way to do so is also
by using assembly, which unfortunately introduces overhead:

- After the call to sync.MemoryFenceReads(), callers zero XMM15 and reload the
  runtime.g pointer from %fs:-8, reflecting the switch from ABI0 to
  ABIInternal. This is a relatively small cost.

- Before the call to sync.MemoryFenceReads(), callers spill all registers to
  the stack, since ABI0 function calls clobber all registers. The cost of this
  depends on the state of the caller before the call, and is not reflected in
  BenchmarkSeqCountReadUncontended (which does not read any protected state
  between the calls to BeginRead() and ReadOk()).

Both of these problems are caused by Go assembly functions being restricted to
ABI0. Go provides a way to mark assembly functions as using ABIInternal
instead, but restricts its use to functions in package runtime
(golang/go#44065). runtime.publicationBarrier(),
which is semantically "sync.MemoryFenceWrites()", is implemented as a compiler
fence on x86; defining sync.MemoryFenceReads() as an alias for that function
(using go:linkname) would mitigate the former problem, but not the latter.
Thus, for simplicity, we define sync.MemoryFenceReads() in (ABI0) assembly, and
have no choice but to eat the overhead.

("Fence" and "barrier" are often used interchangeably in this context; Linux
uses "barrier" (e.g. `smp_rmb()`), while C++ uses "fence" (e.g.
`std::atomic_memory_fence()`). We choose "fence" to reduce ambiguity with
"write barriers", since Go is a GC'd language.)

PiperOrigin-RevId: 572675753
copybara-service bot pushed a commit to google/gvisor that referenced this issue Oct 16, 2023
sync.SeqCount relies on the following memory orderings:

- All stores following BeginWrite() in program order happen after the atomic
  read-modify-write (RMW) of SeqCount.epoch. In the Go 1.19 memory model, this
  is implied by atomic loads being acquire-seqcst.

- All stores preceding EndWrite() in program order happen before the RMW of
  SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic stores
  being release-seqcst.

- All loads following BeginRead() in program order happen after the load of
  SeqCount.epoch. In the Go 1.19 memory model, this is implied by atomic loads
  being acquire-seqcst.

- All loads preceding ReadOk() in program order happen before the load of
  SeqCount.epoch. The Go 1.19 memory model does not imply this property.

The x86 memory model *does* imply this final property, and in practice the
current Go compiler does not reorder memory accesses around the load of
SeqCount.epoch, so sync.SeqCount behaves correctly on x86.
However, on ARM64, the instruction that is actually emitted for the atomic load
from SeqCount.epoch is LDAR:

```
gvisor/pkg/sentry/kernel/kernel.SeqAtomicTryLoadTaskGoroutineSchedInfo():
gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:34
  56371c:       f9400025        ldr     x5, [x1]
  563720:       f9400426        ldr     x6, [x1, #8]
  563724:       f9400822        ldr     x2, [x1, #16]
  563728:       f9400c23        ldr     x3, [x1, #24]
gvisor/pkg/sentry/kernel/seqatomic_taskgoroutineschedinfo_unsafe.go:36
  56372c:       d503201f        nop
gvisor/pkg/sync/sync.(*SeqCount).ReadOk():
gvisor/pkg/sync/seqcount.go:107
  563730:       88dffc07        ldar    w7, [x0]
  563734:       6b0400ff        cmp     w7, w4
```

LDAR is explicitly documented as not implying the required memory ordering:
https://developer.arm.com/documentation/den0024/latest/Memory-Ordering/Barriers/One-way-barriers
Consequently, SeqCount.ReadOk() is incorrectly memory-ordered on weakly-ordered
architectures. To fix this, we need to introduce an explicit memory fence.

On ARM64, there is no way to implement the memory fence in question without
resorting to assembly, so the implementation is straightforward. On x86, we
introduce a compiler fence, since future compilers might otherwise reorder
memory accesses to after atomic loads; the only apparent way to do so is also
by using assembly, which unfortunately introduces overhead:

- After the call to sync.MemoryFenceReads(), callers zero XMM15 and reload the
  runtime.g pointer from %fs:-8, reflecting the switch from ABI0 to
  ABIInternal. This is a relatively small cost.

- Before the call to sync.MemoryFenceReads(), callers spill all registers to
  the stack, since ABI0 function calls clobber all registers. The cost of this
  depends on the state of the caller before the call, and is not reflected in
  BenchmarkSeqCountReadUncontended (which does not read any protected state
  between the calls to BeginRead() and ReadOk()).

Both of these problems are caused by Go assembly functions being restricted to
ABI0. Go provides a way to mark assembly functions as using ABIInternal
instead, but restricts its use to functions in package runtime
(golang/go#44065). runtime.publicationBarrier(),
which is semantically "sync.MemoryFenceWrites()", is implemented as a compiler
fence on x86; defining sync.MemoryFenceReads() as an alias for that function
(using go:linkname) would mitigate the former problem, but not the latter.
Thus, for simplicity, we define sync.MemoryFenceReads() in (ABI0) assembly, and
have no choice but to eat the overhead.

("Fence" and "barrier" are often used interchangeably in this context; Linux
uses "barrier" (e.g. `smp_rmb()`), while C++ uses "fence" (e.g.
`std::atomic_memory_fence()`). We choose "fence" to reduce ambiguity with
"write barriers", since Go is a GC'd language.)

PiperOrigin-RevId: 573861378
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. umbrella
Projects
None yet
Development

No branches or pull requests

6 participants