Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: provide Pinner API for object pinning #46787

Closed
ansiwen opened this issue Jun 16, 2021 · 117 comments
Closed

runtime: provide Pinner API for object pinning #46787

ansiwen opened this issue Jun 16, 2021 · 117 comments
Assignees
Labels
Milestone

Comments

@ansiwen
Copy link
Contributor

ansiwen commented Jun 16, 2021

Update, 2021-10-20: the latest proposal is the API in #46787 (comment).


Problem Statement

The pointer passing rules state:

Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers.

and

Go code may not store a Go pointer in C memory.

There are C APIs, most notably the iovec based ones for vectored I/O which expect an array of structs that describe buffers to read to or write from. The naive approach would be to allocate both the array and the buffers with C.malloc() and then either work on the C buffers directly or copy the content to Go buffers. In the case of Go bindings for a C API, which is assumably the most common use case for Cgo, the users of the bindings shouldn't have to deal with C types, which means that all data has to be copied into Go allocated buffers. This of course impairs the performance, especially for larger buffers. Therefore it would be desirable to have a safe possibility to let the C API write directly into the Go buffers. This, however, is not possible because

  • either the buffer array is allocated in C memory, but then the pointers of the Go buffers can't be stored in it. (Storing Go pointers in C memory is forbidden.)
  • or the buffer array is allocated in Go memory and the Go buffer pointers are stored in it. But then the pointer to that buffer array can't be passed to a C function. (Passing a Go pointer that points to memory containing other Go pointers to a C function is forbidden.)

Obviously, what is missing is a safe way to pin an arbitrary number of Go pointers in order to store them in C memory or in passed-to-C Go memory for the duration of a C call.

Workarounds

Break the rules and store the Go pointer in C memory

(click)

with something like

IovecCPtr.iov_base = unsafe.Pointer(myGoPtr)

but GODEBUG=cgocheck=2 would catch that.

However, you can circumvent cgocheck=2 with this casting trick:

*(*uintptr)(unsafe.Pointer(&IovecCPtr.iov_base)) = uintptr(myGoPtr)

This might work, as long as the GC is not moving the pointers, which might be a fact as of now, but is not guaranteed.

Break the rules and hide the Go pointer in Go memory

(click)

with something like

type iovecT struct {
  iov_base uintptr
  iov_len  C.size_t
}
iovec := make([]iovecT, numberOfBuffers)
for i := range iovec {
  bufferPtr := unsafe.Pointer(&bufferArray[i][0])
  iovec[i].iov_base = uintptr(bufferPtr)
  iovec[i].iov_len = C.size_t(len(bufferArray[i]))
}
n := C.my_iovec_read((*C.struct_iovec)(unsafe.Pointer(&iovec[0])), C.int(numberOfBuffers))

Again: This might work, as long as the GC is not moving the pointers. GODEBUG=cgocheck=2 wouldn't complain about this.

Break the rules and temporarily disable cgocheck

(click)

If hiding the Go pointer as a uintptr like in the last workaround is not possible, passing Go memory that contains Go pointers usually bails out because of the default cgocheck=1 setting. It is possible to disable temporarily cgocheck during a C call, which can especially useful, when the pointer have been "pinned" with one of the later workarounds. For example the _cgoCheckPtr() function, that is used in the generated Cgo code, can be shadowed in the local scope, which disables the check for the following C calls in the scope:

func ... {
  _cgoCheckPointer := func(interface{}, interface{}) {}
  C.my_c_function(x, y)
}

Maybe slightly more robust, is to export the runtime.dbgvars list:

type dbgVar struct {
	name  string
	value *int32
}

//go:linkname dbgvars runtime.dbgvars
var dbgvars []dbgVar

var cgocheck = func() *int32 {
	for i := range dbgvars {
		if dbgvars[i].name == "cgocheck" {
			return dbgvars[i].value
		}
	}
	panic("Couln't find cgocheck debug variable")
}()

func ... {
	before := *cgocheck
	*cgocheck = 0
	C.my_c_function(x, y)
	*cgocheck = before
}

Use a C function to store the Go pointer in C memory

(click)

The rules allow that a C function stores a Go pointer in C memory for the duration of the call. So, for each Go pointer a C function can be called in a Go routine, that stores the Go pointer in C memory and then calls a Go function callback that waits for a release signal. After the release signal is received, the Go callback returns to the C function, the C function clears the C memory from the Go pointer, and returns as well, finishing the Go routine.

This approach fully complies with the rules, but is quite expensive, because each Go routine that calls a C function creates a new thread, that means one thread per stored Go pointer.

Use the //go:uintptrescapes compiler directive

(click)

//go:uintptrescapes is a compiler directive that

specifies that the function's uintptr arguments may be pointer values that have been converted to uintptr and must be treated as such by the garbage collector.

So, similar to the workaround before, a Go function with this directive can be called in a Go routine, which simply waits for a release signal. When the signal is received, the function returns and sets the pointer free.

This seems already almost like a proper solution, so that I implemented a package with this approach, that allows to Pin() a Go pointer and Poke() it into C memory: PtrGuard

But there are still caveats. The compiler and the runtime (cgocheck=2) don't seem to know about which pointers are protected by the directive, because they still don't allow to pass Go memory containing these Go pointers to a C function, or to store the pointers in C memory. Therefore the two first workarounds are additionally necessary. Also there is the small overhead for the Go routine and the release signalling.

Proposal

It would make Cgo a lot more usable for C APIs with more complex pointer handling like iovec, if there would be a programmatic way to provide what //go:uintptrescapes provides already through the backdoor. There should be a possibility to pin an arbitrary amount of Go pointers in the current scope, so that they are allowed to be stored in C memory or be contained in Go memory that is passed to a C function within this scope, for example with a runtime.PtrEscapes() function. It's cumbersome, that it's required to abuse Go routines, channels and casting tricks in order provide bindings to such C APIs. As long as the Go GC is not moving pointers, it could be a trivial implementation, but it would encapsulate this knowledge and would give users a guarantee.

I know from the other issues and discussions around this topic that it's seen as dangerous if it is possible to pin an arbitrary amount of pointers. But

  1. it is possible to call an arbitrary amount of C or //go:uintptrescapes functions, therefore it is also possible to pin arbitrary amount of Go pointers already.
  2. it is necessary for some C APIs

Related issues: #32115, #40431

/cc @ianlancetaylor @rsc @seebs

edit: the first workaround had an incorrect statement.
edit 2: add workarounds for disabling cgocheck

@gopherbot gopherbot added this to the Proposal milestone Jun 16, 2021
@DeedleFake
Copy link

DeedleFake commented Jun 16, 2021

From what I can tell from the documentation for the new cgo.Handle, it's intended only for a situation where a pointer needs to be passed from Go to C and then back to Go without the C code doing anything with what it points to. As it passes a handle ID, not a real pointer, the C code can't actually get access to the actual data. Maybe a function could be provided on the C side that takes a handle ID and returns the original pointer, thus allowing the C code to access the data? Would that solve this issue?

Edit: Wait, that doesn't make sense. Could you just use Handle to make sure that it's held onto? Could the definition of Handle be extended to mean that the pointer itself is valid for the duration of the Handle's existence? In other words, this would be defined to be valid:

// void doSomethingWithAPointer(int *a);
import "C"

func main() {
  v := C.int(3)
  h := cgo.NewHandle(&v)
  doSomethingWithAPointer(&v) // Safe because the handle exists for that pointer.
  h.Delete()
}

Alternatively, if that's not feasible, what about a method on Handle that returns a valid pointer for the given value?

// Pointer returns a C pointer that points to the underlying value of the handle
// and is valid for the life of the handle.
func (h Handle) Pointer() C.uintptr_t

Disclaimer: I'm not familiar enough with the internals of either the Go garbage collector or Cgo to know if either of these even make sense.

@ansiwen
Copy link
Contributor Author

ansiwen commented Jun 16, 2021

@DeedleFake As you pointed out yourself, the cgo.Handle has a very different purpose. It's just a registry for a map from a C compatible arbitrary ID (uintptr) to an arbitrary Go value. It's purpose is to refer to a Go value in the C world, not to access it from there. It doesn't affect the behavior of the garbage collector, which could still freely move around the values in the Handle map, and would never delete them, since they are referenced by the map.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jun 16, 2021
@ianlancetaylor
Copy link
Contributor

An big advantage of the current cgo mechanisms, including go:uintptrescapes, is that the pointers are automatically unpinned when the cgo function returns. As far as I can see you didn't propose any particular mechanism for pinning pointers, but it would be very desirable to somehow ensure that the pointers are unpinned. Otherwise code could easily get into scenarios in which pointers remain pinned forever, which if Go ever implements a full moving garbage collector will cause the garbage collector to silently behave quite poorly. In other words, some APIs that could solve this problem will be be footguns: code that can easily cause a program to silently behave badly in ways that will be very hard to detect.

It's hard to say more without a specific API to discuss. If you suggested one, my apologies for missing it.

@ansiwen
Copy link
Contributor Author

ansiwen commented Jun 17, 2021

@ianlancetaylor thanks for taking the time to answer.

An big advantage of the current cgo mechanisms, including go:uintptrescapes, is that the pointers are automatically unpinned when the cgo function returns.

I agree, that is an advantage. However, with go routines it's trivial to fire-and-forget thousands of such function calls, that never return.

As far as I can see you didn't propose any particular mechanism for pinning pointers, but it would be very desirable to somehow ensure that the pointers are unpinned. Otherwise code could easily get into scenarios in which pointers remain pinned forever, which if Go ever implements a full moving garbage collector will cause the garbage collector to silently behave quite poorly. In other words, some APIs that could solve this problem will be be footguns: code that can easily cause a program to silently behave badly in ways that will be very hard to detect.

I didn't describe a specific API, that's true. I hoped that this could be developed here together once we agreed on the requirements. One of the requirements that I mentioned was, that the pinning happens only for the current scope. That implies automatic unpinning when the scope is left. Sorry that I didn't make that clear enough. So, to rephrase more compactly, the requirements would be:

  • possibility to pin pointers in the current scope (exactly as if they would be the argument of a C function call)
  • automatic unpinning when the current scope is left (the current function returns)
  • cgocheck knows about the pinning and does not complain

It's hard to say more without a specific API to discuss. If you suggested one, my apologies for missing it.

As stated above, I didn't want to suggest a specific API, but characteristics of it. In the end it could be a function like runtime.PtrEscapes(unsafe.Pointer). The usage could look like this:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
  numberOfBuffers := len(bufferArray)

  iovec := make([]C.struct_iovec, numberOfBuffers)

  for i := range iovec {
    bufferPtr := unsafe.Pointer(&bufferArray[i][0])
    runtime.PtrEscapes(bufferPtr) // <- pins the pointer and makes it known to escape to C
    iovec[i].iov_base = bufferPtr
    iovec[i].iov_len = C.size_t(len(bufferArray[i]))
  }

  n := C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
  // ^^^ cgocheck doesn't complain, because Go pointers in iovec are pinned
  return int(n) // <- all pinned pointers in iovec are unpinned
}

As long as the GC is not moving, runtime.PtrEscapes() is almost a no-op, it would basically only tell cgocheck not to bail out for these pointers. But users would have a guarantee, that if the GC becomes moving later, this function will take care of it.

Regarding footguns I'm pretty sure, that the workarounds, that have to be used at the moment to solve these problems, will cause more "programs to silently behave badly" than the potential abuse of a proper pinning method.

@bcmills
Copy link
Member

bcmills commented Jun 17, 2021

it would be very desirable to somehow ensure that the pointers are unpinned

Drawing from runtime.KeepAlive, one possibility might be something like:

package runtime

// Pin prevents the object to which p points from being relocated until
// the returned PointerPin either is unpinned or becomes unreachable.
func Pin[T any](p *T) PointerPin

type PointerPin struct {…}
func (p PointerPin) Unpin() {}

Then the example might look like:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
	numberOfBuffers := len(bufferArray)

	iovec := make([]C.struct_iovec, numberOfBuffers)

	for i := range iovec {
		bufferPtr := unsafe.Pointer(&bufferArray[i][0])
		defer runtime.Pin(bufferPtr).Unpin()
		iovec[i].iov_base = bufferPtr
		iovec[i].iov_len = C.size_t(len(bufferArray[i]))
	}

	n := C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
	return int(n)
}

A vet warning could verify that the result of runtime.Pin is used, to ensure that it is not accidentally released too early (see also #20803).

@phlogistonjohn
Copy link

@ansiwen when you write "automatic unpinning when the current scope is left (the current function returns)" the current scope you refer to is the scope of the Go function correct? In your example that would be ReadFileIntoBufferArray.
I'm trying to double check what the behavior would be regarding if we needed to make multiple calls into C using the same pointer.

@bcmills version also looks very natural flowing to me, and in that version it's clear that the pointer would be pinned until the defer at the end of ReadFileIntoBufferArray.

@ansiwen
Copy link
Contributor Author

ansiwen commented Jun 17, 2021

@ansiwen when you write "automatic unpinning when the current scope is left (the current function returns)" the current scope you refer to is the scope of the Go function correct? In your example that would be ReadFileIntoBufferArray.

@phlogistonjohn Yes, exactly.

@bcmills version also looks very natural flowing to me, and in that version it's clear that the pointer would be pinned until the defer at the end of ReadFileIntoBufferArray.

Yes, I also would prefer @bcmills version from a user's perspective, because it's more explicit and it's basically the same API that we use with PtrGuard.

I just don't know enough about the implications on the implementation side and effects on the Go internals, so I don't know what API would be more feasible. My proposal is about providing an official way to solve the described problem. I really don't care so much about the "form", that is how exactly the API looks like. Whatever works best with the current Go and Cgo implementation. 😊

@ansiwen
Copy link
Contributor Author

ansiwen commented Jun 18, 2021

@bcmills I guess, an argument @ianlancetaylor might bring up against your API proposal is, that it would allow to store the PointerPin value in a variable and keep them pinned for an unlimited time, so it would not "ensure that the pointers are unpinned". If the unpinning is implicit, it is more comparable to //go:uintptrescapes.

@ansiwen
Copy link
Contributor Author

ansiwen commented Jun 24, 2021

@ianlancetaylor

it would be very desirable to somehow ensure that the pointers are unpinned.

So, if you want to enforce the unpinning, the only strict RAII pattern in Go that I could come up with is using a scoped constructor like this API:

package runtime

// Pinner is the context for pinning pointers with Pin()
// can't be copied or constructed outside a Pinner scope
type Pinner struct {…}

// Pin prevents the object to which p points from being relocated until
// Pinner becomes invalid.
func (Pinner) Pin(p unsafe.Pointer) {...}

func WithPinner(func(Pinner)) {...}

which would be used like this:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
    numberOfBuffers := len(bufferArray)
    
    iovec := make([]C.struct_iovec, numberOfBuffers)

    var n C.ssize_t
    runtime.WithPinner(func (pinner runtime.Pinner) {
        for i := range iovec {
            bufferPtr := unsafe.Pointer(&bufferArray[i][0])
            pinner.Pin(bufferPtr)
            iovec[i].iov_base = bufferPtr
            iovec[i].iov_len = C.size_t(len(bufferArray[i]))
        }
        
        n = C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
    }) // <- All pinned pointers are released here and pinner is invalidated (in case it's copied out of scope).
    return int(n)
}

I personally would prefer a thinner API, where either it must be explicitly unpinned, like in the proposal of @bcmills, or - even better - the pinning implicitly creates a defer for the scope in which the pinning function has been called from. Given, that this will be implemented in the runtime package, I guess there are tricks and magic that can be used there.

@Merovius
Copy link
Contributor

@ansiwen Even with the func API you suggest, a user might store the argument in a closed-over variable, to have it survive the function. In general, as long as the pin is represented by some value, we can't prevent that value from being kept around. So I don't think your version has significant safety-benefits as to compared to @bcmills, while being less wieldy and also potentially heavier in runtime cost (the closure might make it easier for things to escape).

Personally, as long as the PointerPin has to be intentionally kept around, I think that's fine. I think the suggestion to unpin when the PointerPin becomes unreachable already makes it sufficiently hard to shoot yourself in the foot to tolerate the risk. And we might be able to use go vet for additional safety (like warning if the result of Pin is assigned to a global var or something).

@ansiwen
Copy link
Contributor Author

ansiwen commented Jun 30, 2021

@Merovius

@ansiwen Even with the func API you suggest, a user might store the argument in a closed-over variable, to have it survive the function. In general, as long as the pin is represented by some value, we can't prevent that value from being kept around. So I don't think your version has significant safety-benefits as to compared to @bcmills, while being less wieldy and also potentially heavier in runtime cost (the closure might make it easier for things to escape).

The "keeping-around" can easily be prevented by one pointer indirection that get's invalidated when the scope is left. You can have a look at my implementation of PtrGuard that even has test case for exactly the case of a scope escaping variable.

Personally, as long as the PointerPin has to be intentionally kept around, I think that's fine. I think the suggestion to unpin when the PointerPin becomes unreachable already makes it sufficiently hard to shoot yourself in the foot to tolerate the risk. And we might be able to use go vet for additional safety (like warning if the result of Pin is assigned to a global var or something).

Yeah, I agree, as I wrote before, I'm totally fine with both. It's just something I came up with to address @ianlancetaylor's concerns. I also think that the risks are "manageable", there are all kinds of other risks when dealing with runtime and/or unsafe packages after all.

@rsc rsc moved this from Incoming to Active in Proposals (old) Aug 4, 2021
@beoran
Copy link

beoran commented Aug 4, 2021

I think that the API proposed by @bcmills is the most useful one. Although there is a risk of forgetting to unpin a pointer, once Go gets a moving garby collector, for certain low level uses, certain blocks of memory will have to stay pinned for the duration of the program. Certainly for system calls in Linux, such as for the frame buffers. In other words, Pin and Unpin are also useful without cgo.

@hnes
Copy link

hnes commented Aug 17, 2021

Hi @rsc, any updates on this issue recently? I noticed it has been several days after the 2021-08-04's review meeting minutes.

@rsc
Copy link
Contributor

rsc commented Aug 18, 2021

The compiler/runtime team has been talking a bit about this but don't have any clear suggestions yet.

The big problem with pinning is that if we ever want a moving garbage collector in the future, pins will make it much more complex. That's why we've avoided it so far.

/cc @aclements

@ansiwen
Copy link
Contributor Author

ansiwen commented Aug 18, 2021

The big problem with pinning is that if we ever want a moving garbage collector in the future, pins will make it much more complex. That's why we've avoided it so far.

@rsc But my point in the description was, that we have pinning already when C functions are called with Go pointers or when the //go:uintptrescapes directive is used. So the situation is complex already, isn't it?

@beoran
Copy link

beoran commented Aug 18, 2021

@rsc I would say the converse is also true. If you are going to implement a moving garbage collector without support for pinning, that will make it much more complex to use Go for certain direct operating calls without cgo, e.g. on Linux.
In other words, as @ansiwen says, there's really no way to avoid that complexity. And therefore I think it would be better if Go supported it explicitly than through workarounds.

@ianlancetaylor
Copy link
Contributor

Unbounded pinning has the potential to be significantly worse than bounded pinning. If people accidentally or intentionally leave many pointers pinned, that can fragment the spaces that the GC uses, and make it very hard for a moving GC to make any progress at all. This can in principle happen with cgo today, but it is unlikely that many programs pass a bunch of pointers to a cgo function that never returns. When programmers control the pinning themselves, bugs are more likely. If the bug is in some imported third party library, the effect will be strange garbage collection behavior for the overall program. This will be hard to understand and hard to diagnose, and it will be hard to find the root cause. (One likely effect will be a set of tools similar to the memory profiler that track pinned pointers.)

It's also worth noting that we don't have a moving garbage collector today, so any problems that pinned pointers may introduce for a moving garbage collector will not be seen today. So if we ever do introduce a moving garbage collector, we will have a flag day of hard-to-diagnose garbage collection problems. This will make it that much harder to ever change the garbage collector in practice.

So I do not think the current situation is nearly as complex as the situation would be if we add unbounded pinning. This doesn't mean that we shouldn't add unbounded pinning. But I think that it does mean that the argument for it has to be something other than "we can already pin pointers today."

@beoran
Copy link

beoran commented Aug 19, 2021

@ianlancetaylor That is fair enough. But then it seems to me the best way ahead is to put this issue on hold until we can implement a prototype moving garbage collector.

There is always a workaround if there is no pinning available and that is to manually allocate memory directly from the OS so the GC doesn't know about it. It is not ideal but it can work.

@egonelbre
Copy link
Contributor

Yeah, one workaround that is missing from the discussion is hiding the C api allocation concerns, e.g. iovec could be implemented like:

package iovec

type Buffers struct {
	Data [][]byte

	data *C.uint8_t
	list *C.iovecT
}

func NewBuffers(sizes []int) *Buffers {
	...
	// C.malloc everything
	// cast from *C.uint8_t to []byte
}

func (buffers *Buffers) ReadFrom(f *os.File) error { ...

Or in other words, from the problem statement, it's unclear why it's required to use bufferArray [][]byte as the argument.

@ansiwen
Copy link
Contributor Author

ansiwen commented Aug 20, 2021

@ianlancetaylor

So I do not think the current situation is nearly as complex as the situation would be if we add unbounded pinning. This doesn't mean that we shouldn't add unbounded pinning. But I think that it does mean that the argument for it has to be something other than "we can already pin pointers today."

Let's separate the two questions "pinning yes/no" and "pinning bounded/unbounded".

pinning yes/no

I also proposed

  1. an API that allows bounded pinning (runtime.WithPinner()).
  2. the potential possibility of a runtime.Pin() with no return value and an implicit defer that automatically gets unpinned when the current function returns.

Both provide a similar behaviour as the //go:uintptrescapes directive, if that is what you mean with "bounded". What do you think of these options?

pinning bounded/unbounded

  1. when we will have a moving GC, there will always be also a possibility to pin pointer or pause the moving, so this needs to be implemented in any case. Is this correct?
  2. when people leave pointers pinned, the GC will behave like a non-moving GC, so there is no regression beyond our current status-quo, right? So, what exactly do you mean with "hard-to-diagnose garbage collection problems"?
  3. would the risk of many unpinned pointers not be similar to that of memory leaks, like with global dynamic data structures, that are possible now? I know, memory fragmentation is potentially worse than just allocating memory, but the effect would be similar: OOM errors.

For me personally the first question is more important. Bounded or unbounded, I think the existing and required ways of pinning should be made less hacky in their usage.

@egonelbre

Or in other words, from the problem statement, it's unclear why it's required to use bufferArray [][]byte as the argument.

The bufferArray [][]byte is just a placeholder for an arbitrary "native Go data structure". As the problem statement mentions, the goal is to avoid copying of the data. Especially vectored I/O is used for big amounts of data, so depending on the use case, you can't choose the target data structure by yourself, but it is provided by another library that you intend to use (let's say video processing for example). That would mean, that in all these cases you have to copy the data from your own C allocated data structure to the Go-allocated target data structure of your library, for no good reason.

@ianlancetaylor
Copy link
Contributor

when we will have a moving GC, there will always be also a possibility to pin pointer or pause the moving, so this needs to be implemented in any case. Is this correct?

In some manner, yes.

when people leave pointers pinned, the GC will behave like a non-moving GC, so there is no regression beyond our current status-quo, right? So, what exactly do you mean with "hard-to-diagnose garbage collection problems"?

A GC that is based on moving pointers is not the same as a GC that does not move pointers. A GC based on moving pointers may be completely blocked by a pinned pointer, whereas for a non-moving GC a pinned pointer is just the same as a live pointer.

would the risk of many unpinned pointers not be similar to that of memory leaks, like with global dynamic data structures, that are possible now? I know, memory fragmentation is potentially worse than just allocating memory, but the effect would be similar: OOM errors.

Same answer.

Again, all I am saying is that arguments based on "we already support pinned pointers, so it's OK to add more" are not good arguments. We need different arguments.

@hnes
Copy link

hnes commented Aug 21, 2021

How would we deal with the iovec struct during vectored I/O syscall if we have a GC that is based on moving pointers? Maybe the same solution could also be applied to the pointer pinning we are discussing?

A GC based on moving pointers may be completely blocked by a pinned pointer.

I'm afraid that would badly impact the GC latency or something else if it is true. Please consider the disk i/o syscall that may block a very long time.

@ansiwen
Copy link
Contributor Author

ansiwen commented Aug 21, 2021

@ianlancetaylor

when we will have a moving GC, there will always be also a possibility to pin pointer or pause the moving, so this needs to be implemented in any case. Is this correct?

In some manner, yes.

when people leave pointers pinned, the GC will behave like a non-moving GC, so there is no regression beyond our current status-quo, right? So, what exactly do you mean with "hard-to-diagnose garbage collection problems"?

A GC that is based on moving pointers is not the same as a GC that does not move pointers. A GC based on moving pointers may be completely blocked by a pinned pointer, whereas for a non-moving GC a pinned pointer is just the same as a live pointer.

Since you agreed that the pinning is required in the answer before, I don't understand how such an implementation could be used in Go.

Again, all I am saying is that arguments based on "we already support pinned pointers, so it's OK to add more" are not good arguments. We need different arguments.

I don't think "add more" is the right wording. It's more about exposing the pinning in a better way. And these are not arguments for doing it, but arguments against the supposed risks of doing it.

The argument for doing it should be clear by now: give people a zero-copy way to use APIs like iovec with Go data structures in a future proof way. At the moment, that's not possible.

In your answers you skipped the first part about the bounded pinning. If you have the time to comment on these too, I would be very interested. 😊

@ianlancetaylor
Copy link
Contributor

Since you agreed that the pinning is required in the answer before, I don't understand how such an implementation could be used in Go.

The current system for pinning pointers doesn't permit pointers to be pinned indefinitely, if we discount the unusual case of a C function that does not return.

I agree that other systems that somehow ensure that pointers can't be pinned indefinitely are better. (I don't think that an implicit defer is a good approach for Go, though.)

@aclements
Copy link
Member

I agree with @randall77 that the bit should definitely be per object, but thanks for considering that dimension.

One suggestion was to use the "specials" list in the span struct. However, that seems to add quite some complexity, which would also be less efficient and might not be necessary, because the two checks above only need to know, if an address is pinned or not, and not how many times.

This might just be because I'm used to working with span specials, but your other proposed solutions seem more complicated to me, or at least significantly less efficient. Note that in my proposed approach the pin checks are still a simple bitmap check. Only Pin/Unpin need to consult the specials.

Here's a simpler version of my proposal that wouldn't be quite as efficient, but sticks to a single-bit bitmap:

  • isPinned just checks the bit for the object being queried.
  • Pin checks the bit. If it's 0, it sets it to 1 and creates a special for tracking the count, which it also sets to 1. If it's 1, it looks up the special and increments the count.
  • Unpin looks up the special and decrements it for each pointer in the Pinner. If a special's count drops to 0, it deletes the special and clears the bit.

normal map with a mutex

I do like how simple this is, but I worry about the performance of a single global lock on the map. Perhaps we start with this and then optimize in a follow-up?

a specialized version of the lock-free sync.Map

sync.Map is poorly suited to this problem. Since every access to the map is also a write, this definitely wouldn't fall into the "read-mostly" usage where sync.Map performs well. sync.Map also performs well if goroutines are accessing disjoint keys, but accessing the same keys repeatedly. Pinner may fall into that category, but it depends on how the application uses it.

a lock-free balanced binary tree

I have implemented these. They are extremely hard to get right and don't usually have the performance properties you want them to have, especially if the tree is usually small.

@gopherbot
Copy link

Change https://go.dev/cl/465676 mentions this issue: unix: mitigate uintptr violations in PtraceIO on freebsd

@ansiwen
Copy link
Contributor Author

ansiwen commented Mar 7, 2023

I agree with @randall77 that the bit should definitely be per object, but thanks for considering that dimension.

One suggestion was to use the "specials" list in the span struct. However, that seems to add quite some complexity, which would also be less efficient and might not be necessary, because the two checks above only need to know, if an address is pinned or not, and not how many times.

This might just be because I'm used to working with span specials, but your other proposed solutions seem more complicated to me, or at least significantly less efficient. Note that in my proposed approach the pin checks are still a simple bitmap check. Only Pin/Unpin need to consult the specials.

@aclements thanks for your answer. Maybe there is a misunderstanding? My suggestion would have identical efficiency, IIUC, because both our proposals would use the bitmap for the "isPinned()" question. The difference would only be, if the pin counters are stored in the specials list, or in another data structure like a map, which is only accessed by Pin/Unpin functions. So efficiency regarding the gocheck code would be the same for bitmap+specials and bitmap+map. Right?

Here's a simpler version of my proposal that wouldn't be quite as efficient, but sticks to a single-bit bitmap:

  • isPinned just checks the bit for the object being queried.
  • Pin checks the bit. If it's 0, it sets it to 1 and creates a special for tracking the count, which it also sets to 1. If it's 1, it looks up the special and increments the count.
  • Unpin looks up the special and decrements it for each pointer in the Pinner. If a special's count drops to 0, it deletes the special and clears the bit.

My proposal would be exactly the same, only that it stores the counter in a map, not in the specials list.

If there is a good reason, I can do the specials-list approach, but I though a map-based solution would be less error prone.

normal map with a mutex

I do like how simple this is, but I worry about the performance of a single global lock on the map. Perhaps we start with this and then optimize in a follow-up?

That sounds good.

a specialized version of the lock-free sync.Map

sync.Map is poorly suited to this problem. Since every access to the map is also a write, this definitely wouldn't fall into the "read-mostly" usage where sync.Map performs well. sync.Map also performs well if goroutines are accessing disjoint keys, but accessing the same keys repeatedly. Pinner may fall into that category, but it depends on how the application uses it.

Right.

a lock-free balanced binary tree

I have implemented these. They are extremely hard to get right and don't usually have the performance properties you want them to have, especially if the tree is usually small.

Yeah, I agree.

So the last open question is then: bitmap+(counter in mutex+map) or bitmap+(counter in specials-list).

As I said before, I think mutex+map is less error prone and the performance difference for Pin/Unpin should not matter. But I'm happy to be convinced otherwise and go for the specials-list.

Thanks

@ansiwen
Copy link
Contributor Author

ansiwen commented Mar 26, 2023

So, I went ahead and implemented multi-pinning with both approaches, a version with a global mutexed map, and then a version based on specials. Both seem to work fine, so I pushed the specials-based version, because that apeared to be @aclements preference. I still have several questions about some code which I added as comments in the latest patch. PTAL: https://go-review.googlesource.com/c/go/+/367296

@DeedleFake
Copy link

You mentioned being unable to use compare-and-swap with a uint8 in a comment. Can you use an atomic.Pointer[uint8], or would that not work in this case?

@ansiwen
Copy link
Contributor Author

ansiwen commented Mar 26, 2023

You mentioned being unable to use compare-and-swap with a uint8 in a comment. Can you use an atomic.Pointer[uint8], or would that not work in this case?

No, a atomic.Pointer[uint8] is still a pointer, which is 64bit in most cases. I would need something that atomically compares-and-swaps only 8 bit.

@ansiwen
Copy link
Contributor Author

ansiwen commented Mar 26, 2023

Here are some benchmarks. Certainly they are not representative for real-world-code, but there are some differences. As expected the specials implementation has a higher baseline, but becomes more efficient, when there are multi-pinnings, especially parallel ones. Overall the difference is not big, and for the "common" use cases the map solution is even a bit better. However, the specials solution could be further tuned for the common cases.

specials:

BenchmarkPinnerPinUnpinBatch-12                 	   16455	     70933 ns/op	   25208 B/op	      12 allocs/op
BenchmarkPinnerPinUnpinBatchTiny-12             	   10000	    105707 ns/op X	   25208 B/op	      12 allocs/op
BenchmarkPinnerPinUnpinBatchDouble-12           	    1677	    680472 ns/op	   60024 B/op	      14 allocs/op
BenchmarkPinnerPinUnpin-12                      	10608369	       109.3 ns/op	      16 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinTiny-12                  	12511864	        96.72 ns/op	       9 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinDouble-12                	 4560363	       259.1 ns/op X	      32 B/op	       3 allocs/op
BenchmarkPinnerPinUnpinParallel-12              	40334288	        32.21 ns/op	      16 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinParallelTiny-12          	48015036	        27.26 ns/op	       9 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinParallelDouble-12        	 3771037	       335.6 ns/op X	      32 B/op	       3 allocs/op

map:

BenchmarkPinnerPinUnpinBatch-12                 	   16312	     61342 ns/op X	   25208 B/op	      12 allocs/op
BenchmarkPinnerPinUnpinBatchTiny-12             	    9375	    110195 ns/op 	   30881 B/op	      83 allocs/op
BenchmarkPinnerPinUnpinBatchDouble-12           	    3589	    304002 ns/op X	  151585 B/op	    1036 allocs/op
BenchmarkPinnerPinUnpin-12                      	11324712	        99.40 ns/op X	      16 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinTiny-12                  	12946808	        87.23 ns/op X	       9 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinDouble-12                	 3223006	       365.2 ns/op	     232 B/op	       6 allocs/op
BenchmarkPinnerPinUnpinParallel-12              	57211736	        21.13 ns/op X	      16 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinParallelTiny-12          	59951550	        18.54 ns/op X	       9 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinParallelDouble-12        	 1886768	       630.3 ns/op	      40 B/op	       4 allocs/op

@ansiwen
Copy link
Contributor Author

ansiwen commented Mar 28, 2023

Updated version beats the maps version in almost all scenarios:

specials (improved):

BenchmarkPinnerPinUnpinBatch-12                 	   17776	     68068 ns/op	   25208 B/op	      12 allocs/op
BenchmarkPinnerPinUnpinBatchTiny-12             	   10000	    102848 ns/op	   25208 B/op	      12 allocs/op
BenchmarkPinnerPinUnpinBatchDouble-12           	    1630	    683130 ns/op	   60024 B/op	      14 allocs/op
BenchmarkPinnerPinUnpin-12                      	12095858	        95.07 ns/op	      16 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinTiny-12                  	12968074	        84.12 ns/op	       9 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinDouble-12                	 4711479	       247.5 ns/op	      32 B/op	       3 allocs/op
BenchmarkPinnerPinUnpinParallel-12              	49601271	        20.32 ns/op	      16 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinParallelTiny-12          	61096221	        18.05 ns/op	       9 B/op	       2 allocs/op
BenchmarkPinnerPinUnpinParallelDouble-12        	 3783782	       320.5 ns/op	      32 B/op	       3 allocs/op

@mknyszek
Copy link
Contributor

mknyszek commented Apr 28, 2023

Hey, sorry for the delay. I'm looking at your CL now but I'm having difficulty verifying that CL against the set of semantics that was actually accepted as part of this proposal. This is mostly because I'm trying to piece it together from this GitHub issue, which isn't going super well. 😅 If they are written down and I just missed it, please point me at it!

In the meantime, here's what the CL's semantics are, AFAICT:

  • Pinned objects will not be moved by the GC.
  • Pinned objects (and their referents) are kept alive via the Pinner.
  • If the Pinner gets collected, and it contains any pinned objects, the GC panics.
  • Directly pinned objects are allowed to have their addresses stored in C memory beyond the scope of a cgo call.
  • Objects referred to by pinned objects must also be pinned.
  • Objects are allowed to be pinned by more than one Pinner, and must be unpinned from each pinner.

Does this sound right? Am I missing anything? Thanks.

EDIT: I updated this on 2023-05-17 to clarify a couple points.

@gopherbot
Copy link

Change https://go.dev/cl/496193 mentions this issue: runtime: fix lockrank ordering for pinner implementation

gopherbot pushed a commit that referenced this issue May 19, 2023
The new Pinner API's implementation imposes some partial-orders that are
safe but previously did not exist between a mspanSpecial, mheapSpecial,
and mheap. Fix that up in the lock ranking.

For #46787.

Change-Id: I51cc8f7f069240caeb44d749bed43515634f4814
Reviewed-on: https://go-review.googlesource.com/c/go/+/496193
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/497615 mentions this issue: runtime: cache inner pinner on P

gopherbot pushed a commit that referenced this issue May 24, 2023
This change caches the *pinner on the P to pool it and reduce the chance
that a new allocation is made. It also makes the *pinner no longer drop
its refs array on unpin, also to avoid reallocating.

The Pinner benchmark results before and after this CL are attached at
the bottom of the commit message.

Note that these results are biased toward the current change because of
the last two benchmark changes. Reusing the pinner in the benchmark
itself achieves similar performance before this change. The benchmark
results thus basically just confirm that this change does cache the
inner pinner in a useful way. Using the previous benchmarks there's
actually a slight regression from the extra check in the cache, however
the long pole is still setPinned itself.

name                                old time/op    new time/op    delta
PinnerPinUnpinBatch-8                 42.2µs ± 2%    41.5µs ± 1%      ~     (p=0.056 n=5+5)
PinnerPinUnpinBatchDouble-8            367µs ± 1%     350µs ± 1%    -4.67%  (p=0.008 n=5+5)
PinnerPinUnpinBatchTiny-8              108µs ± 0%     102µs ± 1%    -6.22%  (p=0.008 n=5+5)
PinnerPinUnpin-8                       592ns ± 8%      40ns ± 1%   -93.29%  (p=0.008 n=5+5)
PinnerPinUnpinTiny-8                   693ns ± 9%      39ns ± 1%   -94.31%  (p=0.008 n=5+5)
PinnerPinUnpinDouble-8                 843ns ± 5%     124ns ± 3%   -85.24%  (p=0.008 n=5+5)
PinnerPinUnpinParallel-8              1.11µs ± 5%    0.00µs ± 0%   -99.55%  (p=0.008 n=5+5)
PinnerPinUnpinParallelTiny-8          1.12µs ± 8%    0.00µs ± 1%   -99.55%  (p=0.008 n=5+5)
PinnerPinUnpinParallelDouble-8        1.79µs ± 4%    0.58µs ± 6%   -67.36%  (p=0.008 n=5+5)
PinnerIsPinnedOnPinned-8              5.78ns ± 0%    5.80ns ± 1%      ~     (p=0.548 n=5+5)
PinnerIsPinnedOnUnpinned-8            4.99ns ± 1%    4.98ns ± 0%      ~     (p=0.841 n=5+5)
PinnerIsPinnedOnPinnedParallel-8      0.71ns ± 0%    0.71ns ± 0%      ~     (p=0.175 n=5+5)
PinnerIsPinnedOnUnpinnedParallel-8    0.67ns ± 1%    0.66ns ± 0%      ~     (p=0.167 n=5+5)

name                                old alloc/op   new alloc/op   delta
PinnerPinUnpinBatch-8                 20.1kB ± 0%    20.0kB ± 0%    -0.32%  (p=0.008 n=5+5)
PinnerPinUnpinBatchDouble-8           52.7kB ± 0%    52.7kB ± 0%    -0.12%  (p=0.008 n=5+5)
PinnerPinUnpinBatchTiny-8             20.1kB ± 0%    20.0kB ± 0%    -0.32%  (p=0.008 n=5+5)
PinnerPinUnpin-8                       64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinTiny-8                   64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinDouble-8                 64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallel-8               64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallelTiny-8           64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallelDouble-8         64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerIsPinnedOnPinned-8               0.00B          0.00B           ~     (all equal)
PinnerIsPinnedOnUnpinned-8             0.00B          0.00B           ~     (all equal)
PinnerIsPinnedOnPinnedParallel-8       0.00B          0.00B           ~     (all equal)
PinnerIsPinnedOnUnpinnedParallel-8     0.00B          0.00B           ~     (all equal)

name                                old allocs/op  new allocs/op  delta
PinnerPinUnpinBatch-8                   9.00 ± 0%      8.00 ± 0%   -11.11%  (p=0.008 n=5+5)
PinnerPinUnpinBatchDouble-8             11.0 ± 0%      10.0 ± 0%    -9.09%  (p=0.008 n=5+5)
PinnerPinUnpinBatchTiny-8               9.00 ± 0%      8.00 ± 0%   -11.11%  (p=0.008 n=5+5)
PinnerPinUnpin-8                        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinTiny-8                    1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinDouble-8                  1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallel-8                1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallelTiny-8            1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallelDouble-8          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerIsPinnedOnPinned-8                0.00           0.00           ~     (all equal)
PinnerIsPinnedOnUnpinned-8              0.00           0.00           ~     (all equal)
PinnerIsPinnedOnPinnedParallel-8        0.00           0.00           ~     (all equal)
PinnerIsPinnedOnUnpinnedParallel-8      0.00           0.00           ~     (all equal)

For #46787.

Change-Id: I0cdfad77b189c425868944a4faeff3d5b97417b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/497615
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ansiwen <ansiwen@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/498116 mentions this issue: cmd/cgo: update pointer passing rules for runtime.Pinner

gopherbot pushed a commit that referenced this issue May 30, 2023
With the introduction of runtime.Pinner, we need to update the cgo
pointer passing rules to accomodate the new functionality. These rule
changes are easier to describe if the rest of the pointer passing rules
are described in terms of pinning as well (Go memory is implicitly
pinned when a pointer to it is passed to a C function, and implicitly
unpinned when that function returns).

For #46787.

Change-Id: I263f03412bc9165f19c9ada72fb005ed0483a8ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/498116
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
@dmitshur dmitshur modified the milestones: Backlog, Go1.21 Jun 4, 2023
Nasfame pushed a commit to golangFame/go that referenced this issue Jun 4, 2023
With the introduction of runtime.Pinner, we need to update the cgo
pointer passing rules to accomodate the new functionality. These rule
changes are easier to describe if the rest of the pointer passing rules
are described in terms of pinning as well (Go memory is implicitly
pinned when a pointer to it is passed to a C function, and implicitly
unpinned when that function returns).

For golang#46787.

Change-Id: I263f03412bc9165f19c9ada72fb005ed0483a8ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/498116
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/527702 mentions this issue: runtime: clarify error when returning unpinned pointers

gopherbot pushed a commit that referenced this issue Nov 8, 2023
With the introduction of runtime.Pinner, returning a pointer to a pinned
struct that then points to an unpinned Go pointer is correctly caught.

However, the error message remained as "cgo result has Go pointer",
which should be updated to acknowledge that Go pointers to pinned
memory are allowed.

This also updates the comments for cgoCheckArg and cgoCheckResult
to similarly clarify.

Updates #46787

Change-Id: I147bb09e87dfb70a24d6d43e4cf84e8bcc2aff48
GitHub-Last-Rev: 706facb
GitHub-Pull-Request: #62606
Reviewed-on: https://go-review.googlesource.com/c/go/+/527702
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Accepted
Development

No branches or pull requests