cmd/cgo: do not let Go pointers end up in C #8310

rsc · 2014-07-01T20:21:46Z

We need some way to prevent Go pointers from crossing the boundary into C.

People write code like (calling a C function that takes a T* to fill in).

var x C.T
C.f(&x)

One option is to disallow this - a Go pointer &x is being passed to C.

Another option is to have cgo's wrapper for C.f do a shallow copy into an allocated C
pointer, then run f, then copy the data back into the original Go data, then free the
allocated C pointer, then return.

If the second option handles 99% of the pointers being passed, then we probably should
do it. On the other hand, the free might create a dangling pointer that could confuse
people. At least it's a dangling pointer into the C heap and not the Go heap. :-)

griesemer · 2014-07-01T21:45:02Z

Comment 2:

Just for reference, and not meant as implementation suggestion: Java's JNI passes
pointers as "handles": A JNI-wrapper (a little stub, generated on-the fly for each call
signature) pushes the arguments on the stack again in C (reverse) order. For each Java
object, the stack address of the corresponding Java pointer is pushed. In C, that stack
address (which is immutable for the duration of the C call) acts as the handle via which
the underlying Java object is accessed. This permits GC to move objects around, but
comes at the cost of an extra (hopefully inlined) function call and indirection for each
Java data structure access from within C. It also requires the respective C code to be
written specifically with JNI in mind.

ianlancetaylor · 2014-07-01T21:54:12Z

Comment 3:

I think JNI is a good example of something we don't want.

randall77 · 2014-07-02T14:38:14Z

Comment 4:

Another option is to pin pointers that are passed into C.f for the duration of C.f. 
This allows the C code to use raw pointers, but does not allow the C code to squirrel
the pointer away past the immediate call.
There's no way to enforce the no-squirrel rule, unfortunately.
This requires a mechanism to pin objects in the heap.  If/when we have a copying
collector this will be a pain, but might be necessary for other reasons (runtime
internals?).

rsc · 2014-07-02T15:20:41Z

Comment 5:

There's no way to enforce the no-squirrel rule, and also if we have any kind of write
barrier on pointer updates then the C code would not respect that, which could be
problematic. I think it will be much simpler in the long run if we keep C from ever
seeing a Go pointer.

dvyukov · 2014-07-02T15:36:34Z

Comment 6:

Updating Go objects from C code never worked. C code runs concurrently with GC. GC needs
to see consistent picture of the world. If C updates Go objects, GC corrupts heap.

rsc · 2014-07-02T17:23:06Z

Comment 7:

All the more reason to not let Go pointers end up in C.

dominikh · 2014-07-02T18:39:14Z

Comment 8:

Does this issue affect all Go pointers? The following is a somewhat common pattern in
cgo land: `x := make([]byte, 4096); C.foo(&x[0], len(x))`

minux · 2014-07-02T18:42:53Z

Comment 9:

Re #8, it will affect *all* Go pointers crossing language boundary.

bryanturley · 2014-07-10T15:00:19Z

Comment 10:

I am confused, Is this for an upcoming copying collector?
Would the basic read/write syscalls need to be double buffered?
I think the problem isn't stated clearly.

ianlancetaylor · 2014-07-10T15:30:51Z

Comment 11:

One goal is to make it possible to write a copying collector.  Another goal is to avoid
the possibility of the Go garbage collector collecting a pointer that has been passed to
C code.

gopherbot · 2014-07-31T23:05:01Z

Comment 12:

Please keep in mind leveldb, sqlite etc where one might actually want to store binary
blobs of data that e.g. came from a network socket. There has to be a better way than an
extra copy round for everything, e.g. for when you know the data pointed to does not
contain further pointers.

andlabs · 2014-08-03T21:41:06Z

Comment 13:

The copying proposal would be bad for code like my ui package, which wants to store a Go
pointer on the C side for the purposes of giving it back to Go when an event comes in.
dominikh says he also has callback functions which work similarly.
I don't know how you intend to deal with this,  unless it's something through
unsafe.Pointer, but then you would need to use unsafe.Pointer for C-side types allocated
in Go, which is also ugly.

alexbrainman · 2014-08-03T23:13:40Z

Comment 14:

@pietro10, windows standard packages pass pointers to Go objects into Windows kernel in
a few places too. So the situation is known.
Alex

RLH · 2014-08-04T12:54:48Z

Comment 15:

Passing a Go heap pointer to C can break today's 1.3 GC which does not scan
C stacks / variables.

ianlancetaylor · 2014-08-04T14:36:49Z

Comment 16:

In the long run there just isn't any way that we can support passing a Go pointer into
C.  We want to make it possible to write a moving garbage collector, but that means that
the garbage collector has to be able to change all pointers.  The GC won't be able to
see pointers stored in C code.  So we can't permit that.
To pass a pointer into C and get it back you will have to do something like use a
map[uintptr]*X.  When you want to pass the pointer down, convert to uintptr and store
the pointer in the map.  When you get it back, use the map to get the real Go pointer.

alexbrainman · 2014-08-04T23:21:30Z

Comment 17:

Ian, would you be kind enough to show small program to make it clear for me? Thank you.
Alex

ianlancetaylor · 2014-08-04T23:44:31Z

Comment 18:

var ptrs map[uintptr]*T
func F(p *T) {
        u := uintptr(unsafe.Pointer(p))
        if ptrs[u] != nil {
                panic("need more sophisticated scheme")
        }
        ptrs[u] = p
        C.store(u)
}
func G() *T {
        u := C.fetch()
        return ptrs[u]
}

alexbrainman · 2014-08-05T00:53:48Z

Comment 19:

Thank you for the code, Ian. But I don't see how your proposal helps with original issue
raised by rsc:
var x C.T
C.f(&x)
What will you do here?
Also, there is still issue of Go objects been moved by GC. For example, in windows net
package we pass Go buffers (just like in #8) to kernel. The syscall returns immediately,
while kernel proceed to write to passed buffer. We keep pointer to buffer in Go, until
kernel finished writing (just like your proposal suggests). But that approach will fail,
if GC will move the buffer.
Alex

minux · 2014-08-05T01:00:06Z

Comment 20:

For those kind of requirements, we probably should just allocate the
buffer from C land, and not from the Go heap.

ianlancetaylor · 2014-08-05T02:35:59Z

Comment 21:

The point of this issue is to resolve exactly what we will promise people who use cgo,
kernel syscalls, and Windows functions.  See the original issue description.

alexbrainman · 2014-08-05T02:36:41Z

Comment 22:

@minux, allocating the buffer from C land will work. But, I think, it would be expensive
to do it as is. We'll end-up writing custom "external" memory allocators for these.
Alex

ianlancetaylor · 2014-08-05T02:48:39Z

Comment 23:

We don't literally have to allocate by calling C code.  We just have to allocate using
an allocator that the Go GC does not examine.
I think the open question here is how much of that cgo can or should do for you
automatically.

minux · 2014-08-05T02:50:15Z

Comment 24:

perhaps we can use sync.Pool to manage buffers allocated on C heaps, then they
could be made pretty cheap to use.
One main downside is that we can't return C buffers as is to Go land, otherwise
it becomes impossible to manage their life time and might lead to memory leaks,
and this necessitates at least one copying from C buffer to Go buffer.

alexbrainman · 2014-08-05T02:58:17Z

Comment 25:

> We don't literally have to allocate by calling C code.  We just have to allocate using
an allocator that the Go GC does not examine.
If this is provided by Go runtime or similar, then it should work just fine. I assume we
are talking about memory that needs to live after syscall returns. You would have to
find a separate solution for any code that just uses Go memory during syscall, just like:
var x C.T
C.f(&x)
Alex

btracey · 2014-08-05T14:53:54Z

Comment 26:

I don't understand what is meant by a pointer "crossing the boundary" to C. If Go
maintains the pointer while C is doing stuff with the underlying data, is that okay? In
github.com/gonum/blas/cblas [0], we send []float64 to C by sending unsafe.Pointer(&x[0])
to C. The x variable stays alive on the Go side, but the values it points to are
modified by the C code. Is the plan to still support this?
[0] https://github.com/gonum/blas/blob/master/cblas/blas.go

ianlancetaylor · 2014-08-05T16:11:40Z

Comment 27:

tracey.brendan: see comment #16.  We haven't yet decided what the exact rules will be,
but the general case of passing a pointer to C will not work even if the pointer is
maintained on the Go side.  I don't know whether the use case you describe will work or
not, but it's clearly an important case to keep in mind.

btracey · 2014-08-05T16:26:35Z

Comment 28:

Okay, thanks Ian. I wasn't sure if "into C" meant gone from Go's eyes, or at all.

andlabs · 2014-08-05T18:48:47Z

Comment 29:

In my opinion map[token]pointer seems really inelegant; I tried to actually remove
similar (map[C pointer]Go pointer with more data) from package ui's rewrite (redo/
folder). I'm willing to hear counterarguments, though.
Also by moving GC do you mean only stack pointers will move or will heap pointers move
too?

minux · 2014-08-06T03:25:57Z

Comment 30:

stack pointers are already moving with the copying stack in 1.3.
moving GC means that the heap pointers can also move (e.g.
compacting GC)

gopherbot · 2014-08-12T20:51:18Z

Comment 31 by voidlogic7:

This is going to break a huge number of existing cgo bindings. I understand this is
necessary to support things like copying/compacting GC... but in light how much existing
(and in production) Go software this is going to break I imagine whatever release this
becomes reality in, many people will be stuck on the previous release quite a while as
they scramble to replace offending dependencies.
Since the common case is buffers ([]byte, etc) being passed to C, filled and handed
back, what about introducing a cgo function to make this situation easier:
buf := C.make([]byte, 2048)
defer C.release(buf)
C.someFunction((...)buf.Data)
C.make would be like make, except the GC would be unaware of the resulting memory
(different memory arena from normal GC'd memory?).
C.release would be like 'free' but rather then freeing memory it just makes the GC aware
of it and eligible for later collection. (These functions are meant to be Go functions
so the above code only crosses the boundary once. Also notice no extra copying is
necessary to go on to use buf inside of Go objects [unless the GC decides to copy a
"released" slice into the normal GC space later of course])
I hope this would make it trivial to "fix" most broken cgo dependencies and at the same
time stay out of the way of GC advancement.
The alternative to something like this is either making two cgo calls (one to perform
the action and make the allocation) and another to free the buffer later OR to create
some kind of pooling object that allocates the pool on the C side and returns batches
later on the Go side.
This solution isn't perfect but I think its better than the alternatives. In some ways
this is like pinning without the burdons of supporting arbitrary one of pinning in the
primary GC arena.

rsc · 2014-08-12T20:57:15Z

Comment 32:

Disabling comments until we post our new proposal. We are still working on the wording.

Labels changed: added restrict-addissuecomment-commit.

griesemer · 2014-10-01T21:00:24Z

Comment 33:

Labels changed: added repo-main.

ianlancetaylor · 2015-08-11T22:39:35Z

cmd/cgo: do not let Go pointers end up in C #8310

cmd/cgo: do not let Go pointers end up in C #8310

Comments

rsc commented Jul 1, 2014

griesemer commented Jul 1, 2014

ianlancetaylor commented Jul 1, 2014

randall77 commented Jul 2, 2014

rsc commented Jul 2, 2014

dvyukov commented Jul 2, 2014

rsc commented Jul 2, 2014

dominikh commented Jul 2, 2014

minux commented Jul 2, 2014

bryanturley commented Jul 10, 2014

ianlancetaylor commented Jul 10, 2014

gopherbot commented Jul 31, 2014

andlabs commented Aug 3, 2014

alexbrainman commented Aug 3, 2014

RLH commented Aug 4, 2014

ianlancetaylor commented Aug 4, 2014

alexbrainman commented Aug 4, 2014

ianlancetaylor commented Aug 4, 2014

alexbrainman commented Aug 5, 2014

minux commented Aug 5, 2014

ianlancetaylor commented Aug 5, 2014

alexbrainman commented Aug 5, 2014

ianlancetaylor commented Aug 5, 2014

minux commented Aug 5, 2014

alexbrainman commented Aug 5, 2014

btracey commented Aug 5, 2014

ianlancetaylor commented Aug 5, 2014

btracey commented Aug 5, 2014

andlabs commented Aug 5, 2014

minux commented Aug 6, 2014

gopherbot commented Aug 12, 2014

rsc commented Aug 12, 2014

griesemer commented Oct 1, 2014

ianlancetaylor commented Aug 11, 2015

ianlancetaylor commented Aug 31, 2015

ianlancetaylor commented Jan 7, 2016