Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gl.FenceSync returns invalid unsafe.Pointer, causing crash #71

Closed
dominikh opened this issue May 9, 2017 · 26 comments
Closed

gl.FenceSync returns invalid unsafe.Pointer, causing crash #71

dominikh opened this issue May 9, 2017 · 26 comments

Comments

@dominikh
Copy link

dominikh commented May 9, 2017

gl.FenceSync returns the created fence as an unsafe.Pointer. However, values returned by glFenceSync aren't necessarily valid pointers to memory owned by the driver.

On my system (nvidia 378.13 on Linux), for example, glFenceSync returns increasing numbers, starting at 1. The issue with that is that the Go runtime expects pointers in unsafe.Pointer to be valid, either pointing to memory owned by the Go memory allocator, or valid, non-Go memory. Pointers such as "1" are neither, which causes runtime.writebarrierptr to abort the program with a "bad pointer in write barrier" error. Similarly, the driver might return a value that looks like a valid pointer to Go memory, preventing said Go memory from being garbage collected as long as the unsafe.Pointer is alive.

To simulate the issue, compile https://play.golang.org/p/614oIx_Y1D and run it with GOGC=0 (to force frequent GCs, triggering the problem faster.). You should see the following:

$ GOGC=0 ./foo
runtime: writebarrierptr *0x4fa230 = 0x1
fatal error: bad pointer in write barrier

runtime stack:
runtime.throw(0x4a81ca, 0x1c)
	/usr/lib/go/src/runtime/panic.go:596 +0x95
runtime.writebarrierptr.func1()
	/usr/lib/go/src/runtime/mbarrier.go:208 +0xbd
runtime.systemstack(0x4fab00)
	/usr/lib/go/src/runtime/asm_amd64.s:327 +0x79
runtime.mstart()
	/usr/lib/go/src/runtime/proc.go:1132

goroutine 1 [running]:
runtime.systemstack_switch()
	/usr/lib/go/src/runtime/asm_amd64.s:281 fp=0xc420048f00 sp=0xc420048ef8
runtime.writebarrierptr(0x4fa230, 0x1)
	/usr/lib/go/src/runtime/mbarrier.go:209 +0x96 fp=0xc420048f38 sp=0xc420048f00
main.main()
	/tmp/foo.go:15 +0x9b fp=0xc420048f88 sp=0xc420048f38
runtime.main()
	/usr/lib/go/src/runtime/proc.go:185 +0x20a fp=0xc420048fe0 sp=0xc420048f88
runtime.goexit()
	/usr/lib/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420048fe8 sp=0xc420048fe0

I would recommend returning a uintptr instead (and, similarly, accepting a uintptr in gl.ClientWaitSync and gl.WaitSync).

@dmitshur
Copy link
Member

dmitshur commented May 9, 2017

Thanks for reporting. This sounds somewhat similar to #31 that we've had to resolve for Go 1.6. Except I'm guessing that gl.FenceSync is much more rarely used, and it's quite possible you're the first person who's tried to use it and ran into this.

I would recommend returning a uintptr instead (and, similarly, accepting a uintptr in gl.ClientWaitSync and gl.WaitSync).

I need to refresh my memory on this (if you happen to have the relevant doc link handy, please share, otherwise I'll have to look for it), but that's likely the right fix. Thanks. Do you want to send a PR (to go-gl/glow, the generator that generates these bindings)?

@dmitshur
Copy link
Member

dmitshur commented May 9, 2017

and, similarly, accepting a uintptr in gl.ClientWaitSync and gl.WaitSync

gl.IsSync too. I think it's everything that has GLsync type in the signature.

@dominikh
Copy link
Author

dominikh commented May 9, 2017

It's not quite the same as #31, but does fall into the category of not playing loose with pointer types.

Regarding documentation: https://www.khronos.org/opengl/wiki/Sync_Object and the Go spec would be relevant here. The former specifies that GLsync is an opaque type typedef struct __GLsync *GLsync, ergo a pointer. And uintptr is defined as an unsigned integer large enough to store the uninterpreted bits of a pointer value.

Good point re gl.IsSync – I haven't looked up the whole list of GLsync-related functions.

Do you want to send a PR (to go-gl/glow, the generator that generates these bindings)?

I can do that if my proposed solution is accepted.

@errcw
Copy link
Member

errcw commented May 9, 2017

Right, it should be straightforward to update type.go in glow to change the Go type of GLsync, and I agree that's the right path forward. It's unfortunate it's not a backwards-compatible change, but given that it's required, I don't think there is a better way.

@dmitshur
Copy link
Member

dmitshur commented May 9, 2017

Reopening because we still need to regenerate these bindings with the latest glow. (See https://github.com/go-gl/gl#generating and #54 for an example of a regenerate PR.)

@dmitshur dmitshur reopened this May 9, 2017
@dmitshur dmitshur self-assigned this May 9, 2017
dmitshur added a commit that referenced this issue May 9, 2017
Regenerate all bindings after generator change in go-gl/glow#79.

Done with latest version of glow:

	go generate -tags=gen github.com/go-gl/gl

Resolves #71.
@errcw
Copy link
Member

errcw commented May 9, 2017

It seems that the straightforward fix to change the generated type to unitptr failed (see PR #72). I'll copy my commentary there to here.

Go is unhappy converting between uintptr and a C pointer type. Which is a little strange because we're already doing the same thing for GLhandleARB which maps to void * on Mac OS. Maybe void * is special compared to an anonymous struct pointer?

It might be safe to do a just-in-time conversion from uintptr to unsafe.Pointer (but the GC may still catch this at exactly the wrong moment)? Alternatively we could change the C function signature and cast the value there, though that's not yet well supported by glow.

@dominikh
Copy link
Author

dominikh commented May 9, 2017

I can't say for sure if that conversion would be safe or not. I'd like to err on the side of "not safe" if arguments are placed on the stack before the call.

I'm currently checking if there is another way around this that doesn't require extending glow.

@dmitshur dmitshur removed their assignment May 9, 2017
@dominikh
Copy link
Author

dominikh commented May 9, 2017

@errcw So… void* is special because cgo represents void* as unsafe.Pointer – and unsafe.Pointer can be converted to uintptr. All other pointer types are represented in a type-safe manner in Go, and Go will not let you convert a typed pointer to uintptr, unless you go through unsafe.Pointer first. Otherwise, Go's type system wouldn't be safe.

Doing x = uintptr(unsafe.Pointer(fence)) would be safe, however I don't believe the inverse conversion to be safe in the context of a function call.

I don't see any other solution than the one proposed by you, which adds C shims to do the conversion for us. One idea that I had was a type GLSync C.GLsync, but unfortunately Go does the "right" thing, treats it as a pointer, and crashes just like with an unsafe.Pointer.

@errcw
Copy link
Member

errcw commented May 9, 2017

So, in theory, we can mangle the C parameter types used for both the GL function typedefs and the corresponding cgo entry points, because we simply need the types to align throughout the generated code and be compatible with the function pointers, rather than being exactly faithful to the GL definitions.

In type.go we could hack CType to return a different value for GLsync? (I'd try it myself but I'm not on a computer where I can compile anything... Otherwise I can try later this week.)

@dominikh
Copy link
Author

dominikh commented May 9, 2017

Which type would you use instead? We can't use void*, because if our fence ID by chance looked like a Go pointer to another Go pointer, cgo wouldn't allow us to make the call (see https://play.golang.org/p/oRap4q_Nw9 for an example). uintptr_t exists, but as far as I understand C, that type has to be at least large enough to hold any pointer. It may be larger.

@errcw
Copy link
Member

errcw commented May 9, 2017

In a cursory search I failed to find conclusive evidence whether uintptr_t could be wider than a pointer. I suspect it's true in theory but false in practice, which might be sufficient for us to choose it as a type.

Another alternative we could consider is to change the GLsync C typedef from an anonymous struct to void* which should allow the cast.

@dominikh
Copy link
Author

dominikh commented May 9, 2017

Like I said in my previous message, using void* wouldn't be of any help. That would still require going through unsafe.Pointer and be treated like an actual pointer, with all the checks and requirements that come with that. It's not a suitable type for arbitrary numbers.

@errcw
Copy link
Member

errcw commented May 10, 2017

I'm not sure we'd have to route through unsafe.Pointer? Today GLhandleARB is typedef void* on Apple platforms, and we use unitptr as the corresponding Go type. See, e.g., CompileShaderARB, where we happily convert (C.GLhandleARB)(shaderObj). Unless I misunderstand and you're suggesting that the cgo compiler understands the conversion is actually for a pointer type, and will catch this transgression at runtime?

@dmitshur
Copy link
Member

@errcw, was this issue closed intentionally?

@errcw
Copy link
Member

errcw commented May 12, 2017

Alas, no, I was just resyncing my fork to try to work on it.

@errcw errcw reopened this May 12, 2017
@dominikh
Copy link
Author

@errcw What I am referring to is that a C function void fn(void *arg) will have the signature C.fn(arg unsafe.Pointer) in Go, i.e. the following code doesn't compile:

package main

/*
#include <stdio.h>

void dhfoo(void* arg) {
  printf("%p", arg);
}
*/
import "C"

func main() {
	x := uintptr(1234)
	C.dhfoo(x)
}

because of

/tmp/foo.go:14: cannot use x (type uintptr) as type unsafe.Pointer in argument to func literal

Of course you can convert a uintptr to an unsafe.Pointer, and that's what (C.GLhandleARB)(shaderObj) is doing, but that brings you back to square one:

package main

/*
typedef void *Magic;
*/
import "C"
import "fmt"

var X C.Magic

func main() {
	X = C.Magic(uintptr(1))
	for {
		fmt.Println(X)
	}
}

results in

$ GOGC=0 ./foo
runtime: writebarrierptr *0x6fb380 = 0x1
fatal error: bad pointer in write barrier

and

_cgo_gotypes.go:type _Ctype_Magic unsafe.Pointer

@errcw
Copy link
Member

errcw commented Jun 4, 2017

All right, I have a proposed workaround in glow #85. It relies on uintptr_t having the same width as struct __GLsync * but based on my research (and StackOverflow) it appears this is a reasonable assumption to make in practice. It also relies on uintptr_t having a definition, but the OpenGL typedefs already rely on ptrdiff_t so it should be reasonable to also rely on uintptr_t.

I verified it works as intended on macOS but that's the only platform I currently have available to build and test on. If somebody else is interested in verifying on another platform that'd be great.

@dominikh
Copy link
Author

dominikh commented Jun 5, 2017

I should be able to give the patch a test later this week.

@dmitshur
Copy link
Member

Friendly ping @dominikh.

@dominikh
Copy link
Author

I've finally gotten around to testing the change. It compiles, doesn't cause runtime panics, and I've successfully used a fence.

@errcw
Copy link
Member

errcw commented Feb 18, 2018

Thanks! I've merged in my fix. Ideally we now regenerate go-gl/gl with the updated code.

@dominikh
Copy link
Author

@errcw it might be worth mentioning that when I regenerated the 4.5-core bindings with glow, some function signatures changed from accepting int32 to accepting int. I don't know if that's expected or not, but it'd break consumers of the API.

@errcw
Copy link
Member

errcw commented Feb 18, 2018

Alas. Well, because of #80, I'm expecting a more broadly breaking API change will be necessary, so hopefully we can break everyone only once.

@dmitshur
Copy link
Member

We've regenerated with latest glow and merged #93. I believe this should be resolved. Can you please confirm?

@dmitshur
Copy link
Member

dmitshur commented Oct 21, 2018

Friendly ping @dominikh. You said on February 15:

I've finally gotten around to testing the change. It compiles, doesn't cause runtime panics, and I've successfully used a fence.

If that's still the case, I think we can close this issue as resolved (via PR go-gl/glow#85 to glow, and PR #93 to regenerate gl).

@dominikh
Copy link
Author

Like I said, it's working :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants