Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGFAULT using glib.Value #61

Closed
RSWilli opened this issue Nov 29, 2023 · 13 comments · Fixed by #62
Closed

SEGFAULT using glib.Value #61

RSWilli opened this issue Nov 29, 2023 · 13 comments · Fixed by #62
Assignees
Labels
bug Something isn't working SEGFAULT

Comments

@RSWilli
Copy link
Member

RSWilli commented Nov 29, 2023

When setting a property on a custom bin I sometimes get a SIGSEGV.

The error in the stack trace is reported to be at https://github.com/go-gst/go-glib/blob/main/glib/gvalue.go#L52 in the called C function in https://github.com/go-gst/go-glib/blob/b2d34240bcb44a1b341b2f094eb2ab33e1a2aa98/glib/glib.go.h#L126-L128

I will investigate and provide further details.

Currently I suspect that the reason is the conversion from C to golang, and if timed correctly it may interfere with some GC magic.

@RSWilli RSWilli added bug Something isn't working SEGFAULT labels Nov 29, 2023
@RSWilli RSWilli self-assigned this Nov 29, 2023
@RSWilli
Copy link
Member Author

RSWilli commented Nov 29, 2023

I think this issue can be ultimately retraced to https://pkg.go.dev/unsafe#Pointer , especially the casting of uintptr to unsafe.Pointer.

The docs state that, although the casting is safe to to under certain circumstances, saving the value of the uintptr in a variable before casting it back is invalid. It may work, but if the go GC steps in and moves the underlying data, then the pointer address wont get moved with it.

We do have a couple instances of code like this in the bindings: https://github.com/go-gst/go-gst/blob/46f74cb97a436a19871ba0e65f63c7a219e992c1/gst/pbutils/discoverer.go#L70C1-L73 (even the original gotk3 does some casts like this)

The ultimate reason for using the unsafe package in the bindings is to escape the type checks, although the correct pointer type for this is unsafe.Pointer IMO. The use cases for uintptr listed in the docs do (mostly) not apply to us.

This can only be fixed by breaking a lot of the current glib API. I can also only test very small parts of the resulting code, since I'm nowhere near using every little bit of the features in my code.

The reason this is a big change, is that currently all type marshalers use uintptr as an input:

tm := []glib.TypeMarshaler{
{
T: glib.Type(C.gst_discoverer_get_type()),
F: func(p uintptr) (interface{}, error) {
c := C.g_value_get_object(uintptrToGVal(p))
return &Discoverer{toGObject(unsafe.Pointer(c))}, nil
},
},
{
T: glib.Type(C.gst_discoverer_info_get_type()),
F: func(p uintptr) (interface{}, error) {
c := C.g_value_get_object(uintptrToGVal(p))
return &DiscovererInfo{toGObject(unsafe.Pointer(c))}, nil
},
},
{
T: glib.Type(C.gst_discoverer_stream_info_get_type()),
F: func(p uintptr) (interface{}, error) {
c := C.g_value_get_object(uintptrToGVal(p))
return &DiscovererStreamInfo{toGObject(unsafe.Pointer(c))}, nil
},
},
{
T: glib.Type(C.gst_discoverer_audio_info_get_type()),
F: func(p uintptr) (interface{}, error) {
c := C.g_value_get_object(uintptrToGVal(p))
return &DiscovererAudioInfo{&DiscovererStreamInfo{toGObject(unsafe.Pointer(c))}}, nil
},
},
{
T: glib.Type(C.gst_discoverer_video_info_get_type()),
F: func(p uintptr) (interface{}, error) {
c := C.g_value_get_object(uintptrToGVal(p))
return &DiscovererVideoInfo{&DiscovererStreamInfo{toGObject(unsafe.Pointer(c))}}, nil
},
},
{
T: glib.Type(C.gst_discoverer_subtitle_info_get_type()),
F: func(p uintptr) (interface{}, error) {
c := C.g_value_get_object(uintptrToGVal(p))
return &DiscovererSubtitleInfo{&DiscovererStreamInfo{toGObject(unsafe.Pointer(c))}}, nil
},
},
{
T: glib.Type(C.gst_discoverer_container_info_get_type()),
F: func(p uintptr) (interface{}, error) {
c := C.g_value_get_object(uintptrToGVal(p))
return &DiscovererContainerInfo{&DiscovererStreamInfo{toGObject(unsafe.Pointer(c))}}, nil
},
.

I will try to remove all instances of uintptr across the bindings, and document all other use cases explicitly.

FYI @tinyzimmer @danjenkins

@RSWilli
Copy link
Member Author

RSWilli commented Nov 29, 2023

may also relate to #60

@RSWilli RSWilli changed the title SEGFAULT in custom element SetProperty Invalid uses of uintptr across the bindings Nov 29, 2023
@RSWilli
Copy link
Member Author

RSWilli commented Nov 29, 2023

as mentioned in #51 the reason uintptr is so widespread in the bindings is because gotk3 has made the same mistake. The type marshalers were always implemented with uintptr, athough nothing prevents us from just replacing it with unsafe.Pointer

@RSWilli RSWilli changed the title Invalid uses of uintptr across the bindings SEGFAULT using glib.Value Dec 1, 2023
@RSWilli
Copy link
Member Author

RSWilli commented Dec 1, 2023

I still get Segfaults when I use glib.Value even after #62

@tinyzimmer @danjenkins could you try to run one of your gstreamer applications with GOMEMLIMIT="10KiB" GOGC=1 (put memlimit lower than your requirements) please? I think this reproduces the error, since the segfault that happens in production for me happens pretty much instant when these params are passed.

@RSWilli
Copy link
Member Author

RSWilli commented Dec 1, 2023

the above reproduction is not complete, it doesn't crash every time.

if a GC is forced during the goObjectSetProperty then it will always crash for me. This function is called when properties on the custom elements are set from C. In my case the are set from go through C back to go. There must be a pointer to a golang managed object beeing passed to C somewhere.

//export goObjectSetProperty
func goObjectSetProperty(obj *C.GObject, propID C.guint, val *C.GValue, param *C.GParamSpec) {
	defer func() {
		if r := recover(); r != nil {
			fmt.Println("recovered")
			panic(r)
		}
	}()
	runtime.GC() // forcing a GC here crashes with SIGSEGV
	WithPointerTransferOriginal(unsafe.Pointer(obj), func(object *Object, subclass GoObjectSubclass) {
		iface := subclass.(interface{ SetProperty(*Object, uint, *Value) })
		iface.SetProperty(object, uint(propID-1), ValueFromNative(unsafe.Pointer(val)))
	})
}

I'll look deeper into it. Help appreciated.

@RSWilli
Copy link
Member Author

RSWilli commented Dec 4, 2023

I pinned the Issue down to the finalizer on the glib.Value ValueInit.

This only happens because I am registering and using a custom element in the same application.

func ValueInit(t Type) (*Value, error) {
	c := C._g_value_init(C.GType(t))
	if c == nil {
		return nil, nilPtrErr
	}
	v := &Value{c}
	// runtime.SetFinalizer(v, (*Value).unset) <- uncommenting this crashes if the GC intervenes
	return v, nil
}

The steps that are happening are:

  1. the SetProperty is called on the element.
  2. SetProperty allocates a glib.Value containing a C native *C.GValue
  3. g_object_set_property is called with the native *C.GValue
  4. the glib.Value goes out of scope and is marked for GC, along with freeing the native C struct
  5. gstreamer calls goObjectSetProperty, the glue function between the C base class and the custom element implemented in golang
  6. the goObjectSetProperty calls ValueFromNative(unsafe.Pointer(val)), creating a new golang wrapper glib.Value around the passed pointer to the c struct (which has been freed)
  7. Now calling GoValue() on the new glib.Value tries to extract the contained value, but results in a SIGSEGV

@sdroege could you provide some information regarding the internals of g_object_set_property, is there any multithreading happening internally while passing the data to the element? If not, then this could be fixed by only freeing after the g_object_set_property is done.

FYI @danjenkins @tinyzimmer

@RSWilli
Copy link
Member Author

RSWilli commented Dec 4, 2023

It looks like a simple runtime.KeepAlive(value) was enough to fix the problem. Implemented in #62. I will test it for a bit in production before providing more feedback on the PR

@danjenkins
Copy link
Contributor

Sorry I've not replied to these @RSWilli - was pretty much unavailable last week due to illness. Well done on figuring it out - looks like a pain to track down.

@RSWilli
Copy link
Member Author

RSWilli commented Dec 4, 2023

@danjenkins yeah it was.

Now after finding the fix I think this may also be a problem in other parts of the bindings. Essentially, when we do not keep the value alive, the finalizer can run before the C call is done and free the C value. This can happen with all intermediate values that wrap some C object, where a user does not have a reference to.

@danjenkins
Copy link
Contributor

That would make sense. But at the same time if this was a problem elsewhere I'd expect to have hit a problem by now?

@RSWilli
Copy link
Member Author

RSWilli commented Dec 4, 2023

That depends entirely on your setup and your use case. The GC runs in parallel so it always was a race condition. If your GC never decides to eagerly clean up the heap, then you will never run into this issue. I run my prod environment with GOMEMLIMIT=20MiB, which instructs the GC to stay under 20MiB of RAM usage. Additionally the docker container my app runs in is limited to 350MB RAM.

I also frequently update the gstreamer pipeline (so I call SetProperty often), which gives me more chances at hitting the race condition, although even then I only hit it once every few hours.

@RSWilli
Copy link
Member Author

RSWilli commented Dec 4, 2023

But at the same time if this was a problem elsewhere I'd expect to have hit a problem by now?

@danjenkins as said above, you can more easily detect these rare errors when you run your app with the following two env vars set:

  • GOMEMLIMIT=10KiB (set it really low) this instructs the GC to try to enforce this as a upper limit
  • GOGC=1 this essentially instructs the GC to perform a GC sweep every time something allocates (more info here: https://tip.golang.org/doc/gc-guide)

Combined this effectively means that your application will not have any uncollected garbage memory lying around.

image

These two settings dramatically increase the CPU consumption of the GC, so they are not recommended for production in this combination

@RSWilli
Copy link
Member Author

RSWilli commented Dec 5, 2023

Whilst thinking about more cases where this issue could pop up, the following came into mind:

https://github.com/go-gst/go-gst/blob/main/gst/gst_element_factory.go#L40-L76

The temporary GValues could be cleaned up during usage in CGO, so they should be kept alive.

I'm going to add a runtime.KeepAlive statement there, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SEGFAULT
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants