-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JVM crashes on setting callback for GTK3 signals #281
Comments
I suspect there's an alignment or width issue with the arguments, but we need to dig deeper to know for sure. Can you provide an example, perhaps as a small repository, that I can build and use to reproduce this? |
Sure, you can see the repository at https://github.com/praj-foss/jnr-demo. The target code is present under the Also, there are some changes: I used On a side note, I'm actually writing JNR examples for my blog and I'd be happy to contribute to the official docs/examples. Please let me know if I can be of any help. |
I have managed to reproduce on MacOS and @enebo is confirming that it reproduces on Linux. If you are good with C libraries, getting a debug build of GTK3 and seeing where it segfaults would clearly be a great help.
That would be fantastic! We do not get a lot of time to document the library, and our uses of JNR are pretty stable and do not require much maintenance so we rarely run into the edge cases users like you will see. |
Interestingly, setting the callback to null, so it would be passed in as a null pointer, produces a different result: gtk catches the null handler and asserts:
Seems to indicate that it is not necessarily the callback getting nulled out, since it should catch that. Bad memory location? Already collected and not honoring our attempts to keep the handler referenced? |
This investigation is hampered by the fact that it seems the We have not had other reports of callbacks leading to SEGV so I am left speculating why this function seems to be getting a bad pointer. |
DIsabling jnr-ffi's x86_64 ASM generation does not appear to improve the situation, assuming it is being passed through. However... I looked closer at the error dumps and I'm seeing RAX set this this implausible value:
Unknown indeed. The hex |
This seems to be the source of the bogus pointer value:
I believe this would indicate that either the DefaultObjectReferenceManager is not working properly, or this code is not using it properly. |
@praj-foss Ok, this may be a flaw in how you are using the API, but I do not know enough about GTK to be certain. I modified your final code to not use the pointer value returned, and it seems to get much further... far enough to trigger a different, probably MacOS-specific error: diff --git a/gtk3/src/main/java/in/praj/demo/Gtk3App.java b/gtk3/src/main/java/in/praj/demo/Gtk3App.java
index 8195466..eaaf284 100644
--- a/gtk3/src/main/java/in/praj/demo/Gtk3App.java
+++ b/gtk3/src/main/java/in/praj/demo/Gtk3App.java
@@ -23,17 +23,18 @@ public class Gtk3App {
lib.gtk_get_major_version(), lib.gtk_get_minor_version(), lib.gtk_get_micro_version());
var application = lib.gtk_application_new("in.praj.demo.Gtk3App", 0);
- var onActivate = refs.add((LibGtk3.GCallback) (gobject, data) -> {
+ LibGtk3.GCallback callback = (gobject, data) -> {
var window = lib.gtk_application_window_new(gobject);
var button = lib.gtk_button_new_with_label("Click me");
lib.gtk_container_add(window, button);
lib.gtk_widget_show_all(window);
- });
+ };
+ var callbackKey = refs.add(callback);
- lib.g_signal_connect_data(application, "activate", onActivate, null, null, 0);
+ lib.g_signal_connect_data(application, "activate", callback, null, null, 0);
lib.g_application_run(application, 0, null);
- refs.remove(onActivate);
+ refs.remove(callbackKey);
lib.g_object_unref(application);
}
}
diff --git a/gtk3/src/main/java/in/praj/demo/LibGtk3.java b/gtk3/src/main/java/in/praj/demo/LibGtk3.java
index 72e3e3a..1c5f7ab 100644
--- a/gtk3/src/main/java/in/praj/demo/LibGtk3.java
+++ b/gtk3/src/main/java/in/praj/demo/LibGtk3.java
@@ -13,7 +13,7 @@ public interface LibGtk3 {
@u_int64_t long g_signal_connect_data(
Pointer instance,
String detailed_signal,
- Pointer c_handler,
+ GCallback c_handler,
Pointer data,
Pointer destroy_data,
int connect_flags);
From the very little I know about GUI development on MacOS, this appears to be a problem further down the pipeline when it attempts to actually display something. Perhaps you can try my diff on Linux and see if it works better? I believe the value returned by the DefaultObjectReferenceManager is intended to just be an opaque reference to the object value, not a new or better pointer to the object in question. In this case, the resulting value is a bogus pointer starting with "0xCAFEBABE" bytes, leading to the peculiar RAX I mentioned above. |
So I tried the diff here on Linux and it does crash differently now: hs_err_pid5098.log. Unfortunately, I'm still pretty inexperienced in both GTK and C/C++, so I couldn't figure out much from the logs. I do believe it has something to do with how GTK and GObject-system work internally since the normal way of creating JNR callbacks works fine in simpler use-cases. I went through the official hello-world example of gtk3 and found that I missed implementing app = gtk_application_new ("org.gtk.example", G_APPLICATION_FLAGS_NONE);
g_signal_connect (app, "activate", G_CALLBACK (activate), NULL);
status = g_application_run (G_APPLICATION (app), argc, argv);
g_object_unref (app); From the docs:
I'll look into that soon and post an update. |
I used the preprocessor output from gcc and added the necessary functions in // Before preprocessing
int status = g_application_run(G_APPLICATION(app), argc, argv);
// After preprocessing
int status = g_application_run(((((GApplication*) g_type_check_instance_cast ((GTypeInstance*) ((app)), ((g_application_get_type ())) )))), argc, argv); public interface LibGtk3 {
// ...
@u_int64_t long g_application_get_type();
Pointer g_type_check_instance_cast(Pointer inst, @u_int64_t long type);
}
// Inside main method
lib.g_application_run(
lib.g_type_check_instance_cast(application, lib.g_application_get_type()), 0, null); Now I'm pretty much clueless. The only I've not implemented is the pointer type-casting done by the macros, as I'm using the normal |
I hate to chime in with this but WFM. If I add @headius diff gtk3:run will work for me on:
I get a Click me button in a frame popping up on my screen. I also got this to work with graalvm ce 21.2 (openjdk version "11.0.12" openjdk version "11.0.12" 2021-07-20). I am on Fedora Core 34. @praj-foss Can you do two things: 1) update to latest version of graalvm. Let's just hope there is a bug in graal that was fixed. 2) Install openjdk and verify it fails on that VM. |
@praj-foss Since I did not see 21.3 is out I will get that and see if it also works. |
I have pushed a branch with my change, which has been confirmed on @enebo's Fedora system and my MacOS system (the latter works after passing https://github.com/headius/jnr-demo/tree/patched At this point I don't see any bug on the jnr-ffi side. @praj-foss let us know if you are still unable to run this and we'll have a look at your latest error. |
I also downloaded graal ce 21.1.0 and it works with @headius patch. |
@headius @enebo I downloaded Graalvm 21.3 (JDK 11) and Temurin JDK 16.0.2 and tried to run the patched repo, but it's still crashing the same: hs_err_pid6764.log. I even tried the |
I tried running the app on two different machines: one with ubuntu 21.10 with openjdk 17, where it crashed similarly, and another with opensuse leap 15.2 with openjdk 11 and a slightly older gtk3 release, where it ran perfectly. I'm assuming something breaks on the new gtk3 release. So I'll close this issue for now. Thanks, everyone! |
@praj-foss Thanks for following up and figuring this out! Please let us know if you file an issue with the GTK folks because I'd like to know that we're not doing anything wrong. I assume they will have better luck investigating why it crashes at that particular point. |
@headius Sure! I'd like to do some more research on it though I'm not a C/C++ dev at all. Can you tell me how to debug the JNR/native calls? I came across this article which described how to use
So what's the proper way to debug JNR here? |
For that we would need to build a jffi binary with debug symbols. I'm not sure if the build is set up for that but can look into it this week. I will say that your crasher that fails inside jffi should probably still be treated as a bug. May be something about your platforms that jffi is not handling correctly. |
I believe this diff followed by running diff --git a/jni/GNUmakefile b/jni/GNUmakefile
index cfe570a..4a8a061 100755
--- a/jni/GNUmakefile
+++ b/jni/GNUmakefile
@@ -61,7 +61,7 @@ LIBNAME = jffi
# Compiler/linker flags from:
# http://weblogs.java.net/blog/kellyohair/archive/2006/01/compilation_of_1.html
JFLAGS = -fno-omit-frame-pointer -fno-strict-aliasing -DNDEBUG
-OFLAGS = -O2 $(JFLAGS)
+OFLAGS = -Og -g $(JFLAGS)
# MacOS headers aren't completely warning free, so turn them off
WERROR = -Werror Could you open a new issue for the crash within JFFI itself? I believe this issue has been resolved by fixing the client code, but this other crasher is a new mystery. |
@headius I've reopened this issue in JFFI. Check out: jnr/jffi#118 |
Hello there!
I'm currently learning JNR by trying out various Linux libraries, most recently GTK3. I used this example as a reference and wrote the new demo that can be found here. But it crashes badly when I try to run it (using
./gradlew gtk3:run
). Here's the crash log: hs_err_pid7667.log. I use GraalVM 21.1.0 as my JDK 11, on a x86_64 Linux machine (opensuse tumbleweed). My installed GTK version is 3.24.30-2.3.I can see that it crashes on line 31 of
Gtk3App.java
where I call from JavaThe
onActivate
is a lambda looking like this:which is supposed to act like a function pointer similar to
on_app_activate
from my C reference:I also had a look at #231 and read the suggestions there to define
onActivate
aspublic static final
variable, but it still didn't stop the crash. I don't have much idea about why it's crashing, my previous example seemed to work fine with callbacks. It might be an issue specific to GTK3 and its thread management or using GraalVM as JDK, but again I have zero ideas. Please try running the example if you're on a Linux machine and let me know where's the problem.The text was updated successfully, but these errors were encountered: