Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in Message.getSubject() #26

Open
wardev opened this issue Apr 1, 2023 · 8 comments
Open

SIGSEGV in Message.getSubject() #26

wardev opened this issue Apr 1, 2023 · 8 comments

Comments

@wardev
Copy link

wardev commented Apr 1, 2023

As a Java developer I expect NASA's code to never segfault and kill the JVM. NullPointerExceptions are fine, segfaults are not.

Using GMSEC API version 4.6 I can produce a segfault every time by calling getSubject() on a received REPLY message. I can call getSubject() on received PUBLISH or REQUEST messages without any problems. Reproducer code and stack trace top are below.

I assume the problem is not actually in std::string::c_str(). My theory is that the issue is due to using reinterpret_cast<Message*>(jlong) in the native code on some long that is not the address of a message. How NASA's Java interface got an invalid pointer is beyond me.

Current thread (0x00007f08c0012800):  JavaThread "main" [_thread_in_native, id=289137, stack(0x00007f08c6ee0000,0x00007f08c6fe0000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000

Register to memory mapping:

RAX=0x771abb87bdb6526b is an unknown value
RBX={method} {0x00007f0878441d68} 'Message_GetSubject' '(JLgov/nasa/gsfc/gmsec/api/jni/JNIMessage;)Ljava/lang/String;' in 'gov/nasa/gsfc/gmsec/api/jni/gmsecJNI'
RCX=0x00007f08c6fdbeb0 is pointing into the stack for thread: 0x00007f08c0012800
RDX=0x00007f0828002930 is an unknown value
RSP=0x00007f08c6fdbc38 is pointing into the stack for thread: 0x00007f08c0012800
RBP=0x00007f08c6fdbc50 is pointing into the stack for thread: 0x00007f08c0012800
RSI=0x00007f08c6fdbea0 is pointing into the stack for thread: 0x00007f08c0012800
RDI=0x771abb87bdb6526b is an unknown value
R8 =0x00007f087850b928 is pointing into metadata
R9 =0x0000000000000009 is an unknown value
R10=0x00007f08b10183fb is at code_begin+859 in an Interpreter codelet
method entry point (kind = native)  [0x00007f08b10180a0, 0x00007f08b1018a40]  2464 bytes
R11=0x00007f08c7a2ca00: <offset 0xa2ca00> in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so at 0x00007f08c7000000
R12=0x0000000000000000 is an unknown value
R13={method} {0x00007f0878441d68} 'Message_GetSubject' '(JLgov/nasa/gsfc/gmsec/api/jni/JNIMessage;)Ljava/lang/String;' in 'gov/nasa/gsfc/gmsec/api/jni/gmsecJNI'
R14=0x00007f08c6fdbec0 is pointing into the stack for thread: 0x00007f08c0012800
R15=0x00007f08c0012800 is a thread

Stack: [0x00007f08c6ee0000,0x00007f08c6fe0000],  sp=0x00007f08c6fdbc38,  free space=1007k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libstdc++.so.6+0xf3c64]  std::string::c_str() const+0x4
C  [libGMSECAPI.so+0x314cea]  gmsec::api::Message::getSubject() const+0x1c
C  [libgmsec_jni.so+0x1083e3]  Java_gov_nasa_gsfc_gmsec_api_jni_gmsecJNI_Message_1GetSubject+0x6b
j  gov.nasa.gsfc.gmsec.api.jni.gmsecJNI.Message_GetSubject(JLgov/nasa/gsfc/gmsec/api/jni/JNIMessage;)Ljava/lang/String;+0
j  gov.nasa.gsfc.gmsec.api.jni.JNIMessage.getSubject()Ljava/lang/String;+5
j  gov.nasa.gsfc.gmsec.api.Message.getSubject()Ljava/lang/String;+4
public class SigsegvTest {


    private static BoltServer boltServer;

    @BeforeAll
    public static void setup() throws InterruptedException {
        // load GMSEC_API native code
        GmsecLoader.loadGmsec();
        // start bolt server
        boltServer = new BoltServer();
        new Thread(boltServer, "boltServer").start();
        boltServer.waitUntilStarted();
    }

    @AfterAll
    public static void cleanup() {
        boltServer.stop();
        GmsecLoader.cleanGmsec();
    }

    @TempDir
    Path schemaDir;

    @Test
    public void testSegfault() throws GMSEC_Exception, IOException, InterruptedException {
        // setup
        Log.registerHandler(new LogHandler() {
            @Override
            public void onMessage(LogEntry logEntry) {
                System.err.format("%s.%09d [%s] %s:%s %s\n",
                        logEntry.time.getSeconds(), logEntry.time.getNanoseconds(),
                        logEntry.level,
                        logEntry.fileName, logEntry.lineNumber, logEntry.message);
            }
        });
        Log.setReportingLevel(LogLevel.VERBOSE);
        Config config = new Config();
        config.addValue("connectiontype", "gmsec_bolt");
        config.addValue("server", "localhost");
        config.addValue("mw-multi-resp", "true");
        config.addValue("GMSEC-MSG-CONTENT-VALIDATE", "false");
        config.addValue("GMSEC-MSG-CONTENT-VALIDATE-ALL", "false");
        config.addValue("GMSEC-MSG-CONTENT-VALIDATE-SEND", "false");
        config.addValue("GMSEC-MSG-CONTENT-VALIDATE-RECV", "false");
        config.addValue("GMSEC-VALIDATION-LEVEL", "NO_ENFORCEMENT");
        config.addValue("GMSEC-SCHEMA-LEVEL", "0");
        config.addValue("GMSEC_LEGACY_SCHEMA_FILES", "true");
        config.addValue("GMSEC-SPECIFICATION-VERSION", "201600");
        // I wish I could turn off message validation entirely, but I can't
        // so fake a directory of schemas
        Files.createDirectories(schemaDir.resolve("2016.00"));
        try (Writer out = Files.newBufferedWriter(schemaDir.resolve("2016.00").resolve(".DIRECTORY.xml"))) {
            out.write("<DIRECTORY NAME=\"2016.00\"><SCHEMA NAME=\"HEADER\" LEVEL-0=\"GMSEC\" DEFINITION=\"\"/></DIRECTORY>");
        }
        try (Writer out = Files.newBufferedWriter(schemaDir.resolve("2016.00").resolve("GMSEC.HEADER.xml"))) {
            out.write("<SCHEMA NAME=\"GMSEC.HEADER\"></SCHEMA>");
        }
        config.addValue("GMSEC-SCHEMA-PATH", schemaDir.toAbsolutePath().toString());
        String subject = "GMSEC.1.2.3.4.5.6";
        List<Message> messages = new ArrayList<>();

        // subscribe
        final ConnectionManager cm1 = new ConnectionManager(config);
        cm1.initialize();
        cm1.startAutoDispatch();
        Thread.sleep(500);
        cm1.subscribe("GMSEC.+", new ConnectionManagerCallback() {
            @Override
            public void onMessage(ConnectionManager connectionManager, Message message) {
                // save request message
                messages.add(message);
                try {
                    if (message.getKind().equals(MessageKind.REQUEST)) {
                        // create and send two replies
                        Message reply = new Message(message.getSubject(), MessageKind.REPLY);
                        connectionManager.reply(message, reply);
                        /*
                        reply = new Message(message.getSubject(), MessageKind.REPLY);
                        connectionManager.reply(message, reply);
                         */
                    }
                } catch (GMSEC_Exception e) {
                    throw new RuntimeException(e);
                }
            }
        });
     
        // send request

        try {
            final ConnectionManager cm2 = new ConnectionManager(config);
            cm2.initialize();
            Thread.sleep(500);
            Message request = new Message(subject, MessageKind.REQUEST);
            cm2.request(
                    request,
                    60000,
                    new ConnectionManagerReplyCallback() {
                        @Override
                        public void onReply(ConnectionManager connectionManager,
                                            Message request,
                                            Message reply) {
                            // save message
                            messages.add(reply);
                        }

                        @Override
                        public void onEvent(ConnectionManager connectionManager,
                                            Status status,
                                            ConnectionEvent connectionEvent) {
                            System.err.println("" + status + " " + connectionEvent);
                        }
                    },
                    -1);

        } catch (GMSEC_Exception | InterruptedException e) {
            throw new RuntimeException(e);
        }

        // inelegant wait for messages to arrive
        Thread.sleep(5000);

        // verify saved messages
        // check waited long enough for the messages to arrive
        assertThat(messages.size(), is(2));
        // check received request message has a subject
        assertThat(messages.get(0).getSubject(), not(nullValue()));
        // check received reply message has a subject
        // this causes a SIGSEGV
        assertThat(messages.get(1).getSubject(), not(nullValue()));
    }

}
@dmwhitne-583
Copy link
Contributor

Within your callback's onMessage() and onReply() methods, you are storing the provided Message object within your own ArrayList container. Please copy the Message object before stowing it. From the onReply() JavaDoc:

DO NOT STORE the Message objects for use beyond the scope of the callback. Otherwise, make a copy of the Message object(s).

Try something like:
messages.add( new Message(reply) );

@wardev
Copy link
Author

wardev commented Apr 3, 2023

Yes, thanks, I figured that out. I still think it is an error that this code causes a seg fault. Throwing a GMSEC_Exception would be fine because that can be handled via normal error handling methods. But with a seg fault there is no chance to handle the error and it immediately kills the process.

It looks like one of the preconditions for using reinterpret_cast is that the memory is a object of the type being cast to. Based on the documentation you quote and the test case provided that is provably false for Message. Shouldn't the code then use a checked cast, such as dynamic_cast?

@dmwhitne-583
Copy link
Contributor

The GMSEC API attempted to honor your client application's request to retrieve the message subject, however it has no idea that the underlying pointer to a C++ Message object that is being referenced through the Java-based Message object (stored in your ArrayList) has already been freed/destroyed. The attempt to access a destroyed object is what leads to a seg-fault.

A dynamic_cast is used for down-casting an object, for example a Field object to a StringField object. Whereas MistMessage (sp?) is a subclass of Message, the former type of object is not provided in a callback. In other words, MistMessage ceases to exist once it is transmitted across the bus. The object provided in the callback is-a Message, nothing else.

@wardev
Copy link
Author

wardev commented Apr 4, 2023

Thanks for the explanation. That makes sense. I'm not used to dealing with the details of memory allocation. So how would you modify GMSEC API so that it can't seg fault? Would that be storing the message passed to the call back on the heap? Using automatic reference counting?

@dmwhitne-583
Copy link
Contributor

When dealing with programming languages such as C and C++, there's always the chance for a seg-fault. A developer has to be careful/cognizant of what they are doing. Whereas C++ offers containers that can maintain a reference count for a shared pointer, this is moot when the referenced object is shared across a language binding, such as JNI (to support Java). There is no way to know whether a Java app still has a reference to the object allocated in C++ land. For this reason, Message object(s) passed to a callback are limited to the scope of the callback itself, and hence why a copy should be made in situations where it is needed for a longer period.

A common reason to make a copy of a Message delivered to a callback is to defer processing until a later time, perhaps in another thread. The callback could be summoned hundreds (maybe thousands) of times per second, and in such cases it would be unwise to process the Message within the callback itself. Some apps stow a copy of the Message object within a synchronized queue that is shared with another thread.

@wardev
Copy link
Author

wardev commented Apr 4, 2023

Thanks, that is similar to the use case I'm implementing. It just seems quite dangerous that forgetting a simple copy kills the whole app in violent way, without closing streams, etc.

There is no way to know whether a Java app still has a reference to the object allocated in C++ land.

It is possible and looking at the NASA API code it is already implemented. For example, JNIMessage has the following methods:

	protected void finalize() throws Throwable
	{
		try {
			delete();
		}
		finally {
			super.finalize();
		}
	}


	public synchronized void delete()
	{
		if (swigCPtr != 0 && swigCMemOwn)
		{
			gmsecJNI.delete_Message(swigCPtr, this);
			swigCMemOwn = false;
		}

		swigCPtr = 0;
	}

The finalize() method is called by the JVM when the object is about to be garbage collected. It seems that JNIMessage uses this opportunity to let the native code know that it is done with the native memory for the particular Message.

This suggests one solution is to give the JVM "ownership" (set swigCMemOwn=true) for every Message passed to the JVM.

It seems that this could also be used to implement reference counting by decrementing a reference count when finalize() is called instead of deleting the Message.

@dmwhitne-583
Copy link
Contributor

dmwhitne-583 commented Apr 5, 2023

That's a good idea, however it goes against the paradigm used in the core API (in C++), and in other language bindings, most of which are generated using SWIG. This is not to say the change could not be made.

Before ownership can be passed to the JVM, a copy of the Message object will still need to be made. This will impact performance, depending of course on the size of the Message. And there's always a possibility where a single Message object may be destined to be delivered to multiple callbacks in a single app, thus incurring multiple copies, and yet additional performance hits.

git diff java/src/gmsecJNI_Jenv.cpp
diff --git a/java/src/gmsecJNI_Jenv.cpp b/java/src/gmsecJNI_Jenv.cpp
index 7ad6956c..7f938f97 100644
--- a/java/src/gmsecJNI_Jenv.cpp
+++ b/java/src/gmsecJNI_Jenv.cpp
@@ -285,7 +285,7 @@ JavaVM* AutoJEnv::getVM()
 
 jobject gmsec::api5::jni::createJavaMessage(JNIEnv* jenv, const gmsec::api5::Message& message)
 {
-       jobject jniMessage = jenv->NewObject(Cache::getCache().classJNIMessage, Cache::getCache().methodMessageInitJZ, JNI_POINTER_TO_JLONG((&message)), JNI_FALSE);
+       jobject jniMessage = jenv->NewObject(Cache::getCache().classJNIMessage, Cache::getCache().methodMessageInitJZ, JNI_POINTER_TO_JLONG((new Message(message))), JNI_TRUE);
        jobject jMessage = jenv->NewObject(Cache::getCache().classMessage, Cache::getCache().methodMessageInit, jniMessage);
 
        if (!gmsec::api5::jni::jvmOk(jenv, "createJavaMessage: new Message") || !jMessage)

Not every application will need to stow a Message object delivered to a callback; some just access particular fields to handle some simple task. I suppose this may be one reason the design paradigm was chosen -- honestly, I cannot recall. When I started development on the GMSEC API, callbacks in the Java binding were not even supported.

P.S. Note that GMSEC API 5.0 is available (on NASA GitHub). Support for API 4.x will continue to be provided, but do not expect a new release in that series.

@wardev
Copy link
Author

wardev commented Apr 6, 2023

That's great, thanks for the ideas. I understand there is a tradeoff between performance and safety. :)

I'll take a look at 5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants