Incorrect String Conversion in JNI to C Bindings #136

zpv · 2023-08-24T22:15:46Z

What

Incorrect handling of string conversion between Java strings and C strings in the JNI bindings. The conversion does not correctly handle characters that are encoded into 4-bytes in UTF-8 (e.g. some emojis).

ldk-garbagecollected/java_strings.py

Lines 465 to 487 in 3e33cfd

    
           static inline jstring str_ref_to_java(JNIEnv *env, const char* chars, size_t len) { 
        
           	// Sadly we need to create a temporary because Java can't accept a char* without a 0-terminator 
        
           	char* conv_buf = MALLOC(len + 1, "str conv buf"); 
        
           	memcpy(conv_buf, chars, len); 
        
           	conv_buf[len] = 0; 
        
           	jstring ret = (*env)->NewStringUTF(env, conv_buf); 
        
           	FREE(conv_buf); 
        
           	return ret; 
        
           } 
        
           static inline LDKStr java_to_owned_str(JNIEnv *env, jstring str) { 
        
           	uint64_t str_len = (*env)->GetStringUTFLength(env, str); 
        
           	char* newchars = MALLOC(str_len + 1, "String chars"); 
        
           	const char* jchars = (*env)->GetStringUTFChars(env, str, NULL); 
        
           	memcpy(newchars, jchars, str_len); 
        
           	newchars[str_len] = 0; 
        
           	(*env)->ReleaseStringUTFChars(env, str, jchars); 
        
           	LDKStr res = { 
        
           		.chars = newchars, 
        
           		.len = str_len, 
        
           		.chars_is_owned = true 
        
           	}; 
        
           	return res; 
        
           }

Current conversion functions GetStringUTFChars and NewStringUTF expect Modified UTF-8 (MUTF-8) format and not regular UTF-8.
Java's MUTF-8 to C's UTF-8 encoding is causing data corruption for certain UTF-8 characters.

Possible Fix?

https://stackoverflow.com/a/32215302

Use GetStringChars() to get the original UTF-16 encoded characters, and then create your own UTF-8 byte array from that. The conversion from UTF-16 to UTF-8 is a very simply algorithm to implement by hand, or you can use any pre-existing implementation provided by your platform or 3rd party libraries.

Have your JNI code call back into Java to invoke the String.getBytes(String charsetName) method to encode the jstring object to a UTF-8 byte array, eg:

The text was updated successfully, but these errors were encountered:

TheBlueMatt · 2023-08-24T22:21:05Z

I'm fine with either proposed solution, the second is easier to reason about but a chunk more code and criss-crossing different parts of the codebase, if the first solution is as simple as random stackoverflow commenters think that's probably simpler.

TheBlueMatt added this to the 0.0.117 milestone Aug 24, 2023

TheBlueMatt closed this as completed in 32973ea Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect String Conversion in JNI to C Bindings #136

Incorrect String Conversion in JNI to C Bindings #136

zpv commented Aug 24, 2023

TheBlueMatt commented Aug 24, 2023

Incorrect String Conversion in JNI to C Bindings #136

Incorrect String Conversion in JNI to C Bindings #136

Comments

zpv commented Aug 24, 2023

What

Possible Fix?

TheBlueMatt commented Aug 24, 2023