Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect String Conversion in JNI to C Bindings #136

Closed
zpv opened this issue Aug 24, 2023 · 1 comment
Closed

Incorrect String Conversion in JNI to C Bindings #136

zpv opened this issue Aug 24, 2023 · 1 comment
Milestone

Comments

@zpv
Copy link

zpv commented Aug 24, 2023

What

Incorrect handling of string conversion between Java strings and C strings in the JNI bindings. The conversion does not correctly handle characters that are encoded into 4-bytes in UTF-8 (e.g. some emojis).

static inline jstring str_ref_to_java(JNIEnv *env, const char* chars, size_t len) {
// Sadly we need to create a temporary because Java can't accept a char* without a 0-terminator
char* conv_buf = MALLOC(len + 1, "str conv buf");
memcpy(conv_buf, chars, len);
conv_buf[len] = 0;
jstring ret = (*env)->NewStringUTF(env, conv_buf);
FREE(conv_buf);
return ret;
}
static inline LDKStr java_to_owned_str(JNIEnv *env, jstring str) {
uint64_t str_len = (*env)->GetStringUTFLength(env, str);
char* newchars = MALLOC(str_len + 1, "String chars");
const char* jchars = (*env)->GetStringUTFChars(env, str, NULL);
memcpy(newchars, jchars, str_len);
newchars[str_len] = 0;
(*env)->ReleaseStringUTFChars(env, str, jchars);
LDKStr res = {
.chars = newchars,
.len = str_len,
.chars_is_owned = true
};
return res;
}

  • Current conversion functions GetStringUTFChars and NewStringUTF expect Modified UTF-8 (MUTF-8) format and not regular UTF-8.

  • Java's MUTF-8 to C's UTF-8 encoding is causing data corruption for certain UTF-8 characters.

Possible Fix?

https://stackoverflow.com/a/32215302

  • Use GetStringChars() to get the original UTF-16 encoded characters, and then create your own UTF-8 byte array from that. The conversion from UTF-16 to UTF-8 is a very simply algorithm to implement by hand, or you can use any pre-existing implementation provided by your platform or 3rd party libraries.

  • Have your JNI code call back into Java to invoke the String.getBytes(String charsetName) method to encode the jstring object to a UTF-8 byte array, eg:

@TheBlueMatt
Copy link
Collaborator

I'm fine with either proposed solution, the second is easier to reason about but a chunk more code and criss-crossing different parts of the codebase, if the first solution is as simple as random stackoverflow commenters think that's probably simpler.

@TheBlueMatt TheBlueMatt added this to the 0.0.117 milestone Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants