You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Incorrect handling of string conversion between Java strings and C strings in the JNI bindings. The conversion does not correctly handle characters that are encoded into 4-bytes in UTF-8 (e.g. some emojis).
Use GetStringChars() to get the original UTF-16 encoded characters, and then create your own UTF-8 byte array from that. The conversion from UTF-16 to UTF-8 is a very simply algorithm to implement by hand, or you can use any pre-existing implementation provided by your platform or 3rd party libraries.
Have your JNI code call back into Java to invoke the String.getBytes(String charsetName) method to encode the jstring object to a UTF-8 byte array, eg:
The text was updated successfully, but these errors were encountered:
I'm fine with either proposed solution, the second is easier to reason about but a chunk more code and criss-crossing different parts of the codebase, if the first solution is as simple as random stackoverflow commenters think that's probably simpler.
What
Incorrect handling of string conversion between Java strings and C strings in the JNI bindings. The conversion does not correctly handle characters that are encoded into 4-bytes in UTF-8 (e.g. some emojis).
ldk-garbagecollected/java_strings.py
Lines 465 to 487 in 3e33cfd
Current conversion functions
GetStringUTFChars
andNewStringUTF
expect Modified UTF-8 (MUTF-8) format and not regular UTF-8.Java's MUTF-8 to C's UTF-8 encoding is causing data corruption for certain UTF-8 characters.
Possible Fix?
https://stackoverflow.com/a/32215302
The text was updated successfully, but these errors were encountered: