Skip to content

Strings

Eugene Gershnik edited this page Oct 9, 2021 · 4 revisions

Passing strings between C++ code and Java using JNI is very tricky to get right. SimpleJNI tries to make it easy and straightforward while avoiding common problems. The very first thing you should know about JNI and strings is that all the raw JNI methods with UTF in their name are dangerous and should be avoided. These methods operate on so called modified UTF-8 which is unlikely to be what the rest of your C++ code means by UTF-8. Things will appear to work just fine until you try to pass emoji from Java to C++ or back at which point they will mysteriously fail. The only safe way to pass strings in JNI is by using UTF-16 which is Java's internal string representation. SimpleJNI supports that as well as exposes it's own UTF-8 interface that uses standard UTF-8 to pass strings back and forth. In SimpleJNI you operate on Java strings like on any other Java object using string JNI type. A helper class java_string exposes various utility methods that cover all the necessary operations. You can create a Java string from C++ data in two ways

JNIEnv * env = ...;
//From UTF-8 C string
local_java_ref<jstring> str1 = java_string_create(env, "abcd");
//From UTF-8 C++ string
local_java_ref<jstring> str1 = java_string_create(env, std::string("abcd"));
//From UTF-16 string
local_java_ref<jstring> str2 = java_string_create(env, static_cast<const jchar *>(u"abcd"), 4);

When having a Java string object you can obtain its length and copy its characters as follows

string str = ...;
jsize len = java_string_get_length(env, str);
std::vector<jchar> buffer(len);
if (len)
    java_string_get_region(env, str, 0, len, &buffer[0])

SimpleJNI also supports usage equivalent to raw JNI's GetStringChars/ReleaseStringChars via a java_string_access RAII wrapper.

auto str = java_string_create(env, "hello");
java_string_access access(env, str);
for(jsize i = 0, count = access.size(); i < count; ++i)
    char c = access[i];

std::copy(access.begin(), access.end(), ...somewhere...);

However, usage of java_string_create is rarely a good idea. First, performance-wise GetStringChars never guarantees that you will get access to the underlying characters - its specification says that a copy might be made and it often is. Second, the buffer you get from java_string_access is read-only (remember, Java strings are immutable). In practice you often end up needing to mutate the contents and at that point you will need to make a second copy. Third, you have no control over memory allocation of the buffer in case the copy is made. If you want to use your own memory arena or allocator - tough luck. With all this in mind the simplest and most straightforward approach is the one that uses java_string_get_region - make your own buffer and copy into it.

Finally, because it is so common there is a helper that converts a Java string to std::string (in UTF-8 encoding)

auto str = java_string_create(env, "hello");
std::string cpp_string = java_string_to_cpp(env, str);

UTF conversions

Since SimpleJNI itself and its users often need to convert between UTF-8, UTF-16 and UTF-32 and there is no standard C++ facility that does so, (well, not without a huge overhead), SimpleJNI includes simple conversion algorithms

template<typename InIt, typename Out>
Out utf32_to_utf16(InIt first, InIt last, Out dest);
template<typename InIt, typename OutIt>
OutIt utf16_to_utf32(InIt first, InIt last, OutIt dest);
template<typename InIt, typename Out>
Out utf8_to_utf16(InIt first, InIt last, Out dest);
template<typename InIt, typename OutIt>
OutIt utf16_to_utf8(InIt first, InIt last, OutIt dest);

These never fail. If a conversion cannot be made the replace invalid characters with U+FFFD and continue as much as possible. This might not be the right behavior in security sensitive context so if that is important to you please

  1. Do not use these algorithms in such context and
  2. Do not use java_string methods accepting UTF-8. Use the ones that take jchars and perform conversions yourself using security-tuned conversion library.