Skip to content

Strings

Eugene Gershnik edited this page Aug 31, 2024 · 10 revisions

Passing strings between C++ code and Java using JNI is very tricky to get right. SimpleJNI tries to make it easy and straightforward while avoiding common problems.

The very first thing you should know about JNI and strings is that all the raw JNI methods with UTF in their name are dangerous and should be avoided. These methods operate on so called modified UTF-8 which is unlikely to be what the rest of your C++ code means by UTF-8. Things will appear to work just fine until you try to pass emoji from Java to C++ or back at which point they will mysteriously fail. The only safe way to pass strings in JNI is by using UTF-16 which is Java's internal string representation. SimpleJNI supports that as well as exposes it's own UTF-8 interface that uses standard UTF-8 to pass strings back and forth.

In SimpleJNI you operate on Java strings like on any other Java object using string JNI type. A helper class java_string exposes various utility methods that cover all the necessary operations.

You can create a Java string from C++ data in a few ways

JNIEnv * env = ...;

//From a null-terminated C string

//Plain char is always assumed to be in UTF-8
local_java_ref<jstring> str = java_string_create(env, "abcd");
//you can also use char8_t strings, if supported by your compiler 
local_java_ref<jstring> str = java_string_create(env, u8"abcd");
//and char16_t strings
local_java_ref<jstring> str = java_string_create(env, u"abcd");
//or, using jchar * type if you have it from somewhere else
const jchar * jcstr = ...;
local_java_ref<jstring> str = java_string_create(env, jcstr);

//All of the above also support explicit size
local_java_ref<jstring> str = java_string_create(env, "abcd", 2);
//... and so on for other variants

//From a C++ object

//In C++17 you can only use a UTF-8 C++ string
local_java_ref<jstring> str = java_string_create(env, std::string("abcd"));

//In C++20 wit ranges available you can use any contiguous range (e.g. std::basic_string, std::vector, 
//std::basic_string_view, std::span, std::array etc.) of all the character types above
local_java_ref<jstring> str = java_string_create(env, std::vector<char>{'a', 'b', 'c', 'd'});
local_java_ref<jstring> str = java_string_create(env, std::u8string_view(u8"abcd"));
//... etc

When having a Java string object you can obtain its length and extract its characters as follows

local_java_ref<jstring> str = ...;
jsize len = java_string_get_length(env, str);
std::vector<jchar> buffer(len);
if (len)
    java_string_get_region(env, str, 0, len, buffer.data());

//In C++20 with ranges available you can simply use any contiguous range as a destination
java_string_get_region(env, str, 0, buffer);

SimpleJNI also supports usage equivalent to raw JNI's GetStringChars/ReleaseStringChars via a java_string_access RAII wrapper.

auto str = java_string_create(env, "hello");
java_string_access access(env, str);
for(char c: access)
    ...
//or via indices
for(jsize i = 0, count = access.size(); i < count; ++i)
    char c = access[i];
    ...
//or via algorithms
std::copy(access.begin(), access.end(), ...somewhere...);
//Or using C++20 ranges. java_string_access is a contiguous range
std::ranges::copy(access, ...somewhere...);

However, usage of java_string_access is not always a good idea. First, performance-wise GetStringChars never guarantees that you will get access to the underlying characters - its specification says that a copy might be made and it often is. Second, the buffer you get from java_string_access is read-only (remember, Java strings are immutable). In practice you often end up needing to mutate the contents and at that point you will need to make a second copy. Third, you have no control over memory allocation of the buffer in case the copy is made. If you want to use your own memory arena or allocator - tough luck. With all this in mind the simplest and most straightforward approach is the one that uses java_string_get_region - make your own buffer and copy into it.

Finally, because it is so common there is a helper that converts a Java string to std::string (in UTF-8 encoding)

auto str = java_string_create(env, "hello");
std::string cpp_string = java_string_to_cpp(env, str);

UTF conversions

Since SimpleJNI itself and its users often need to convert between UTF-8, UTF-16 and UTF-32 and there is no standard C++ facility that does so, (well, not without a huge overhead), SimpleJNI includes simple conversion algorithms

template<typename InIt, typename Out>
Out utf32_to_utf16(InIt first, InIt last, Out dest);
template<typename InIt, typename OutIt>
OutIt utf16_to_utf32(InIt first, InIt last, OutIt dest);
template<typename InIt, typename Out>
Out utf8_to_utf16(InIt first, InIt last, Out dest);
template<typename InIt, typename OutIt>
OutIt utf16_to_utf8(InIt first, InIt last, OutIt dest);

These never fail. If a conversion cannot be made the replace invalid characters with U+FFFD and continue as much as possible. This might not be the right behavior in security sensitive context so if that is important to you please

  1. Do not use these algorithms in such context and
  2. Do not use java_string methods accepting UTF-8. Use the ones that take jchars or char16_t and perform conversions yourself using security-tuned conversion library.