-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mobile: Rewrite Java/JNI stream callbacks #34234
Conversation
c4c1365
to
687fa1c
Compare
/retest |
6593e0e
to
5534231
Compare
Signed-off-by: Fredy Wijaya <fredyw@google.com>
5534231
to
381fa2c
Compare
/retest |
/assign @abeyad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome!
mUserCurrentReadBuffer = null; // Avoid the reference to a potentially large buffer. | ||
int dataRead = data.remaining(); | ||
// It is important to copy the `data` into the `userBuffer` outside the thread execution | ||
// because the `data` is backed by a direct `ByteBuffer` and it will be destroyed upon once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/destroyed upon once/destroyed once
Also, was this a problem before this PR? If not, why is it a problem now? I think I'm missing the correlation between this particular change and the rest of the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/destroyed upon once/destroyed once
Fixed the typo.
Also, was this a problem before this PR? If not, why is it a problem now? I think I'm missing the correlation between this particular change and the rest of the PR
With the new change, we are passing the direct ByteBuffer
into the callback without any copy (no more envoy_data
(in C++) to byte[]
(in Java)). The direct ByteBuffer
is backed by the native C++ Envoy::Buffer::Instance
. That also means it's tied to the lifetime of Envoy::Buffer::Instance
. In Cronvoy, we run the onData
on a separate thread, so we have to make a copy of the ByteBuffer
, which is what the existing code does (the old is also making a copy). The new code is simply moving the copy outside the thread because making a copy inside the thread poses a risk of having a dangling ByteBuffer
.
int dataRead = data.remaining(); | ||
// It is important to copy the `data` into the `userBuffer` outside the thread execution | ||
// because the `data` is backed by a direct `ByteBuffer` and it will be destroyed upon once | ||
// the `onData` completes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so is this the only place we'll have buffer copies once everything is in the final desired end state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup!
mobile/library/jni/jni_helper.h
Outdated
* See https://developer.android.com/training/articles/perf-jni#jclass,-jmethodid,-and-jfieldid | ||
* | ||
* Another reason for caching the `jclass` object is to able to find a non-built-in class when the | ||
* native code creates a thread and then attaching it with `AttachCurrentThread`, i.e. calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/attaching/attaches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Another reason for caching the `jclass` object is to able to find a non-built-in class when the | ||
* native code creates a thread and then attaching it with `AttachCurrentThread`, i.e. calling | ||
* `getThreadLocalEnv()->getEnv()->FindClass`. This is because there are no stack frames from the | ||
* application. When calling `FindClass` from the thread, the `JavaVM` will start in the "system" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this comment, very helpful to know! One thing that wasn't clear is what stack frames have to do with the reset of the explanation about FindClass
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://developer.android.com/training/articles/perf-jni#faq:-why-didnt-findclass-find-my-class
If the JNIEnv
is attached to the current thread, there's no stack frame and FindClass
needs the stack frame to find the class. For built-in classes, e.g. java.util.*
, etc this is fine because we look in the system classloader, but for non-built-in classes, this will result in a class not found. Specific to this PR, we need to find EnvoyStreamIntel
and EnvoyFinalStreamIntel
, which are custom classes. In the next PR, I will update the rest of the code to cache the jclass
-es since that's also the recommendation from https://developer.android.com/training/articles/perf-jni#jclass,-jmethodid,-and-jfieldid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note for this, on Android each app has a separate ClassLoader
(for sandboxing purposes). The old code does this:
envoy/mobile/library/jni/jni_utility.cc
Lines 24 to 33 in 266f9a3
LocalRefUniquePtr<jclass> findClass(const char* class_name) { | |
JniHelper jni_helper(JniHelper::getThreadLocalEnv()); | |
LocalRefUniquePtr<jclass> class_loader = jni_helper.findClass("java/lang/ClassLoader"); | |
jmethodID find_class_method = jni_helper.getMethodId(class_loader.get(), "loadClass", | |
"(Ljava/lang/String;)Ljava/lang/Class;"); | |
LocalRefUniquePtr<jstring> str_class_name = jni_helper.newStringUtf(class_name); | |
LocalRefUniquePtr<jclass> clazz = jni_helper.callObjectMethod<jclass>( | |
getClassLoader(), find_class_method, str_class_name.get()); | |
return clazz; | |
} |
It basically tries to read any custom code from the app ClassLoader
, which is what the third recommendation says in https://developer.android.com/training/articles/perf-jni#faq:-why-didnt-findclass-find-my-class
Cache a reference to the ClassLoader object somewhere handy, and issue loadClass calls directly. This requires some effort.
My new solution is to actually go with the first recommendation with caching (caching is also better for performance as mentioned in https://developer.android.com/training/articles/perf-jni#jclass,-jmethodid,-and-jfieldid).
Do your FindClass lookups once, in JNI_OnLoad, and cache the class references for later use. Any FindClass calls made as part of executing JNI_OnLoad will use the class loader associated with the function that called System.loadLibrary (this is a special rule, provided to make library initialization more convenient). If your app code is loading the library, FindClass will use the correct class loader.
* | ||
* https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#findclass | ||
*/ | ||
[[nodiscard]] jclass findClassFromCache(const char* class_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why isn't this static if addClassToCache
is static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong reason for it not to be static
other than it's a bit nicer being able to call jni_helper.findClass
similar like the rest of the code. The JniHelper
is meant to be a tiny wrapper around JNI API and findClass
is a function in the JNI API, e.g. (env->FindClass
) so having jni_helper.findClass
makes it more consistent with the JNI API.
In the next PR, I plan to remove findClassFromCache
and simply replace findClass
implementation with the cache implementation, so the existing code doesn't need to change.
@@ -10,6 +12,7 @@ constexpr jint JNI_VERSION = JNI_VERSION_1_6; | |||
constexpr const char* THREAD_NAME = "EnvoyMain"; | |||
std::atomic<JavaVM*> java_vm_cache_; | |||
thread_local JNIEnv* jni_env_cache_ = nullptr; | |||
absl::flat_hash_map<absl::string_view, jclass> JCLASS_CACHES; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how large would this map grow? I'm concerned about keeping it unbounded in size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We control the cache via addClassToCache
in the JNI_OnLoad
. Right now we only cache 2 classes, but in the future it'll be whatever Envoy Mobile needs, but it'll be around 10-20.
auto java_stream_callbacks_class = jni_helper.getObjectClass(java_stream_callbacks_global_ref); | ||
auto java_on_headers_method_id = jni_helper.getMethodId( | ||
java_stream_callbacks_class.get(), "onHeaders", | ||
"(Ljava/util/Map;ZLio/envoyproxy/envoymobile/engine/types/EnvoyStreamIntel;)V"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we're looking a lot of methods up now by their exact argument types, whereas before it seems we'd look up the same methods without that... is this something required by the new jclass cache? I think I'm not following the new logic here, but I knew very little about JNI so I'm mostly asking for educational purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the old code, basically we are passing byte[] key, byte[] value
(
envoy/mobile/library/java/io/envoyproxy/envoymobile/engine/JvmCallbackContext.java
Lines 25 to 27 in 266f9a3
void passHeader(byte[] key, byte[] value, boolean start) { | |
bridgeUtility.passHeader(key, value, start); | |
} |
Map
(envoy/mobile/library/java/io/envoyproxy/envoymobile/engine/JvmBridgeUtility.java
Lines 57 to 62 in 266f9a3
Map<String, List<String>> retrieveHeaders() { | |
final Map<String, List<String>> headers = headerAccumulator; | |
headerAccumulator = null; | |
headerCount = 0; | |
return headers; | |
} |
The old code uses intermediate types and has more translation layers, which this PR is trying to eliminate. For the onHeader
case, it will first create a pair of byte[]
s and then convert them into Java Map
, which I personally think is wasteful and not to mention the code isn't very easy to read.
override fun onHeaders( | ||
headers: Map<String, List<String>>, | ||
endStream: Boolean, | ||
streamIntel: EnvoyStreamIntel | ||
) { | ||
callbacks.onHeaders?.invoke(ResponseHeaders(headers), endStream, StreamIntel(streamIntel)) | ||
if (executor != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it looks like we're making the executor optional - what's the advantage of that? Seems to put more onus on the caller. What about having a default executor of DirectExecutor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are defaulting it to the direct executor, we always have to make a copy since there's no way we can omit the copy or else we'll risk having a dangling ByteBuffer
. For Cronvoy, this would result in two copies (one from here and the other one on the Cronvoy side) and this is what we want to avoid. Basically we want to eliminate the copy when it's not necessary, so making it optional is a better design IMO.
Another side note, the C++ API doesn't actually enforce some kind of threading model. It's basically up to the callback implementer to decide whether they need some kind of threading. Although the onus is on the caller, it provides more flexibility IMO.
Signed-off-by: Fredy Wijaya <fredyw@google.com>
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanations!
This PR rewrites the Java/JNI implementation of the stream callbacks by removing a lot of indirection. This should reduce the number of translation layers, copies, as well as making the code easier to read and maintain. Prior to this PR, the filter and callback implementation was somewhat shared, but it makes the code confusing to read because there's some logic that's only relevant for the filters and vice versa. With the new implementation, the stream callback implementation is separate from the filter implementation, but some common helpers have been written so that the code can still be shared between the stream callback and filter implementations. In the next PR, I will update the filter implementation, so that we can finally get rid of the C wrapper types.
Risk Level: medium
Testing: unit & integration tests
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: mobile