Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeExecutor #196

Open
ejanzer opened this issue Jan 31, 2020 · 26 comments
Open

RuntimeExecutor #196

ejanzer opened this issue Jan 31, 2020 · 26 comments
Labels
💡 Proposal This label identifies a proposal 👓 Transparency This label identifies a subject on which the core has been already discussing prior to the repo

Comments

@ejanzer
Copy link
Collaborator

ejanzer commented Jan 31, 2020

Introduction

We're currently rewriting core pieces of React Native's infrastructure to function without the bridge. As part of that effort, we want to re-think how we create and manage access to the JS runtime.

Right now, the runtime is managed by a JSExecutor, which is created by a JSExecutorFactory, which is passed in to the bridge when it is constructed. Each type of JS VM has its own executor factory (JSC, Hermes, etc.). The JSExecutor is owned by the NativeToJSBridge, which ensures that all operations on the JS runtime take place on the JS message queue thread. The NativeToJSBridge is then owned by the Instance, which interfaces with platform-specific code.

The downside with this approach is that accessing the jsi::Runtime is difficult. Adding a new method that accesses the JS runtime and exposing it to platform-specific code requires touching a half dozen classes, or more. So what ends up happening instead is that we access the runtime in unsafe ways, for convenience. For example, on Android we expose the raw pointer to the jsi::Runtime through CatalystInstance.getJavaScriptContextHolder(), which then allows us to access the jsi::Runtime without the safety of the JS message queue thread, which is used to prevent concurrent access. This has already caused problems for us.

The Core of It

Our goal with this proposal is to build the simplest possible abstraction to expose safe access to a jsi::Runtime. Once we have this abstraction, we can build higher-level, more versatile interfaces on top of it, which will be exposed to product code (and maybe used within React Native core, as well). The abstraction that we’ve decided to use is what we’re calling RuntimeExecutor.

RuntimeExecutor is already used by Fabric today. It's defined in RuntimeExecutor.h:

/*
 * Takes a function and calls it with a reference to a Runtime. The function
 * will be called when it is safe to do so (i.e. it ensures non-concurrent
 * access) and may be invoked asynchronously, depending on the implementation.
 * If you need to access a Runtime, it's encouraged to use a RuntimeExecutor
 * instead of storing a pointer to the Runtime itself, which makes it more
 * difficult to ensure that the Runtime is being accessed safely.
*/
using RuntimeExecutor =
  std::function<void(std::function<void(jsi::Runtime &runtime)> &&callback)>;

The RuntimeExecutor must make the following guarantees (note: these are slightly different from what's described in the docblock above):

  • The callback will only be invoked when it has exclusive access to the jsi::Runtime.
  • The callback will be invoked asynchronously (i.e. calling the RuntimeExecutor will not block the current thread).

The RuntimeExecutor will be created by a class that implements the interface JSEngineInstance. (This interface is not on GitHub yet, but it basically just defines a method, getRuntimeExecutor, that returns a RuntimeExecutor). Each JS VM will need its own implementation of this interface (HermesInstance, JSCInstance, etc.).

The JSEngineInstance will be the one responsible for the synchronization that guarantees exclusive access to the runtime. It will create the RuntimeExecutor pass it to React Native core when the React Native instance is initialized.

JSEngineInstance  →  RuntimeExecutor →  React Native Core
HermesInstance                          Fabric
JSCInstance                             TurboModules
...

In theory, the JSEngineInstance can use whatever mechanism it wants to guarantee exclusive access to the runtime, as long as it satisfies the above conditions. In practice, however, we will continue to use the JS message queue thread for this. This is important because the assumption that we’re using this thread is baked in to React Native in various places (see discussion).

Here is an example implementation of RuntimeExecutor, from Fabric’s Binding.cpp:

  RuntimeExecutor runtimeExecutor =
      [runtime, sharedJSMessageQueueThread](
          std::function<void(facebook::jsi::Runtime & runtime)> &&callback) {
        sharedJSMessageQueueThread->runOnQueue(
            [runtime, callback = std::move(callback)]() {
              callback(*runtime);
            });
      };

Why Make This Change

The main reason to make this change is because it will guarantee safe (exclusive) access to the jsi::Runtime, which we don't currently have in places where store a jsi::Runtime *. This will prevent crashes caused by accessing the runtime simultaneously from multiple threads.

With RuntimeExecutor, it will also be easier to lazily initialize the jsi::Runtime than it is today. Right now we don’t have a lot of control over when we pay the cost of initializing the JSExecutorFactory and the runtime. With this model, it will be easier to only initialize the runtime when it’s actually needed.

Because we are now delegating threading/synchronization to the JSEngineInstance, it will also be possible for different VMs to use different threading models. For example, on iOS 11 and below (IIRC, needs citation), JSC can only be accessed from the same thread it was created on; other JS VMs don’t have this limitation. By delegating the threading model to the JSEngineInstance, we could have greater flexibility in which thread(s) we use for JS execution.

Discussion points

  • Theoretically, the JSEngineInstance can use whatever mechanism it wants to guarantee exclusive access to the runtime. However, there is an assumption baked in to RN that all JS is executed on the JS message queue thread; any RuntimeExecutor that violates that assumption will probably run into a lot of issues. This is a limitation that we want to move away from in the future, but for now it’s important that all RuntimeExecutors continue to use the JS message queue thread, at least on Android and iOS.
  • RuntimeExecutor is a very simple abstraction, and is not ideal for usage in product code. However, we can write additional abstractions on top of it (see appendix) which can provide more flexible APIs. We still need to discuss exactly what this higher-level abstraction will look like, and where/how it will be used.
  • Synchronous JS execution: At some point in the future, we want to support synchronous JS execution for some specific use cases (e.g. high priority UI events). RuntimeExecutor provides a way to ensure proper threading (a valid thread to execute the callback) and proper ownership management (validity of the pointer to jsi::Runtime). However RuntimeExecutor does not provide any other high-level concepts such as serial queues or guaranteed sync or async execution of the callback. Those features can be built on top of RuntimeExecutor in a platform/VM-agnostic manner.
  • ABI safety: We believe RuntimeExecutor is as safe as JSI is. The ABI stability of std::function is guaranteed by C++ standard library; the rest is just jsi::Runtime. All actual complexity of implementation design is hidden behind stable std::function interface.

Appendix: Possible Implementations

 static void executeAsynchronously(RuntimeExecutor &runtimeExecutor, std::function<void(jsi::Runtime &runtime)> &&callback, std::shared_ptr<std::thread> thread = {}) {
    // This is the first, not the most perfromance implementation.
    // What we can do better:
    // 1. Schedule normally first, check that the call was sync and only then spin up a new thread.
    // 2. Use given thread (if provided) to save a thread allocation in cases where it make sense.
    std::thread{[callback=std::move(callback), &runtimeExecutor]() mutable {
      runtimeExecutor(std::move(callback));
    }};
  }
  
  static void executeSynchronously_CAN_DEADLOCK(RuntimeExecutor &runtimeExecutor, std::function<void(jsi::Runtime &runtime)> &&callback) {
    std::mutex mutex;
    mutex.lock();

    runtimeExecutor([callback=std::move(callback), &mutex](jsi::Runtime &runtime) {
      callback(runtime);
      mutex.unlock();
    });

    mutex.lock();
  }

  static void executeSynchronouslyOnSameThread_CAN_DEADLOCK(RuntimeExecutor &runtimeExecutor, std::function<void(jsi::Runtime &runtime)> &&callback) {
    // Note: We need the third mutex to get back to the main thread before
    // the lambda is finished (because all mutexes are allocated on the stack).

    std::mutex mutex1;
    std::mutex mutex2;
    std::mutex mutex3;

    mutex1.lock();
    mutex2.lock();
    mutex3.lock();

    jsi::Runtime *runtimePtr;

    runtimeExecutor([&](jsi::Runtime &runtime) {
      runtimePtr = &runtime;
      mutex1.unlock();
      // `callback` is called somewhere here.
      mutex2.lock();
      mutex3.unlock();
    });

    mutex1.lock();
    callback(*runtimePtr);
    mutex2.unlock();
    mutex3.lock();
  }
@ejanzer
Copy link
Collaborator Author

ejanzer commented Jan 31, 2020

Adding the people who have contributed to this proposal/discussion at FB: @shergin, @RSNara, @mhorowitz @fkgozali

And some people who might be interested: @vmoroz @acoates-ms

This proposal isn't something we're completely committed to, and we still have some details to iron out, but I wanted to open this up for discussion, especially with the maintainers of other platforms.

@vmoroz
Copy link

vmoroz commented Jan 31, 2020

Talking about the std::function ABI safety: it is not ABI safe because C++ standard does not specify its layout, it only specifies its API. If, for example, V8 engine that exposes JSI is compiled using Clang, that DLL cannot be used from MSVC compiled code because it has different implementation of std::function. In practice, we often cannot use C++ standard classes even with the same compiler if modules are compiled using different Release vs Debug configurations because Debug version often has additional fields.

In general, no standard C++ standard classes are ABI safe except for some trivial cases.

The industry usually uses:

  • plain C functions with specified calling conventions such as __stdcall.
  • plain C structs or arrays where the byte layout and alignment are well defined.

E.g. the Chakra JS engine API is C-based and thus ABI safe.

Another alternative for ABI-safe APIs is to use v-tables. A v-table is essentially an array with function pointers. Most compilers use the same predefined v-table layout to keep compatibility with Microsoft's COM to be able to use Windows API.

Thus, in addition to C-style APIs we can define ABI-safe API based on C++ abstract virtual interfaces. We can either make them COM-compatible and derive all interfaces from IUnknown, or use custom defined rules for the interface shape and lifetime management. The IUnknown gives us out of the box shared ref-counted semantic and ABI-safe dynamic cast (QueryInterface) an I personally like it for its simplicity and power. (There is no need to bring the rest of the COM.)

In that case instead of std::function, the ABI-safe API should use an interface that has Invoke method. To simplify the usage, we can also wrap it up in a simple header-only smart pointer such as std::unique_ptr or ComPtr and call it abi::function.

@shergin
Copy link

shergin commented Jan 31, 2020

@vmoroz As a person who wrote that sentence, I absolutely agree with you.

Personally, I don't think we should aim for ABI stability in our C++ code at all. If some customers need ABI stability, I think we/they have to have plain C wrapper for that.

Yeah, that's a controversial opinion that divided the whole C++ community into two camps. From what I see, Google (via Titus Winters) pushes against the stability. I am not aware of an official Facebook position but all I see inside (CI, repos, build configurations, culture) clearly indicates that Facebook on the same page with Google.

Does Microsoft value ABI stability?

(For anyone who interested in the topic, I would recommend this talk to get some context about how challenging it is to preserve the ABI stability (from libc++ authors): https://www.youtube.com/watch?v=DZ93lP1I7wU )

@shergin
Copy link

shergin commented Jan 31, 2020

I think I would change the sentence

The callback will be invoked asynchronously (i.e. calling the RuntimeExecutor will not block the current thread).

to

The callback might be invoked asynchronously (i.e. calling the RuntimeExecutor may not block the current thread).

The reason is that the implementation details should do whatever it believes the most efficient way to do for the particular moment (considering the current thread, load and so on) whereas the caller should express the desired (sync/async/sync-same-thread/async-specified-thread/and-so-on) behavior via additional helper functions (see the appendix).
Specifying all constrains and particular behavior for RuntimeExecutor will penalize all other use-cases where we do not care about those particular constraints (one size does not fit all). At the same time, defining the exact behavior of RuntimeExecutor will not allow avoiding those additional wrappers.

@RSNara
Copy link

RSNara commented Feb 1, 2020

The callback might be invoked asynchronously (i.e. calling the RuntimeExecutor may not block the current thread).

Specifying all constrains and particular behavior for RuntimeExecutor will penalize all other use-cases where we do not care about those particular constraints (one size does not fit all). At the same time, defining the exact behavior of RuntimeExecutor will not allow avoiding those additional wrappers.

I think the more specific we make the definition of RuntimeExecutor, the more useful it becomes as an abstraction. Something that is maybe synchronous, and maybe asynchronous is dangerous. Something that is maybe asynchronous on the same thread, or maybe asynchronous on a third party thread (i.e: JS thread) is dangerous. I'm not sure how we can use RuntimeExecutor with confidence if we leave these details vague. How do I know that the same piece of code won't deadlock if we pass in a different RuntimeExecutor?

Could you elaborate a bit more on why you think this is a good idea? I think my doubt stems from a lack of understanding of what you're trying to convey.

Edit:

Oh, I guess we won't be using RuntimeExecutor directly, but instead via these APIs that force it to be asynchronous or synchronous? But even then, why even need these utilities? executeSynchronouslyOnSameThread_CAN_DEADLOCK will always deadlock on a synchronous RuntimeExecutor. This isn't true if make sure that RuntimeExecutor is always async on the current or some third party thread. Also, executeAsynchronously wouldn't be necessary if we specify that RuntimeExecutor is always async. 🤔

@kelset kelset added 👓 Transparency This label identifies a subject on which the core has been already discussing prior to the repo 💡 Proposal This label identifies a proposal labels Feb 3, 2020
@shergin
Copy link

shergin commented Feb 3, 2020

@RSNara
That's fair points!
I think the idea of guaranteed asynchronous is actually so non-trivial in C++ (concurrent environment) so it's dangerous to even try to guarantee that. 🤯😭
Yeah, in the original design and implementation of RuntimeExecutor, it was always async but here's why I changed my mind...

Something that is maybe synchronous, and maybe asynchronous is dangerous.

What do you mean by that? When I first thought about this problem, I imagined this example:

var x = 1;
someTrulyAsyncAPI(function() { x = 3; });
x = 2;
console.log(x); // Must print `2`.

That's absolutely understandable for any JavaScript engineer. That's why we have nextTick and Promise API is always async. Let's rewrite that in C++.

auto x = std::make_shared<int>{1};
someTrulyAsyncAPI([=]() { *x = 3; });
*x = 2;
assert(*x == 2); // Should print `2`...? Not really. :( 

I think all use-case scenarios fall into three categories:

  • We don't care about sync/async nature of the calls (majority);
  • We want to have it sync and being executed on the same thread in non-blocking re-entrant manner (fabric sync event dispatching);
  • We want to have some sort of queue that we want to manage (imaginary, not real use case).

If someone thinks that some code depends on async execution, I suspect it's already broken.
There are several optimizations and implementation aspects that will drastically benefit by removing any asynchronous execution guarantees, I will describe it later.

I guess we won't be using RuntimeExecutor directly, but instead via these APIs that force it to be asynchronous or synchronous?

Yeah, kinda. I just expect that the vast majority of use-cases do/should not need to have any guarantees.

executeSynchronouslyOnSameThread_CAN_DEADLOCK will always deadlock on a synchronous RuntimeExecutor.

True. We (I) should fix it.

Added:
But why on Earth we would want from RuntimeExecutor to execute the callback synchronously when the caller does not strictly need it? There are several possible use-cases:

  • Imagine that we have some sort of recursive call and the JS mutex is already acquired (or/and we already on JavaScript thread). In this case, scheduling asynchronous code execution can be wasteful.
  • Allowing this possibility we open opportunities to implement the sync wrapper more efficiently.
  • Some heuristic might decide that running this particular block synchronously is cheaper and do it.
  • Again, IMHO the most strong argument for this feature is that the opposite is just unimplementable (we cannot guarantee the order of execution of some code that placed right after the call to RuntimeExecutor which is essentially what "async" execution is).

@shergin
Copy link

shergin commented Feb 4, 2020

I think we should replace std::mutex with std::recursive_mutex in executeSynchronouslyOnSameThread_CAN_DEADLOCK to make it not deadlock in case of sync callback. Another option is to check std::this_thread::get_id() but I am not sure that will work with Java threads.

@RSNara
Copy link

RSNara commented Feb 5, 2020

tl;dr: Sorry for the rambling below. I think we should do 2.

Raman: Something that is maybe synchronous, and maybe asynchronous is dangerous.

Valentin: What do you mean by that?

Suppose that we have two implementations of RuntimeExecutor: one that is optimized, and sometimes executes code synchronously, and the other, which isn't, and always executes asynchronously. Then, code that works with one implementation might not necessarily work with the other. This is not an issue if we only ever expect to have one implementation of RuntimeExecutor. Then, we could simply depend on the implementation's behaviour.

If, however, RuntimeExecutor is supposed to be a generic interface, then I think simply allowing the implementation to decided when it's async vs sync makes the RuntimeExecutor abstraction unreliable. Without knowledge of whether the work will be executed asynchronously/synchronously, we'll have to write code that works in both cases, which is hard if the work performs side-effects. Worse, if the behaviour of RuntimeExecutor is ambiguous, we might end up (accidentally) writing code that depends on a particular implementation's behaviours. This is just bad no matter how you slice it.

Imagine that we have some sort of recursive call and the JS mutex is already acquired (or/and we already on JavaScript thread). In this case, scheduling asynchronous code execution can be wasteful.

🤔... You're right. If we always make it async, then this would be problematic. Always-async runtime executor doesn't work.

So it looks like we have two options:

  1. Let RuntimeExecutor implementations decide when they're sync vs async. (Still not convinced that this is the right approach).
  2. Expand the specification of RuntimeExecutor to also include a JS thread. With this addition, we can define the set of circumstances under which all RuntimeExecutor implementations are async, and do the same for sync. Something like this, but probably more refined: "There exists a thread A that is the only thread allowed to execute JavaScript (is this true?). jsi::Runtime is always accessed on this thread. If you call RuntimeExecutor on thread A, RuntimeExecutor will always do the work synchronously. Otherwise, it should do the work at a later time on thread A."

@shergin
Copy link

shergin commented Feb 6, 2020

Suppose that we have two implementations of RuntimeExecutor: one that is optimized, and sometimes executes code synchronously, and the other, which isn't, and always executes asynchronously. Then, code that works with one implementation might not necessarily work with the other. This is not an issue if we only ever expect to have one implementation of RuntimeExecutor.

My point is that those cases are indistinguishable: if some implementation runs the code sometimes synchronously it can have exact same side-effects as always-async one and vise-versa. If some code relies on assumptions related to that, it's probably already broken.

I think the another guarantees that RuntimeExecutor should support (and supports) and what we expect from async execution is sequentiality: If same thread schedules two blocks A and B, those must be executed in the same order.

@zackargyle
Copy link

Raman: Something that is maybe synchronous, and maybe asynchronous is dangerous.

@RSNara, for what it's worth, this is already a core aspect of React itself. Calls to setState may or may not be synchronous depending on where/when they are executed. It becomes a matter of education which, for something like the core of RN, is probably less of an issue than for an API used by product engineers.

@dvicory
Copy link

dvicory commented Feb 12, 2020

I want to preface my comment by saying I've only recently done a deeper look into React Native's internals and my C++ is very rusty. But I can talk more from the perspective of an end-user, for a use case that I would want to accomplish from the proposed API.

I want to be able to delegate certain native lifecycle methods (in this instance, for WebView) to JavaScript. However, many of these implemented lifecycle methods require returning a value that dictate behavior to the underlying platform. Since we don't know the value to return synchronously, we have to fake it as a workaround. An example of one is whether to load a URL. Natively, we say to block loading it, then fire an event to JS, allow the JS to make a decision, and then JS sends the command to native to load a URL. This is completely disconnected from the original event that started it all: "should I load this URL?". However, not all lifecycle methods can be worked around in this manner due to the unique side effects that you want to produce by returning directly and synchronously, for which there is no substitute for after the fact. Subtle bugs can also be introduced with this workaround. Before JSI, I tried locking and sending the event, waiting to unlock based on the JS command to come in, but quickly found that to cause a deadlock. I set it aside thinking it wasn't really solvable with current React Native architecture.

Enter JSI: I started to dig more into JSI and was finally able to do what I wanted with the raw jsi::Runtime pointer. It allows me to get a function in the runtime, synchronously call it and get the result, and use that for the native lifecyle method to return. We haven't used it in production yet, so I haven't found issues with non-safe access yet, though I wouldn't be surprised if that were to be an issue.

I'd love to see improved APIs for working with JSI. I wanted to get my particular experience out there so that this use case can be addressed. And perhaps it is already covered by the ideas put forth here. 😅

@ejanzer
Copy link
Collaborator Author

ejanzer commented Feb 12, 2020

@dvicory Thanks for your comment! That use case is definitely one we want to support - we need to be able to synchronously call into JS (and get a return value) for proper Android back button handling, too. I think we'll support this with the explicitly synchronous API built on top of RuntimeExecutor, like @shergin described above.

@RSNara / @shergin : It seems to me that there are 3 potential problems with the sometimes-synchronous RuntimeExecutor:

  1. The synchronous abstraction will sometimes be inefficient, because it will do unnecessary locking. This seems unavoidable with this approach, although I'm not sure if it's a big deal.
  2. The synchronous abstraction will sometimes result in deadlocks, if it's called from the same thread it uses for JS. It sounds like this should be addressed by using std::recursive_mutex, hopefully?
  3. The possibility of bugs caused by functions written under the assumption that they will be called asynchronously. I've been trying to think of the ways that this could cause a problem, and I think this should only really be a problem in cases similar to the setTimeout issue - JS calls a native function, which calls a JS callback:
MyNativeModule.doSomething(() => { console.log('Done!'); });
console.log('Start');

With the always-async bridge, you can (probably?) be certain that 'Start' will always be logged before 'Done!', because the JS callback will be enqueued after the current JS execution block. With the sometimes-synchronous RuntimeExecutor, it's possible that 'Done!' will come before 'Start' if MyNativeModule decides to invoke the JS callback in the body of doSomething, if the native module is called from the JS thread.

So this is potentially a problem, if native module methods are invoked synchronously in the JS thread (although IIUC that's not the default behavior for void functions).

Outside of this example, I'm struggling to think of example of when this would cause problems for us - Java and Obj-C are both multi-threaded, so it seems to me that any code that's thread-safe should not be making assumptions about where/when JS is being executed. The only places where people might reasonably be making those assumptions are in JS.

If that's true, then the only thing we really need to sort out is the JS-native-JS issue, and there may be other ways to solve that than by changing the definition of RuntimeExecutor?

On the other hand, I'm also not sure if there's an important reason why the JS queue shouldn't be part of the spec. Even if we want to synchronously execute JS from the main thread, I can't think of a compelling reason why we we need to execute JS on the main thread. And the possibility of using thread pools, etc. doesn't really make sense here since you need exclusive access to the runtime anyway. It just seems like it would be nice to allow the provider of the RuntimeExecutor to do what they want, and not to be tied to a specific thread.

@shergin
Copy link

shergin commented Feb 13, 2020

  1. The inefficiency of sync calls: I am not sure I explicitly mentioned that, I am proposing that RuntimeExecutor might call the callback synchronously if it believes it's the most efficient way to call it. Sync call does not always mean slow/inefficient call.

  2. Deadlocks: It's implementation details of RuntimeExecutor and helper methods. I believe it's solvable. Recursive mutex is the simplest solution but probably there are other ones which are more complex ones that are a tiny bit more performant.

  3. The callback of the next tick: That's a great example. I think the actual behavior of the Module -- calling the callback sync, async or unspecified manor -- should be up to the module. If the module needs to call that sync, it can do this using executeAsynchronously helper function.

a compelling reason why we need to execute JS on the main thread

If Fabric we (will) do that to deliver some events synchronously (e.g. TextInput::onChange to build reliable controlled TextInput).

IMHO, this is all about the idea of building something small and extendable, instead of building something complex and immediately versatile. Any constraint that we impose on RuntimeExecutor should serve some need that impossible achieve otherwise.

@ejanzer
Copy link
Collaborator Author

ejanzer commented Feb 18, 2020

The inefficiency of sync calls: I am not sure I explicitly mentioned that, I am proposing that RuntimeExecutor might call the callback synchronously if it believes it's the most efficient way to call it. Sync call does not always mean slow/inefficient call.

For this example, I was thinking about executeSynchronously - if the RuntimeExecutor would have called the function synchronously anyway (presumably using locking of its own), then the locking in the implementation of executeSynchronously would be unnecessary. If we knew whether the RuntimeExecutor would call the function synchronously or asynchronously, then it seems like our implementations of executeSynchronously and executeAsynchronously could be more efficient.

@mhorowitz
Copy link

I've finally had a chance to catch up on this discussion, and I have a few thoughts.

  • I am skeptical that any analysis will be able to dynamically determine whether any call should be made synchronously or asynchronously. This sounds like the halting problem.

  • It is easier to give more functionality than to take it away.

  • It is easier to reason about simple interfaces than complex interfaces.

  • JS and C++ have exceptions, and the APIs should specify what happens when an exception bubbles to a language boundary. None of the above discussion mentions this.

  • We can iterate by adding more complex APIs if/when we determine we need them, and implement shims if needed. For example, we can say X always executes synchronously. Then later, we can say Y can execute synchronously or asynchronously. We can tell people Y is better, but if X is all there is, Y can call X, and it still works. We can't do the reverse.

  • There are actually several relevant APIs here.

    1. An API for adding support for a new JS engine.
    2. An API for C++ native module authors to call JS, or be used from JS.
    3. An API for JS code (module or app) to call native code.

    They are have different requirements, and the numbers of each will be very different, too. If we paint ourselves into a corner and need to refactor all the engine impls at some point, that is doable. If we paint ourselves into a corner and need to refactor all of the RN apps... we have a big problem. Let's try to decouple these as much as possible, at least to start off. This probably means some extra layers, which might not be ideally efficient, but we can implement richer APIs later which are more efficient, with perhaps safety or resource tradeoffs.

My intuition is we should start simple and explicit:

  • For an engine, RuntimeExecutor always executes synchronously. If the bridge wants to do stuff asynchronously, the bridge can do queue management. (This is the status quo.)
  • Native module authors get called from a thread where they can't block JS execution, and are provided an API to call into JS which is guaranteed safe, and can't mess things up. (This is more or less the status quo for the current Instance.) If this API is async, then everything gets queued (at least conceptually). If it is sync, then you need mutexes, or some more sophisticated pattern to run code safely. The existing pattern is async, so that's probably the best starting point due to familiarity. This does incur some efficiency cost.
  • JS code is inherently single-threaded, and there are already patterns JS developers are familiar with for asynchrony: function calls are always synchronous; setTimeout and Promise callback execution are never synchronous. There is not (that I know of), a standard API which is maybe-synchronous. Further, JS developers may not know what's native. So my intuition is for JS developers, the initial API just looks like a normal function, which runs and returns a value (which maybe is always undefined to start).

This all is, of course, limiting, but that's the point. As we port code over to the new architecture, we will have more understanding of additional use cases we might add:

  • We can make the RuntimeExecutor interface richer to allow coordination around locking and queuing between the engine and RN.
  • Native modules which don't want to block the calling thread can use continuation-passing without any assistance from the bridge. It falls to the native module to call the continuation function safely (if there's a safe API, then it's easy to reason about the code correctness if it's used).
  • Native modules which want to return data synchronously can define methods which are called synchronously from JS, and can return from JS. This creates the potential for blocking JS execution, which callers of this API will need to deal with.
  • We can provide a less safe API for native modules called synchronously from JS to call sync back into JS. We can choose (using multiple APIs, or static config, or dynamic config) whether it does runtime checks to catch unsafe behavior, or just executes C++ UB if the developer errs.
  • The above API could include watchdog functionality (at a cost), or not.
  • We can provide APIs for native module to create Promises. This would work well for I/O intensive modules which never want to block the calling thread, and potentially create performance benefits.

We can have clearly-labelled experimental APIs to experiment with things we really don't understand well yet.

  • We can experiment with APIs which are aware of the queue/event loop. This might include using the engine's event loop (if it has one), and might allow for writing more efficient async native method calls. RN can provide a generic event loop if the engine's is not appropriate for some reason.
  • We can experiment with APIs which give hints about latency. The obvious use case here is calling into JS from the UI thread, and wanting to minimize latency, perhaps at the cost of delaying GC, deferring queued work, etc.
  • We can experiment with maybe-sync APIs, until we find something we like.

I'm probably missing use cases.

The overall strategy here is to start with simple, safe, orthogonal APIs to make existing simple patterns work. We don't conflate different APIs. We give ourselves a path to more complex patterns which might require more coupling between layers, and/or more complex reasoning about correctness. For simple apps and modules with simple needs, we can keep the older APIs around, and old code just keeps working. Let's not try to get everything perfect from the beginning.

@grabbou
Copy link
Member

grabbou commented Feb 20, 2020

Hey,

Thank you for forming this great proposal and being so explicit about upcoming changes! I have some very basic questions around the idea presented here - I am trying to wrap my head around the latest architecture changes. I've spoken with @ejanzer on Discord and she recommended I ask these questions here - there might be other developers wondering the same, so let's go.

I guess my main question is why do we even need to write a new abstraction such as RuntimeExecutor

Our goal with this proposal is to build the simplest possible abstraction to expose safe access to a jsi::Runtime

Shouldn't it be possible to provide thread-safe access to jsi::Runtime via existing JSExecutor?

The proposal says:

Right now, the runtime is managed by a JSExecutor, which is created by a JSExecutorFactory, which is passed in to the bridge when it is constructed

and later

on Android we expose the raw pointer to the jsi::Runtime through CatalystInstance.getJavaScriptContextHolder(), which then allows us to access the jsi::Runtime without the safety of the JS message queue thread

I guess my main question is what is wrong/no-go with current architecture that would require a whole new set of classes and abstractions? I would say accessing JSExecutor that is thread-safe vs jsi::Runtime directly would be similar?

Also, I would like to understand how does the API look on iOS right now and whether it's as hacky as Android.

@ejanzer
Copy link
Collaborator Author

ejanzer commented Feb 25, 2020

@grabbou:

I guess my main question is what is wrong/no-go with current architecture that would require a whole new set of classes and abstractions? I would say accessing JSExecutor that is thread-safe vs jsi::Runtime directly would be similar?

The biggest difference between this approach and reusing JSExecutor is that using the jsi::Runtime directly gives you access to all of JSI, whereas JSExecutor exposes a (relatively) very limited API. In order for JSExecutor to be useful, we would either have to write a whole bunch of functions to handle all the different ways we plan to use the runtime in Fabric, TurboModules, and the rest of core, or we need to directly expose the jsi::Runtime. We think it makes more sense to do the latter.

You could ask why we can't just build this on top of our existing abstractions, though - why not have JSExecutor provide a RuntimeExecutor? We still might do that for backward compatibility purposes, but for the bridgeless rewrite we think it makes sense to create some new abstractions for a few reasons:

  1. It's a lot easier for us to iterate on things without having to worry about breaking production apps.
  2. Since it's likely that the full migration to bridgeless mode will take some time, having separate abstractions will make it easier for us to remove unused code from apps that have migrated to bridgeless mode without breaking other apps that still use the bridge.
  3. This rewrite is an opportunity for us to think about how we do things in RN, and what our APIs look like, without being limited by what currently exists. A lot of RN's existing infrastructure was built around the assumption that all communication between JS and native code is async, and some of those APIs might not make as much sense in a world where we can invoke JS synchronously (not that we do, yet, but we want to at some point).

Also, I would like to understand how does the API look on iOS right now and whether it's as hacky as Android.

I'm not as familiar with how this works on iOS - I think we pass around the bridge everywhere instead of the runtime pointer, which may not be as dangerous but does present its own challenges for bridgeless mode.

@ejanzer
Copy link
Collaborator Author

ejanzer commented Feb 25, 2020

@mhorowitz I think you're right - we're conflating (at least) two different things here. The API that we expose to third party developers that want to add support for a new JS VM does not have to be the same API that we expose within RN core and to third party code that needs access to the runtime - to your point, the API that we use for accessing the runtime needs to have some logic for handling exceptions, thread management, etc. That logic should probably be shared between all different JS VMs; we don't want to have to implement it multiple times, or ask third party developers to implement it for the VMs they want to support. In particular, the logic for deferring queued work - to be able to insert or prepend a high priority task - would be a pretty significant change, and probably not something we want to require third party developers to implement for the JS engines they want to use.

I think this proposal should probably be split into two different APIs:

  1. An API for adding support for a new JS engine (a synchronous RuntimeExecutor)
  2. An API to safely access the jsi::Runtime, to be used in RN core as well as some third party code (an asynchronous RuntimeExecutor)

The first API should only be used by RN core to create the abstraction for the second API. In the future, we can build additional APIs on top of these - like the ones you listed, as well as:

  • Internal APIs that synchronously (blocking) invoke a JS function and do something with its return value (e.g. back button)
  • Internal APIs that allow scheduling high priority tasks in JS (interrupt ongoing work, defer queued tasks, etc.)

For this proposal, I'd like to figure out what goes in the specs for the first two abstractions. Should they both have the same RuntimeExecutor function signature, with the understanding that one is sync and unsafe, and the other is async and safe? Or should we call them different things, for clarity? If the first one is always synchronous, should it still guarantee exclusive access via locking? Or should we rely on the second abstraction to provide exclusive access through the JS message queue thread (or whatever mechanism we want in the future)?

  1. JSEngine::RuntimeExecutor - creates the jsi::Runtime and a RuntimeExecutor that is always synchronous.
  2. ReactInstance::RuntimeExecutor - wraps the JSEngineInstance::RuntimeExecutor with one that is always async*, and handles exceptions. This is what gets used internally in RN Core, Fabric, TurboModules, etc.

*Always async because it enqueues the callable on the JS message queue thread, even when it's called from that thread - I believe this is the current behavior.

@shergin
Copy link

shergin commented Feb 27, 2020

Marc,
I think we are on the same page, and we share the same values of simplicity and iterative approach.

I do like your idea about the always-sync RuntimeExecutor by default as the simplest possible concept. That would work for me too. The reason why I advocate for unspecified behavior for RuntimeExecutor is to actually allow us to make it sync in the future. The problem is that currently, with existing dispatching infrastructure (several of them, one per platform), it's impossible to implement always-sync RuntimeExecutor efficiently, so the idea is to implement it somehow, and then build on top of that what we need. So, having RuntimeExecutor with unspecified dispatching we can have tools (functions like executeSynchronouslyOnSameThread(runtimeExecutor, callback)) that allow configuring mode of dispatching.

We need to figure out the propagation of the exception indeed. I don't have concrete ideas about that besides "let's propagate them". (I think the question is: how exactly.)

Emily,
Yeah, we need to specify the customers for that API. To me, the audience for RuntimeExecutor is the internals of a few core systems (such as UI/Fabric and TM) and maybe some very tricky 3rd party modules that need such low-level access (e.g. JS-facing native implementation for Promise).
I think the more problem-oriented APIs with custom dispatching logic will be specific to those few core system. E.g. TM will probably have some queues & callbacks tight to specific thread associated with specific modules, Fabric already has pretty complex and very specific queueing for UI events and so on.

@grabbou
Copy link
Member

grabbou commented Mar 4, 2020

Thank you @ejanzer for answering my questions! The three points that you have mentioned are exactly the context I was looking for. I really appreciate this.

@jbrodriguez
Copy link

Any news about this ? Am I wrong in thinking that this is a step stone for proper Turbo Modules on RN ? Even proper concurrent mode ? Pretty clear that external events wreak havoc on plans, I'm just kind of thinking out loud.

@ejanzer
Copy link
Collaborator Author

ejanzer commented Jun 15, 2020

@jbrodriguez No updates at the moment, I'm still tinkering with this to find the right API for what we need. This proposal isn't blocking any work on Fabric or TurboModules, which both have their own way of accessing the runtime through the bridge. We want to make this change for bridgeless RN, but that'll come after Fabric + TM.

marcinwasowicz added a commit to CommE2E/comm that referenced this issue Sep 26, 2022
Summary: In https://phab.comm.dev/D4650 we enabled informative JSError to be thrown when SQLite query fails on databaseThread of CommCoreModule. I tested it only on emulators and it worked. However on physical devices I found out that I cannot access jsi::Runtime passed to function from auxiliary thread. Context is here react-native-community/discussions-and-proposals#196. This diff updates the code so that we only catch C++ error in database thread and transform it to JSError on the main thread. This change also solves the crash described here: https://linear.app/comm/issue/ENG-1714/ashoat-experienced-5-crashes-in-one-day-on-build-142

Test Plan: Place temporary 'throw std::system_error(ECANCELED, std::generic_category(), "error");' at the begining of SQLiteWueryExecutor::getAllMessages. Build and start the app. Without this diff it will crash. With this diff it will throw informative JSError.

Reviewers: tomek, atul, jon

Reviewed By: tomek

Subscribers: ashoat, abosh

Differential Revision: https://phab.comm.dev/D4976
jakub-kedra-swm pushed a commit to CommE2E/comm that referenced this issue Sep 28, 2022
Summary: In https://phab.comm.dev/D4650 we enabled informative JSError to be thrown when SQLite query fails on databaseThread of CommCoreModule. I tested it only on emulators and it worked. However on physical devices I found out that I cannot access jsi::Runtime passed to function from auxiliary thread. Context is here react-native-community/discussions-and-proposals#196. This diff updates the code so that we only catch C++ error in database thread and transform it to JSError on the main thread. This change also solves the crash described here: https://linear.app/comm/issue/ENG-1714/ashoat-experienced-5-crashes-in-one-day-on-build-142

Test Plan: Place temporary 'throw std::system_error(ECANCELED, std::generic_category(), "error");' at the begining of SQLiteWueryExecutor::getAllMessages. Build and start the app. Without this diff it will crash. With this diff it will throw informative JSError.

Reviewers: tomek, atul, jon

Reviewed By: tomek

Subscribers: ashoat, abosh

Differential Revision: https://phab.comm.dev/D4976
@mattiaferrari02
Copy link

mattiaferrari02 commented Feb 8, 2024

Hi guys, just here to ask what's the current state of this. With JSI do we have a way to use runtime in a thread safe manner?
In my case I have a WebSocket implementation in cpp from the code that I'm using in the native module, this implementation has exposed only a method to handle its notifications passing a void * as its context. Is there a way to comunicate with the JS Runtime from there?

@maksimlya
Copy link

Hi guys, just here to ask what's the current state of this. With JSI do we have a way to use runtime in a thread safe manner? In my case I have a WebSocket implementation in cpp from the code that I'm using in the native module, this implementation has exposed only a method to handle its notifications passing a void * as its context. Is there a way to comunicate with the JS Runtime from there?

Hi, did you find a way? I am in need of something similar(receive data from tcp sockets on c++, and need to pass it to js in async way).

@mattiaferrari02
Copy link

mattiaferrari02 commented Apr 25, 2024

@maksimlya hi, in the end i found this guy on youtube that showed how to do that. Basically you have to implement your own thread pool to queue the async work, make use of the jsCallInvoker and implement manually the promise js object in the cpp side.
Here's the link, he has also his source code on github.

https://youtu.be/SC9PwcKw20o?si=gkt2K_OqrcMwUuZ4
I'm tagging also the guy @ospfranco if you want more explanation from him. His video about react native JSI are great and they are like the only source of documentation that I could find about JSI

@maksimlya
Copy link

@maksimlya hi, in the end i found this guy on youtube that showed how to do that. Basically you have to implement your own thread pool to queue the async work, make use of the jsCallInvoker and implement manually the promise js object in the cpp side. Here's the link, he has also his source code on github.

https://youtu.be/SC9PwcKw20o?si=gkt2K_OqrcMwUuZ4 I'm tagging also the guy @ospfranco if you want more explanation from him. His video about react native JSI are great and they are like the only source of documentation that I could find about JSI

Thx alot!. I've already done everything regarding thread pool and async work, just couldn't get to how to return result to js from different thread. Will check it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💡 Proposal This label identifies a proposal 👓 Transparency This label identifies a subject on which the core has been already discussing prior to the repo
Projects
None yet
Development

No branches or pull requests