-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FlutterEngineGroup
: Hot Reloads hang and Hot Restarts start isolates in different IsolateGroups
#124546
Comments
IsolateGroups
FlutterEngineGroup
: Hot Reloads hang and Hot Restarts start isolates in different IsolateGroups
If I remove the second isolate instantiation (the
|
Adds more context and separates the issue from: #119589 |
Thanks for the detailed report @dballance |
@darshankawar - a few things here: I have verified that all isolates are being restarted on a restart - just in the wrong In our usecase, we've been operating by passing a From my viewpoint - this is a regression. However, we did move to This is the rub. I don't believe that this is a tenable workaround. Isolate communication for Isolates in the same
Further - this all works appropriately at runtime. If I was to deploy this app, without utilizing the "serializing" workaround from the above comment, it will work as expected. I can stop / start the same It's only hot restarts / hot reloads that don't seem to handle |
Thanks for the detailed feedback and update. Based on the report and above comment, I am going ahead and keep this issue open for team's thoughts. |
/cc @aam @mkustermann |
So there's a few independent issues here, that I'd like to respond to.
When the Dart VM is asked to perform a hot-reload, it will hot-reload all isolates within a group. That means before and after the reload they remain in same group and continue to be able to send (mostly) arbitrary objects to each other. If it hangs, we should investigate why and find a fix for it.
My understanding of hot-restart is that it's supposed to discarding the dart application state. That means it will loose all state (including global/static variable state, ...). The engine will then start Now that would work for the The situation is more difficult if the spawning of isolates is not triggered by the Dart application (something triggered by For applications where it's possible, it's always preferred if the Dart application is the main driver of things. So If not possible it may be beneficial if the native code (e.g. Kotlin) has an API it can use to get informed of hot-restarts and then to perform some action (e.g. launching some helper isolates again). Not familiar with Flutter engine APIs to know if there's something like this. @gaaclarke may know?
It should be noted that this works fine with hot-reload. But hot restart will make old registered ports in So for communication to work after hot-restart, all parties have to re-create ports and re-register them with
The multiple_flutters/multiple_flutters_android cannot be launched via @dballance Would you mind creating a small git repo with easy to follow instructions that reproduces this? It would be quite helpful. |
🙏🏻
Yes - In our production application we have the main isolate reach out via a The service will not start another The other tidbit here is that we can't have the background isolate be started every time During dev we can probably force the issue though - since it's not the normal runtime environment. I will test this thought with the repro and provide results.
Yes, we have some novel code that handles this for us to setup the
Absolutely - will try to get to it today and will provide the link here. Greatly appreciate the team's time and effort here! |
Repro: https://github.com/dballance/flutter-issue-124546
Going to try real quick to just kill and restart the |
Well - the above hope that forcing the existing Running the repro as is produces the following: If I naively just tell the private fun startFlutterNativeView() {
// if (flutterEngine != null) return
if (flutterEngine != null) {
stopFlutterNativeView()
}
...
}
logcat crash buffer isn't helpful:
I think this might have actually worked once or twice - but I can't be sure that it did AND that a new |
@dballance Thank you very much - I can reproduce the hot-reload issue. Here's what seems to happen: When using flutter Kotlin/Java APIs to create engines, the scheduling of the engine's isolates is done by the flutter engine (in contrast to isolates spawned via Now the Dart VM hot-reload is currently implemented by sending all isolates in a group a message that they should participate in a hot-reload. When receiving such a message an isolate will tell the hot-reloading thread that it's ready and waits until reload is done. Now this works if all isolates can concurrently execute (which is the case for isolates spawned via e.g. We've had plans to make this part of hot-reloading in the Dart VM more efficient and avoid relying on sending messages, which would then also make the issue go away. I'll try to implement that. The fact that independent engine isolates are multiplexed on the same OS thread (if true - @gaaclarke may know) seems a bit concerning - as they cannot utilize multiple cores. Especially in this use case, since only one is an UI isolate and the other is a background isolate, it's hard to see why this restriction is there. |
@mkustermann - Appreciate the update here. I'm heartened to hear that this isn't expected behavior and that the usecase is one that is supported. I will note - although it may require a branch to rework back to the original setup - that there may be more information gleamed from running on 3.0.5. I'm guessing there is a plethora of engine changes in the diff between 3.0.5 and 3.7.x - but might have some more insight. Happy to help if you need anything else! |
The current hot-reload implementation [0] will perform a reload by first sending OOB messages to all isolates and waiting until those OOB messages are being handled. The handler of the OOB message will block the thread (and unschedule isolate) and notify the thread performing reload it's ready. This requires that all isolates within a group can actually run & block. This is the case for the VM implementation of isolates (as they are run an unlimited size thread pool). Though flutter seems to multiplex several engine isolates on the same OS thread. Reloading can then result in one engine isolate performing reload waiting for another to act on the OOB message (which it will not do as it's multiplexed on the same thread as the former). Now that we have a more flexible safepointing mechanism (introduced in [1]) we can utilize for hot reloading by introducing a new "reloading" safepoint level. Reload safepoints ----------------------- We introduce a new safepoint level (SafepointLevel::kGCAndDeoptAndReload). Being at a "reload safepoint" implies being at a "deopt safepoint" which implies being at a "gc safepoint". Code has to explicitly opt-into making safepoint checks participate / check into "reload safepoints" using [ReloadParticipationScope]. We do that at certain well-defined places where reload is possible (e.g. event loop boundaries, descheduling of isolates, OOM message processing, ...). While running under [NoReloadScope] we disable checking into "reload safepoints". Initiator of hot-reload ----------------------- When a mutator initiates a reload operation (e.g. as part of a `ReloadSources` `vm-service` API call) it will use a [ReloadSafepointOperationScope] to get all other mutators to a safepoint. For mutators that aren't already at a "reload safepoint", we'll notify them via an OOB message (instead of scheduling kVMInterrupt). While waiting for all mutators to check into a "reload safepoint", the thread is itself at a safepoint (as other mutators may perform lower level safepoint operations - e.g. GC, Deopt, ...) Once all mutators are at a "reload safepoint" the thread will take ownership of all safepoint levels. Other mutators ----------------------- Mutators can be at a "reload safepoint" already (e.g. isolate is not scheduled). If they try to exit safepoint they will block until the reload operation is finished. Mutators that are not at a "reload safepoint" (e.g. executing Dart or VM code) will be sent an OOB message indicating it should check into a "reload safepoint". We assume mutators make progress until they can process OOB message. Mutators may run under a [NoReloadScope] when handling the OOM message. In that case they will not check into the "reload safepoint" and simply ignore the message. To ensure the thread will eventually check-in, we'll make the destructor of [~NoReloadScope] check & send itself a new OOB message indicating reload should happen. Eventually getting the mutator to process the OOM message (which is a well-defined place where we can check into the reload safepoint). Non-isolate mutators such as the background compiler do not react to OOB messages. This means that either those mutators have to be stopped (e.g. bg compiler) before initiating a reload safepoint operation, the threads have to explicitly opt-into participating in reload safepoints or the threads have to deschedule themselves eventually. Misc ---- Owning a reload safepoint operation implies also owning the deopt & gc safepoint operation. Yet some code would like to ensure it actually runs under a [DeoptSafepointOperatoinScope]/[GCSafepointOperationScope]. => The `Thread::OwnsGCSafepoint()` handles that. While performing hot-reload we may exercise common code (e.g. kernel loader, ...) that acquires safepoint locks. Normally it's disallows to acquire safepoint locks while holding a safepoint operation (since mutators may be stopped at places where they hold locks, creating deadlock scenarios). => We explicitly opt code into participating in reload safepointing requests. Those well-defined places aren't holding safepoint locks. => The `Thread::CanAcquireSafepointLocks()` will return `true` despite owning a reload operation. (But if one also holds deopt/gc safepoint operation it will return false) Example where this matters: As part of hot-reload, we load kernel which may create new symbols. The symbol creation code may acquire the symbol lock and `InsertNewOrGet()` a symbol. This is safe as other mutators don't hold the symbol lock at reload safepoints. The same cannot be said for Deopt/GC safepoint operations - as they can interrupt code at many more places where there's no guarantee that no locks are held. [0] https://dart-review.googlesource.com/c/sdk/+/187461 [1] https://dart-review.googlesource.com/c/sdk/+/196927 Issue flutter/flutter#124546 TEST=Newly added Reload_* tests. Change-Id: I6842d7d2b284d043cc047fd702b7c5c7dd1fa3c5 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/296183 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Slava Egorov <vegorov@google.com>
It has been quite a ride but I believe I've landed now everything on the main branch needed to fix the hot-reloading issue for multi-engine flutter scenarios: |
Just now saw this comment - very exited @mkustermann! Hopefully other work will allow me to test this later this week! |
@mkustermann - I've validated that hot reload no longer hangs in VSCode when using the tip of flutter repo (via a devcontainer).
🍻 Do you know if there are any plans to address hot restart "separate isolate groups" as well? Would it be useful to open a second issue for hot restart, referencing this issue, and close this one out? |
Glad it fixes your issue, @dballance !
This part is unrelated to the Dart VM. There's a few flutter specific parts that one may want to look into as part of this:
@zanderso Could you find someone to look into those things? |
Sorry I'm not quite following what the "separate isolate groups" problem is, but if you can gather the necessary details into a new issue, I'll make sure the team discusses it. |
This issue has all the context already: The easiest way to reproduce is to go to an older version of flutter where observatory URL is still surfaced (e.g. b0d04ea) - as DevTools doesn't seem to show isolate group / isolate related information. Then follow the instructions here #124546 (comment) to run the example. Look at observatory to see there's one isolate group with two isolates inside. Perform a hot-restart and now there's two isolate groups each with one isolate inside. See more detailed pictures of this behavior in #124546 (comment) => Performing a |
Just a note here - I finally got around to implementing the iOS side. Overall, it's more or less the same as the Android side in terms of implementation. Details of how the iOS impl is structured
The behavior is the same as observed on Android. Hot restarts end up with separate isolate groups and hot reloads are fixed by the work above. With this - I got our app upgraded to 3.10.2 and happily running the the Happy to update the sample with the iOS implementation if it will help. Would love any updates on when hot restarts may be handled as well. Appreciate the effort of the team! 🍻 |
Adjusting labels for new triage process. |
Looks like most of the work here is in the engine and not the Dart VM, so removing dart:dependency label. |
I think this has to do with the call to We probably need some way to check if an @gaaclarke might know better. |
Specifically, the code here: https://github.com/flutter/engine/blob/1df6e462d333c37a11c6b46f20ae2480e75f85b3/shell/common/engine.cc#L206 seems wrong if the engine was spawned. |
Problem
When using a
FlutterEngineGroup
to provide multiple engines, both hot reload and hot restart appear broken.In my case, I use a
FlutterEngineGroup
to provide an engine much like the multiple_flutters example.I create one engine and provide the result to
provideFlutterEngine
in my mainFlutterActivity
. This utilizescreateAndRunDefaultEngine
method ofFlutterEngineGroup
to run the main engine.A secondary engine is created via
createAndRunEngine
to another entrypoint in the same bundle of code.Everything works as expected when initially starting, but:
IsolateGroup
- which impacts cross Isolate communication.Steps to Reproduce
Steps to reproduce the behavior:
flutter/samples/multiple_flutters
. In my instance, I add aFlutterEngineGroup
viaApp.kt
and then provide two flutter engines. One in myFlutterActivity
viaprovideFlutterEngine
and another in a foreground service (effectively running the isolate "headless"flutter run
IsolateGroup
in the VM Observatoryctrl-shift-r
if running viaflutter run
IsolateGroup
s in the VM Observatory (the once grouped Isolates are now running in separate groups)Optionally - to trigger a failure in Hot Reload - simply change any file and trigger a reload. The reload will hang.
Expected results: Hot Reload works, and Hot Restarts start isolates in the same
IsolateGroup
Actual results: Hot Reload hangs. Hot Restart works, but starts the isolates in different
IsolateGroup
, which is different than non-debug runtime behavior.Code sample
Adapted from the `flutter/samples/mutliple_flutters` sample.main.dart
App.kt
MainActivity.kt
ForegroundService.kt
Logs
The text was updated successfully, but these errors were encountered: