Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlutterEngineGroup: Hot Reloads hang and Hot Restarts start isolates in different IsolateGroups #124546

Open
dballance opened this issue Apr 10, 2023 · 23 comments
Labels
P2 Important issues not at the top of the work list t: hot reload Reloading code during "flutter run" team-engine Owned by Engine team triaged-engine Triaged by Engine team

Comments

@dballance
Copy link

dballance commented Apr 10, 2023

Problem

When using a FlutterEngineGroup to provide multiple engines, both hot reload and hot restart appear broken.

In my case, I use a FlutterEngineGroup to provide an engine much like the multiple_flutters example.

I create one engine and provide the result to provideFlutterEngine in my main FlutterActivity. This utilizes createAndRunDefaultEngine method of FlutterEngineGroup to run the main engine.

A secondary engine is created via createAndRunEngine to another entrypoint in the same bundle of code.

Everything works as expected when initially starting, but:

  • Saving a file and hot reloading causes hot reloading to hang
  • Running hot restart does work, but creates the isolates in two different IsolateGroup - which impacts cross Isolate communication.

Steps to Reproduce

Steps to reproduce the behavior:

  1. Setup two flutter engines much like the flutter/samples/multiple_flutters. In my instance, I add a FlutterEngineGroup via App.kt and then provide two flutter engines. One in my FlutterActivity via provideFlutterEngine and another in a foreground service (effectively running the isolate "headless"
  2. Run the app via vscode OR flutter run
  3. Observe that there is a single IsolateGroup in the VM Observatory
  4. Click "restart" in vscode or ctrl-shift-r if running via flutter run
  5. Observe that there are now two IsolateGroups in the VM Observatory (the once grouped Isolates are now running in separate groups)

Optionally - to trigger a failure in Hot Reload - simply change any file and trigger a reload. The reload will hang.

Expected results: Hot Reload works, and Hot Restarts start isolates in the same IsolateGroup

Actual results: Hot Reload hangs. Hot Restart works, but starts the isolates in different IsolateGroup, which is different than non-debug runtime behavior.

Code sample Adapted from the `flutter/samples/mutliple_flutters` sample.

main.dart

@pragma('vm:entry-point')
void main() {
   print('Main running');
   runApp(const MyApp(color: Colors.green));
}

@pragma('vm:entry-point')
void serviceMain() {
   print('Service running'); 
}

App.kt

class App : Application() {
    lateinit var engines: FlutterEngineGroup

    override fun onCreate() {
        super.onCreate()
        engines = FlutterEngineGroup(this)
    }
}

MainActivity.kt

class MainActivity: FlutterActivity() {

  override fun provideFlutterEngine(context: Context): FlutterEngine? {
    return (applicationContext as App).engines.createAndRunDefaultEngine(context);
  }


  override fun configureFlutterEngine(flutterEngine: FlutterEngine) {
    GeneratedPluginRegister.registerGeneratedPlugins(flutterEngine)
    
    // Start foreground service and isolate
    ForegroundService.startService(this@MainActivity)

  }
}

ForegroundService.kt

/// Somewhere in the Foreground Service
val appBundlePath =
            FlutterInjector.instance().flutterLoader().findAppBundlePath()

val entryPoint = DartExecutor.DartEntrypoint(appBundlePath, "package:multiple_flutter_module/main.dart", "serviceMain")

flutterEngine = (applicationContext as App).engines.createAndRunEngine(this, entryPoint)
In the below logs - I reloaded, it succeeded. Then I reloaded with a change - and `Reassembling application` is never logged. I then terminated the app (as the UI was non-responsive). Logs
[   +2 ms] Flutter run key commands.
[   +1 ms] r Hot reload. 🔥🔥🔥
[        ] R Hot restart.
[        ] h List all available interactive commands.
[        ] d Detach (terminate "flutter run" but leave application running).
[        ] c Clear the screen
[        ] q Quit (terminate the application on the device).
[        ] 💪 Running with sound null safety 💪
[   +1 ms] An Observatory debugger and profiler on Pixel 5 is available at: http://127.0.0.1:34907/UaJa0wTOFA0=/
[  +61 ms] The Flutter DevTools debugger and profiler on Pixel 5 is available at: http://127.0.0.1:9102?uri=http://127.0.0.1:34907/UaJa0wTOFA0=/
[+4348 ms] Skipping target: gen_localizations
[        ] Skipping target: gen_dart_plugin_registrant
[        ] Skipping target: _composite
[        ] complete
[        ] Performing hot reload...
[  +17 ms] Scanned through 592 files in 4ms
[        ] Syncing files to device Pixel 5...
[        ] Compiling dart to kernel with 0 updated files
[        ] Processing bundle.
[        ] <- recompile package:tip_test_isolates/main.dart 44d0762b-9ab5-4789-8c92-3ad83e05dbfa
[        ] <- 44d0762b-9ab5-4789-8c92-3ad83e05dbfa
[        ] Bundle processing done.
[  +18 ms] Updating files.
[ +128 ms] DevFS: Sync finished
[        ] Syncing files to device Pixel 5... (completed in 149ms)
[        ] Synced 0.0MB.
[  +61 ms] Reassembling application
[ +156 ms] Hot reload performed in 372ms.
[   +1 ms] Performing hot reload... (completed in 387ms)
[   +2 ms] Reloaded 0 libraries in 392ms (compile: 19 ms, reload: 0 ms, reassemble: 214 ms).
[+8372 ms] Skipping target: gen_localizations
[        ] Skipping target: gen_dart_plugin_registrant
[        ] Skipping target: _composite
[        ] complete
[        ] Performing hot reload...
[  +13 ms] Scanned through 592 files in 4ms
[        ] Syncing files to device Pixel 5...
[   +2 ms] Compiling dart to kernel with 1 updated files
[        ] Processing bundle.
[        ] <- recompile package:tip_test_isolates/main.dart b2190e91-93d1-4657-8306-e09af61ea038
[        ] package:tip_test_isolates/foo.dart
[        ] <- b2190e91-93d1-4657-8306-e09af61ea038
[        ] Bundle processing done.
[  +70 ms] Updating files.
[  +20 ms] DevFS: Sync finished
[        ] Syncing files to device Pixel 5... (completed in 94ms)
[        ] Synced 0.0MB.
[+4405 ms] Application finished.
No issues found! (ran in 1.0s)
[✓] Flutter (Channel stable, 3.7.10, on Linux Mint 20.3 5.4.0-146-generic, locale en_US.UTF-8)
    • Flutter version 3.7.10 on channel stable at /opt/flutter
    • Upstream repository https://github.com/flutter/flutter.git
    • Framework revision 4b12645012 (7 days ago), 2023-04-03 17:46:48 -0700
    • Engine revision ec975089ac
    • Dart version 2.19.6
    • DevTools version 2.20.1

[✓] Android toolchain - develop for Android devices (Android SDK version 30.0.3)
    • Android SDK at /home/dballance/Android/Sdk
    • Platform android-33, build-tools 30.0.3
    • ANDROID_SDK_ROOT = /home/dballance/Android/Sdk
    • Java binary at: /opt/android-studio/jbr/bin/java
    • Java version OpenJDK Runtime Environment (build 11.0.15+0-b2043.56-8887301)
    • All Android licenses accepted.

[✓] Chrome - develop for the web
    • CHROME_EXECUTABLE = /opt/brave.com/brave/brave-browser

[✗] Linux toolchain - develop for Linux desktop
    ✗ clang++ is required for Linux development.
      It is likely available from your distribution (e.g.: apt install clang), or can be downloaded from https://releases.llvm.org/
    • cmake version 3.16.3
    • ninja version 1.10.0
    • pkg-config version 0.29.1

[✓] Android Studio (version 2022.1)
    • Android Studio at /opt/android-studio
    • Flutter plugin version 73.0.1
    • Dart plugin version 221.6103.1
    • Java version OpenJDK Runtime Environment (build 11.0.15+0-b2043.56-8887301)

[✓] VS Code (version 1.77.1)
    • VS Code at /usr/share/code
    • Flutter extension version 3.62.0

[✓] Connected device (3 available)
    • Pixel 5 (mobile) • 13291FDD4000UB • android-arm64  • Android 13 (API 33)
    • Linux (desktop)  • linux          • linux-x64      • Linux Mint 20.3 5.4.0-146-generic
    • Chrome (web)     • chrome         • web-javascript • Brave Browser 112.1.50.114

[✓] HTTP Host Availability
    • All required HTTP hosts are available

! Doctor found issues in 1 category.
@dballance dballance changed the title FlutterEngineGroup: Hot Reloads hang and Hot Restarts start isolates in different IsolateGroups FlutterEngineGroup: Hot Reloads hang and Hot Restarts start isolates in different IsolateGroups Apr 10, 2023
@dballance
Copy link
Author

dballance commented Apr 10, 2023

If I remove the second isolate instantiation (the createAndRunEngine instantiated entrypoint) - reload works appropriately.

[        ] Performing hot reload...
[  +37 ms] Scanned through 592 files in 23ms
[   +1 ms] Syncing files to device Pixel 5...
[   +5 ms] Compiling dart to kernel with 1 updated files
[        ] Processing bundle.
[        ] <- recompile package:tip_test_isolates/main.dart 6ece343c-3c3a-44ff-9187-30ce6304037a
[        ] package:tip_test_isolates/foo.dart
[        ] <- 6ece343c-3c3a-44ff-9187-30ce6304037a
[        ] Bundle processing done.
[  +34 ms] Updating files.
[  +25 ms] DevFS: Sync finished
[        ] Syncing files to device Pixel 5... (completed in 67ms)
[        ] Synced 0.0MB.
[ +164 ms] <- accept
[   +2 ms] reloaded 2 of 825 libraries
[  +37 ms] Reassembling application
[ +114 ms] Hot reload performed in 411ms.
[   +1 ms] Performing hot reload... (completed in 427ms)
[        ] Reloaded 2 of 825 libraries in 435ms (compile: 36 ms, reload: 167 ms, reassemble: 150 ms).

@dballance
Copy link
Author

Adds more context and separates the issue from: #119589

@darshankawar darshankawar added the in triage Presently being triaged by the triage team label Apr 11, 2023
@darshankawar
Copy link
Member

Thanks for the detailed report @dballance
Please check this related / similar issue and underlying comments which might help for further reference:

#114005 (comment)
#114005 (comment)
#114005 (comment)

@darshankawar darshankawar added the waiting for customer response The Flutter team cannot make further progress on this issue until the original reporter responds label Apr 11, 2023
@dballance
Copy link
Author

@darshankawar - a few things here:

#114005 (comment)

I have verified that all isolates are being restarted on a restart - just in the wrong IsolateGroup. Also, hot reload doesn't complete. So, unable to verify anything assumed in this comment - simply because the reload doesn't work.

#114005 (comment)

In our usecase, we've been operating by passing a SendPort pair via IsolateNameServer for ~3 years now. One isolate acts as a "server" and advertises a "connect" SendPort used to initiate a connection. But then both Isolates place a "connection specific" SendPort into the IsolateNameServer to create an IsolateChannel to allow a stream of events between both isolates. This works fine in 3.0.5, and is broken (in with hot restarts and hot reloads) in development ONLY in 3.7.x+. If I start the app in a release build - everything works appropriately. So the issue lies in flutter_tools and / or dart-lang/sdk and how it hot reloads and hot restarts isolates from the same FlutterEngineGroup.

From my viewpoint - this is a regression.

However, we did move to FlutterEngineGroup with 3.7.x since the previously working method of simply starting another FlutterEngine no longer starts Isolates in the same IsolateGroup.

#114005 (comment)

This is the rub. I don't believe that this is a tenable workaround. Isolate communication for Isolates in the same IsolateGroup has clearly defined constraints and benefits around memory. See: https://api.dart.dev/stable/2.19.6/dart-isolate/SendPort/send.html

If the sender and receiver isolate share the same code (e.g. isolates created via Isolate.spawn), the transitive object graph of message can contain any object, with the following exceptions:

Further - this all works appropriately at runtime. If I was to deploy this app, without utilizing the "serializing" workaround from the above comment, it will work as expected. I can stop / start the same MainActivity hosted Isolate over and over and it will correctly be started again in the same IsolateGroup.

It's only hot restarts / hot reloads that don't seem to handle FlutterEngineGroup semantics and start the Isolates again in the same IsolateGroup.

@github-actions github-actions bot removed the waiting for customer response The Flutter team cannot make further progress on this issue until the original reporter responds label Apr 11, 2023
@darshankawar
Copy link
Member

Thanks for the detailed feedback and update. Based on the report and above comment, I am going ahead and keep this issue open for team's thoughts.

@darshankawar darshankawar added engine flutter/engine repository. See also e: labels. dependency: dart Dart team may need to help us t: hot reload Reloading code during "flutter run" platform-android Android applications specifically and removed in triage Presently being triaged by the triage team labels Apr 12, 2023
@mraleph
Copy link
Member

mraleph commented Apr 12, 2023

/cc @aam @mkustermann

@mkustermann
Copy link
Member

So there's a few independent issues here, that I'd like to respond to.

Hot Reload hangs.

When the Dart VM is asked to perform a hot-reload, it will hot-reload all isolates within a group. That means before and after the reload they remain in same group and continue to be able to send (mostly) arbitrary objects to each other.

If it hangs, we should investigate why and find a fix for it.

Hot Restart works, but starts the isolates in different IsolateGroup, which is different than non-debug runtime behavior.

My understanding of hot-restart is that it's supposed to discarding the dart application state. That means it will loose all state (including global/static variable state, ...). The engine will then start main() again (based on the newest source code). As part of this it would make sense that all isolates of an isolate group are killed - as they are also application state.

Now that would work for the main() function and isolates it spawns. It would start again and possibly start any isolates it needs.

The situation is more difficult if the spawning of isolates is not triggered by the Dart application (something triggered by main()) - but rather by native code (e.g. Kotlin) - as that is not restarted with hot-restarts. So it wouldn't know that a hot-restart has killed old isolates and it may need to perform a (re)start of any isolates it needs.

For applications where it's possible, it's always preferred if the Dart application is the main driver of things. So main() may start isolates itself, it may even call out to e.g. Kotlin which then may start isolates. Using this approach, ensures that if a hot-restart will invoke main() again, everything else will also be setup correctly.

If not possible it may be beneficial if the native code (e.g. Kotlin) has an API it can use to get informed of hot-restarts and then to perform some action (e.g. launching some helper isolates again). Not familiar with Flutter engine APIs to know if there's something like this. @gaaclarke may know?

In our usecase, we've been operating by passing a SendPort pair via IsolateNameServer for ~3 years now. One isolate acts as a "server" and advertises a "connect" SendPort used to initiate a connection. But then both Isolates place a "connection specific" SendPort into the IsolateNameServer to create an IsolateChannel to allow a stream of events between both isolates.

It should be noted that this works fine with hot-reload. But hot restart will make old registered ports in IsolateNameServer not work anymore, since hot-restart will clear out any application state, including any open ports, and re-launching the application code, which will make new send ports that aren't in the IsolateNameServer.

So for communication to work after hot-restart, all parties have to re-create ports and re-register them with IsolateNameServer.

Steps to reproduce the behavior:

Setup two flutter engines much like the flutter/samples/multiple_flutters.
...
Run the app via vscode OR flutter run

The multiple_flutters/multiple_flutters_android cannot be launched via flutter run - it seems it has to be launched via e.g. Android Studio.

@dballance Would you mind creating a small git repo with easy to follow instructions that reproduces this? It would be quite helpful.

@dballance
Copy link
Author

If it hangs, we should investigate why and find a fix for it.

🙏🏻

My understanding of hot-restart is that it's supposed to discarding the dart application state. That means it will loose all state (including global/static variable state, ...). The engine will then start main() again (based on the newest source code). As part of this it would make sense that all isolates of an isolate group are killed - as they are also application state.

Now that would work for the main() function and isolates it spawns. It would start again and possibly start any isolates it needs.

The situation is more difficult if the spawning of isolates is not triggered by the Dart application (something triggered by main()) - but rather by native code (e.g. Kotlin) - as that is not restarted with hot-restarts. So it wouldn't know that a hot-restart has killed old isolates and it may need to perform a (re)start of any isolates it needs.

For applications where it's possible, it's always preferred if the Dart application is the main driver of things. So main() may start isolates itself, it may even call out to e.g. Kotlin which then may start isolates. Using this approach, ensures that if a hot-restart will invoke main() again, everything else will also be setup correctly.

If not possible it may be beneficial if the native code (e.g. Kotlin) has an API it can use to get informed of hot-restarts and then to perform some action (e.g. launching some helper isolates again). Not familiar with Flutter engine APIs to know if there's something like this. @gaaclarke may know?

Yes - In our production application we have the main isolate reach out via a MethodChannel to start the background isolate that "lives" within an Android ForegroundService. We used to do this via passing a Dart Callback id from Dart > Kotlin over the method channel. However, in our effort to move from 3.0.5 to 3.7.x (small team - had big deliverables - so just getting to it now), I think we can simplify to the newer FlutterEngineGroup entrypoint config. Would look like this - val entryPoint = DartExecutor.DartEntrypoint(appBundlePath, "package:multiple_flutter_module/main.dart", "serviceMain") - and I think would remove the need to utilize the dart callback being passed.

The service will not start another FlutterEngine if it already has a running one - so that could be preventing the isolate from being restarted. This worked previously but may not work now when utilizing a FlutterEngineGroup to build / start the FlutterEngine. I had a thought while typing this that the callback changing (dart recompiled, so a new ref) might have spun up the FlutterEngine again - but the above guard discarded this as an option in my mind.

The other tidbit here is that we can't have the background isolate be started every time MainActivity starts (which launches the "main" isolate) - because the background isolate may be in the middle of important processing that shouldn't be canceled / restarted. For context - the app here is a "Push-to-Talk" app that acts like a radio for First Responders. A typical flow is hearing audio while backgrounded (background isolate is connected and processing an ongoing Push-to-Talk call), and then opening the UI to press a "Push to Talk" button to talk back. If we restarted the background isolate again when the UI was opened - it would interrupt the ongoing call handling.

During dev we can probably force the issue though - since it's not the normal runtime environment. I will test this thought with the repro and provide results.

It should be noted that this works fine with hot-reload. But hot restart will make old registered ports in IsolateNameServer not work anymore, since hot-restart will clear out any application state, including any open ports, and re-launching the application code, which will make new send ports that aren't in the IsolateNameServer.

So for communication to work after hot-restart, all parties have to re-create ports and re-register them with IsolateNameServer.

Yes, we have some novel code that handles this for us to setup the IsolateChannel. I'm unsure that I want to include it in the repro below simply because I don't want it to muddy the waters for y'all - but will be happy to share it. The gist is that we do clean up the IsolateNameServer when launching the background isolate - and the connection over the IsolateChannel uses a new ReceivePort / SendPort pair if the connection is ever interrupted - such as the "foreground isolate" stopping because the user closed the MainActivity.

@dballance Would you mind creating a small git repo with easy to follow instructions that reproduces this? It would be quite helpful.

Absolutely - will try to get to it today and will provide the link here.

Greatly appreciate the team's time and effort here!

@dballance
Copy link
Author

dballance commented Apr 13, 2023

Repro: https://github.com/dballance/flutter-issue-124546

  • Tapping to count sends the count, wrapped in an arbitrary object, to the background isolate.
  • An "ack" is sent back to the main isolate when the background has processed the value - showing bidirectional comms.
  • You can "terminate" the UI by swiping up from the running apps tray, and the background will continue to tick and report it's current value every 5 secs. This will continue to work (open and terminate as many times as you want - everything keeps running) until you either hot reload or hot restart.
  • Hot reload will hang if you uncomment the log statement in main, save, and trigger a reload (r if using flutter run)
  • Hot restart will succeed, but the Isolates are started in different IsolateGroups and we get the below:
E/flutter (17884): [ERROR:flutter/runtime/dart_vm_initializer.cc(41)] Unhandled Exception: Invalid argument: is a regular instance: Instance of 'Value'
E/flutter (17884): #0      _SendPort._sendInternal (dart:isolate-patch/isolate_patch.dart:249:43)
E/flutter (17884): #1      _SendPort.send (dart:isolate-patch/isolate_patch.dart:230:5)
E/flutter (17884): #2      main (package:demo/main.dart:40:12)
E/flutter (17884): <asynchronous suspension>

Going to try real quick to just kill and restart the ForegroundService hosted FlutterEngine on restarts to see if that changes the isolate spinup.

@dballance
Copy link
Author

dballance commented Apr 13, 2023

Well - the above hope that forcing the existing FlutterEngine to shut down on restart in debug isn't looking promising. Took some screenshots of the Observatory when running normally and then the logs / crash when changing to close the existing FlutterEngine and open another.

Running the repro as is produces the following:

  • On Boot
    Screenshot from 2023-04-12 23-47-43

  • After Hot Restart
    Screenshot from 2023-04-12 23-48-15

If I naively just tell the ForegroundService flutter engine to stop, on hot restart it hangs and eventually crashes.

 private fun startFlutterNativeView() {
    // if (flutterEngine != null) return
    if (flutterEngine != null) {
        stopFlutterNativeView()
    }
  ...
 }
[ +429 ms] I/flutter (23064): INFO · 2023-04-13 00:09:30.400476 · Main · Started
[        ] I/flutter (23064): INFO · 2023-04-13 00:09:30.408270 · Main · Starting background isolate
[  +11 ms] I/MainActivity(23064): Starting service
[        ] I/ForegroundService(23064): Starting foreground service.
[        ] I/ForegroundService(23064): Stopping foreground service isolate.
[+20383 ms] I/om.example.demo(23064): Thread[6,tid=23070,WaitingInMainSignalCatcherLoop,Thread*=0xb400007b93d89a60,peer=0x135006c8,"Signal Catcher"]: reacting to signal 3
[        ] I/om.example.demo(23064): 
[ +207 ms] I/om.example.demo(23064): Wrote stack traces to tombstoned
[+1947 ms] F/crash_dump64(23130): crash_dump.cpp:485] failed to attach to thread 626: Permission denied
[  +23 ms] F/crash_dump64(23133): crash_dump.cpp:485] failed to attach to thread 640: Permission denied
[+3705 ms] Service protocol connection closed.
[        ] Lost connection to device.
[   +1 ms] DevFS: Deleting filesystem on the device (file:///data/user/0/com.example.demo/code_cache/demoYAAOCD/demo/)
[ +253 ms] Ignored error while cleaning up DevFS: TimeoutException after 0:00:00.250000: Future not completed
[   +7 ms] "flutter run" took 34,391ms.
[  +71 ms] ensureAnalyticsSent: 63ms
[        ] Running 1 shutdown hook
[   +8 ms] Shutdown hooks complete
[        ] exiting with code 0

logcat crash buffer isn't helpful:

04-13 00:09:52.985   640   640 F libc    : crash_dump helper failed to exec, or was killed

I think this might have actually worked once or twice - but I can't be sure that it did AND that a new IsolateGroup was established properly with both Isolates.

@mkustermann
Copy link
Member

Repro: https://github.com/dballance/flutter-issue-124546

@dballance Thank you very much - I can reproduce the hot-reload issue.

Here's what seems to happen: When using flutter Kotlin/Java APIs to create engines, the scheduling of the engine's isolates is done by the flutter engine (in contrast to isolates spawned via Isolate.spawn() API - which are managed by the Dart VM). Now it appears that the flutter engine implementation will not allow those engine isolates to execute in parallel: they seem to be multiplexed on the same OS thread (i.e. only one isolate can run at a given time).

Now the Dart VM hot-reload is currently implemented by sending all isolates in a group a message that they should participate in a hot-reload. When receiving such a message an isolate will tell the hot-reloading thread that it's ready and waits until reload is done. Now this works if all isolates can concurrently execute (which is the case for isolates spawned via e.g. Isolate.spawn(), but doesn't seem to be the case for engine isolates)

We've had plans to make this part of hot-reloading in the Dart VM more efficient and avoid relying on sending messages, which would then also make the issue go away. I'll try to implement that.

The fact that independent engine isolates are multiplexed on the same OS thread (if true - @gaaclarke may know) seems a bit concerning - as they cannot utilize multiple cores. Especially in this use case, since only one is an UI isolate and the other is a background isolate, it's hard to see why this restriction is there.

@dballance
Copy link
Author

dballance commented Apr 14, 2023

@mkustermann - Appreciate the update here. I'm heartened to hear that this isn't expected behavior and that the usecase is one that is supported.

I will note - although it may require a branch to rework back to the original setup - that there may be more information gleamed from running on 3.0.5. I'm guessing there is a plethora of engine changes in the diff between 3.0.5 and 3.7.x - but might have some more insight.

Happy to help if you need anything else!

@mkustermann mkustermann self-assigned this Apr 17, 2023
@chinmaygarde chinmaygarde added the P1 High-priority issues at the top of the work list label Apr 17, 2023
copybara-service bot pushed a commit to dart-lang/sdk that referenced this issue Apr 21, 2023
The current hot-reload implementation [0] will perform a reload by
first sending OOB messages to all isolates and waiting until those OOB
messages are being handled. The handler of the OOB message will block
the thread (and unschedule isolate) and notify the thread performing
reload it's ready.

This requires that all isolates within a group can actually run & block.
This is the case for the VM implementation of isolates (as they are
run an unlimited size thread pool).

Though flutter seems to multiplex several engine isolates on the same OS
thread. Reloading can then result in one engine isolate performing
reload waiting for another to act on the OOB message (which it will not
do as it's multiplexed on the same thread as the former).

Now that we have a more flexible safepointing mechanism (introduced in
[1]) we can utilize for hot reloading by introducing a new "reloading"
safepoint level.

Reload safepoints
-----------------------

We introduce a new safepoint level (SafepointLevel::kGCAndDeoptAndReload).

Being at a "reload safepoint" implies being at a "deopt safepoint"
which implies being at a "gc safepoint".

Code has to explicitly opt-into making safepoint checks participate /
check into "reload safepoints" using [ReloadParticipationScope]. We do
that at certain well-defined places where reload is possible (e.g. event
loop boundaries, descheduling of isolates, OOM message processing, ...).

While running under [NoReloadScope] we disable checking into "reload
safepoints".

Initiator of hot-reload
-----------------------

When a mutator initiates a reload operation (e.g. as part of a
`ReloadSources` `vm-service` API call) it will use a
[ReloadSafepointOperationScope] to get all other mutators to a
safepoint.

For mutators that aren't already at a "reload safepoint", we'll
notify them via an OOB message (instead of scheduling kVMInterrupt).

While waiting for all mutators to check into a "reload safepoint", the
thread is itself at a safepoint (as other mutators may perform lower
level safepoint operations - e.g. GC, Deopt, ...)

Once all mutators are at a "reload safepoint" the thread will take
ownership of all safepoint levels.

Other mutators
-----------------------

Mutators can be at a "reload safepoint" already (e.g. isolate is not
scheduled). If they try to exit safepoint they will block until the
reload operation is finished.

Mutators that are not at a "reload safepoint" (e.g. executing Dart or VM
code) will be sent an OOB message indicating it should check into a
"reload safepoint". We assume mutators make progress until they can
process OOB message.

Mutators may run under a [NoReloadScope] when handling the OOM message.
In that case they will not check into the "reload safepoint" and simply
ignore the message. To ensure the thread will eventually check-in,
we'll make the destructor of [~NoReloadScope] check & send itself a new OOB
message indicating reload should happen. Eventually getting the mutator
to process the OOM message (which is a well-defined place where we can
check into the reload safepoint).

Non-isolate mutators such as the background compiler do not react to OOB
messages. This means that either those mutators have to be stopped (e.g.
bg compiler) before initiating a reload safepoint operation, the
threads have to explicitly opt-into participating in reload safepoints
or the threads have to deschedule themselves eventually.

Misc
----

Owning a reload safepoint operation implies also owning the deopt &
gc safepoint operation. Yet some code would like to ensure it actually
runs under a [DeoptSafepointOperatoinScope]/[GCSafepointOperationScope].
=> The `Thread::OwnsGCSafepoint()` handles that.

While performing hot-reload we may exercise common code (e.g. kernel
loader, ...) that acquires safepoint locks. Normally it's disallows to
acquire safepoint locks while holding a safepoint operation (since
mutators may be stopped at places where they hold locks, creating
deadlock scenarios).
=> We explicitly opt code into participating in reload safepointing
requests. Those well-defined places aren't holding safepoint locks.
=> The `Thread::CanAcquireSafepointLocks()` will return `true` despite
owning a reload operation. (But if one also holds deopt/gc safepoint
operation it will return false)

Example where this matters: As part of hot-reload, we load kernel which
may create new symbols. The symbol creation code may acquire the symbol
lock and `InsertNewOrGet()` a symbol. This is safe as other mutators
don't hold the symbol lock at reload safepoints. The same cannot be said
for Deopt/GC safepoint operations - as they can interrupt code at many
more places where there's no guarantee that no locks are held.

[0] https://dart-review.googlesource.com/c/sdk/+/187461
[1] https://dart-review.googlesource.com/c/sdk/+/196927

Issue flutter/flutter#124546

TEST=Newly added Reload_* tests.

Change-Id: I6842d7d2b284d043cc047fd702b7c5c7dd1fa3c5
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/296183
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
@mkustermann
Copy link
Member

It has been quite a ride but I believe I've landed now everything on the main branch needed to fix the hot-reloading issue for multi-engine flutter scenarios:

@dballance
Copy link
Author

Just now saw this comment - very exited @mkustermann! Hopefully other work will allow me to test this later this week!

@dballance
Copy link
Author

@mkustermann - I've validated that hot reload no longer hangs in VSCode when using the tip of flutter repo (via a devcontainer).

[✓] Flutter (Channel master, 3.11.0-1.0.pre.43, on Ubuntu 22.04.2 LTS 5.4.0-147-generic, locale C.UTF-8)
    • Flutter version 3.11.0-1.0.pre.43 on channel master at /home/vscode/flutter
    • Upstream repository https://github.com/flutter/flutter.git
    • Framework revision 15386d9d17 (67 minutes ago), 2023-05-03 21:35:35 -0400
    • Engine revision ee65be7d8f
    • Dart version 3.1.0 (build 3.1.0-70.0.dev)
    • DevTools version 2.23.1

🍻

Do you know if there are any plans to address hot restart "separate isolate groups" as well? Would it be useful to open a second issue for hot restart, referencing this issue, and close this one out?

@mkustermann
Copy link
Member

I've validated that hot reload no longer hangs in VSCode when using the tip of flutter repo (via a devcontainer).

Glad it fixes your issue, @dballance !

Do you know if there are any plans to address hot restart "separate isolate groups" as well? Would it be useful to open a second issue for hot restart, referencing this issue, and close this one out?

This part is unrelated to the Dart VM. There's a few flutter specific parts that one may want to look into as part of this:

  • Multiple flutter engine isolates seem to be multiplexed on the same thread and can therefore not utilize multi-core HW (even if only one of them is a UI isolate - as it seems to be the case here)
  • The Java/Kotlin/ObjC/Swift embedding APIs of Flutter Engine may need to expose APIs in development mode that allow native code to react to hot restart events (as native code may need to re-initialize various state on hot-restarts)
  • Restarting a flutter app should probably not only shutdown the UI isolate but the entire isolate group (and all isolates within it). Unclear if it does that atm.
  • It may also make sense to clear various global state (e.g. isolate name server registry, plugin callback registry)

@zanderso Could you find someone to look into those things?

@mkustermann mkustermann removed their assignment May 10, 2023
@zanderso
Copy link
Member

Sorry I'm not quite following what the "separate isolate groups" problem is, but if you can gather the necessary details into a new issue, I'll make sure the team discusses it.

@mkustermann
Copy link
Member

Sorry I'm not quite following what the "separate isolate groups" problem is, but if you can gather the necessary details into a new issue, I'll make sure the team discusses it.

This issue has all the context already:

The easiest way to reproduce is to go to an older version of flutter where observatory URL is still surfaced (e.g. b0d04ea) - as DevTools doesn't seem to show isolate group / isolate related information.

Then follow the instructions here #124546 (comment) to run the example. Look at observatory to see there's one isolate group with two isolates inside. Perform a hot-restart and now there's two isolate groups each with one isolate inside. See more detailed pictures of this behavior in #124546 (comment)

=> Performing a hot restart results in one isolate running new code (application code when restart was issued) and another isolate running old code (application code before the restart), which is probably not what a hot restart is supposed to do.
=> The fact that the isolates run different application code means they cannot properly communicate with each other (application code before restart may have different classes/functions/... than the code after restart)

@dballance
Copy link
Author

Just a note here - I finally got around to implementing the iOS side. Overall, it's more or less the same as the Android side in terms of implementation.

Details of how the iOS impl is structured

In my FlutterAppDelegate - I create FlutterEngineGroup and utilize the init method on my impl to establish two FlutterEngines.

I then reference my UIApplication inside a custom FlutterViewController init block, get a reference to the engine - run it - and pass it to the super.init(engine: manager.engine, nibName: nil, bundle: nil) call. Utilizing the required coder init callback, I then call this view controller's init to get everything in place.

I then set the ViewController as the viewcontroller for main scene.

From here I continue on as normal and setup everything else in the didFinishLaunchingWithOptions callback.

This works almost exactly how the Android implementation works and is probably a little cleaner than the previous implementation used in Flutter 3.0.5.

The behavior is the same as observed on Android. Hot restarts end up with separate isolate groups and hot reloads are fixed by the work above.

With this - I got our app upgraded to 3.10.2 and happily running the the beta channel for dev to utilize hot reloads.

Happy to update the sample with the iOS implementation if it will help. Would love any updates on when hot restarts may be handled as well. Appreciate the effort of the team! 🍻

@flutter-triage-bot flutter-triage-bot bot added multiteam-retriage-candidate team-android Owned by Android platform team triaged-android Triaged by Android platform team labels Jul 8, 2023
@a-siva a-siva added triaged-engine Triaged by Engine team team-engine Owned by Engine team and removed team-android Owned by Android platform team triaged-android Triaged by Android platform team triaged-engine Triaged by Engine team labels Aug 1, 2023
@zanderso zanderso added P2 Important issues not at the top of the work list and removed P1 High-priority issues at the top of the work list engine flutter/engine repository. See also e: labels. platform-android Android applications specifically labels Aug 1, 2023
@zanderso
Copy link
Member

zanderso commented Aug 1, 2023

Adjusting labels for new triage process.

@a-siva
Copy link
Contributor

a-siva commented Jan 25, 2024

Looks like most of the work here is in the engine and not the Dart VM, so removing dart:dependency label.

@dnfield
Copy link
Contributor

dnfield commented Feb 12, 2024

I think this has to do with the call to RuntimeController::Clone instead of RuntimeController::Spawn in Engine::Restart.

We probably need some way to check if an Engine object was spawned from another or not, and if it was we need to Spawn the runtime controller instead of cloning it when restarting.

@gaaclarke might know better.

@dnfield
Copy link
Contributor

dnfield commented Feb 12, 2024

Specifically, the code here: https://github.com/flutter/engine/blob/1df6e462d333c37a11c6b46f20ae2480e75f85b3/shell/common/engine.cc#L206 seems wrong if the engine was spawned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Important issues not at the top of the work list t: hot reload Reloading code during "flutter run" team-engine Owned by Engine team triaged-engine Triaged by Engine team
Projects
None yet
Development

No branches or pull requests

9 participants