Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine should not load the VM service using the application's snapshot #91382

Closed
rmacnak-google opened this issue Oct 6, 2021 · 14 comments · Fixed by flutter/engine#29245
Closed
Assignees
Labels
c: performance Relates to speed or footprint issues (see "perf:" labels) customer: money (g3) engine flutter/engine repository. See also e: labels. P0 Critical issues such as a build break or regression perf: memory Performance issues related to memory perf: startup Performance issues related to app startup time waiting for PR to land (fixed) A fix is in flight

Comments

@rmacnak-google
Copy link
Contributor

rmacnak-google commented Oct 6, 2021

Internal: b/202972820

Currently to load the vm-service isolate, the engine uses the application's snapshot. This has a number of drawbacks:

  • The vm-service code is included in the application snapshot
  • The engine must wait for the first application to be known before it can initialize the Dart VM
  • The memory footprint of the vm-service isolate is as high as the initial memory footprint of the application. In the limit, the memory footprint of large applications is 2x as large as is needed

The engine should instead build a snapshot specifically for the vm-service, and either link it into the engine or bundle it as another asset. Cf. the "dart_runner" embedder in Fuchsia which already made this change.

@chinmaygarde

@rmacnak-google rmacnak-google added engine flutter/engine repository. See also e: labels. c: performance Relates to speed or footprint issues (see "perf:" labels) perf: memory Performance issues related to memory perf: startup Performance issues related to app startup time labels Oct 6, 2021
@zanderso
Copy link
Member

@jason-simmons Do you have cycles to take a look at this?

@chinmaygarde
Copy link
Member

@zanderso Is this prioritized correctly at P2? This should not affect release modes.

@zanderso
Copy link
Member

@chinmaygarde My understanding from @a-siva is that this is blocking investigation of release-mode memory issues for customer: money. @a-siva Do I have that right?

@chinmaygarde
Copy link
Member

Ah, got it. Sounds good to me.

@a-siva
Copy link
Contributor

a-siva commented Oct 12, 2021

@chinmaygarde My understanding from @a-siva is that this is blocking investigation of release-mode memory issues for customer: money. @a-siva Do I have that right?

Not release-mode but currently a number of memory footprint benchmarks/investigation is being done in profile mode and reducing the service isolate overhead would be a good thing.

@dnfield
Copy link
Contributor

dnfield commented Oct 12, 2021

For a little more context, running with the service isolate on lower end devices is consuming so much memory that the devices are grinding to a halt/oom killing things that are necessary for profiling/benchmarking/etc.

@chinmaygarde
Copy link
Member

This is just for my clarification, but I was under the impression that the snapshots were immutable memory mapped arenas. It seems like there are copies going on under the hood after isolate initialization. Is this the case? If it is, can the engine release the memory mapped arenas after isolate launch? We keep these around till isolate shutdown now (based on my old assumption).

@rmacnak-google
Copy link
Contributor Author

There are indeed some copies going on.

Snapshots are immutable, but they are a mixture of (A) memory that is used directly at the address the snapshot is loaded and (B) memory that is deserialized into the Dart heap. (A) cannot be released until all isolates in the group have shut down. In principle B can mostly be released after isolate initialization, but if isolate groups are not enabled (B) will be accessed again on Isolate.spawn, if split snapshots are used parts of (B) are accessed again, and certain typed data objects have their backing stores scattered in (B). If the VM could be promised the snapshot is file-backed, it could madvise(DONT_NEED) the relevant regions of (B), but without this promise the madvise is destructive. I suspect this is always true in the engine, but it is not always true in the command line VM.

When both the main and service isolates are loaded from the same snapshot, (A) is shared but the deserialization of (B) is not. For customer: money, the extra deserialization of (B) from using the main snapshot to load the service isolate is much greater (40+MB) than the lost sharing of (A) that would occur from separate service snapshot (~2-3MB).

@dnfield
Copy link
Contributor

dnfield commented Oct 13, 2021

I suspect this is always true in the engine

Is there a bug tracking making this happen for the engine? Sounds like it would save some memory usage.

@a-siva
Copy link
Contributor

a-siva commented Oct 13, 2021

I suspect this is always true in the engine

Is there a bug tracking making this happen for the engine? Sounds like it would save some memory usage.

This is something @rmacnak-google is proposing, we are going to try a change locally on the app mentioned above to see if it works and has an impact on the RSS at which point I will open an issue to track it.

@rmacnak-google
Copy link
Contributor Author

Is there a bug tracking making this happen for the engine? Sounds like it would save some memory usage.

dart-lang/sdk#44019

@chinmaygarde
Copy link
Member

The engine cannot guarantee that the snapshot will always be file based. Sometimes, these are buffers dlsym-ed from a dynamic library. For custom embedders, it could even be on the heap. IMO, I don't think the madvise is the VM is going to be easy to reason about.

@rmacnak-google: It seems like even within a single snapshot, there are two arenas with different usage semantics (A & B). Could the engine just give the VM two smaller snapshots instead? It could reclaim A on isolate initialization and pass ownership of B to the VM (or the VM could copy it and the engine reclaim the original). That way, the entire arena has identical usage. We can do this in addition to the original ask in this issue.

@zanderso
Copy link
Member

@jason-simmons has a prototype of this going, and is investigating whether there are any improvements in a big app on a low-end device.

jason-simmons added a commit to jason-simmons/flutter_engine that referenced this issue Oct 19, 2021
jason-simmons added a commit to jason-simmons/flutter_engine that referenced this issue Oct 19, 2021
jason-simmons added a commit to jason-simmons/flutter_engine that referenced this issue Oct 19, 2021
jason-simmons added a commit to jason-simmons/flutter_engine that referenced this issue Oct 21, 2021
jason-simmons added a commit to jason-simmons/flutter_engine that referenced this issue Oct 22, 2021
@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2021
@flutter-triage-bot flutter-triage-bot bot added P0 Critical issues such as a build break or regression and removed P2 labels Jun 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
c: performance Relates to speed or footprint issues (see "perf:" labels) customer: money (g3) engine flutter/engine repository. See also e: labels. P0 Critical issues such as a build break or regression perf: memory Performance issues related to memory perf: startup Performance issues related to app startup time waiting for PR to land (fixed) A fix is in flight
Projects
None yet
6 participants