Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: mmap asar files #24470

Merged
merged 10 commits into from Aug 4, 2020
Merged

refactor: mmap asar files #24470

merged 10 commits into from Aug 4, 2020

Conversation

nornagon
Copy link
Member

@nornagon nornagon commented Jul 8, 2020

Description of Change

In preparation for being able to load asar files from a dll/dylib, which are implicitly memory mapped.

Checklist

  • PR description included and stakeholders cc'd
  • npm test passes
  • PR title follows semantic commit guidelines
  • This is NOT A BREAKING CHANGE. Breaking changes may not be merged to master until 11-x-y is branched.

Release Notes

Notes: none

@electron-cation electron-cation bot added the new-pr 🌱 PR opened in the last 24 hours label Jul 8, 2020
@deepak1556
Copy link
Member

deepak1556 commented Jul 8, 2020

First of all, interesting exploration 👍

QQ: wouldn't this increase the memory footprint of the main process considering the entire asar archive is being kept in memory ?

@nornagon
Copy link
Member Author

nornagon commented Jul 9, 2020

QQ: wouldn't this increase the memory footprint of the main process considering the entire asar archive is being kept in memory ?

This isn't how mmap works. The OS can page out chunks of the mmap'd file.

@nornagon
Copy link
Member Author

nornagon commented Jul 9, 2020

Some relevant commentary from Sublime: https://www.sublimetext.com/blog/articles/use-mmap-with-care

@deepak1556
Copy link
Member

thanks for the context, follow-up question why does base::MemoryMappedFile provide to load a specific region of file into mmap https://source.chromium.org/chromium/chromium/src/+/master:base/files/memory_mapped_file.h;l=102 and there are some uses of that in the codebase, what is the advantage over loading the whole file ? would it be useful for asar archive which can be separated to regions.

@nornagon
Copy link
Member Author

nornagon commented Jul 9, 2020

Not 100% sure, but I think the main reason to map a subregion is to limit usage of virtual address space, which is important mainly on 32-bit systems. It could also be a convenience method to avoid having to pass around an offset everywhere along with the mmap'd file.

@MarshallOfSound
Copy link
Member

@nornagon Have you performance tested this at all? Can we see if there's an increase / decrease in perf of reading a large amount of small files sequentially. Or reading large files. etc.

@deepak1556
Copy link
Member

Thinking a bit more, can you run this experiment behind a base::FeatureList flag, seems like a good candidate.

@nornagon
Copy link
Member Author

nornagon commented Jul 9, 2020

Thinking a bit more, can you run this experiment behind a base::FeatureList flag, seems like a good candidate.

Hm, I'm not sure I understand why this would qualify for that. There's no change in functionality.

@deepak1556
Copy link
Member

In preparation for being able to load asar files from a dll/dylib

wouldn't this require anymore implementation changes, having both code paths then would help in comparison as well as a way to opt in.

There's no change in functionality.

Hmm but the underlying implementation has changed, this is similar to blink enabling its new rendering engine LayoutNG which had no user facing changes but was behind feature flag to experiment any regressions in perf or other stuff.

@nornagon
Copy link
Member Author

nornagon commented Jul 9, 2020

In preparation for being able to load asar files from a dll/dylib

wouldn't this require anymore implementation changes, having both code paths then would help in comparison as well as a way to opt in.

You'll be able to opt in to loading from a dylib by putting your app in a dylib and enabling the relevant fuse. That part will always remain optional, and no feature flag will be necessary.

There's no change in functionality.

Hmm but the underlying implementation has changed, this is similar to blink enabling its new rendering engine LayoutNG which had no user facing changes but was behind feature flag to experiment any regressions in perf or other stuff.

LayoutNG was a major rewrite of a massive and extraordinarily complex system. This is not. I think the risk is substantially and meaningfully lower here.

I don't think we should ask that all refactors in Electron come with a feature flag and a deprecation cycle. If there are concrete reasons to suspect that this may cause issues, then I'd be happy to test & address them prior to merging. Otherwise, let's treat this as any other refactor: roll it out, watch for reports of breakage, and fix as necessary.

The only thing I'm cautious about is potential issues when loading from a network drive that goes away (as mentioned in the sublime post). Previously, that would have resulted in an I/O failure (with unknown effects on the app; it would be up to the app to handle it). After this change, I think it will result in SIGBUS.

There might be other issues when using mmap as opposed to fs.read(), but we won't find them until we enable this by default anyway, so putting it behind a flag will just delay the gathering of such information and force us to maintain both implementations in parallel for longer.

I think flags are best suited for large, relatively isolated functionality which app developers have a reason to want to opt into (e.g. it enables a feature for them, so they'll switch on the flag in order to use it). The value of this refactor by itself isn't obvious, so I don't see any reason a developer would choose to enable it.

@electron-cation electron-cation bot removed the new-pr 🌱 PR opened in the last 24 hours label Jul 9, 2020
shell/common/api/electron_api_asar.cc Outdated Show resolved Hide resolved
shell/common/api/electron_api_asar.cc Outdated Show resolved Hide resolved
lib/common/asar.js Outdated Show resolved Hide resolved
shell/common/api/electron_api_asar.cc Outdated Show resolved Hide resolved
@zcbenz
Copy link
Member

zcbenz commented Jul 23, 2020

In preparation for being able to load asar files from a dll/dylib, which are implicitly memory mapped.

I'm not sure what does loading asar files from dll means, can you explain a bit more about what it is and when it would be used?

Mmap is very useful for editors when editing very large files, where app has to constantly read parts of files. But in Electron apps most files in asar archive are only read for once, I'm not sure if there would be benefits on performance. So I'm a bit hesitate with a refactoring that brings little benefits and increases stability risks. It would be great if you can explain more about the new features it would enable, and probably do some benchmarks on performance.

@nornagon
Copy link
Member Author

I'm not sure what does loading asar files from dll means, can you explain a bit more about what it is and when it would be used?

The idea is to allow protecting script sources with codesigning. Currently, .asar files are not considered "code" by codesigning verification systems in macOS and Windows:

  • macOS's Gatekeeper system validates non-code resources at first launch, but not thereafter. However, enabling the hardened runtime on a binary can prevent it from loading any unsigned dylibs, regardless of whether it's "first launch" or not.
    • The level of protection that this would add is debatable. If an attacker can modify files on disk, they can just as easily remove the code signature and hardened runtime flags from the main executable as edit the .asar. In current versions of macOS, unsigned binaries run without any user prompt. There are, however, some minor restrictions that apply to unsigned apps: for instance, unsigned apps cannot access the keychain.
    • I've heard that on modern macOS, Gatekeeper does "periodic" checks in addition to first-launch checks. I'm not sure what triggers this.
    • There have been some signs from Apple that in future versions of macOS, unsigned code will not run by default.
  • On Windows, AppLocker can restrict which publishers' signatures are respected for all loaded DLLs. It has no system for requiring that non-DLL files are signed, though. Putting resources in a DLL on Windows would allow AppLocker to apply its restrictions to those resources.
  • I don't know enough about codesigning on Linux to comment as to whether this approach would be relevant on that platform. It seems there are several systems and ongoing development. e.g. IMA appraisal.

The goal here is to allow Electron apps to take advantage of the resource-integrity systems provided by the underlying operating system. Native apps have access to this by default, but given that the code of Electron apps is loaded dynamically from a non-"code" file, these systems do not effectively apply to Electron apps currently.

My plan for supporting this is roughly:

  1. Introduce the ability to load ASAR files from a new kind of bundle file, .asar.dll (or .asar.dylib on macOS). These would be loaded with LoadLibrary / dlopen, and would expose a two well-known symbols: a pointer to a byte sequence, and the size of that byte sequence.
  2. Add app.asar.dll to the front of the "path search" code when searching for app code.
  3. Add a fuse which turns off the path search code and sets it to only ever load from app.asar.dll.

The ultimate goal is to provide apps with tools that allow them to do just as well as the OS will permit them to in attempting to prevent unvalidated code from running.

@zcbenz
Copy link
Member

zcbenz commented Jul 24, 2020

Thank for the clarification, code signing asar files would be very exciting and I think it is a fair use case. 👍

@nornagon
Copy link
Member Author

@zcbenz please take another look.

@nornagon nornagon requested a review from zcbenz July 28, 2020 23:44
@nornagon
Copy link
Member Author

nornagon commented Aug 4, 2020

all 4 (!) ci failures are unrelated; merging.

@nornagon nornagon merged commit 01a2e23 into master Aug 4, 2020
@release-clerk
Copy link

release-clerk bot commented Aug 4, 2020

No Release Notes

@nornagon nornagon deleted the asar-mmap branch August 4, 2020 18:48
nornagon added a commit that referenced this pull request Mar 11, 2021
@nomagick nomagick mentioned this pull request Mar 15, 2021
3 tasks
nornagon added a commit that referenced this pull request Mar 15, 2021
trop bot pushed a commit that referenced this pull request Mar 15, 2021
trop bot pushed a commit that referenced this pull request Mar 15, 2021
trop bot pushed a commit that referenced this pull request Mar 15, 2021
zcbenz pushed a commit that referenced this pull request Mar 16, 2021
This reverts commit 01a2e23.

Co-authored-by: Jeremy Rose <nornagon@nornagon.net>
zcbenz pushed a commit that referenced this pull request Mar 16, 2021
This reverts commit 01a2e23.

Co-authored-by: Jeremy Rose <nornagon@nornagon.net>
zcbenz pushed a commit that referenced this pull request Mar 16, 2021
This reverts commit 01a2e23.

Co-authored-by: Jeremy Rose <nornagon@nornagon.net>
mlaurencin pushed a commit that referenced this pull request Mar 18, 2021
@gpetrov
Copy link

gpetrov commented May 18, 2021

@nornagon packaging asar files in dlls/gylibs and code signing them will be really an amazing feature! Can't wait till it is available!

I presume that the asar.dll files will be loaded by Electron only when the code signature checks out and refused otherwise.

Loading from a memory mapped files, should really give also a welcome speed boost as well, specially when dealing with tons of small node_modules files.

Also another welcome addition will be also to be able to encrypt the asar files, so next to code signing also code protection is available. When packing an asar archive we can already specify a stream transform, now we just need to at the loading as well.

@gpetrov
Copy link

gpetrov commented May 19, 2021

A small but powerful addition will be also the just un-gzip asar dll files runtime on load. This will save storage as well and make asar files much smaller.

Also if the asar memory can be directly smartly mapped with the node_modules cache, so run time memory is saved. You don't want to load a 100MB asar file with node_modules, that will take then twice the memory because of the asar mmap file self and all the require modules cache... but that might be quite a technical challenge.

@nornagon
Copy link
Member Author

I presume that the asar.dll files will be loaded by Electron only when the code signature checks out and refused otherwise.

This is up to the OS. On Windows, code signatures are not checked by default, but there are some 3rd-party tools like AppLocker which can be employed to harden the system.

Loading from a memory mapped files, should really give also a welcome speed boost as well, specially when dealing with tons of small node_modules files.

There is no appreciable speed difference between reading from an asar and reading from a memory-mapped file, as far as I can tell.

encrypt the asar files

We have no plans to add encryption of any sort.

un-gzip asar dll files runtime on load

I don't believe this is useful to implement in Electron. Enable NTFS compression if you want this.

directly smartly mapped with the node_modules cache

I'm not sure what you mean by this.

@gpetrov
Copy link

gpetrov commented May 19, 2021

This is up to the OS. On Windows, code signatures are not checked by default, but there are some 3rd-party tools like AppLocker which can be employed to harden the system.

Well we can't really rely on the OS as we don't have control of it. Specially on windows indeed there is no code signing validation per default or it can be easily switched off.

So loading and validating the app and its asar file is really a responsibility of the app self. It has being asked before many times: #24260 and encryption just for that as well #24681 #2570

maybe something like the content verification with own public key, like in NWJS:
https://nwjs.readthedocs.io/en/latest/For%20Users/Advanced/Content%20Verification/

but then combined with Electron fuses @MarshallOfSound or other more easier way to embed the public key in the electron binary without the need to recompile it and also have a way from the asar content to verify back that the main binary is also not tempered and bot use the same code signing key.

un-gzip asar dll files runtime on load
I don't believe this is useful to implement in Electron. Enable NTFS compression if you want this.

Again we don't have control of the user system, so can just let them enable NTFS compression. Just like loading files from a website the browser accepts gzip and unzip on the fly - this should be also possible when loading asar files. NWJS also supports having zip file as a content package to load. But unfortunately it unzips it fully first and is not that smart as asar to do that on the fly.

directly smartly mapped with the node_modules cache > I'm not sure what you mean by this.

well the node modules code resides in the asar file, so when the asar file is loaded, it will get in memory (well maybe not completely depending on the mmap implementation) but then when a node module is required, the require builds a global cache in memory of all required modules, so those don't have to be loaded again. So you can actually the code of the same module twice - once in the asar memory file and once in the require cache https://nodejs.org/api/modules.html#modules_caching

@fabiospampinato
Copy link
Contributor

un-gzip asar dll files runtime on load

I don't believe this is useful to implement in Electron. Enable NTFS compression if you want this.

@nornagon Could you expand on why you think this wouldn't be useful? One complaint I hear fairly often from people is that Electron apps are just big, if there's something that can deliver double-digits improvements in this area I don't think Electron should just dismiss the opportunity like that.

Also I don't think it's reasonable to say to people "Yeah we could improve on this, but just switch to a filesystem that supports compressed partitions", that's obviously not going to work, people don't even know what a filesystem is.

I have opened an issue about this here. IMHO working with compressed asar archives may speed things up too, since the bottleneck for almost all machine is going to be reading from disk rather than decompressing.

Other than the potential difficulties of switching to compressed asar archives it seems to me like the potential pros clearly outweigh the potential cons 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants