Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64 build crashes if user has used x64 build previously #27206

Closed
KishanBagaria opened this issue Jan 6, 2021 · 19 comments
Closed

arm64 build crashes if user has used x64 build previously #27206

KishanBagaria opened this issue Jan 6, 2021 · 19 comments

Comments

@KishanBagaria
Copy link
Contributor

Issue Details

  • Electron Version:
    • 11.1.1
  • Operating System:
    • macOS 11.1

Actual Behavior

We shipped an Electron app only for x64 architecture initially. It has some native dependencies. Some users used it on their M1 / Apple silicon Macs with Rosetta 2.

Recently, we built the app for arm64 arch with electron-builder and shipped an update to those users. We determined if the process was translated with sysctl sysctl.proc_translated / app.runningUnderRosettaTranslation. It wasn't a universal binary, just arm64.

Now those users are unable to open the app – it crashes on launch.

If they rename the app or move it, it works.

To Reproduce

  1. A user uses Foo.app (x64 only) with Rosetta 2.

  2. Some time later, Foo.app (x64) auto-updates to Foo.app (arm64, not universal binary). App path stays the same: /Applications/Foo.app
    Instead of auto update, user can also delete the x64 version and replace it with the arm64 version.

  3. After the update, Foo.app (arm64) crashes on launch.

  4. If the user renames Foo.app to anything else (Foo2.app) or moves it to a different location, it starts working.

Screenshots

CleanShot 2020-12-29 at 02 07 52

Additional Information

If the user launches the app through Terminal by running /Applications/Foo.app/Contents/MacOS/Foo, it runs and works fine.

This issue also happens if the user was using the arm64-only version initially and replaced it with the x64-only version. The x64 version will crash on launch in that instance.

We got reports from three different users about this (one using M1 Mac Mini, another using M1 MacBook Pro). We were able to reproduce and confirm this on a clean Mac Mini.

@ggreco
Copy link

ggreco commented Jan 8, 2021

Have you tried cleaning up ~/Library/Application\ Support/YourAPP ?

Maybe electron saves some binary file there that broke the launch with a different arch.

@KishanBagaria
Copy link
Contributor Author

KishanBagaria commented Jan 8, 2021

Yeah, I did try removing that directory and a few others – no change.

@jviotti
Copy link
Contributor

jviotti commented Apr 29, 2021

According to https://apple.stackexchange.com/a/414517, "Cache files of Rosetta 2 are located at both /System/Library/dyld/aot_shared_cache and /var/db/oah/".

Can you try wiping those out and see if it makes a difference? My guess is that Rosetta 2 is caching the fact that an application needs to be translated using its app name as an identifier, and that might be stored somewhere in that cache.

For non-system binaries such as third-party x86_64 binaries, the files with aot extension under /var/db/oah contain the translated results.

I guess its mainly /var/db/oah the one that we should be taking a look at. What are the contents of this directory in your setup, before wiping it out?

@jviotti
Copy link
Contributor

jviotti commented Apr 29, 2021

This website (https://ffri.github.io/ProjectChampollion/part1/) provides some interesting background on Rosetta 2. This part is particularly interesting:

Since this is the first time we run hello.out, the oahd cannot find the corresponding AOT file. So, it creates a new AOT file. If the same binary in the same path has already been executed and the AOT file has been created, the oahd uses it.

I think it might be that oahd is blindly using its pre-compiled file (based on its file name?) without checking if it was changed to support the native architecture.

The file names look like this, apparently:

/var/db/oah/16c6785d8fdab5ee2435f23dc2962ceda2e76042ea2ad1517687c5bb7358bf00/065b3f057e68a5474d378306e41d8b1e3e8e612b9cf9010b76449e02b607d7f0/hello.out.aot

I'm not sure what the hexadecimal strings mean, but they could be some sort of checksums to the absolute path of the file?. Actually:

The names of these folders are SHA-256 hash values that are calculated from both the contents of the file in x86_64 code and the path where it was executed.

Can you find a directory here that contains files for your application? What happens if you delete them?

This issue also happens if the user was using the arm64-only version initially and replaced it with the x64-only version. The x64 version will crash on launch in that instance.

That is confusing though. Could it be that Rosetta 2 also caches the fact that a binary does not need to be translated, and therefore also blindly skips the translation here?

@jviotti
Copy link
Contributor

jviotti commented May 13, 2021

@KishanBagaria Does it make a difference if the arm64 application sets the LSRequiresNativeExecution (to YES) and/or the LSArchitecturePriority options (to prefer arm64 and arm64e) in its Info.plist?

I believe this will make the operating system ignore the Rosetta 2 cache and execute the app natively as normal without any errors.

Let me know how it goes!

@KishanBagaria
Copy link
Contributor Author

Cannot access /System/Library/dyld/aot_shared_cache and /var/db/oah/ – will likely require disabling System Integrity Protection and/or other things.


Same issue after setting

  • LSRequiresNativeExecution = true
  • LSRequiresNativeExecution = true and LSArchitecturePriority = arm64e
  • LSRequiresNativeExecution = true and LSArchitecturePriority = arm64

(Verified with cat /Applications/Texts.app/Contents/Info.plist)

@KishanBagaria
Copy link
Contributor Author

The issue might be related to ffi-napi – the app ran when it wasn't imported but crashed as soon as require(‘ffi-napi’) was present.

@jviotti
Copy link
Contributor

jviotti commented May 14, 2021

@KishanBagaria Could it be that the native library you are loading with ffi-napi is still an Intel x64? What do you get if you run lipo -archs on it?

@KishanBagaria
Copy link
Contributor Author

KishanBagaria commented May 14, 2021

We aren't loading the library. The code literally is just require('ffi-napi') that crashes.

@jviotti
Copy link
Contributor

jviotti commented May 15, 2021

@KishanBagaria That module is a native C++ add-on. They also publish pre-built binaries (see https://github.com/node-ffi-napi/node-ffi-napi/releases/tag/v4.0.3), but not for arm64.

What happens if you force re-compilation of the native add-on by setting npm_config_build_from_source=true, the right target arch (i.e. npm_config_target_arch=arm64) and the right Electron target, runtime, and dist URL? See https://github.com/electron/electron/blob/master/docs/tutorial/using-native-node-modules.md

@KishanBagaria
Copy link
Contributor Author

KishanBagaria commented May 15, 2021

We've already rebuilt that add-on and others for Electron 11 using electron-rebuild. It'd not work on arm64 machines without doing that and throw a different error. The arm64 build is fully functional as long as the user hasn't swapped the x64 build with it.

Edit: the ffi-napi issue I mentioned is actually node-ffi-napi/node-ffi-napi#125 which we independently discovered while trying to make a minimal repro for this issue. This issue may or may not be linked to ffi-napi.

@KishanBagaria
Copy link
Contributor Author

Running the app in Debugtron logs this right before crashing. The last line (mach_vm_read(..., ...): (os/kern) protection failure) isn't emitted when the app is renamed (and doesn't crash.)

@jviotti
Copy link
Contributor

jviotti commented May 15, 2021

So just to summarize our findings so far:

  • The issue only reproducible when the user auto-updates the x64 build with the arm64, or deletes the x64 build from /Applications and puts the arm64 one on the same place
  • The issue goes away if you don't require ffi-napi
  • The native add-on from ffi-napi is indeed an arm64 build
  • Forcing the OS to execute the app as arm64 using LSRequiresNativeExecution and LSArchitecturePriority didn't make a difference

The hypothesis is that the issue is somehow tied to the file system path the application is copied to, as i.e. renaming the app makes it all work again.

Running the app in Debugtron logs this right before crashing. The last line (mach_vm_read(..., ...): (os/kern) protection failure) isn't emitted when the app is renamed (and doesn't crash.)

Looks like that error is coming from crashpad and it's just masking the original issue. I believe crashpad attempts to read the symbol table of the process that crashed in order to give better information about the error, which is what it's failing here. You don't see that same line when renaming the app as the crash doesn't occur on the same place and therefore crashpad is never involved.

Is there a chance you can run this on a debug Electron build, so that we can better see where the error is coming from?

@KishanBagaria
Copy link
Contributor Author

The issue only reproducible when the user auto-updates the x64 build with the arm64, or deletes the x64 build from /Applications and puts the arm64 one on the same place
Forcing the OS to execute the app as arm64 using LSRequiresNativeExecution and LSArchitecturePriority didn't make a difference
The native add-on from ffi-napi is indeed an arm64 build

This is correct.

The issue goes away if you don't require ffi-napi

It's unknown if ffi-napi is related to this issue. While trying to figure this out, we found node-ffi-napi/node-ffi-napi#125 which is a totally different issue.

The hypothesis is that the issue is somehow tied to the file system path the application is copied to, as i.e. renaming the app makes it all work again.

Yep.

Is there a chance you can run this on a debug Electron build, so that we can better see where the error is coming from?

Sure, how do we do that? Do you have a link to download the debug build? We're using Electron v11.4.6.

@cliqer
Copy link

cliqer commented Jun 6, 2021

Just to add in this issue that we also have been fighting with.
When we build (using electron builder) x64 executable from an arm64 machine and the executable contains native modules and tries to spawn a new child process the app crashes at the very moment of the spawn and debug throws:

System Integrity Protection: enabled

Crashed Thread:        0  CrBrowserMain  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGBUS)
Exception Codes:       KERN_PROTECTION_FAILURE at 0x0000007838042e6c
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Bus error: 10
Termination Reason:    Namespace SIGNAL, Code 0xa
Terminating Process:   exc handler [93335]

VM Regions Near 0x7838042e6c:
    Memory Tag 255             7808800000-7809000000   [ 8192K] rw-/rwx SM=ZER  
--> Memory Tag 255             7809000000-7900000000   [  3.9G] ---/rwx SM=ZER  
    MALLOC_NANO              600000000000-600008000000 [128.0M] rw-/rwx SM=PRV  

Thread 0 Crashed:: CrBrowserMain  Dispatch queue: com.apple.main-thread
0   com.github.Electron.framework 	0x000000010980dca3 v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) + 4579

The very same app when built from an x64 intel mac works perfectly and the same works when building an arm64 from an m1.
As such multi-platform compiling and universal builds do not work until we can fix this.

@josteph
Copy link

josteph commented Apr 14, 2022

This is still happening with electron 18.0.1 and electron-builder 24.0.5

@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2022

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. If you have any new additional information—in particular, if this is still reproducible in the latest version of Electron or in the beta—please include it with your comment!

@github-actions github-actions bot added the stale label Oct 6, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2022

This issue has been closed due to inactivity, and will not be monitored. If this is a bug and you can reproduce this issue on a supported version of Electron please open a new issue and include instructions for reproducing the issue.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 5, 2022
@anis-dr
Copy link

anis-dr commented Feb 23, 2023

I have the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants