Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 64bit apk freezes on arm64 hardware #8665

Closed
yenda opened this issue Jul 29, 2019 · 52 comments · Fixed by #8943

Comments

@yenda
Copy link
Member

commented Jul 29, 2019

Problem

When building the app for 64bit arch as required by Google Play Store, we get freezes on arm64 hardware.
The freezes are random. There is no obvious indicator of a freeze in the logs besides the fact that there is no more js logging statements appearing, or no js logs at all in logcat if the freeze occurs during splash screen.

Implementation

We have tried:

Dismissed cause:

  • status-go is compiled for all arch already with gomobile so we consider it is not the issue (will try with debug level logs though to make sure it isn't an arch issue on go side)

We need to try:

  • considering that it seems like there is no logs for status-go whatsoever in 64bits arch, we need to investigate the reason behind that
  • upgrading react-native and using hermes js engine (which is also said to be much faster)

Acceptance Criteria

No freezes of the 64bits apk on 64bits devices

@yenda yenda self-assigned this Jul 29, 2019

@yenda

This comment has been minimized.

Copy link
Member Author

commented Jul 29, 2019

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

here are the logs when it gets stuck on logo screen https://gist.github.com/flexsurfer/94d5d67eb3880d2c2a62708b50c660b7
these logs when it doesn't get stuck
https://gist.github.com/flexsurfer/b5824a0a1828ad28fc2d7aefa33e6074

@yenda yenda changed the title Fix 64bit apk crash on arm64 hardware Fix 64bit apk freezes on arm64 hardware Jul 29, 2019

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

logs when it gets stuck after interacting with the UI https://gist.github.com/flexsurfer/b5be0623e19c19e401fc16f63be62f71

@oskarth

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

Did anyone else with React Native or X other technology see similar issues?

Has someone tried to setup a minimally failing test case? What's the current best hypothesis?

Also cc @PombeirP

@oskarth

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

Could this be related facebook/react-native#24260? Can we add version numbers, platform and environment details to original issue?

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

i couldn't find any reports with similar issues, but here facebook/react-native#25494 (comment) , ppl still experience problems and author of 59.10 fix suggests to try 60.4 and new engine, we're trying to check this currently with @yenda but nix makes things slower, maybe @PombeirP could help with that

@yenda

This comment has been minimized.

Copy link
Member Author

commented Jul 30, 2019

Tested as well without realm and with latest jsc a183490

Still freezes

Next steps:

  • we can try upgrading as many libs as possible, possibly focusing on those who are using native code, though I don't have much hope for that because there is 0 errors in the logs when the freezes occur
  • upgrading react-native and using hermes #8672

Things that we could explore:

  • trying to get more logs
@PombeirP

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

here are the logs when it gets stuck on logo screen https://gist.github.com/flexsurfer/94d5d67eb3880d2c2a62708b50c660b7
these logs when it doesn't get stuck
https://gist.github.com/flexsurfer/b5824a0a1828ad28fc2d7aefa33e6074

Comparing these logs, what stands out is that there are no entries for ReactNativeJS/StatusModule/RNKeychainManager on the 1st case.

logs when it gets stuck after interacting with the UI https://gist.github.com/flexsurfer/b5be0623e19c19e401fc16f63be62f71

In this one, there are entries for ReactNativeJS, but not for StatusModule/RNKeychainManager.

Comparing lib/arm64-v8a/libstatus-logs.so and lib/armeabi-v7a/libstatus-logs.so, the 64-bit version of the lib suspiciously takes <6KB while the 32-bit takes almost 14KB. I'll be investigating that, maybe the 64-bit version is currently just a skeleton.

@oskarth

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

we can try upgrading as many libs as possible, possibly focusing on those who are using native code, though I don't have much hope for that because there is 0 errors in the logs when the freezes occur

Honestly I think this is a suboptimal approach to debugging the problem. The problem with it is it that it is a positive one, where you add stuff. The only time this would make sense imo is if there's a clear hypothesis, for example based on similar bug reports, that this would resolve the issue.

A better approach to isolate the error instead is the negative one, where we try to find a minimally failing test case. This is also what would be the most useful once we find out what the upstream (?) bug is.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

I'm comparing the readelf output from both the 32-bit lib and the 64-bit lib to check the import/export tables, and you can definitely see that quite a bit is missing (although the main export JNI_OnLoad is indeed there):

image

In addition to comparing the full objdump logs, I'll try adding code that crashes the app in JNI_OnLoad to confirm that it is indeed getting called for arm64-v8a.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

Even more telling is dumping the contents of the .rodata ELF section (readelf -p.rodata):

image

I'll try to find out where the missing strings are coming from and that should inform us of what input is missing in the arm64 library.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Jul 31, 2019

I created a local PR build and from the logs can see that status-go is working (search for GoLog): https://gist.githubusercontent.com/PombeirP/380cf5da9e7e2d4eb4da79e3ca9620fd/raw/87b96bb4a389bf7e9633c1209b608628a15d3707/Status.log

@PombeirP

This comment has been minimized.

Copy link
Member

commented Jul 31, 2019

OK, some more data. In this run, I was able to create an account and see the chat screens. I then went to add a public chat, and it hung on the public chat list. Even though the UI was hung, I could still see signals from status-go being processed by the status-react Java code, e.g.:

07-31 11:54:18.813 11840 12008 D StatusModule: Signal event: {"type":"wallet","event":{"type":"newblock","blockNumber":8257691,"accounts":[]}}

From subsequent runs, I gathered some additional observations:

  • hangs can occur as soon as the splash screen is shown, or as late as adding a public chat;
  • since native/status-go code is still running, the obvious culprit would seem to be the JS thread being locked up for some reason;
  • status-logs isn't at the root of this issue, I've made a build that didn't load the lib, but it still hung on the splash screen (and other places in the app, depending on the run);
  • I could eventually post a message to #status-core, without the app locking up (by chance);
  • native animations and text inputs, of course, run even when the JS UI is hung.
@yenda

This comment has been minimized.

Copy link
Member Author

commented Jul 31, 2019

So as we are suspecting the JSC, we are going to switch it with Hermes to see if the problem persists

@PombeirP

This comment has been minimized.

Copy link
Member

commented Jul 31, 2019

Currently trying to build any non-trivial RN sample app which is built on 0.59.x, such as from this list, to see if they suffer from the same issue or if we can modify them to reproduce the issue, but it's proving challenging to find one that actually builds and runs...

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Jul 31, 2019

maybe it would be better to remove all from status-react, create a few simple screens and try

@jakubgs

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

I think @flexsurfer has the right idea. Building the app using our current stack but with almost everything removed to see if it fails the same way, and if it doesn't keep re-adding screens/whatever else to see what triggers the freeze.

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

keep re-adding screens/whatever else to see what triggers the freeze.

this may be challenging and time-consuming

One more quick idea is to build release v13 for 64, because in current develop we have all performance optimizations, js modules etc

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

Error libgnustl_shared.so is 32 bit instead of 64-bit (android) #4478
fix libgnustl_shared.so is 32 bit instead of 64-bit b50cbd0
so before this commit we had 64-bit working version?
no: 4b7360d#diff-dc46f9f5dfe204e394fb41395ccd03cf

@flexsurfer

This comment has been minimized.

@yenda

This comment has been minimized.

Copy link
Member Author

commented Aug 1, 2019

@flexsurfer don't waste your time going back, previous versions of react-native didn't even support 64bits

first version of react-native to support 64bits was 0.59.1
we made a big jump for 56 to 59 in this commit 24a978d#diff-66a880bc35034360e66422980df146b0 which was part of 0.12 https://github.com/status-im/status-react/blob/0.12.0-mobile/mobile_files/package.json.orig

The v13 build seems to fail for a different reason:
09:25:17 From https://github.com/status-im/react-native-status-keycard 09:25:17 * [new branch] ios -> origin/ios 09:25:17 ! [rejected] v2.5.5 -> v2.5.5 (would clobber existing tag) 09:25:17 info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

If I build v13 locally I get a different error:

> Task :app:processReleaseManifest FAILED
/home/pedro/src/github.com/status-im/status-react/android/app/src/release/AndroidManifest.xml:22:18-91 Error:
        Attribute application@appComponentFactory value=(android.support.v4.app.CoreComponentFactory) from [com.android.support:support-compat:28.0.0] AndroidManifest.xml:22:18-91
        is also present at [androidx.core:core:1.0.0] AndroidManifest.xml:22:18-86 value=(androidx.core.app.CoreComponentFactory).
        Suggestion: add 'tools:replace="android:appComponentFactory"' to <application> element at AndroidManifest.xml:26:5-102:19 to override.
@flexsurfer

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

ok, i have a news , after removing react-native-status module app works
two commits:
35a665b
1e55253

build: https://status-im-prs.ams3.digitaloceanspaces.com/StatusIm-190801-105131-1e5525-pr.apk

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

@PombeirP could you take a look?

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

@flexsurfer It's hard to say whether it fixes the issue or not, since with this change most of the screens were removed, so there is not much that can be done inside the app to exercise the functionality. Perhaps the only indicator is whether it stops locking up in the splash screen, but I didn't get that often in my P20 Pro. Normally it hung when creating an account or joining a public chat. I'll try to see whether I'm unable to repro the hang at splash screen.

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

ok it works with native module, but with this change a32e604

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

It doesn't prove beyond a reasonable doubt that the problem is inside react-native-status (it could be due to some weird interaction), but it does give us some clues to investigate. The question is what part of react-native-status might be causing this. If we disable all calls to status-go instead of removing react-native-status altogether, we'll still have a very crippled app but at least we're able to check if the app still hangs at startup or not.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

I'm looking at gomobile to see if there's any weird arm64 stuff going on, like this, and whether our Reproducible Android gomobile build differs from the status-go Jenkins build.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

Did another experiment, to rule out whether the issue was with the Nix status-go build: I modified the Nix recipe to build status-go (with gomobile) so that it just takes the latest status-go manual build from Jenkins. I was able to progress all the way to add a public chat in one go, but it still hung at that point.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

Also, fwiw @flexsurfer I did the following experiments, all of which still resulted in hangs:

  • leaving setMobileSignalHandler in and commenting out the contents of StatusModule.handleSignal;
  • commenting out the contents of SetMobileSignalHandler on the Go side.

It's as if as soon as we call into Statusgo, we're prone to this hang. (will test the hypothesis by calling Statusgo.appStateChange("wifi") instead of Statusgo.setMobileSignalHandler(this)).

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

it's not only setMobileSignalHandler, any call to status go may lead to hang
this is latest develop without go calls, and it works
https://status-im-prs.ams3.digitaloceanspaces.com/StatusIm-190801-160041-8ebd2a-pr.apk
this is the change 8ebd2a1

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

but only with this one it still hangs 06d8d33

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

Yup, calling Statusgo.appStateChange("wifi") instead of Statusgo.setMobileSignalHandler(this) still caused the app to hang (only on arm64), so it sounds like as soon as we call into Statusgo, we're prone to these hangs. Now the question is: why is that the case?

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

I'll try upgrading to latest gomobile. It seems to have some changes relevant to arm64.

UPDATE: Upgrading gomobile didn't seem to fix the hang, unfortunately.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

Maybe we should build a proof-of-concept RN app which calls a simple method in Go through gomobile? I'll be on holiday tomorrow, but I can work on that on Monday.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 5, 2019

I was able to create a minimal RN sample that reproduces the hang. It still uses status-go as the import, so now the next step is to use a minimal gomobile project.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 5, 2019

Built the simplest Go project with a single exported method DoSomething, and that still causes the RN sample app to hang at startup. Once I clean it up I'll have what I need to create a bug report in https://github.com/golang/go/issues.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 5, 2019

Repro app is available here. @flexsurfer I'd appreciate it if you could follow the build instructions to ensure that others can also reproduce it before I create an issue in the golang repository.

@oskarth

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

For clarity, what about:

  1. Simplest Android app (no RN) and Go?
  2. Simplest Android app and no Go?
@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

For clarity, what about:

1. Simplest Android app (no RN) and Go?

2. Simplest Android app and no Go?

That'd be useful, but since the way that this bug manifests itself is through a hang of the JavaScript UI thread, I'm not sure how we'd reproduce it in a completely native app. We could get lucky and witness a hang of the native main thread, but might be a long shot. Still worth trying though.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

I'd like to also generate a native arm64-v8a through some method other than Gomobile (maybe xgo?), in order to prove that the issue really comes from the way they generate the library.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

I've created a sample app derived from an Android sample (BasicTransition) which calls the same gomobile library, but it doesn't seem to suffer any problems after invoking DoSomething() on that library.

I've also tried using xgo to build an aar which contains an arm64 library, but for some reason xgo only seems to want to build 32-bit targets in the aar. Maybe @adambabik knows more about this.

@oskarth

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

Sounds good.

Built the simplest Go project with a single exported method DoSomething, and that still causes the RN sample app to hang at startup. Once I clean it up I'll have what I need to create a bug report in https://github.com/golang/go/issues.

Seems useful, perhaps people upstream have further thoughts. I notice there are a bunch of issues related to arm64 and mobile https://github.com/golang/go/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+arm64+mobile+ but don't know if any of them are related. Maybe gdb and backtrace would help? On the Go side obviously.

Perhaps it'd make sense to open an upstream issue in both Go and RN repo?

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

I've built the sample using RN 0.60.4 and these are the findings when running the debug app:

  • Hermes disabled:
    • arm64-v8a: it still hung when asked to reload JS;
    • armeabi-v7a: did not hang when asked to reload JS;
  • Hermes enabled:
    • arm64-v8a: did not hang when asked to reload JS;
    • armeabi-v7a: did not hang when asked to reload JS;

So while this does not inform us regarding the root cause of the interaction (i.e. is this an issue with RN or gomobile?), it gives us a path forward for the future releases.

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

Hermes is still not stable and there are bugs like this facebook/react-native#25864, maybe we could try to upgrade app to 0.60.4, enable Hermes and test it on different Android devices? @annadanchenko @PombeirP wdyt?

@yenda

This comment has been minimized.

Copy link
Member Author

commented Aug 6, 2019

@flexsurfer that was the plan #8672 the update isn't trivial though

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

I'm taking a stab at upgrading to Hermes.

@PombeirP

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

Reported bug in facebook/react-native#25970

@rachelhamlin

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

@annadanchenko looking at progress on this issue, I'm guessing we won't be ready to release v0.14 with these items in time for our iOS cut-off. wdyt? Should we prepare comms with @j-zerah to explain the need to re-release v0.13 for iOS?

I would suggest we call it v0.13.1 (assuming 0.13 was latest).

@flexsurfer

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

i'll try to build simple clojure RN app with hermes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.