Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 64bit apk freezes on arm64 hardware #8665

Closed
yenda opened this issue Jul 29, 2019 · 52 comments · Fixed by #8943 or #14041
Closed

Fix 64bit apk freezes on arm64 hardware #8665

yenda opened this issue Jul 29, 2019 · 52 comments · Fixed by #8943 or #14041

Comments

@yenda
Copy link
Contributor

yenda commented Jul 29, 2019

Problem

When building the app for 64bit arch as required by Google Play Store, we get freezes on arm64 hardware.
The freezes are random. There is no obvious indicator of a freeze in the logs besides the fact that there is no more js logging statements appearing, or no js logs at all in logcat if the freeze occurs during splash screen.

Implementation

We have tried:

Dismissed cause:

  • status-go is compiled for all arch already with gomobile so we consider it is not the issue (will try with debug level logs though to make sure it isn't an arch issue on go side)

We need to try:

  • considering that it seems like there is no logs for status-go whatsoever in 64bits arch, we need to investigate the reason behind that
  • upgrading react-native and using hermes js engine (which is also said to be much faster)

Acceptance Criteria

No freezes of the 64bits apk on 64bits devices

@yenda yenda self-assigned this Jul 29, 2019
@yenda
Copy link
Contributor Author

yenda commented Jul 29, 2019

@flexsurfer
Copy link
Member

here are the logs when it gets stuck on logo screen https://gist.github.com/flexsurfer/94d5d67eb3880d2c2a62708b50c660b7
these logs when it doesn't get stuck
https://gist.github.com/flexsurfer/b5824a0a1828ad28fc2d7aefa33e6074

@yenda yenda changed the title Fix 64bit apk crash on arm64 hardware Fix 64bit apk freezes on arm64 hardware Jul 29, 2019
@flexsurfer
Copy link
Member

logs when it gets stuck after interacting with the UI https://gist.github.com/flexsurfer/b5be0623e19c19e401fc16f63be62f71

@oskarth
Copy link
Contributor

oskarth commented Jul 30, 2019

Did anyone else with React Native or X other technology see similar issues?

Has someone tried to setup a minimally failing test case? What's the current best hypothesis?

Also cc @pombeirp

@oskarth
Copy link
Contributor

oskarth commented Jul 30, 2019

Could this be related facebook/react-native#24260? Can we add version numbers, platform and environment details to original issue?

@flexsurfer
Copy link
Member

i couldn't find any reports with similar issues, but here facebook/react-native#25494 (comment) , ppl still experience problems and author of 59.10 fix suggests to try 60.4 and new engine, we're trying to check this currently with @yenda but nix makes things slower, maybe @pombeirp could help with that

@yenda
Copy link
Contributor Author

yenda commented Jul 30, 2019

Tested as well without realm and with latest jsc a183490

Still freezes

Next steps:

Things that we could explore:

  • trying to get more logs

@pedropombeiro
Copy link
Contributor

pedropombeiro commented Jul 30, 2019

here are the logs when it gets stuck on logo screen https://gist.github.com/flexsurfer/94d5d67eb3880d2c2a62708b50c660b7
these logs when it doesn't get stuck
https://gist.github.com/flexsurfer/b5824a0a1828ad28fc2d7aefa33e6074

Comparing these logs, what stands out is that there are no entries for ReactNativeJS/StatusModule/RNKeychainManager on the 1st case.

logs when it gets stuck after interacting with the UI https://gist.github.com/flexsurfer/b5be0623e19c19e401fc16f63be62f71

In this one, there are entries for ReactNativeJS, but not for StatusModule/RNKeychainManager.

Comparing lib/arm64-v8a/libstatus-logs.so and lib/armeabi-v7a/libstatus-logs.so, the 64-bit version of the lib suspiciously takes <6KB while the 32-bit takes almost 14KB. I'll be investigating that, maybe the 64-bit version is currently just a skeleton.

@oskarth
Copy link
Contributor

oskarth commented Jul 30, 2019

we can try upgrading as many libs as possible, possibly focusing on those who are using native code, though I don't have much hope for that because there is 0 errors in the logs when the freezes occur

Honestly I think this is a suboptimal approach to debugging the problem. The problem with it is it that it is a positive one, where you add stuff. The only time this would make sense imo is if there's a clear hypothesis, for example based on similar bug reports, that this would resolve the issue.

A better approach to isolate the error instead is the negative one, where we try to find a minimally failing test case. This is also what would be the most useful once we find out what the upstream (?) bug is.

@pedropombeiro
Copy link
Contributor

pedropombeiro commented Jul 30, 2019

I'm comparing the readelf output from both the 32-bit lib and the 64-bit lib to check the import/export tables, and you can definitely see that quite a bit is missing (although the main export JNI_OnLoad is indeed there):

image

In addition to comparing the full objdump logs, I'll try adding code that crashes the app in JNI_OnLoad to confirm that it is indeed getting called for arm64-v8a.

@pedropombeiro
Copy link
Contributor

pedropombeiro commented Jul 30, 2019

Even more telling is dumping the contents of the .rodata ELF section (readelf -p.rodata):

image

I'll try to find out where the missing strings are coming from and that should inform us of what input is missing in the arm64 library.

@pedropombeiro
Copy link
Contributor

I created a local PR build and from the logs can see that status-go is working (search for GoLog): https://gist.githubusercontent.com/PombeirP/380cf5da9e7e2d4eb4da79e3ca9620fd/raw/87b96bb4a389bf7e9633c1209b608628a15d3707/Status.log

@pedropombeiro
Copy link
Contributor

OK, some more data. In this run, I was able to create an account and see the chat screens. I then went to add a public chat, and it hung on the public chat list. Even though the UI was hung, I could still see signals from status-go being processed by the status-react Java code, e.g.:

07-31 11:54:18.813 11840 12008 D StatusModule: Signal event: {"type":"wallet","event":{"type":"newblock","blockNumber":8257691,"accounts":[]}}

From subsequent runs, I gathered some additional observations:

  • hangs can occur as soon as the splash screen is shown, or as late as adding a public chat;
  • since native/status-go code is still running, the obvious culprit would seem to be the JS thread being locked up for some reason;
  • status-logs isn't at the root of this issue, I've made a build that didn't load the lib, but it still hung on the splash screen (and other places in the app, depending on the run);
  • I could eventually post a message to #status-core, without the app locking up (by chance);
  • native animations and text inputs, of course, run even when the JS UI is hung.

@yenda
Copy link
Contributor Author

yenda commented Jul 31, 2019

So as we are suspecting the JSC, we are going to switch it with Hermes to see if the problem persists

@pedropombeiro
Copy link
Contributor

Currently trying to build any non-trivial RN sample app which is built on 0.59.x, such as from this list, to see if they suffer from the same issue or if we can modify them to reproduce the issue, but it's proving challenging to find one that actually builds and runs...

@flexsurfer
Copy link
Member

flexsurfer commented Jul 31, 2019

maybe it would be better to remove all from status-react, create a few simple screens and try

@jakubgs
Copy link
Member

jakubgs commented Aug 1, 2019

I think @flexsurfer has the right idea. Building the app using our current stack but with almost everything removed to see if it fails the same way, and if it doesn't keep re-adding screens/whatever else to see what triggers the freeze.

@flexsurfer
Copy link
Member

keep re-adding screens/whatever else to see what triggers the freeze.

this may be challenging and time-consuming

One more quick idea is to build release v13 for 64, because in current develop we have all performance optimizations, js modules etc

@flexsurfer
Copy link
Member

flexsurfer commented Aug 1, 2019

Error libgnustl_shared.so is 32 bit instead of 64-bit (android) #4478
fix libgnustl_shared.so is 32 bit instead of 64-bit b50cbd0
so before this commit we had 64-bit working version?
no: 4b7360d#diff-dc46f9f5dfe204e394fb41395ccd03cf

@flexsurfer
Copy link
Member

@yenda
Copy link
Contributor Author

yenda commented Aug 1, 2019

@flexsurfer don't waste your time going back, previous versions of react-native didn't even support 64bits

first version of react-native to support 64bits was 0.59.1
we made a big jump for 56 to 59 in this commit 24a978d#diff-66a880bc35034360e66422980df146b0 which was part of 0.12 https://github.com/status-im/status-react/blob/0.12.0-mobile/mobile_files/package.json.orig

The v13 build seems to fail for a different reason:
09:25:17 From https://github.com/status-im/react-native-status-keycard 09:25:17 * [new branch] ios -> origin/ios 09:25:17 ! [rejected] v2.5.5 -> v2.5.5 (would clobber existing tag) 09:25:17 info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.

@pedropombeiro
Copy link
Contributor

If I build v13 locally I get a different error:

> Task :app:processReleaseManifest FAILED
/home/pedro/src/github.com/status-im/status-react/android/app/src/release/AndroidManifest.xml:22:18-91 Error:
        Attribute application@appComponentFactory value=(android.support.v4.app.CoreComponentFactory) from [com.android.support:support-compat:28.0.0] AndroidManifest.xml:22:18-91
        is also present at [androidx.core:core:1.0.0] AndroidManifest.xml:22:18-86 value=(androidx.core.app.CoreComponentFactory).
        Suggestion: add 'tools:replace="android:appComponentFactory"' to <application> element at AndroidManifest.xml:26:5-102:19 to override.

@flexsurfer
Copy link
Member

flexsurfer commented Aug 1, 2019

ok, i have a news , after removing react-native-status module app works
two commits:
35a665b
1e55253

build: https://status-im-prs.ams3.digitaloceanspaces.com/StatusIm-190801-105131-1e5525-pr.apk

@flexsurfer
Copy link
Member

@pombeirp could you take a look?

@pedropombeiro
Copy link
Contributor

@flexsurfer It's hard to say whether it fixes the issue or not, since with this change most of the screens were removed, so there is not much that can be done inside the app to exercise the functionality. Perhaps the only indicator is whether it stops locking up in the splash screen, but I didn't get that often in my P20 Pro. Normally it hung when creating an account or joining a public chat. I'll try to see whether I'm unable to repro the hang at splash screen.

@pedropombeiro
Copy link
Contributor

Did another experiment, to rule out whether the issue was with the Nix status-go build: I modified the Nix recipe to build status-go (with gomobile) so that it just takes the latest status-go manual build from Jenkins. I was able to progress all the way to add a public chat in one go, but it still hung at that point.

@pedropombeiro
Copy link
Contributor

pedropombeiro commented Aug 1, 2019

Also, fwiw @flexsurfer I did the following experiments, all of which still resulted in hangs:

  • leaving setMobileSignalHandler in and commenting out the contents of StatusModule.handleSignal;
  • commenting out the contents of SetMobileSignalHandler on the Go side.

It's as if as soon as we call into Statusgo, we're prone to this hang. (will test the hypothesis by calling Statusgo.appStateChange("wifi") instead of Statusgo.setMobileSignalHandler(this)).

@flexsurfer
Copy link
Member

flexsurfer commented Aug 1, 2019

it's not only setMobileSignalHandler, any call to status go may lead to hang
this is latest develop without go calls, and it works
https://status-im-prs.ams3.digitaloceanspaces.com/StatusIm-190801-160041-8ebd2a-pr.apk
this is the change 8ebd2a1

@flexsurfer
Copy link
Member

but only with this one it still hangs 06d8d33

@pedropombeiro
Copy link
Contributor

Yup, calling Statusgo.appStateChange("wifi") instead of Statusgo.setMobileSignalHandler(this) still caused the app to hang (only on arm64), so it sounds like as soon as we call into Statusgo, we're prone to these hangs. Now the question is: why is that the case?

@pedropombeiro
Copy link
Contributor

pedropombeiro commented Aug 1, 2019

I'll try upgrading to latest gomobile. It seems to have some changes relevant to arm64.

UPDATE: Upgrading gomobile didn't seem to fix the hang, unfortunately.

@pedropombeiro
Copy link
Contributor

pedropombeiro commented Aug 1, 2019

Maybe we should build a proof-of-concept RN app which calls a simple method in Go through gomobile? I'll be on holiday tomorrow, but I can work on that on Monday.

@pedropombeiro
Copy link
Contributor

I was able to create a minimal RN sample that reproduces the hang. It still uses status-go as the import, so now the next step is to use a minimal gomobile project.

@pedropombeiro
Copy link
Contributor

Built the simplest Go project with a single exported method DoSomething, and that still causes the RN sample app to hang at startup. Once I clean it up I'll have what I need to create a bug report in https://github.com/golang/go/issues.

@pedropombeiro
Copy link
Contributor

Repro app is available here. @flexsurfer I'd appreciate it if you could follow the build instructions to ensure that others can also reproduce it before I create an issue in the golang repository.

@oskarth
Copy link
Contributor

oskarth commented Aug 6, 2019

For clarity, what about:

  1. Simplest Android app (no RN) and Go?
  2. Simplest Android app and no Go?

@pedropombeiro
Copy link
Contributor

For clarity, what about:

1. Simplest Android app (no RN) and Go?

2. Simplest Android app and no Go?

That'd be useful, but since the way that this bug manifests itself is through a hang of the JavaScript UI thread, I'm not sure how we'd reproduce it in a completely native app. We could get lucky and witness a hang of the native main thread, but might be a long shot. Still worth trying though.

@pedropombeiro
Copy link
Contributor

I'd like to also generate a native arm64-v8a through some method other than Gomobile (maybe xgo?), in order to prove that the issue really comes from the way they generate the library.

@pedropombeiro
Copy link
Contributor

I've created a sample app derived from an Android sample (BasicTransition) which calls the same gomobile library, but it doesn't seem to suffer any problems after invoking DoSomething() on that library.

I've also tried using xgo to build an aar which contains an arm64 library, but for some reason xgo only seems to want to build 32-bit targets in the aar. Maybe @adambabik knows more about this.

@oskarth
Copy link
Contributor

oskarth commented Aug 6, 2019

Sounds good.

Built the simplest Go project with a single exported method DoSomething, and that still causes the RN sample app to hang at startup. Once I clean it up I'll have what I need to create a bug report in https://github.com/golang/go/issues.

Seems useful, perhaps people upstream have further thoughts. I notice there are a bunch of issues related to arm64 and mobile https://github.com/golang/go/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+arm64+mobile+ but don't know if any of them are related. Maybe gdb and backtrace would help? On the Go side obviously.

Perhaps it'd make sense to open an upstream issue in both Go and RN repo?

@pedropombeiro
Copy link
Contributor

I've built the sample using RN 0.60.4 and these are the findings when running the debug app:

  • Hermes disabled:
    • arm64-v8a: it still hung when asked to reload JS;
    • armeabi-v7a: did not hang when asked to reload JS;
  • Hermes enabled:
    • arm64-v8a: did not hang when asked to reload JS;
    • armeabi-v7a: did not hang when asked to reload JS;

So while this does not inform us regarding the root cause of the interaction (i.e. is this an issue with RN or gomobile?), it gives us a path forward for the future releases.

@flexsurfer
Copy link
Member

Hermes is still not stable and there are bugs like this facebook/react-native#25864, maybe we could try to upgrade app to 0.60.4, enable Hermes and test it on different Android devices? @annadanchenko @pombeirp wdyt?

@yenda
Copy link
Contributor Author

yenda commented Aug 6, 2019

@flexsurfer that was the plan #8672 the update isn't trivial though

@pedropombeiro
Copy link
Contributor

I'm taking a stab at upgrading to Hermes.

@pedropombeiro
Copy link
Contributor

Reported bug in facebook/react-native#25970

@rachelhamlin
Copy link
Contributor

@annadanchenko looking at progress on this issue, I'm guessing we won't be ready to release v0.14 with these items in time for our iOS cut-off. wdyt? Should we prepare comms with @j-zerah to explain the need to re-release v0.13 for iOS?

I would suggest we call it v0.13.1 (assuming 0.13 was latest).

@flexsurfer
Copy link
Member

i'll try to build simple clojure RN app with hermes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants