Crash: Signal 10 w/ RxJava flatMap() #987

Closed
deadpixelsociety opened this Issue May 18, 2015 · 18 comments

Projects

None yet

5 participants

@deadpixelsociety

When running my app on a device (iPhone 4S/iOS 8.3) I get the following error in my RoboVM console:

[ERROR] AppLauncher failed with an exception:
[ERROR] java.lang.RuntimeException: The app crashed: Terminated due to signal 10. Check the device logs in Xcode (Window->Devices) for more info.
[ERROR] at org.robovm.libimobiledevice.util.AppLauncher.pipeStdOut(AppLauncher.java:829)
[ERROR] at org.robovm.libimobiledevice.util.AppLauncher.launchInternal(AppLauncher.java:734)
[ERROR] at org.robovm.libimobiledevice.util.AppLauncher.launch(AppLauncher.java:1052)
[ERROR] at org.robovm.compiler.target.ios.AppLauncherProcess$1.run(AppLauncherProcess.java:67)

Crash log: https://www.dropbox.com/s/r56q90500rjifb4/Main%20%205-18-15%2C%204-12%20PM.crash?dl=0

This app works on the simulator (4S/8.3) but fails on an actual device.

Able to reproduce the issue in a sample project: https://www.dropbox.com/s/3w43lhy9odsrj1d/untitled.zip?dl=0

Not sure if it's a RxJava issue or something RoboVM related that's just being exposed here, but I thought I'd bring it to your attention.

Thanks!

@badlogic badlogic added the bug label May 19, 2015
@badlogic
Contributor

Thanks for the repro project. Will take a look asap.

@ashleyj
Contributor
ashleyj commented May 19, 2015

Perhaps related to #766

@ntherning
Contributor

Our sun.misc.Unsafe.putOrdered*() methods are not implemented properly. They just delegate to sun.misc.Unsafe.put*Volatile(). This works on all platforms but ARM 32-bit which will fail if we try to do an atomic store of a long to a memory location which isn't 8-byte aligned. The app will crash which I believe is what happens here.

When I fix the sun.misc.Unsafe.putOrdered*() methods to work as intended using a full memory barrier before the store the provided reproduction case succeeds and works as expected.

@ntherning ntherning added this to the 1.3 milestone May 19, 2015
@ntherning ntherning self-assigned this May 19, 2015
@ntherning ntherning added a commit that referenced this issue May 19, 2015
@ntherning ntherning Fixed up the implementations of sun.misc.Unsafe.putOrderedInt(),
sun.misc.Unsafe.putOrderedLong() and sun.misc.Unsafe.putOrderedObject() to
work as expected and to avoid putOrderedLong() crashing under certain
circumstances on ARM 32-bit. (#987)
269fa33
@ntherning
Contributor

I believe this is fixed now. Please try with the next nightly build (20150520) and let us know if it works. See http://docs.robovm.com/advanced-topics/nightlies.html for instructions on how to use nightly builds.

@deadpixelsociety

Just got to test this today using the newest nightlies (via the 20150521 intellij plugin snapshot) and the issue seems to still be there.

Crash log: https://www.dropbox.com/s/04m8jq82y974qtf/OnTimeMobile%20%205-21-15%2C%2011-04%20AM.crash?dl=0

I uninstalled the plugin, cleared out .robovm, .robovm-sdks and any relevant .gradle directories. Reinstalled the nightly plugin, sync'd and built my project to the same results.

@badlogic
Contributor

I assume that's for your full flegded project. Just to rule out any issues
with the IDEA plugin packaging, can you tro your repro sample on your end?
On May 21, 2015 5:08 PM, "deadpixelsociety" notifications@github.com
wrote:

Just got to test this today using the newest nightlies (via the 20150521
intellij plugin snapshot) and the issue seems to still be there.

Crash log:
https://www.dropbox.com/s/04m8jq82y974qtf/OnTimeMobile%20%205-21-15%2C%2011-04%20AM.crash?dl=0


Reply to this email directly or view it on GitHub
#987 (comment).

@deadpixelsociety

I can confirm that the original repro project does work as expected on-device now. However my actual project does similar things but with threading included. I'm able to reproduce crashes when adding an 'observeOn(Schedulers.newThread())' call to my Rx call chain. Should we continue this here or should I close this and open a new issue with a new log/repro?

Thanks!

@badlogic
Contributor

Let's continue here. If you could modify your repro so it starts crashing
again i'd be most wonderful :)

On Thu, May 21, 2015 at 6:00 PM, deadpixelsociety notifications@github.com
wrote:

I can confirm that the original repro project does work as expected
on-device now. However my actual project does similar things but with
threading included. I'm able to reproduce crashes when adding an
'observeOn(Schedulers.newThread())' call to my Rx call chain. Should we
continue this here or should I close this and open a new issue with a new
log/repro?

Thanks!


Reply to this email directly or view it on GitHub
#987 (comment).

@ntherning
Contributor

Thanks! Had a look at the crash log and below is the thread which is the likely cause of the crash. The crash log says thread 0 but that's often incorrect. I suspect it's an alignment issue. ARM 32-bit crashes if one tries to read non 8-byte aligned 64-bit values atomically. Next I'll try the repro app.

Thread 13:
0   Main                            0x000448a4 Java_sun_misc_Unsafe_getLongVolatile + 12
1   Main                            0x00613b40 [J]sun.misc.Unsafe.getLongVolatile(Ljava/lang/Object;J)J + 36
2   Main                            0x00607194 [J]rx.internal.util.unsafe.SpscArrayQueue.lvConsumerIndex()J (/SpscArrayQueue.java:192)
3   Main                            0x00607062 [J]rx.internal.util.unsafe.SpscArrayQueue.size()I (/SpscArrayQueue.java:168)
4   Main                            0x00221452 [J]java.util.AbstractCollection.isEmpty()Z (/AbstractCollection.java:184)
5   Main                            0x005ea78c [J]rx.internal.operators.OperatorObserveOn$ObserveOnSubscriber.pollQueue()V + 216
6   Main                            0x005ea9e2 [J]rx.internal.operators.OperatorObserveOn$ObserveOnSubscriber$2.call()V + 18
7   Main                            0x005ff8d2 [J]rx.internal.schedulers.ScheduledAction.run()V (/ScheduledAction.java:55)
8   Main                            0x00272dba [J]java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; + 22
9   Main                            0x0027c4e2 [J]java.util.concurrent.FutureTask.run()V (/FutureTask.java:237)
10  Main                            0x00282e10 [J]java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V + 16
11  Main                            0x00282dd4 [J]java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V + 112
12  Main                            0x00286192 [J]java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (/ThreadPoolExecutor.java:1112)
13  Main                            0x00287c30 [J]java.util.concurrent.ThreadPoolExecutor$Worker.run()V + 20
14  Main                            0x001b5812 [J]java.lang.Thread.run()V (/Thread.java:837)
15  Main                            0x006272f0 _call0 + 44
16  Main                            0x0061fe12 callVoidMethod + 98
17  Main                            0x0061fbd8 rvmCallVoidInstanceMethodA + 216
18  Main                            0x00626acc startThreadEntryPoint + 256
19  Main                            0x006386b6 GC_inner_start_routine + 82
20  Main                            0x006363d2 GC_call_with_stack_base + 26
21  Main                            0x00639694 GC_start_routine + 28
22  libsystem_pthread.dylib         0x394d9de8 _pthread_body + 136
23  libsystem_pthread.dylib         0x394d9d5a _pthread_start + 114
24  libsystem_pthread.dylib         0x394d7b04 thread_start + 4
@ntherning
Contributor

Jupp, looks like the crash is due to an unaligned ldrexd in getLongVolatile(). getLongVolatile() is called with an offset equals to 412 which is not 8-byte aligned. On ARM 32-bit we only make sure volatile long fields are 8-byte aligned. RxJava probably calls getLongVolatile() with the offset of a non-volatile long field offset which happens to be 4-byte aligned only. The only fix as I see it is to make all long fields on ARM 32-bit 8-byte aligned regardless of whether they are volatile or not since we cannot know beforehand which fields getLongVolatile() will be called for. Objects might consume a little more memory for the extra padding but that shouldn't be too much of a problem.

@ntherning ntherning added a commit that referenced this issue May 22, 2015
@ntherning ntherning Always 8-byte align long fields on ARM 32-bit to make sure any long f…
…ield can

be used with Unsafe.getLongVolatile()/Unsafe.putLongVolatile() regardless of
whether the field is volatile/final or not. (#987)
1fab8d6
@ntherning
Contributor

Ok, fixed the second issue. Please retest and let us know.

@deadpixelsociety

I will do that. Probably will be Tuesday before I get that chance. Thanks!

@deadpixelsociety

I feel like I'm becoming a problem child. Sorry!

The above issue is -fixed-. However I am able to reproduce a new crash case using the same basic setup. If this is still related to the original problem(s) or not I can't say. If you need me to move this to a new issue at any point please let me know!

Crash log: https://www.dropbox.com/s/coc0fpnp4v6lked/Main%20%205-26-15%2C%204-19%20PM.crash?dl=0
Repro: https://www.dropbox.com/s/7swhqod5eyfeirb/untitled_05262015.zip?dl=0

@ntherning
Contributor

It's related. Another unaligned access.We need to figure out why RxJava calls getLongVolatile() with unaligned offsets on RoboVM but not on Android. The same code runs fine on Android ARM 32-bit devices and still uses Unsafe. We've also verified that Unsafe.getLongVolatile() crashes on Android 32-bit devices when called with an unaligned offset. So RxJava passes properly aligned offsets on Android but not on RoboVM. We'll have to postpone further investigations to the next release I'm afraid.

@ntherning ntherning modified the milestone: 1.4, 1.3 May 27, 2015
@badlogic
Contributor

They fetch an array base offset on Android and add the actual offset to
that. Maybe our base offset calc in Unsafe is wrong?
On May 27, 2015 13:53, "Niklas Therning" notifications@github.com wrote:

It's related. Another unaligned access.We need to figure out why RxJava
calls getLongVolatile() with unaligned offsets on RoboVM but not on
Android. The same code runs fine on Android ARM 32-bit devices and still
uses Unsafe. We've also verified that Unsafe.getLongVolatile() crashes on
Android 32-bit devices when called with an unaligned offset. So RxJava
passes properly aligned offsets on Android but not on RoboVM. We'll have to
postpone further investigations to the next release I'm afraid.


Reply to this email directly or view it on GitHub
#987 (comment).

@ntherning ntherning modified the milestone: 1.4, 1.5 Jun 17, 2015
@ntherning ntherning modified the milestone: 1.5, 1.6 Jul 3, 2015
@ntherning ntherning modified the milestone: 1.6, 1.7 Aug 4, 2015
@ntherning ntherning removed their assignment Aug 20, 2015
@ntherning ntherning modified the milestone: 1.7, 1.8, 1.9 Sep 2, 2015
@ntherning ntherning modified the milestone: 1.10, 1.9 Sep 14, 2015
@JohnColanduoni

In the meantime, you can use a custom build of RxJava with the isUnsafeAvailable (located here) patched to always return false or return false when running on RoboVM and 32-bit ARM.

@ntherning ntherning modified the milestone: 1.11, 1.10 Nov 2, 2015
@badlogic badlogic self-assigned this Nov 3, 2015
@ntherning ntherning added the ready label Nov 3, 2015
@badlogic badlogic modified the milestone: 1.12, 1.10 Nov 23, 2015
@ntherning ntherning assigned ntherning and unassigned badlogic Dec 15, 2015
@ntherning ntherning modified the milestone: 1.13, 1.12 Dec 15, 2015
@ntherning ntherning added in progress and removed ready labels Dec 17, 2015
@ntherning
Contributor

This has been fixed in the latest nightly build.

@ntherning ntherning closed this Dec 20, 2015
@ntherning ntherning removed the in progress label Dec 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment