Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchOS apps crash on launch if built with LLVM #13454

Closed
rolfbjarne opened this issue Mar 14, 2019 · 17 comments · Fixed by #14362
Closed

watchOS apps crash on launch if built with LLVM #13454

rolfbjarne opened this issue Mar 14, 2019 · 17 comments · Fixed by #14362

Comments

@rolfbjarne
Copy link
Member

rolfbjarne commented Mar 14, 2019

Steps to Reproduce

  1. Build xamarin-macios/master
  2. Create new iOS project in VSfM and add a new watchApp project to it.
  3. Select the Release|Device configuration, and run.

Current Behavior

Crash at launch.

Application output isn't very helpful:

[...]
Xamarin.Hosting: Launched after 1/10 attempt(s).
Xamarin.Hosting: Process '548' exited.
Watch application 'com.rolfkvinge.watch-2018-10-launch-crash-test.watchkitapp' terminated.

A lldb session doesn't help much either: https://gist.github.com/rolfbjarne/42efa81e405c86f6eafc278483dadec0.

Expected Behavior

No crash.

On which platforms did you notice this

[x] watchOS (only armv7k, doesn't repro on arm64_32 with the new arm64_32 support).
[ ] macOS
[ ] Linux
[ ] Windows

Version Used:

xamarin-macios: master (xamarin/xamarin-macios@96ae30a)

@marek-safar
Copy link
Member

@lewurm could you please check this

@marek-safar marek-safar added this to the 2019-02 (6.0.xx) milestone Mar 14, 2019
@lewurm
Copy link
Contributor

lewurm commented Mar 18, 2019

@rolfbjarne provided some more logs.

Crash caught in the debugger: https://gist.github.com/rolfbjarne/e333fb4b7704c3c311a7cf7c631cd864

Rolf added some more logging around the crash site:

the crash occurs just after returning from the call to MethodBase.GetMethodFromHandleNoGenericCheck here:

a[i] = (RuntimeMethodInfo) MethodBase.GetMethodFromHandleNoGenericCheck (mh, refh);

instrumented like this: https://gist.github.com/rolfbjarne/1ca6318058ac8507742e7b8f2df5296f, this is the output:

RuntimeType.GetMethodsByName (): 2
RuntimeType.GetMethodsByName (): 4
RuntimeType.GetMethodsByName (): 5
RuntimeType.GetMethodsByName (): 6
RuntimeType.GetMethodsByName (): 7
RuntimeType.GetMethodsByName (): 8
MethodBase.GetMethodFromHandleNoGenericCheck (,) 1
MethodBase.GetMethodFromHandleNoGenericCheck (System.RuntimeMethodHandle, System.RuntimeTypeHandle)
MethodBase.GetMethodFromHandleNoGenericCheck (,) 1 done
[crash]

lldb session: https://gist.github.com/rolfbjarne/c45adc8c37e9831e7d56b6604e4f3f4d:

  1. there's a Sleep in the instrumented code: https://gist.github.com/rolfbjarne/1ca6318058ac8507742e7b8f2df5296f#file-instrumented-cs-L35, I ctrl-c in that sleep.
  2. I put a breakpoint on the address where execution will return in GetMethodsByName: https://gist.github.com/rolfbjarne/c45adc8c37e9831e7d56b6604e4f3f4d#file-lldb-txt-L73
  3. Continue
  4. When the breakpoint is hit in GetMethodsByName, I repeat re re -a and stepi until eventually the watch decides I'm being too slow and kills me (before the crash location is reached)
  5. At the end I disass the GetMethodsByName method

Here's a different log, where I just stepi until I reach the crash site: https://gist.github.com/rolfbjarne/3e65fc7ee983746bd65303384c2bd1ee#file-lldb-txt-L133

Here are the bitcode files in case someone wants to look at it:
https://www.dropbox.com/sh/h651vqdiu0psy6q/AAChTRvj0Frqfm1LB_CJCcSna?dl=0

I looked into all the logs, but I didn't discover anything suspicious 🙁

As a reference, here is the slack conversation with more context: https://xamarinhq.slack.com/archives/C03CCJNR7/p1552567685389000

@vargaz do you have any idea what that could be?

@marek-safar
Copy link
Member

@vargaz any thoughts on that?

@marek-safar marek-safar changed the title [2018-10] watchOS apps crash on launch if built with LLVM watchOS apps crash on launch if built with LLVM Apr 2, 2019
@marek-safar
Copy link
Member

It might be related to #13006

@lewurm
Copy link
Contributor

lewurm commented Apr 4, 2019

FWIW log output with MONO_LOG_LEVEL=debug MONO_LOG_MASK=all: https://gist.github.com/rolfbjarne/02c8964cf4dd3ea490ab9c10159afdbc

@chamons
Copy link
Contributor

chamons commented Apr 24, 2019

Friendly reminder/poke that this needs to happen in the d16-2 timeframe, preferably before 2019-02 lands, else we were be breaking watch then.

@lewurm lewurm self-assigned this Apr 25, 2019
@lewurm
Copy link
Contributor

lewurm commented May 2, 2019

I can reproduce on my Apple Watch Series 3. Additionally, I get this message on the output as well:

critical: ../../../../../mono/eglib/gptrarray.c:88: assertion 'array != NULL' failed

@lewurm
Copy link
Contributor

lewurm commented May 2, 2019

with some counting tricks, I got this back trace:

(lldb) bt
* thread #1, name = 'tid_303', queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00b5bf08 mini`eglib_debug_bp at gptrarray.c:86:2
    frame #1: 0x00b5bf2c mini`monoeg_g_ptr_array_free(array=0x00000000, free_seg=1) at gptrarray.c:96:3
    frame #2: 0x00a300ea mini`ves_icall_Mono_RuntimeGPtrArrayHandle_GPtrArrayFree(ptr_array=0x00000000, error=0x00dc2394) at icall.c:1550:2
    frame #3: 0x00a4142c mini`ves_icall_Mono_RuntimeGPtrArrayHandle_GPtrArrayFree_raw(a0=0x00000000, error=0x00dc2394) at icall-def.h:155:1
    frame #4: 0x0040749e mini`aot_wrapper___Mono_Mono_dot_RuntimeGPtrArrayHandle__GPtrArrayFree_pinvoke_void_cl24_Mono_2eRuntimeStructs_2fGPtrArray_2a_void_cl24_Mono_2eRuntimeStructs_2fGPtrArray_2a_ + 80
    frame #5: 0x0040753e mini`mscorlib_Mono_RuntimeGPtrArrayHandle_DestroyAndFree_Mono_RuntimeGPtrArrayHandle_ + 100
    frame #6: 0x0040790c mini`mscorlib_Mono_SafeGPtrArrayHandle_Dispose + 60
    frame #7: 0x004b6a82 mini`mscorlib_System_RuntimeType_GetConstructors_internal_System_Reflection_BindingFlags_System_RuntimeType + 806

I need to dechiper the handles code, but wild guess: There was a big commit in the handles code after 2018-10 branch off (so it's in 2019-02/2019-04 only): 44db64c So that might be the root cause.

@vargaz
Copy link
Contributor

vargaz commented May 2, 2019

It looks like the NULL value comes from managed code. Its supposed to be the return value of
ves_icall_RuntimeType_GetConstructors_native ().

@lambdageek
Copy link
Member

lambdageek commented May 2, 2019

@lewurm could be this call to GetConstructors_native is returning null

using (var h = new Mono.SafeGPtrArrayHandle (GetConstructors_native (bindingAttr))) {

I notice that the Mono.SafeGPtrArrayHandle always ends up calling GPtrArrayFree even if the underlying pointer is null.

So if it's reasonable for GetConstructors_native to return null, the SafeGPtraArrayHandle should be fixed up, if not, we need to figure out why the constructors icall is returning null

@vargaz
Copy link
Contributor

vargaz commented May 2, 2019

It can only return null here:
mono_error_set_for_class_failure (error, klass);
return NULL;
If set_for_class_failure () doesn't actually set a failure, or if the wrappers don't convert the error to an exception.

@lewurm
Copy link
Contributor

lewurm commented May 2, 2019

according to my logging we take this exit

return res_array;

I added this before the return

printf ("\tnormal exit for class: %s\n", mono_type_full_name (&klass->_byval_arg))

and prints

        normal exit for class: System.Collections.Generic.GenericEqualityComparer`1<intptr>

@lewurm
Copy link
Contributor

lewurm commented May 3, 2019

"nice", I don't get that assertion 'array != NULL' failed message anymore. Now the code segfaults slightly earlier. So it appears we have somewhere code that smashes things on the stack:

Process 230 stopped
* thread #1, name = 'tid_303', queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x0032229c mini`mscorlib_System_RuntimeType_GetConstructors_internal_System_Reflection_BindingFlags_System_RuntimeType + 528
mini`mscorlib_System_RuntimeType_GetConstructors_internal_System_Reflection_BindingFlags_System_RuntimeType:
->  0x32229c <+528>: bl     0x3c1baa                  ; mscorlib_System_Reflection_RuntimeMethodInfo_GetMethodFromHandleNoGenericCheck_System_RuntimeMethodHandle_System_RuntimeTypeHandle
    0x3222a0 <+532>: str    r0, [r7, #-140]
    0x3222a4 <+536>: b      0x3222a6                  ; <+538>
    0x3222a6 <+538>: movs   r0, #0x0
Target 0: (mini) stopped.
(lldb) register read sp r7
      sp = 0x00c2e430
      r7 = 0x00c2e4e8
(lldb) c
Process 230 resuming
Process 230 stopped
* thread #1, name = 'tid_303', queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
    frame #0: 0x003222a0 mini`mscorlib_System_RuntimeType_GetConstructors_internal_System_Reflection_BindingFlags_System_RuntimeType + 532
mini`mscorlib_System_RuntimeType_GetConstructors_internal_System_Reflection_BindingFlags_System_RuntimeType:
->  0x3222a0 <+532>: str    r0, [r7, #-140]
    0x3222a4 <+536>: b      0x3222a6                  ; <+538>
    0x3222a6 <+538>: movs   r0, #0x0
    0x3222a8 <+540>: ldr    r1, [r7, #-140]
Target 0: (mini) stopped.
(lldb) register read sp r7
      sp = 0x00c2e430
      r7 = 0x00000000
(lldb) x/20i 0x3c1baa
    0x3c1baa: 0xb580       push   {r7, lr}
    0x3c1bac: 0x466f       mov    r7, sp
    0x3c1bae: 0xb08a       sub    sp, #0x28

lldb timeouts on the watch prevent me to single step through it 😡
I'm currently starring at https://github.com/mono/mono/pull/11294/files it moved some initialization to wrappers that have been in runtime code before

@lewurm
Copy link
Contributor

lewurm commented May 3, 2019

This bitcode

 %System.RuntimeTypeHandle = type { i8, i8, i8, i8 }
 %System.RuntimeMethodHandle = type { i8, i8, i8, i8 }
 [...]
define internal i32* @mscorlib_System_Reflection_RuntimeMethodInfo_GetMethodFromHandleNoGenericCheck_System_RuntimeMethodHandle_System_RuntimeTypeHandle([1 x i64] %arg_0,         [1 x i64] %arg_1) #5 {
BB0:
  %0 = alloca %System.RuntimeTypeHandle, align 4
  %1 = alloca %System.RuntimeMethodHandle, align 4
  %2 = alloca double, align 8
  %3 = alloca double, align 8
  %4 = alloca %System.RuntimeTypeHandle, align 4
  %5 = alloca %System.RuntimeMethodHandle, align 4
  %6 = bitcast %System.RuntimeMethodHandle* %1 to [1 x i64]*
  store [1 x i64] %arg_0, [1 x i64]* %6
  %7 = bitcast %System.RuntimeTypeHandle* %0 to [1 x i64]*
  store [1 x i64] %arg_1, [1 x i64]* %7
  br label %INIT_BB1

INIT_BB1:                                         ; preds = %BB0
  %is_inited = load i8, i8* getelementptr inbounds ([10868 x i8], [10868 x i8]* @mono_inited, i32 0, i32 4990)
  %8 = call i8 @llvm.expect.i8(i8 %is_inited, i8 1)

turns into that machine code

_mscorlib_System_Reflection_RuntimeMethodInfo_GetMethodFromHandleNoGenericCheck_System_RuntimeMethodHandle_System_RuntimeTypeHandle:
00151aaa    80 b5   push    {r7, lr}
00151aac    6f 46   mov r7, sp
00151aae    8a b0   sub sp, #0x28
00151ab0    08 90   str r0, [sp, #0x20]
00151ab2    09 91   str r1, [sp, #0x24]
00151ab4    0a 93   str r3, [sp, #0x28]  # <--- this smashes the stack slot where r7 is stored
00151ab6    09 92   str r2, [sp, #0x24]
00151ab8    41 f2 7c 30     movw    r0, #0x137c
00151abc    c0 f2 00 00     movt    r0, _init_method
00151ac0    4e f2 24 31     movw    r1, :lower16:((_mono_inited-1383112)-4)
00151ac4    c0 f2 d3 01     movt    r1, :upper16:((_mono_inited-1383112)-4)
 [...]

So this is the C# code for it:

internal static MethodBase GetMethodFromHandleNoGenericCheck (RuntimeMethodHandle handle, RuntimeTypeHandle reflectedType)
{
return GetMethodFromHandleInternalType_native (handle.Value, reflectedType.Value, false);
}

Why are RuntimeMethodHandle and RuntimeTypeHandle each i64 in the argument list in the LLVM code? As far as I understand they are a struct with an IntPtr in it and the definition in LLVM IR confirms that: It's 4xi8 = i32 which maps to pointer size on armv7k.

@vargaz does that ring a bell?

@vargaz
Copy link
Contributor

vargaz commented May 3, 2019

I assume they are passed as LLVMArgAsIArgs, which is passed as a set of IntPtr()-s:
static LLVMTypeRef
IntPtrType (void)
{
return TARGET_SIZEOF_VOID_P == 8 ? LLVMInt64Type () : LLVMInt32Type ();
}
Is TARGET_SIZEOF_VOID_P 4 or 8 on armv7k ?

@lewurm
Copy link
Contributor

lewurm commented May 3, 2019

TARGET_SIZEOF_VOID_P should be 4, right?

I think this is wrong: #12992 :(

@lewurm
Copy link
Contributor

lewurm commented May 3, 2019

diff --git a/mono/mini/mini-arm.c b/mono/mini/mini-arm.c
index 82c05db8462..f4ebd7655c6 100644
--- a/mono/mini/mini-arm.c
+++ b/mono/mini/mini-arm.c
@@ -2375,7 +2375,7 @@ mono_arch_get_llvm_call_info (MonoCompile *cfg, MonoMethodSignature *sig)
                         * passing structs with sizes up to 8 bytes in a single register.
                         * On armv7k slotsize=8 boils down to the same generated native
                         * code by LLVM, so it's okay. */
-                       slotsize = 8;
+                       slotsize = 4;
 #else
                        slotsize = eabi_supported && ainfo->align == 8 ? 8 : 4;
 #endif

this fixes it, but I've to think about a proper solution so it doesn't break release mode of arm64_32.

lewurm added a commit to lewurm/mono that referenced this issue May 6, 2019
We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by mono#12992

Fixes mono#13454
vargaz pushed a commit that referenced this issue May 7, 2019
We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by #12992

Fixes #13454
monojenkins pushed a commit to monojenkins/mono that referenced this issue May 7, 2019
We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by mono#12992

Fixes mono#13454
monojenkins pushed a commit to monojenkins/mono that referenced this issue May 7, 2019
We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by mono#12992

Fixes mono#13454
monojenkins added a commit that referenced this issue May 7, 2019
[2019-04] [arm] fix armv7k regression on struct passing

We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by #12992

Fixes #13454


Backport of #14362.

/cc @lewurm
akoeplinger pushed a commit that referenced this issue May 7, 2019
We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by #12992

Fixes #13454
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants