Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchOS apps crash on launch if built with LLVM #13454

Closed
rolfbjarne opened this issue Mar 14, 2019 · 17 comments · Fixed by #14362

Comments

@rolfbjarne
Copy link
Member

@rolfbjarne rolfbjarne commented Mar 14, 2019

Steps to Reproduce

  1. Build xamarin-macios/master
  2. Create new iOS project in VSfM and add a new watchApp project to it.
  3. Select the Release|Device configuration, and run.

Current Behavior

Crash at launch.

Application output isn't very helpful:

[...]
Xamarin.Hosting: Launched after 1/10 attempt(s).
Xamarin.Hosting: Process '548' exited.
Watch application 'com.rolfkvinge.watch-2018-10-launch-crash-test.watchkitapp' terminated.

A lldb session doesn't help much either: https://gist.github.com/rolfbjarne/42efa81e405c86f6eafc278483dadec0.

Expected Behavior

No crash.

On which platforms did you notice this

[x] watchOS (only armv7k, doesn't repro on arm64_32 with the new arm64_32 support).
[ ] macOS
[ ] Linux
[ ] Windows

Version Used:

xamarin-macios: master (xamarin/xamarin-macios@96ae30a)

@marek-safar

This comment has been minimized.

Copy link
Member

@marek-safar marek-safar commented Mar 14, 2019

@lewurm could you please check this

@marek-safar marek-safar added this to the 2019-02 (6.0.xx) milestone Mar 14, 2019
@lewurm

This comment has been minimized.

Copy link
Member

@lewurm lewurm commented Mar 18, 2019

@rolfbjarne provided some more logs.

Crash caught in the debugger: https://gist.github.com/rolfbjarne/e333fb4b7704c3c311a7cf7c631cd864

Rolf added some more logging around the crash site:

the crash occurs just after returning from the call to MethodBase.GetMethodFromHandleNoGenericCheck here:

a[i] = (RuntimeMethodInfo) MethodBase.GetMethodFromHandleNoGenericCheck (mh, refh);

instrumented like this: https://gist.github.com/rolfbjarne/1ca6318058ac8507742e7b8f2df5296f, this is the output:

RuntimeType.GetMethodsByName (): 2
RuntimeType.GetMethodsByName (): 4
RuntimeType.GetMethodsByName (): 5
RuntimeType.GetMethodsByName (): 6
RuntimeType.GetMethodsByName (): 7
RuntimeType.GetMethodsByName (): 8
MethodBase.GetMethodFromHandleNoGenericCheck (,) 1
MethodBase.GetMethodFromHandleNoGenericCheck (System.RuntimeMethodHandle, System.RuntimeTypeHandle)
MethodBase.GetMethodFromHandleNoGenericCheck (,) 1 done
[crash]

lldb session: https://gist.github.com/rolfbjarne/c45adc8c37e9831e7d56b6604e4f3f4d:

  1. there's a Sleep in the instrumented code: https://gist.github.com/rolfbjarne/1ca6318058ac8507742e7b8f2df5296f#file-instrumented-cs-L35, I ctrl-c in that sleep.
  2. I put a breakpoint on the address where execution will return in GetMethodsByName: https://gist.github.com/rolfbjarne/c45adc8c37e9831e7d56b6604e4f3f4d#file-lldb-txt-L73
  3. Continue
  4. When the breakpoint is hit in GetMethodsByName, I repeat re re -a and stepi until eventually the watch decides I'm being too slow and kills me (before the crash location is reached)
  5. At the end I disass the GetMethodsByName method

Here's a different log, where I just stepi until I reach the crash site: https://gist.github.com/rolfbjarne/3e65fc7ee983746bd65303384c2bd1ee#file-lldb-txt-L133

Here are the bitcode files in case someone wants to look at it:
https://www.dropbox.com/sh/h651vqdiu0psy6q/AAChTRvj0Frqfm1LB_CJCcSna?dl=0

I looked into all the logs, but I didn't discover anything suspicious 🙁

As a reference, here is the slack conversation with more context: https://xamarinhq.slack.com/archives/C03CCJNR7/p1552567685389000

@vargaz do you have any idea what that could be?

@marek-safar

This comment has been minimized.

Copy link
Member

@marek-safar marek-safar commented Mar 21, 2019

@vargaz any thoughts on that?

@marek-safar marek-safar changed the title [2018-10] watchOS apps crash on launch if built with LLVM watchOS apps crash on launch if built with LLVM Apr 2, 2019
@marek-safar

This comment has been minimized.

Copy link
Member

@marek-safar marek-safar commented Apr 2, 2019

It might be related to #13006

@lewurm

This comment has been minimized.

Copy link
Member

@lewurm lewurm commented Apr 4, 2019

FWIW log output with MONO_LOG_LEVEL=debug MONO_LOG_MASK=all: https://gist.github.com/rolfbjarne/02c8964cf4dd3ea490ab9c10159afdbc

@chamons

This comment has been minimized.

Copy link
Member

@chamons chamons commented Apr 24, 2019

Friendly reminder/poke that this needs to happen in the d16-2 timeframe, preferably before 2019-02 lands, else we were be breaking watch then.

@lewurm lewurm self-assigned this Apr 25, 2019
@lewurm

This comment has been minimized.

Copy link
Member

@lewurm lewurm commented May 2, 2019

I can reproduce on my Apple Watch Series 3. Additionally, I get this message on the output as well:

critical: ../../../../../mono/eglib/gptrarray.c:88: assertion 'array != NULL' failed
@lewurm

This comment has been minimized.

Copy link
Member

@lewurm lewurm commented May 2, 2019

with some counting tricks, I got this back trace:

(lldb) bt
* thread #1, name = 'tid_303', queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00b5bf08 mini`eglib_debug_bp at gptrarray.c:86:2
    frame #1: 0x00b5bf2c mini`monoeg_g_ptr_array_free(array=0x00000000, free_seg=1) at gptrarray.c:96:3
    frame #2: 0x00a300ea mini`ves_icall_Mono_RuntimeGPtrArrayHandle_GPtrArrayFree(ptr_array=0x00000000, error=0x00dc2394) at icall.c:1550:2
    frame #3: 0x00a4142c mini`ves_icall_Mono_RuntimeGPtrArrayHandle_GPtrArrayFree_raw(a0=0x00000000, error=0x00dc2394) at icall-def.h:155:1
    frame #4: 0x0040749e mini`aot_wrapper___Mono_Mono_dot_RuntimeGPtrArrayHandle__GPtrArrayFree_pinvoke_void_cl24_Mono_2eRuntimeStructs_2fGPtrArray_2a_void_cl24_Mono_2eRuntimeStructs_2fGPtrArray_2a_ + 80
    frame #5: 0x0040753e mini`mscorlib_Mono_RuntimeGPtrArrayHandle_DestroyAndFree_Mono_RuntimeGPtrArrayHandle_ + 100
    frame #6: 0x0040790c mini`mscorlib_Mono_SafeGPtrArrayHandle_Dispose + 60
    frame #7: 0x004b6a82 mini`mscorlib_System_RuntimeType_GetConstructors_internal_System_Reflection_BindingFlags_System_RuntimeType + 806

I need to dechiper the handles code, but wild guess: There was a big commit in the handles code after 2018-10 branch off (so it's in 2019-02/2019-04 only): 44db64c So that might be the root cause.

@vargaz

This comment has been minimized.

Copy link
Member

@vargaz vargaz commented May 2, 2019

It looks like the NULL value comes from managed code. Its supposed to be the return value of
ves_icall_RuntimeType_GetConstructors_native ().

@lambdageek

This comment has been minimized.

Copy link
Member

@lambdageek lambdageek commented May 2, 2019

@lewurm could be this call to GetConstructors_native is returning null

using (var h = new Mono.SafeGPtrArrayHandle (GetConstructors_native (bindingAttr))) {

I notice that the Mono.SafeGPtrArrayHandle always ends up calling GPtrArrayFree even if the underlying pointer is null.

So if it's reasonable for GetConstructors_native to return null, the SafeGPtraArrayHandle should be fixed up, if not, we need to figure out why the constructors icall is returning null

@vargaz

This comment has been minimized.

Copy link
Member

@vargaz vargaz commented May 2, 2019

It can only return null here:
mono_error_set_for_class_failure (error, klass);
return NULL;
If set_for_class_failure () doesn't actually set a failure, or if the wrappers don't convert the error to an exception.

@lewurm

This comment has been minimized.

Copy link
Member

@lewurm lewurm commented May 2, 2019

according to my logging we take this exit

return res_array;

I added this before the return

printf ("\tnormal exit for class: %s\n", mono_type_full_name (&klass->_byval_arg))

and prints

        normal exit for class: System.Collections.Generic.GenericEqualityComparer`1<intptr>
@lewurm

This comment has been minimized.

Copy link
Member

@lewurm lewurm commented May 3, 2019

"nice", I don't get that assertion 'array != NULL' failed message anymore. Now the code segfaults slightly earlier. So it appears we have somewhere code that smashes things on the stack:

Process 230 stopped
* thread #1, name = 'tid_303', queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x0032229c mini`mscorlib_System_RuntimeType_GetConstructors_internal_System_Reflection_BindingFlags_System_RuntimeType + 528
mini`mscorlib_System_RuntimeType_GetConstructors_internal_System_Reflection_BindingFlags_System_RuntimeType:
->  0x32229c <+528>: bl     0x3c1baa                  ; mscorlib_System_Reflection_RuntimeMethodInfo_GetMethodFromHandleNoGenericCheck_System_RuntimeMethodHandle_System_RuntimeTypeHandle
    0x3222a0 <+532>: str    r0, [r7, #-140]
    0x3222a4 <+536>: b      0x3222a6                  ; <+538>
    0x3222a6 <+538>: movs   r0, #0x0
Target 0: (mini) stopped.
(lldb) register read sp r7
      sp = 0x00c2e430
      r7 = 0x00c2e4e8
(lldb) c
Process 230 resuming
Process 230 stopped
* thread #1, name = 'tid_303', queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
    frame #0: 0x003222a0 mini`mscorlib_System_RuntimeType_GetConstructors_internal_System_Reflection_BindingFlags_System_RuntimeType + 532
mini`mscorlib_System_RuntimeType_GetConstructors_internal_System_Reflection_BindingFlags_System_RuntimeType:
->  0x3222a0 <+532>: str    r0, [r7, #-140]
    0x3222a4 <+536>: b      0x3222a6                  ; <+538>
    0x3222a6 <+538>: movs   r0, #0x0
    0x3222a8 <+540>: ldr    r1, [r7, #-140]
Target 0: (mini) stopped.
(lldb) register read sp r7
      sp = 0x00c2e430
      r7 = 0x00000000
(lldb) x/20i 0x3c1baa
    0x3c1baa: 0xb580       push   {r7, lr}
    0x3c1bac: 0x466f       mov    r7, sp
    0x3c1bae: 0xb08a       sub    sp, #0x28

lldb timeouts on the watch prevent me to single step through it 😡
I'm currently starring at https://github.com/mono/mono/pull/11294/files it moved some initialization to wrappers that have been in runtime code before

@lewurm

This comment has been minimized.

Copy link
Member

@lewurm lewurm commented May 3, 2019

This bitcode

 %System.RuntimeTypeHandle = type { i8, i8, i8, i8 }
 %System.RuntimeMethodHandle = type { i8, i8, i8, i8 }
 [...]
define internal i32* @mscorlib_System_Reflection_RuntimeMethodInfo_GetMethodFromHandleNoGenericCheck_System_RuntimeMethodHandle_System_RuntimeTypeHandle([1 x i64] %arg_0,         [1 x i64] %arg_1) #5 {
BB0:
  %0 = alloca %System.RuntimeTypeHandle, align 4
  %1 = alloca %System.RuntimeMethodHandle, align 4
  %2 = alloca double, align 8
  %3 = alloca double, align 8
  %4 = alloca %System.RuntimeTypeHandle, align 4
  %5 = alloca %System.RuntimeMethodHandle, align 4
  %6 = bitcast %System.RuntimeMethodHandle* %1 to [1 x i64]*
  store [1 x i64] %arg_0, [1 x i64]* %6
  %7 = bitcast %System.RuntimeTypeHandle* %0 to [1 x i64]*
  store [1 x i64] %arg_1, [1 x i64]* %7
  br label %INIT_BB1

INIT_BB1:                                         ; preds = %BB0
  %is_inited = load i8, i8* getelementptr inbounds ([10868 x i8], [10868 x i8]* @mono_inited, i32 0, i32 4990)
  %8 = call i8 @llvm.expect.i8(i8 %is_inited, i8 1)

turns into that machine code

_mscorlib_System_Reflection_RuntimeMethodInfo_GetMethodFromHandleNoGenericCheck_System_RuntimeMethodHandle_System_RuntimeTypeHandle:
00151aaa    80 b5   push    {r7, lr}
00151aac    6f 46   mov r7, sp
00151aae    8a b0   sub sp, #0x28
00151ab0    08 90   str r0, [sp, #0x20]
00151ab2    09 91   str r1, [sp, #0x24]
00151ab4    0a 93   str r3, [sp, #0x28]  # <--- this smashes the stack slot where r7 is stored
00151ab6    09 92   str r2, [sp, #0x24]
00151ab8    41 f2 7c 30     movw    r0, #0x137c
00151abc    c0 f2 00 00     movt    r0, _init_method
00151ac0    4e f2 24 31     movw    r1, :lower16:((_mono_inited-1383112)-4)
00151ac4    c0 f2 d3 01     movt    r1, :upper16:((_mono_inited-1383112)-4)
 [...]

So this is the C# code for it:

internal static MethodBase GetMethodFromHandleNoGenericCheck (RuntimeMethodHandle handle, RuntimeTypeHandle reflectedType)
{
return GetMethodFromHandleInternalType_native (handle.Value, reflectedType.Value, false);
}

Why are RuntimeMethodHandle and RuntimeTypeHandle each i64 in the argument list in the LLVM code? As far as I understand they are a struct with an IntPtr in it and the definition in LLVM IR confirms that: It's 4xi8 = i32 which maps to pointer size on armv7k.

@vargaz does that ring a bell?

@vargaz

This comment has been minimized.

Copy link
Member

@vargaz vargaz commented May 3, 2019

I assume they are passed as LLVMArgAsIArgs, which is passed as a set of IntPtr()-s:
static LLVMTypeRef
IntPtrType (void)
{
return TARGET_SIZEOF_VOID_P == 8 ? LLVMInt64Type () : LLVMInt32Type ();
}
Is TARGET_SIZEOF_VOID_P 4 or 8 on armv7k ?

@lewurm

This comment has been minimized.

Copy link
Member

@lewurm lewurm commented May 3, 2019

TARGET_SIZEOF_VOID_P should be 4, right?

I think this is wrong: #12992 :(

@lewurm

This comment has been minimized.

Copy link
Member

@lewurm lewurm commented May 3, 2019

diff --git a/mono/mini/mini-arm.c b/mono/mini/mini-arm.c
index 82c05db8462..f4ebd7655c6 100644
--- a/mono/mini/mini-arm.c
+++ b/mono/mini/mini-arm.c
@@ -2375,7 +2375,7 @@ mono_arch_get_llvm_call_info (MonoCompile *cfg, MonoMethodSignature *sig)
                         * passing structs with sizes up to 8 bytes in a single register.
                         * On armv7k slotsize=8 boils down to the same generated native
                         * code by LLVM, so it's okay. */
-                       slotsize = 8;
+                       slotsize = 4;
 #else
                        slotsize = eabi_supported && ainfo->align == 8 ? 8 : 4;
 #endif

this fixes it, but I've to think about a proper solution so it doesn't break release mode of arm64_32.

lewurm added a commit to lewurm/mono that referenced this issue May 6, 2019
We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by mono#12992

Fixes mono#13454
@vargaz vargaz closed this in #14362 May 7, 2019
vargaz added a commit that referenced this issue May 7, 2019
We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by #12992

Fixes #13454
monojenkins added a commit to monojenkins/mono that referenced this issue May 7, 2019
We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by mono#12992

Fixes mono#13454
monojenkins added a commit to monojenkins/mono that referenced this issue May 7, 2019
We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by mono#12992

Fixes mono#13454
monojenkins added a commit that referenced this issue May 7, 2019
[2019-04] [arm] fix armv7k regression on struct passing

We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by #12992

Fixes #13454


Backport of #14362.

/cc @lewurm
akoeplinger added a commit that referenced this issue May 7, 2019
We can't use `slotsize=8` on `armv7k` as it leads to stack corruption.

Note that `mtouch` already calls the AOT compiler with `--aot=mtriple=arm64_32-ios`, so no changes are required there.

Regression introduced by #12992

Fixes #13454
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.