Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code generated with --fdmd-trace-functions segfaults when threads are used #3447

Open
AndrejMitrovic opened this issue May 27, 2020 · 11 comments

Comments

@AndrejMitrovic
Copy link
Contributor

~/dev/d master * $ ldc2 --version                                                                                                                                              130 ↵
LDC - the LLVM D compiler (1.21.0):
  based on DMD v2.091.1 and LLVM 10.0.0
  built with LDC - the LLVM D compiler (1.21.0)
  Default target: x86_64-apple-darwin18.6.0
  Host CPU: skylake
  http://dlang.org - http://wiki.dlang.org/LDC

Test-case:

import std.parallelism;

void main ()
{
    foreach (i; parallel([1, 2]))
    {
    }
}

Run:

$ ldc2 --fdmd-trace-functions test.d
$ ./test
[1]    27046 illegal hardware instruction  ./test

Here's LLDB:

(lldb) r
Process 27066 launched: '/Users/andrejmitrovic/dev/d/test' (x86_64)
Process 27066 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x000000010016d31c test`_D2rt8monitor_9lockMutexFNbNiPS4core3sys5posixQk5types15pthread_mutex_tZv + 12
test`_D2rt8monitor_9lockMutexFNbNiPS4core3sys5posixQk5types15pthread_mutex_tZv:
->  0x10016d31c <+12>: ud2
    0x10016d31e <+14>: nop

test`_d_monitorexit:
    0x10016d320 <+0>:  push   rax
    0x10016d321 <+1>:  add    rdi, 0x8
Target 0: (test) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x000000010016d31c test`_D2rt8monitor_9lockMutexFNbNiPS4core3sys5posixQk5types15pthread_mutex_tZv + 12
    frame #1: 0x0000000100164b07 test`_d_criticalenter + 39
    frame #2: 0x000000010016f467 test`trace_init + 23
    frame #3: 0x00000001001704a3 test`trace_pro + 51
    frame #4: 0x0000000100002436 test`_D4core6atomic__T8atomicOpVAyaa2_2b3dTmTiZQzFNaNbNiNeKOmiZm + 38
    frame #5: 0x0000000100164cf6 test`rt_init + 22
    frame #6: 0x00000001001652cd test`_D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv + 13
    frame #7: 0x000000010016511f test`_d_run_main2 + 399
    frame #8: 0x0000000100164f7d test`_d_run_main + 141
    frame #9: 0x0000000100001c45 test`main + 37
    frame #10: 0x00007fff77e4b3d5 libdyld.dylib`start + 1
    frame #11: 0x00007fff77e4b3d5 libdyld.dylib`start + 1
@AndrejMitrovic
Copy link
Contributor Author

Could this be related? https://github.com/dlang/dmd/blob/2fea187ca09988e493759c538b8aeeb139aad6e2/src/dmd/glue.d#L1097-L1098

Because this code sample doesn't segfault with DMD, only with LDC.

@kinke
Copy link
Member

kinke commented May 27, 2020

This is not a segfault, but some assert(0) hit in druntime by the looks of it. - The comment about TLS seems totally unrelated to the DMD code. LDC implements this in the same way wrt. the trace_pro and _c_trace_epi calls.

@kinke
Copy link
Member

kinke commented May 27, 2020

From your call stack, https://github.com/dlang/druntime/blob/69d120f5ef5a3176963503496ffaef6fd59b3fd4/src/rt/dmain2.d#L183 seems to be a problem, as that atomicOp is already instrumented, but the code apparently relies on _d_critical_init() a few lines below to be invoked before any of that (because trace_init uses a synchronized block).

@AndrejMitrovic
Copy link
Contributor Author

Yep sorry for the misnomer in the title.

@kinke
Copy link
Member

kinke commented May 27, 2020

So it looks like the linker prefers an (instrumented) atomicOp instantiation from your code, not one from (uninstrumented) druntime. This might be due to COMDATs breaking the one-definition-rule (dropped with LDC v1.22 for ELF targets). The call should probably be inlined as well, at least for a release druntime; that might be due to missing pragma(inline, true) and/or require #3442 first.

@kinke
Copy link
Member

kinke commented May 27, 2020

core.atomic and core.internal.atomic use pragma(inline, true) for all functions, so I'd think that this would be fixed with #3442. I'll probably check it after work.

@kinke
Copy link
Member

kinke commented May 27, 2020

Works on Linux x64 with v1.21 according to run.dlang.io, so I won't be able to test this myself.

@AndrejMitrovic
Copy link
Contributor Author

AndrejMitrovic commented May 28, 2020

I'll try it myself once I set up building LDC on the Mac. Thanks for all the pointers!

@kinke
Copy link
Member

kinke commented May 28, 2020

You don't need to build LDC yourself, you can just download the macOS artifact from https://dev.azure.com/ldc-developers/ldc/_build/results?buildId=1700&view=artifacts&type=publishedArtifacts (the latest CI run for #3442).

@AndrejMitrovic
Copy link
Contributor Author

It worked!

@AndrejMitrovic
Copy link
Contributor Author

Well, it worked for the simple test-case, but in a more elaborate one it failed. But I haven't reduced it for the second case yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants