Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS X: runnable/testabi.d segmentation fault due to unaligned data #1252

Open
smolt opened this issue Jan 16, 2016 · 8 comments
Open

OS X: runnable/testabi.d segmentation fault due to unaligned data #1252

smolt opened this issue Jan 16, 2016 · 8 comments

Comments

@smolt
Copy link
Member

smolt commented Jan 16, 2016

edit: workaround in druntime, waiting on Apple radar 24221680.

This is the crash plaguing the OS X release build on Travis right now (commit d3da239).

First of all to leave no surprises, the problem lies outside of LDC as the same crash can be crafted with C code (I'll show it later). So what to do? Document the problem, suggest a workaround, and file a bug report with Apple as I think it is a linker problem.

This problem comes and goes based on the alignment of data being linked. At one time the crash happened when running the 32-bit OS X release version of testabi.d. Now it has moved over to the 64-bit release. It might only happen when LDC has optimizer enabled, but this is not an optimizer error. With clang, you don't even need -O to get the same problem.

Let's look at a session showing runnable/testabi.d crash.

$ ldmd2 ldc/tests/d2/dmd-testsuite/runnable/testabi.d -O -g
$ ./testabi 
Segmentation fault: 11
$ lldb testabi
(lldb) target create "testabi"
Current executable set to 'testabi' (x86_64).
(lldb) r
Process 46379 launched: '/Users/dan/projects/ldc-master/testabi' (x86_64)
Process 46379 stopped
* thread #1: tid = 0x27d5c1, 0x0000000100004f4d testabi`_D7testabi7D_test1FZv + 11981 at testabi.d:245, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x0000000100004f4d testabi`_D7testabi7D_test1FZv + 11981 at testabi.d:245
   242          }
   243          assert( pass );
   244  
-> 245          results_1[0..5] = 0;
   246          foreach( n, T; R_T )
   247          {
   248                  test1_call_out!(T)(n);
(lldb) disass --line
testabi`_D7testabi7D_test1FZv + 11968 at testabi.d:245
   244  
   245          results_1[0..5] = 0;
   246          foreach( n, T; R_T )
testabi`_D7testabi7D_test1FZv:
    0x100004f40 <+11968>: leaq   0x58c79(%rip), %rdi       ; _D7testabi9results_1G85i
    0x100004f47 <+11975>: callq  *(%rdi)
    0x100004f49 <+11977>: pxor   %xmm0, %xmm0
->  0x100004f4d <+11981>: movdqa %xmm0, (%rax)
(lldb) p/x $rax
(unsigned long) $0 = 0x0000000100801818

What is going on here is that the optimizer chose a nice movdqa instruction to zero the data in results_1 (a static array) and requires 16-byte alignment, yet rax is only 8-bye aligned.

LDC asks for the correct alignment though (verified by peeking at LLVM IR and the assembly code):

.tbss __D7testabi9results_1G85i$tlv$init, 340, 4

The 4 is alignment as a power of 2, so LDC is in the clear. Compiling with -c and running nm on the .o file show a 16-byte aligned relative address for results_1. otool shows __thread_bss section as 16-byte aligned. Yet the final linked executable has result_1 only 8-byte aligned. Sure seems like a linker problem.

What appears to be going on is the linker creates the thread local layout by concatenating the __thread_data and __thread_bss sections. If __thread_data does not have same alignment requirements, then there can be problems. In this case otool shows __thread_data only needs 8-byte alignment.

tls.c is a simple yet specially crafted C program that illustrates all of this:

#include <stdio.h>
#include <string.h>

char b = 42;             // data section
__thread char tb = 42;   // __thread_data
__thread char zb16[16];  // __thread_bss, clang asks for 16-byte alignment

int main()
{
    printf("%p\n", zb16);
    memset(zb16, 0, sizeof(zb16));  // clang turns into movaps
}

Proof that zb16 is 16-byte aligned by clang:

$ clang tls_fail.c -S -emit-llvm
$ grep 'zb16.*align' tls_fail.ll
@zb16 = thread_local global [16 x i8] zeroinitializer, align 16

Then running tls_fail has same segmentation fault ending. -O not even needed.

$ clang tls_fail.c -g -o tls_fail
$ ./tls_fail
0x7f9b59c04bf7
Segmentation fault: 11
$ lldb tls_fail
(lldb) target create "tls_fail"
Current executable set to 'tls_fail' (x86_64).
(lldb) r
Process 46530 launched: '/Users/dan/projects/ldc-master/tls_fail' (x86_64)
0x100104bf7
Process 46530 stopped
* thread #1: tid = 0x27fa55, 0x0000000100000f6d tls_fail`main + 77 at tls_fail.c:11, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x0000000100000f6d tls_fail`main + 77 at tls_fail.c:11
   8    int main()
   9    {
   10       printf("%p\n", zb16);
-> 11       memset(zb16, 0, sizeof(zb16));  // clang turns into movaps
   12   }
(lldb) disass --line
tls_fail`main + 59 at tls_fail.c:11
   10       printf("%p\n", zb16);
   11       memset(zb16, 0, sizeof(zb16));  // clang turns into movaps
   12   }
tls_fail`main:
    0x100000f5b <+59>: leaq   0xcf(%rip), %rdi          ; zb16
    0x100000f62 <+66>: movl   %eax, -0x14(%rbp)
    0x100000f65 <+69>: movl   %ecx, -0x18(%rbp)
    0x100000f68 <+72>: callq  *(%rdi)
    0x100000f6a <+74>: xorps  %xmm0, %xmm0
->  0x100000f6d <+77>: movaps %xmm0, (%rax)
(lldb) p/x $rax
(unsigned long) $0 = 0x0000000100104bf7

Dumping the symbol addresses with nm shows how the variables are properly aligned as a whole.

$ nm -n tls_fail
                 U __tlv_bootstrap
                 U _printf
                 U dyld_stub_binder
0000000100000000 T __mh_execute_header
0000000100000f20 T _main
0000000100001018 D _b
0000000100001019 S _tb
0000000100001031 S _zb16
0000000100001049 s _tb$tlv$init
0000000100001050 s _zb16$tlv$init

The $tlv$init symbols are the template for each thread local. Here _zb16$tlv$init is aligned to 16-bytes. But when the thread local data layout is created for _tb$tlv$init (__thread_data) and _zb16$tlv$init (__thread_bss), it is offset from beginning of __thread_data 0x1049 meaning zb16 0x1050 - 0x1049 = 7 offset (see printf of 0x100104bf7 above).

Workaround? It seems having a maximally aligned member in __thread_data might do the trick. Maybe we can just have LDC emit a dummy member in the meantime.

#include <stdio.h>
#include <string.h>

char b = 42;             // data section
__thread char tb = 42;   // __thread_data
__thread char dummy __attribute((aligned(16))) = 1;  // force __thread_data to be 16-byte aligned
__thread char zb16[16];  // __thread_bss, clang asks for 16-byte alignment

int main()
{
    printf("%p\n", zb16);
    memset(zb16, 0, sizeof(zb16));  // clang turns into movaps
}
@smolt
Copy link
Member Author

smolt commented Jan 16, 2016

One workaround is using the existing dummyTlsSymbol in rt.sections_ldc. Just add lots of alignment and initialize so it ends up in __thread_data section:

private align(16) ubyte dummyTlsSymbol = 42;

runnable/testabi.d now passes.

@kinke
Copy link
Member

kinke commented Jan 16, 2016

Very nice and detailed analysis, thx for investigating, it was a nice read! :)

@dnadlinger
Copy link
Member

@smolt: Could you please do the change to dummyTlsSymbol (with a short comment and reference to this issue)?

@smolt
Copy link
Member Author

smolt commented Jan 17, 2016

Yes, it was fortunate that we already had a dummy variable for this purpose.

Are there any x86 instructions that require greater that 16-byte alignment?

smolt added a commit to smolt/druntime that referenced this issue Jan 17, 2016
Align and initialize dummyTlsSymbol to force __thread_data section
alignment to 16-bytes.  This is needed to obey __thread_bss section
alignments.
@smolt
Copy link
Member Author

smolt commented Jan 17, 2016

Filed 24221680 with Apple.

smolt added a commit to ldc-developers/druntime that referenced this issue Jan 17, 2016
smolt added a commit to smolt/ldc that referenced this issue Jan 17, 2016
Workaround issue ldc-developers#1252 runnable/testabi.d segmentation fault on OS X.
redstar added a commit to redstar/ldc that referenced this issue Jan 30, 2016
@smolt
Copy link
Member Author

smolt commented Apr 12, 2016

Question to all: should this issue be closed because there is a workaround in place or should it stay open because there is an outstanding Apple radar. It would be nice to eventually remove the workaround, as small as it is, but that may be a long time. Perhaps an appropriate label so this Issue can be ignored in queries of what to work on next?

@kinke
Copy link
Member

kinke commented Jun 18, 2016

Upstream = Apple in this case ;)

@smolt
Copy link
Member Author

smolt commented Oct 23, 2016

Apple fixed bug 24221680 as of Xcode 8.0. The workaround in druntime could be removed at some future date when Xcode 7 has disappeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants