Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try/catch not work correctly with i686 dwarf exceptions #8

Closed
SquallATF opened this issue Oct 1, 2018 · 35 comments
Closed

try/catch not work correctly with i686 dwarf exceptions #8

SquallATF opened this issue Oct 1, 2018 · 35 comments

Comments

@SquallATF
Copy link

#include <iostream>

int main() {
  try {
    std::locale loc("test");
  } catch (std::runtime_error& e) {
    std::cerr << "Caught " << e.what() << std::endl;
    std::cerr << "Type " << typeid(e).name() << std::endl;
  } catch (...) {
    std::cerr << "catch all" << std::endl;
  }

  return 0;
}

sismple test program
with static libcxx result

Caught collate_byname<char>::collate_byname failed to construct for test
Type St13runtime_error

with shared libcxx result

terminating with uncaught exception of type std::runtime_error: collate_byname<char>::collate_byname failed to construct for test
@mstorsjo
Copy link
Owner

mstorsjo commented Oct 1, 2018

Thanks, I'll have a look into this. Is this on i686 or x86_64?

@SquallATF
Copy link
Author

x86_64 with SEH
i did not test i686 shared
i686 static with dwarf have same issue

@mstorsjo
Copy link
Owner

mstorsjo commented Oct 1, 2018

Ok, I'll look into it. I do have regular testing with a simple test program in https://github.com/mstorsjo/llvm-mingw/blob/master/test/hello-exception.cpp, but either that's too simple or there's something else going on. Will look into it.

@mstorsjo
Copy link
Owner

mstorsjo commented Oct 1, 2018

I can't reproduce any such issue with x86_64 with SEH, it all runs fine - I tested both with and without optimization and with or without -static. Can you provide more details on how to reproduce it there, with the exact command for compiling it, and the actual compiled exe?

With i686 with dwarf I can reproduce the issue though, both with shared and static libcxx.

By switching i686 to SJLJ exceptions (adding -fsjlj-exceptions in clang-target-wrapper.sh, then rebuilding at least libcxx/libcxxabi/libunwind) it runs fine though. I'll see if I can find any reason for the i686 dwarf issue.

@SquallATF
Copy link
Author

SquallATF commented Oct 1, 2018

I tried again and found that the error was random. some times work correctly, some times not. with -static allways work correctly
exceptiontest
hello-exception.cpp gdb display
image
backtrace

#0  0x00000000008886ec in std::__1::basic_ostream<char, std::__1::char_traits<char> >::sentry::sentry(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) () from E:\Work\cpptest\libc++.dll
#1  0x0000000140003a84 in std::__1::__put_character_sequence<char, std::__1::char_traits<char> > (__os=..., __str=0x140005037 "Hello world C++", __len=15)
    at E:\llvm7\x86_64-w64-mingw32\include\c++\v1/ostream:721
#2  0x0000000140004051 in std::__1::operator<< <std::__1::char_traits<char> >
    (__os=..., __str=0x140005037 "Hello world C++")
    at E:\llvm7\x86_64-w64-mingw32\include\c++\v1/ostream:864
#3  0x000000014000168e in main (argc=1, argv=0x521f20)
    at hello-exception.cpp:64

backtrace after comment cout line

#0  0x0000000000832732 in __cxxabiv1::__class_type_info::can_catch(__cxxabiv1::__shim_type_info const*, void*&) const () from E:\Work\cpptest\libc++.dll
#1  0x0000000000835140 in __cxxabiv1::scan_eh_tab(__cxxabiv1::(anonymous namespace)::scan_results&, _Unwind_Action, bool, _Unwind_Exception*, _Unwind_Context*) () from E:\Work\cpptest\libc++.dll
#2  0x00000000008344bb in __cxxabiv1::__gxx_personality_imp(int, _Unwind_Action, unsigned long long, _Unwind_Exception*, _Unwind_Context*) ()
   from E:\Work\cpptest\libc++.dll
#3  0x0000000180002119 in _GCC_specific_handler ()
   from E:\Work\cpptest\libunwind.dll
#4  0x00000000008343d5 in __gxx_personality_seh0 ()
   from E:\Work\cpptest\libc++.dll
#5  0x00007ffa9f3bed2d in ntdll!.chkstk () from C:\WINDOWS\SYSTEM32\ntdll.dll
#6  0x00007ffa9f326c86 in ntdll!RtlWalkFrameChain ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#7  0x00007ffa9f3bdc5e in ntdll!KiUserExceptionDispatcher ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#8  0x00007ffa9bd0a388 in RaiseException ()
   from C:\WINDOWS\System32\KernelBase.dll
#9  0x000000018000254a in _Unwind_RaiseException ()
   from E:\Work\cpptest\libunwind.dll
#10 0x0000000000833d95 in __cxa_throw () from E:\Work\cpptest\libc++.dll

@mstorsjo
Copy link
Owner

mstorsjo commented Oct 1, 2018

That's strange... I haven't been able to trigger the error with x86_64/SEH once yet. Is it deterministic so that it always works when linking statically and sometimes works/sometimes fails when linking shared? Or does it seem like that's unrelated?

If you have an exe which shows this behaviour, can you share it so I can try if it happens for me as well? Is it built just with x86_64-w64-mingw32-g++ exception-locale.cpp -o exception-locale.exe like I do currently, or something else?

@mstorsjo
Copy link
Owner

mstorsjo commented Oct 1, 2018

Are you ok with me including this example in my set of regression tests, like the other ones? What should I write as author for the copyright line?

@SquallATF
Copy link
Author

I get g++ from x86_64-8.1.0-release-win32-seh-rt_v6-rev0.7z, build hello-exception.cpp and work allways currently.
I put my clang build to a windows 7 virtual machine, it work allways currently too.
here is my build hello-exception.zip
My work env is windows 10 10.0.17134.285

@SquallATF
Copy link
Author

Are you ok with me including this example in my set of regression tests, like the other ones? What should I write as author for the copyright line?

It's ok,put my nickname is ok.

@mstorsjo
Copy link
Owner

mstorsjo commented Oct 1, 2018

Ok, I see that your build often fails, and I see the issues you describe if you run it in gdb.

Attached is my current build of it (from latest master branch, built within a clean docker container using Dockerfile.dev), and this one seems to run fine for me, both normally and in gdb.

hello-exception.zip

mstorsjo added a commit that referenced this issue Oct 1, 2018
This reverts commit ca43ab5.

It was reported in #8 that
the dwarf unwinding on i686 doesn't always work.

This commit adds a testcase for the case that didn't work previously.
@SquallATF
Copy link
Author

Yes, your build always fine. I will try to find out why my build have issue.

@SquallATF
Copy link
Author

I tried rebuild trunk version, does not have this issue. I confirm that it's my own 7.0 backport version‘s bug.

@mstorsjo
Copy link
Owner

mstorsjo commented Oct 2, 2018

Ok, thanks for confirming!

I would have expected the 7.0 branch to work for SEH though (for llvm/clang/lld), except for libunwind where the SEH support was added a couple weeks after the 7.0 release was branched. (Originally I used libgcc for SEH exceptions, see 6bff543.)

As I changed i686 to sjlj exceptions, I guess the rest of this issue is fixed then?

@SquallATF
Copy link
Author

I used latest libunwind on 7.0, I backport multiple trunk patches to my own 7.0 version, may be some thing wrong with my backport

@SquallATF SquallATF changed the title try/cache not work correctly with shared build libcxx try/cache not work correctly with i686 dwarf exceptions Oct 16, 2018
@SquallATF
Copy link
Author

I tried debug i686 dwarf exceptions, and build debug version libc++.dll and libunwind.dll, but debug version work correctly. I set LIBUNWIND_PRINT_UNWINDING=1 print unwind debug log.

release libc++.dll with debug libunwind.dll

libunwind: unwind_phase1(ex_ojb=0047D570): pc=0x1003be38, start_ip=0x1003bde0, func=.anonymous., lsda=0x0, personality=0x0
libunwind: unwind_phase1(ex_ojb=0047D570): pc=0x1000fee8, start_ip=0x1000fec0, func=.anonymous., lsda=0x100ef8bc, personality=0x1003c3e0
libunwind: unwind_phase1(ex_ojb=0047D570): calling personality function 1003C3E0
libunwind: unwind_phase1(ex_ojb=0047D570): _URC_CONTINUE_UNWIND
libunwind: unwind_phase1(ex_ojb=0047D570): pc=0x1000ff85, start_ip=0x1000ff00, func=.anonymous., lsda=0x100ef8cc, personality=0x1003c3e0
libunwind: unwind_phase1(ex_ojb=0047D570): calling personality function 1003C3E0
libunwind: unwind_phase1(ex_ojb=0047D570): _URC_CONTINUE_UNWIND
libunwind: unwind_phase1(ex_ojb=0047D570): pc=0x1000d5f6, start_ip=0x1000d500, func=.anonymous., lsda=0x100ef134, personality=0x1003c3e0
libunwind: unwind_phase1(ex_ojb=0047D570): calling personality function 1003C3E0
libunwind: unwind_phase1(ex_ojb=0047D570): _URC_HANDLER_FOUND
libunwind: unwind_phase2(ex_ojb=0047D570)
libunwind: unwind_phase2(ex_ojb=0047D570): start_ip=0x1003bde0, func=.anonymous., sp=0x19fdb8, lsda=0x0, personality=0x0
libunwind: unwind_phase2(ex_ojb=0047D570): start_ip=0x1000fec0, func=.anonymous., sp=0x19fdd0, lsda=0x100ef8bc, personality=0x1003c3e0
libunwind: unwind_phase2(ex_ojb=0047D570): _URC_CONTINUE_UNWIND
libunwind: unwind_phase2(ex_ojb=0047D570): start_ip=0x1000ff00, func=.anonymous., sp=0x19fde8, lsda=0x100ef8cc, personality=0x1003c3e0
libunwind: unwind_phase2(ex_ojb=0047D570): _URC_INSTALL_CONTEXT
libunwind: unwind_phase2(ex_ojb=0047D570): re-entering user code with ip=0x1000ff85, sp=0x19fde8
libunwind: unwind_phase2(ex_ojb=0047D570)
libunwind: unwind_phase2(ex_ojb=0047D570): start_ip=0x1000ff00, func=.anonymous., sp=0x19fde4, lsda=0x100ef8cc, personality=0x1003c3e0
libunwind: unwind_phase2(ex_ojb=0047D570): _URC_CONTINUE_UNWIND
libunwind: unwind_phase2(ex_ojb=0047D570): start_ip=0x1000d500, func=.anonymous., sp=0x19fe08, lsda=0x100ef134, personality=0x1003c3e0
libunwind: unwind_phase2(ex_ojb=0047D570): _URC_INSTALL_CONTEXT
libunwind: unwind_phase2(ex_ojb=0047D570): re-entering user code with ip=0x1000de5e, sp=0x19fe10
libunwind: unwind_phase1(ex_ojb=0047D570): pc=0x1003c0f0, start_ip=0x1003c0a0, func=.anonymous., lsda=0x100f11c4, personality=0x1003c3e0
libunwind: unwind_phase1(ex_ojb=0047D570): calling personality function 1003C3E0
libunwind: unwind_phase1(ex_ojb=0047D570): _URC_CONTINUE_UNWIND
libunwind: unwind_phase1(ex_ojb=0047D570): pc=0x1000decf, start_ip=0x1000d500, func=.anonymous., lsda=0x100ef134, personality=0x1003c3e0
libunwind: unwind_phase1(ex_ojb=0047D570): calling personality function 1003C3E0
libunwind: unwind_phase1(ex_ojb=0047D570): _URC_CONTINUE_UNWIND
libunwind: unwind_phase1(ex_ojb=0047D570): unw_step() reached bottom => _URC_END_OF_STACK
terminating with uncaught exception of type std::runtime_error: collate_byname<char>::collate_byname failed to construct for test

debug libc++.dll with debug libunwind.dll

libunwind: unwind_phase1(ex_ojb=00629F90): pc=0x1006d393, start_ip=0x1006d300, func=.anonymous., lsda=0x0, personality=0x0
libunwind: unwind_phase1(ex_ojb=00629F90): pc=0x100223d3, start_ip=0x10022380, func=.anonymous., lsda=0x101a027c, personality=0x1006de00
libunwind: unwind_phase1(ex_ojb=00629F90): calling personality function 1006DE00
libunwind: unwind_phase1(ex_ojb=00629F90): _URC_CONTINUE_UNWIND
libunwind: unwind_phase1(ex_ojb=00629F90): pc=0x10022651, start_ip=0x100223f0, func=.anonymous., lsda=0x101a028c, personality=0x1006de00
libunwind: unwind_phase1(ex_ojb=00629F90): calling personality function 1006DE00
libunwind: unwind_phase1(ex_ojb=00629F90): _URC_CONTINUE_UNWIND
libunwind: unwind_phase1(ex_ojb=00629F90): pc=0x10018aee, start_ip=0x10018800, func=.anonymous., lsda=0x1019fe80, personality=0x1006de00
libunwind: unwind_phase1(ex_ojb=00629F90): calling personality function 1006DE00
libunwind: unwind_phase1(ex_ojb=00629F90): _URC_HANDLER_FOUND
libunwind: unwind_phase2(ex_ojb=00629F90)
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x1006d300, func=.anonymous., sp=0x19f0fc, lsda=0x0, personality=0x0
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x10022380, func=.anonymous., sp=0x19f120, lsda=0x101a027c, personality=0x1006de00
libunwind: unwind_phase2(ex_ojb=00629F90): _URC_CONTINUE_UNWIND
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x100223f0, func=.anonymous., sp=0x19f148, lsda=0x101a028c, personality=0x1006de00
libunwind: unwind_phase2(ex_ojb=00629F90): _URC_INSTALL_CONTEXT
libunwind: unwind_phase2(ex_ojb=00629F90): re-entering user code with ip=0x10022664, sp=0x19f148
libunwind: unwind_phase2(ex_ojb=00629F90)
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x100223f0, func=.anonymous., sp=0x19f148, lsda=0x101a028c, personality=0x1006de00
libunwind: unwind_phase2(ex_ojb=00629F90): _URC_CONTINUE_UNWIND
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x10018800, func=.anonymous., sp=0x19f22c, lsda=0x1019fe80, personality=0x1006de00
libunwind: unwind_phase2(ex_ojb=00629F90): _URC_INSTALL_CONTEXT
libunwind: unwind_phase2(ex_ojb=00629F90): re-entering user code with ip=0x1001b953, sp=0x19f22c
libunwind: unwind_phase1(ex_ojb=00629F90): pc=0x1006d9d9, start_ip=0x1006d960, func=.anonymous., lsda=0x0, personality=0x0
libunwind: unwind_phase1(ex_ojb=00629F90): pc=0x1001bcfc, start_ip=0x10018800, func=.anonymous., lsda=0x1019fe80, personality=0x1006de00
libunwind: unwind_phase1(ex_ojb=00629F90): calling personality function 1006DE00
libunwind: unwind_phase1(ex_ojb=00629F90): _URC_CONTINUE_UNWIND
libunwind: unwind_phase1(ex_ojb=00629F90): pc=0x10020de9, start_ip=0x10020d20, func=.anonymous., lsda=0x101a017c, personality=0x1006de00
libunwind: unwind_phase1(ex_ojb=00629F90): calling personality function 1006DE00
libunwind: unwind_phase1(ex_ojb=00629F90): _URC_CONTINUE_UNWIND
libunwind: unwind_phase1(ex_ojb=00629F90): pc=0x401435, start_ip=0x401410, func=.anonymous., lsda=0x407000, personality=0x402e38
libunwind: unwind_phase1(ex_ojb=00629F90): calling personality function 00402E38
libunwind: unwind_phase1(ex_ojb=00629F90): _URC_HANDLER_FOUND
libunwind: unwind_phase2(ex_ojb=00629F90)
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x1006d960, func=.anonymous., sp=0x19f20c, lsda=0x0, personality=0x0
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x10018800, func=.anonymous., sp=0x19f22c, lsda=0x1019fe80, personality=0x1006de00
libunwind: unwind_phase2(ex_ojb=00629F90): _URC_INSTALL_CONTEXT
libunwind: unwind_phase2(ex_ojb=00629F90): re-entering user code with ip=0x1001bcc8, sp=0x19f22c
libunwind: unwind_phase2(ex_ojb=00629F90)
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x10018800, func=.anonymous., sp=0x19f22c, lsda=0x1019fe80, personality=0x1006de00
libunwind: unwind_phase2(ex_ojb=00629F90): _URC_CONTINUE_UNWIND
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x10020d20, func=.anonymous., sp=0x19fdfc, lsda=0x101a017c, personality=0x1006de00
libunwind: unwind_phase2(ex_ojb=00629F90): _URC_INSTALL_CONTEXT
libunwind: unwind_phase2(ex_ojb=00629F90): re-entering user code with ip=0x10020ea6, sp=0x19fdfc
libunwind: unwind_phase2(ex_ojb=00629F90)
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x10020d20, func=.anonymous., sp=0x19fdfc, lsda=0x101a017c, personality=0x1006de00
libunwind: unwind_phase2(ex_ojb=00629F90): _URC_CONTINUE_UNWIND
libunwind: unwind_phase2(ex_ojb=00629F90): start_ip=0x401410, func=.anonymous.,sp=0x19fe84, lsda=0x407000, personality=0x402e38
libunwind: unwind_phase2(ex_ojb=00629F90): _URC_INSTALL_CONTEXT
libunwind: unwind_phase2(ex_ojb=00629F90): re-entering user code with ip=0x40144b, sp=0x19fe84
Caught collate_byname<char>::collate_byname failed to construct for test
Type St13runtime_error

@mstorsjo
Copy link
Owner

Yes, changing debug vs release mode will change things drastically. DWARF for i686 should generally work, more or less.

My hypothesis is that this is related to DW_CFA_GNU_args_size. This is an opcode which is generated very seldomly, but seems to be generated mostly in cases with a 4 byte aligned stack. There were another issue with this opcode before, which was discussed and later fixed in https://reviews.llvm.org/D38680. (Before this, hello-exception.cpp didn't work in DWARF mode.)

As an example, have a look at the output from this:

$ i686-w64-mingw32-g++ hello-exception.cpp -S -o - -fdwarf-exceptions
...
__Z7recursei:                           # @_Z7recursei
Lfunc_begin1:
        .cfi_startproc
        .cfi_personality 0, ___gxx_personality_v0
        .cfi_lsda 0, Lexception1
# %bb.0:
        pushl   %ebp
        .cfi_def_cfa_offset 8
        .cfi_offset %ebp, -8
        movl    %esp, %ebp
        .cfi_def_cfa_register %ebp
        subl    $52, %esp
        movl    8(%ebp), %eax
        movl    8(%ebp), %ecx
        leal    -8(%ebp), %edx

The .cfi... directives produce DWARF opcodes that describe the function and explain where and how to find things in order to be able to unwind the call stack.

Now if I build the same example with -O2:

$ i686-w64-mingw32-g++ hello-exception.cpp -S -o - -fdwarf-exceptions -O2
...
__Z7recursei:                           # @_Z7recursei
Lfunc_begin0:
        .cfi_startproc
        .cfi_personality 0, ___gxx_personality_v0
        .cfi_lsda 0, Lexception0
# %bb.0:
        pushl   %edi
        .cfi_def_cfa_offset 8
        pushl   %esi
        .cfi_def_cfa_offset 12
        .cfi_offset %esi, -12
        .cfi_offset %edi, -8
        movl    12(%esp), %edi
        .cfi_escape 0x2e, 0x08
        pushl   %edi
        .cfi_adjust_cfa_offset 4
        pushl   $L_.str.5
        .cfi_adjust_cfa_offset 4
        calll   _printf
        addl    $8, %esp

In this case, the compiler didn't store the old stack pointer in ebp, but instead continuously keeps track of what the offset is to the return pointer on the stack. The .cfi_escape 0x2e, 0x08 directive is what produces a DW_CFA_GNU_args_size opcode. I'm not sure if that is what actually causes the issues in this case, or something else.

To debug the issue, you'll need to trace through the exact generated code for all functions between the throw and the catch, analyze what libunwind does when it is stepping through the functions, and at what point it gets out of sync with the real frames.

From your copypasted response, for the case when it fails, it looks like it is correctly unwinding functions for a while, but then it finally declares unw_step() reached bottom => _URC_END_OF_STACK. This looks like the place where libunwind misinterpreted the call stack (most probably read the return address from the wrong place on the stack), and instead of continuing on to the next frame thought that it had finished.

The backtrace doesn't contain function names, so it's a bit hard to debug from there. Pass -Wl,-Map,map.txt to the linker when linking libc++.dll and libunwind.dll (maybe even want to pass -Wl,--image-base,0x<unique> for libc++.dll and libunwind.dll, to get them targeted at different, unique addresses, to make sure they actually are loaded at the intended address, to make the map file match). Or maybe you might want to test doing this with static linking instead, which might be easier (easier to make a single map file, which should be correct).

@mstorsjo
Copy link
Owner

mstorsjo commented Dec 3, 2018

Just for the record; I tried debugging this a little bit, by hooking up libgcc to handle the exceptions instead of libunwind, and the same issue persists just as before. Therefore, I would say that the bug is in the unwind data produced by Clang, not in libunwind this time.

@SquallATF
Copy link
Author

Just for the record; I tried debugging this a little bit, by hooking up libgcc to handle the exceptions instead of libunwind, and the same issue persists just as before. Therefore, I would say that the bug is in the unwind data produced by Clang, not in libunwind this time.

I think you are right,I tried use g++ build exception test and link to libunwind, the test cases run correctly.

@mstorsjo
Copy link
Owner

mstorsjo commented Dec 3, 2018

Just for the record; I tried debugging this a little bit, by hooking up libgcc to handle the exceptions instead of libunwind, and the same issue persists just as before. Therefore, I would say that the bug is in the unwind data produced by Clang, not in libunwind this time.

I think you are right,I tried use g++ build exception test and link to libunwind, the test cases run correctly.

That's a good datapoint, but wouldn't be a definite proof either way.

The compiler has got great freedom to generate code in lots of different ways, all which produce the right result when you run it, and it only has to produce unwind info that matches the code that it generated. So any other compiler which happens to generate code differently, not using whatever unwind construct that happens to be broken/wrong, would run fine. And any option to clang to adjust the generated code could also happen to avoid the bug.

But in this case, neither libunwind nor libgcc can unwind correctly with the unwind data produced by clang/llvm, so it's pretty definite that the unwind info itself is broken.

@SquallATF
Copy link
Author

Just for the record; I tried debugging this a little bit, by hooking up libgcc to handle the exceptions instead of libunwind, and the same issue persists just as before. Therefore, I would say that the bug is in the unwind data produced by Clang, not in libunwind this time.

I think you are right,I tried use g++ build exception test and link to libunwind, the test cases run correctly.

That's a good datapoint, but wouldn't be a definite proof either way.

The compiler has got great freedom to generate code in lots of different ways, all which produce the right result when you run it, and it only has to produce unwind info that matches the code that it generated. So any other compiler which happens to generate code differently, not using whatever unwind construct that happens to be broken/wrong, would run fine. And any option to clang to adjust the generated code could also happen to avoid the bug.

But in this case, neither libunwind nor libgcc can unwind correctly with the unwind data produced by clang/llvm, so it's pretty definite that the unwind info itself is broken.

Sorry, I have made a wrong test.
first test use g++ testexception.cpp -lunwind only 3 function linked to libunwind, the libstdc++ still linked to gcc_s.
then I tried build libc++ and libc++abi with g++ and test with

g++ -g -nostdinc++ testexception.cpp -IK:\mingw\x86_64-w64-mingw32\include\c++\v1 -LK:\mingw\x86_64-w64-mingw32\lib  -lunwind  -nodefaultlibs -lc++ -lc++abi  -lmingw32 -lgcc -liconv -lmoldname -lmingwex -lmsvcrt -ladvapi32 -lshell32 -luser32 -lkernel32 -v -lunwind  -lmingwex -lmingw32 -lpthread -lgcc_s -lpsapi

and the result still crash, out put print

terminating with uncaught exception of type std::runtime_error: collate_byname<char>::collate_byname failed to construct for test

if link without libunwind

g++ -g -nostdinc++ testexception.cpp -IK:\mingw\x86_64-w64-mingw32\include\c++\v1 -LK:\mingw\x86_64-w64-mingw32\lib  -nodefaultlibs -lc++ -lc++abi  -lmingw32 -lgcc -liconv -lmoldname -lmingwex -lmsvcrt -ladvapi32 -lshell32 -luser32 -lkernel32  -lmingwex -lmingw32 -lpthread -lgcc_s

print

Caught collate_byname<char>::collate_byname failed to construct for test
Type St13runtime_error

and then crash, gdb said crash at std::runtime_error::~runtime_error. I think gcc_s unwind work more correctly than libunwind. or libunwind is not compatible with gcc

@mstorsjo
Copy link
Owner

mstorsjo commented Dec 3, 2018

first test use g++ testexception.cpp -lunwind only 3 function linked to libunwind, the libstdc++ still linked to gcc_s.
then I tried build libc++ and libc++abi with g++ and test with

g++ -g -nostdinc++ testexception.cpp -IK:\mingw\x86_64-w64-mingw32\include\c++\v1 -LK:\mingw\x86_64-w64-mingw32\lib  -lunwind  -nodefaultlibs -lc++ -lc++abi  -lmingw32 -lgcc -liconv -lmoldname -lmingwex -lmsvcrt -ladvapi32 -lshell32 -luser32 -lkernel32 -v -lunwind  -lmingwex -lmingw32 -lpthread -lgcc_s -lpsapi

and the result still crash, out put print

terminating with uncaught exception of type std::runtime_error: collate_byname<char>::collate_byname failed to construct for test

In a similar setup, does libunwind work with my simpler testcase in hello-exception.cpp? Getting all of the .eh_frame sections right when mixing gcc and clang compiled object files can be a bit problematic as I discovered while testing this; when linking with LLD I had to use https://reviews.llvm.org/D55209 to make them work together. If not, the test doesn't say much else than that you don't have them correctly hooked up.

Although, as I tried to explain, it's not necessarily the same issue in itself, this only says that libunwind doesn't handle the unwind info generated by the compiler for these functions. The unwind info generated by g++ vs clang for libc++ can be wildly different and exercise completely different features of the dwarf interpreter.

if link without libunwind

g++ -g -nostdinc++ testexception.cpp -IK:\mingw\x86_64-w64-mingw32\include\c++\v1 -LK:\mingw\x86_64-w64-mingw32\lib  -nodefaultlibs -lc++ -lc++abi  -lmingw32 -lgcc -liconv -lmoldname -lmingwex -lmsvcrt -ladvapi32 -lshell32 -luser32 -lkernel32  -lmingwex -lmingw32 -lpthread -lgcc_s

print

Caught collate_byname<char>::collate_byname failed to construct for test
Type St13runtime_error

and then crash, gdb said crash at std::runtime_error::~runtime_error.

That's pretty strange. I guess this is an issue with your setup of libc++ compiled with g++ then?

I think gcc_s unwind work more correctly than libunwind.

Yes, that's probably certainly so - libgcc's dwarf unwinding is in very wide use, while libunwind is used in much smaller scale so far.

or libunwind is not compatible with gcc

There's a very important differences in how they find the unwind data:

libgcc uses a special pair of crtbegin.o and crtend.o, which call __register_frame_info to register the current DLL/EXE's .eh_frame section to libgcc (which can either be a DLL or linked statically), and then when unwinding, the libgcc instance browses the .eh_frame sections that are registered to find the unwind data. This means that if you link libgcc statically, it won't find unwind info from a different DLL since each DLL will register the unwind info only to the libgcc embedded in that DLL.

libunwind works differently; it uses EnumProcessModules to list all loaded DLLs and the current EXE and then dives into the list of sections to find a section named .eh_frame. This means that even if you link libunwind statically, it will find .eh_frame from all loaded DLLs.

However, there's one problem with it. .eh_frame is 9 chars, while the section headers only store 8 chars. If you link with LLD, it will truncate the section name, so that it actually is .eh_fram only, and libunwind will find this. If you link with binutils ld, it will write the full name .eh_frame to the string table and just refer to it in the section header (where the section name is written like /4 or so, where 4 is a byte offset into the string table). But the string table is not loaded when a DLL/EXE is loaded into memory, so in these cases, it's impossible for libunwind to actually figure out where the .eh_frame section is.

So libunwind won't find the .eh_frame section if the DLL is linked with binutils ld. This probably is the root cause you're seeing? If you want to test things further, you can look in src/AddressSpace.hpp in libunwind and make it look for sections with a specific name, if you know exactly what the raw name of the .eh_frame section is in your module. But that can vary from one DLL to another though...

So that means that your tests probably don't say anything about libunwind yet.

@jcelerier
Copy link

jcelerier commented Dec 5, 2018

i'm having a problem that I guess is similar (on x86_64).

An exception is thrown here: https://github.com/grame-cncm/faust/blob/master-dev/compiler/parser/sourcereader.cpp#L360

and caught here : https://github.com/grame-cncm/faust/blob/master-dev/compiler/libcode.cpp#L2089

If I do a catch throw with gdb, I can see the exception being thrown :

Thread 1 hit Catchpoint 1 (exception thrown), 0x0000000141db6cb0 in __cxa_throw ()
(gdb) bt
#0  0x0000000141db6cb0 in __cxa_throw ()
#1  0x000000014017f314 in yyerror (
    msg=0x14cc90 "syntax error, unexpected WIRE")
    at C:/dev/faust/compiler/errors\errormsg.cpp:64
#2  0x00000001404a8ca8 in yyparse () at faustparser.cpp:3161
#3  0x00000001404bcc0c in SourceReader::parseLocal (this=0x53dd70,
    fname=0x5529b0 "C:\\score-sdk\\faust-debug\\faust\\bin/foo.dsp")
    at C:/dev/faust/compiler/parser\sourcereader.cpp:350
#4  0x00000001404bcb2b in SourceReader::parseFile (this=0x53dd70,
    fname=0x552618 "foo.dsp")
    at C:/dev/faust/compiler/parser\sourcereader.cpp:328
#5  0x00000001404bdb9b in SourceReader::getList (this=0x53dd70,
    fname=0x552618 "foo.dsp")
    at C:/dev/faust/compiler/parser\sourcereader.cpp:421
#6  0x00000001404bf67f in SourceReader::expandRec (this=0x53dd70,
    ldef=0x552840, visited=..., lresult=0x54e4c0)
    at C:/dev/faust/compiler/parser\sourcereader.cpp:477
#7  0x00000001404bf0ea in SourceReader::expandList (this=0x53dd70,
    ldef=0x552840) at C:/dev/faust/compiler/parser\sourcereader.cpp:463
#8  0x00000001403f1da0 in parseSourceFiles ()
    at C:/dev/faust/compiler\libcode.cpp:1286
#9  0x00000001403db4ee in compileFaustFactoryAux (argc=3, argv=0x4ff640,
    name=0x141df144b <_ZNSt3__1L19piecewise_constructE+1> "FaustDSP",
    dsp_content=0x0, generate=true) at C:/dev/faust/compiler\libcode.cpp:2018
#10 0x00000001403d9e08 in compileFaustFactory (argc=3, argv=0x4ff640,
    name=0x141df144b <_ZNSt3__1L19piecewise_constructE+1> "FaustDSP",
    dsp_content=0x0, error_msg=..., generate=true)
    at C:/dev/faust/compiler\libcode.cpp:2094
#11 0x00000001403ffa7a in main (argc=3, argv=0x4ff640)
    at C:/dev/faust/compiler\main.cpp:45

but if I continue, at some point while unwinding I get a segfault:

Thread 1 received signal SIGSEGV, Segmentation fault.  
0x00007ffd2491820c in ntdll!RtlLookupFunctionEntry ()
from C:\WINDOWS\SYSTEM32\ntdll.dll

(gdb) bt
#0  0x00007ffd2491820c in ntdll!RtlLookupFunctionEntry ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#1  0x00007ffd2491739c in ntdll!RtlUnwindEx ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#2  0x00000000001725e6 in _Unwind_Resume ()
   from C:\msys64\mingw64\bin\libunwind.dll
#3  0x00000001404bf12a in SourceReader::expandList (this=0x53dd70,
    ldef=0x552840) at C:/dev/faust/compiler/parser\sourcereader.cpp:464
#4  0x00000001403f1da0 in parseSourceFiles ()
    at C:/dev/faust/compiler\libcode.cpp:1286
#5  0x00000001403db4ee in compileFaustFactoryAux (argc=3, argv=0x4ff640,
    name=0x141df144b <_ZNSt3__1L19piecewise_constructE+1> "FaustDSP",
    dsp_content=0x0, generate=true) at C:/dev/faust/compiler\libcode.cpp:2018
#6  0x00000001403d9e08 in compileFaustFactory (argc=3, argv=0x4ff640,
    name=0x141df144b <_ZNSt3__1L19piecewise_constructE+1> "FaustDSP",
    dsp_content=0x0, error_msg=..., generate=true)
    at C:/dev/faust/compiler\libcode.cpp:2094
#7  0x00000001403ffa7a in main (argc=3, argv=0x4ff640)
    at C:/dev/faust/compiler\main.cpp:45

(of course the whole codebase is built with the same options, and it's all in the same executable)

hope it helps !

@SquallATF
Copy link
Author

I have redo g++ test with libc++ and libunwind. I found g++ product .eh_fram section named /4, then I use pe editor rename that section to .eh_fram. after rename run test programe. the result same as link with libgcc_s. so that libunwind is correct.
I build debug version libc++, the exception handle correct both shared and static build. Then I tried rebuild libc++ with CMAKE_BUILD_TYPE=RelWithDebInfo , but no lucky still crash. Next I tried to found more crash exception example, but currently only find when constructing std::locale incorrectly.

@SquallATF
Copy link
Author

I am trying to set __attribute__((optnone)) on locale::__imp::__imp at locale.cpp#L230, the exception test work correctly.

@mstorsjo
Copy link
Owner

mstorsjo commented Dec 6, 2018

I am trying to set __attribute__((optnone)) on locale::__imp::__imp at locale.cpp#L230, the exception test work correctly.

That's a great observation. What does the compiler output (-S) for this function look like?

@mstorsjo
Copy link
Owner

mstorsjo commented Dec 6, 2018

i'm having a problem that I guess is similar (on x86_64).

Your issue is totally unrelated (the current issue is about dwarf, your issue is with seh) , can you file it separately?

Can you produce a reduced testcase that is easy for me to reproduce?

If you replace your libunwind.dll with http://martin.st/temp/libgcc/libunwind.dll (which is built from libgcc), does it still behave the same?

@SquallATF
Copy link
Author

SquallATF commented Dec 6, 2018

I am trying to set __attribute__((optnone)) on locale::__imp::__imp at locale.cpp#L230, the exception test work correctly.

That's a great observation. What does the compiler output (-S) for this function look like?

Here is the compiler output , function name is _ZNSt3__16locale5__impC2ERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEEj
locale.zip optO2.s filename correct, it is O3

I have debug the catch code
first catch at locale.cpp#L265-L271 works. but after throw at L270, the exception uncaught.

optnone

Ltmp1761:
	movl	%ebp, (%esp)
	movl	%esi, %ecx
	calll	__ZNSt3__16locale5__imp7installINS_14collate_bynameIcEEEEvPT_
	subl	$4, %esp
Ltmp1762:
	jmp	LBB278_19
LBB278_19:                              # %invoke.cont24
Ltmp1763:
	movl	$16, (%esp)
	calll	__Znwj
Ltmp1764:
........................
Ltmp1896:
	movl	%eax, 8(%esp)
	movl	%edx, 12(%esp)
	jmp	LBB278_97
........................
LBB278_97:                              # %catch
	movl	8(%esp), %eax
	movl	%eax, (%esp)
	calll	___cxa_begin_catch
	movl	$0, 16(%esp)
LBB278_98:                              # %for.cond140
                                        # =>This Inner Loop Header: Depth=1
	movl	16(%esp), %ebp
	movl	%esi, %ecx
	addl	$8, %ecx
	calll	__ZNKSt3__16vectorIPNS_6locale5facetENS_15__sso_allocatorIS3_Lj28EEEE4sizeEv
	cmpl	%eax, %ebp
	jb	LBB278_100
# %bb.99:                               # %for.cond.cleanup144
	jmp	LBB278_106
LBB278_100:                             # %for.body145
                                        #   in Loop: Header=BB278_98 Depth=1
	movl	%esi, %ecx
	addl	$8, %ecx
	movl	16(%esp), %eax
	movl	%eax, (%esp)
	calll	__ZNSt3__16vectorIPNS_6locale5facetENS_15__sso_allocatorIS3_Lj28EEEEixEj
	subl	$4, %esp
# %bb.101:                              # %invoke.cont148
                                        #   in Loop: Header=BB278_98 Depth=1
	cmpl	$0, (%eax)
	je	LBB278_104
# %bb.102:                              # %if.then151
                                        #   in Loop: Header=BB278_98 Depth=1
	movl	%esi, %ecx
	addl	$8, %ecx
	movl	16(%esp), %eax
	movl	%eax, (%esp)
	calll	__ZNSt3__16vectorIPNS_6locale5facetENS_15__sso_allocatorIS3_Lj28EEEEixEj
	subl	$4, %esp
# %bb.103:                              # %invoke.cont153
                                        #   in Loop: Header=BB278_98 Depth=1
	movl	(%eax), %ecx
	calll	__ZNSt3__114__shared_count16__release_sharedEv
LBB278_104:                             # %if.end156
                                        #   in Loop: Header=BB278_98 Depth=1
	jmp	LBB278_105
LBB278_105:                             # %for.inc157
                                        #   in Loop: Header=BB278_98 Depth=1
	movl	16(%esp), %eax
	addl	$1, %eax
	movl	%eax, 16(%esp)
	jmp	LBB278_98
LBB278_106:                             # %for.end159
Ltmp1897:
	calll	___cxa_rethrow
Ltmp1898:
	jmp	LBB278_115
LBB278_107:                             # %lpad160
Ltmp1899:
	movl	%eax, 8(%esp)
	movl	%edx, 12(%esp)
# %bb.108:                              # %ehcleanup
Ltmp1900:
	calll	___cxa_end_catch
........................
	.uleb128 Ltmp1761-Lfunc_begin77 # >> Call Site 5 <<
	.uleb128 Ltmp1764-Ltmp1761      #   Call between Ltmp1761 and Ltmp1764
	.uleb128 Ltmp1896-Lfunc_begin77 #     jumps to Ltmp1896
	.byte	1                       #   On action: 1
........................
	.uleb128 Ltmp1897-Lfunc_begin77 # >> Call Site 45 <<
	.uleb128 Ltmp1898-Ltmp1897      #   Call between Ltmp1897 and Ltmp1898
	.uleb128 Ltmp1899-Lfunc_begin77 #     jumps to Ltmp1899
	.byte	0                       #   On action: cleanup

O3

Ltmp1754:
	.cfi_escape 0x2e, 0x04
	movl	%esi, %ecx
	pushl	%edi
	.cfi_adjust_cfa_offset 4
	calll	__ZNSt3__16locale5__imp7installINS_14collate_bynameIcEEEEvPT_
	.cfi_adjust_cfa_offset -4
Ltmp1755:
# %bb.15:                               # %invoke.cont24
Ltmp1756:
	.cfi_escape 0x2e, 0x04
	pushl	$16
	.cfi_adjust_cfa_offset 4
	calll	__Znwj
	addl	$4, %esp
	.cfi_adjust_cfa_offset -4
Ltmp1757:
........................
Ltmp1872:
	movl	%eax, (%esp)            # 4-byte Spill
LBB275_117:                             # %catch
	.cfi_escape 0x2e, 0x04
	pushl	(%esp)                  # 4-byte Folded Reload
	.cfi_adjust_cfa_offset 4
	calll	___cxa_begin_catch
	addl	$4, %esp
	.cfi_adjust_cfa_offset -4
	movl	8(%esi), %ecx
	cmpl	%ecx, 12(%esi)
	je	LBB275_123
# %bb.118:                              # %for.body145.preheader
	xorl	%edi, %edi
	.p2align	4, 0x90
LBB275_119:                             # %for.body145
                                        # =>This Inner Loop Header: Depth=1
	movl	(%ecx,%edi,4), %ecx
	testl	%ecx, %ecx
	je	LBB275_122
# %bb.120:                              # %if.then151
                                        #   in Loop: Header=BB275_119 Depth=1
	movl	$-1, %eax
	lock		xaddl	%eax, 4(%ecx)
	testl	%eax, %eax
	jne	LBB275_122
# %bb.121:                              # %if.then.i185
                                        #   in Loop: Header=BB275_119 Depth=1
	movl	(%ecx), %eax
	.cfi_escape 0x2e, 0x00
	calll	*8(%eax)
LBB275_122:                             # %for.inc157
                                        #   in Loop: Header=BB275_119 Depth=1
	movl	8(%esi), %ecx
	movl	12(%esi), %eax
	incl	%edi
	subl	%ecx, %eax
	sarl	$2, %eax
	cmpl	%eax, %edi
	jb	LBB275_119
LBB275_123:                             # %for.cond.cleanup144
Ltmp1885:
	.cfi_escape 0x2e, 0x00
	calll	___cxa_rethrow
Ltmp1886:
# %bb.134:                              # %unreachable
LBB275_124:                             # %lpad160
Ltmp1887:
	movl	%eax, %ebp
Ltmp1888:
	.cfi_escape 0x2e, 0x00
	calll	___cxa_end_catch
............................
	.uleb128 Ltmp1754-Lfunc_begin76 # >> Call Site 4 <<
	.uleb128 Ltmp1757-Ltmp1754      #   Call between Ltmp1754 and Ltmp1757
	.uleb128 Ltmp1872-Lfunc_begin76 #     jumps to Ltmp1872
	.byte	1                       #   On action: 1
............................
	.uleb128 Ltmp1885-Lfunc_begin76 # >> Call Site 39 <<
	.uleb128 Ltmp1886-Ltmp1885      #   Call between Ltmp1885 and Ltmp1886
	.uleb128 Ltmp1887-Lfunc_begin76 #     jumps to Ltmp1887
	.byte	0                       #   On action: cleanup

@mstorsjo
Copy link
Owner

mstorsjo commented Dec 7, 2018

I have debug the catch code
first catch at locale.cpp#L265-L271 works. but after throw at L270, the exception uncaught.

That's great progress! I haven't had time to try to read the compiler output yet to see if there's something strange in it, but knowing that the tricky thing is a rethrow in a catch clause, are you able to create a smaller minimal testcase for debugging the issue further?

@SquallATF
Copy link
Author

I have debug the catch code
first catch at locale.cpp#L265-L271 works. but after throw at L270, the exception uncaught.

That's great progress! I haven't had time to try to read the compiler output yet to see if there's something strange in it, but knowing that the tricky thing is a rethrow in a catch clause, are you able to create a smaller minimal testcase for debugging the issue further?

I tried many methods to simulate this throw case but can not reproduce.

@mstorsjo
Copy link
Owner

mstorsjo commented Dec 7, 2018

I have debug the catch code
first catch at locale.cpp#L265-L271 works. but after throw at L270, the exception uncaught.

That's great progress! I haven't had time to try to read the compiler output yet to see if there's something strange in it, but knowing that the tricky thing is a rethrow in a catch clause, are you able to create a smaller minimal testcase for debugging the issue further?

I tried many methods to simulate this throw case but can not reproduce.

Ok, that's a shame - thanks for trying!

@SquallATF
Copy link
Author

New discovery
add __attribute__((optnone)) on locale::classic(), crash will gone. locale.cpp#L239 invoke locale::classic(), but disable optimize on locale::__imp::make_classic() didn't help.

optnone1.zip

disable locale::classic() optimize in locale::__imp::__imp constructor will call locale::classic()

Ltmp1749:
# %bb.1:                                # %invoke.cont3
Ltmp1751:
	.cfi_escape 0x2e, 0x00
	calll	__ZNSt3__16locale7classicEv
Ltmp1752:
# %bb.2:                                # %invoke.cont5
	movl	(%eax), %eax
	cmpl	%esi, %eax
	je	LBB271_4
# %bb.3:                                # %if.then.i

enable locale::classic() optimize locale::classic() will convert to local assembly in locale::__imp::__imp constructor

%bb.1:                                # %invoke.cont3
	movb	__ZGVZNSt3__16locale7classicEvE1c, %al
	testb	%al, %al
	je	LBB275_2
LBB275_5:                               # %invoke.cont5
	movl	__ZZNSt3__16locale5__imp12make_classicEvE3buf, %eax
	cmpl	%esi, %eax
	je	LBB275_7
LBB275_6:                               # %if.then.i

I can't find more reasons because of limited knowledge.

@mstorsjo
Copy link
Owner

mstorsjo commented Dec 7, 2018

Btw, did you see my last comment on the related llvm patch review? The patch has 2 tests, are they duplicates? Should the binary test be left out?

@mstorsjo
Copy link
Owner

I got time to look closer at this issue, and I managed to narrow it down to a few LLVM optimizations that don't work properly together, namely X86 call frame optimization and tail merging. See https://bugs.llvm.org/show_bug.cgi?id=40012 for details. The issue can be avoided by compiling with -mllvm -no-x86-call-frame-opt or -mllvm -enable-tail-merge=0.

@SquallATF SquallATF changed the title try/cache not work correctly with i686 dwarf exceptions try/catch not work correctly with i686 dwarf exceptions Dec 22, 2018
@mstorsjo
Copy link
Owner

mstorsjo commented May 4, 2019

This issue was fixed upstream in LLVM in https://reviews.llvm.org/D61252 (SVN r359496), which now is included in the pinned LLVM version here, and in the latest prebuilt release (that I just made).

@mstorsjo mstorsjo closed this as completed May 4, 2019
@jcelerier
Copy link

thanks for the info !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants