Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrashRecoveryTest fails when built with clang-cl on 32-bit windows #44042

Closed
zmodem opened this issue Jan 28, 2020 · 19 comments
Closed

CrashRecoveryTest fails when built with clang-cl on 32-bit windows #44042

zmodem opened this issue Jan 28, 2020 · 19 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@zmodem
Copy link
Collaborator

zmodem commented Jan 28, 2020

Bugzilla Link 44697
Resolution FIXED
Resolved on Feb 11, 2020 02:08
Version trunk
OS Linux
Blocks #43900
CC @dyung,@jmorse,@MatzeB,@rnk,@stella.stamenova

Extended Description

C:\src\llvm.monorepo\build.32_2019_selfhost>set CC=c:\src\llvm.monorepo\build.32_2019\bin\clang-cl.exe
C:\src\llvm.monorepo\build.32_2019_selfhost>set CXX=%CC%
C:\src\llvm.monorepo\build.32_2019_selfhost>cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_PROJECTS="llvm" ..\llvm

C:\src\llvm.monorepo\build.32_2019_selfhost>ninja SupportTests

C:\src\llvm.monorepo\build.32_2019_selfhost>unittests\Support\SupportTests.exe --gtest_filter=CrashRecovery*
Note: Google Test filter = CrashRecovery*
[==========] Running 5 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 5 tests from CrashRecoveryTest
[ RUN ] CrashRecoveryTest.Basic
unknown file: error: SEH exception with code 0x3221225477 thrown in â─δë±Φ�.
[ FAILED ] CrashRecoveryTest.Basic (2 ms)
[ RUN ] CrashRecoveryTest.Cleanup
unknown file: error: SEH exception with code 0x3221225477 thrown in â─δë±Φ�.
[ FAILED ] CrashRecoveryTest.Cleanup (0 ms)
[ RUN ] CrashRecoveryTest.DumpStackCleanup
0x011568E0 (0x01158560 0x011568E0 0x00000001 0x023CF598)
0x014B2ED2 (0x01561420 0x045B50A0 0x0453F620 0x01158514)
0x0157D1C1 (0x045B50A0 0x01562380 0x015F46E5 0x01562380)
0x015622F5 (0x5E30635E 0x00000000 0x0454D190 0x0453F620)
0x01563C35 (0x0156BD2C 0x005B1000 0x0454D190 0x00000000)
0x0156B405 (0x00E56001 0x00007FFF 0x015F4D63 0x023CF800)
0x0157D581 (0x0453F620 0x0156B170 0x015F4D7C 0x00000000)
unknown file: error: SEH exception with code 0x3221225477 thrown in â─δë±Φ�.
[ FAILED ] CrashRecoveryTest.DumpStackCleanup (32 ms)
[ RUN ] CrashRecoveryTest.RaiseException
unknown file: error: SEH exception with code 0x3221225477 thrown in â─δë±Φ�.
[ FAILED ] CrashRecoveryTest.RaiseException (0 ms)
[ RUN ] CrashRecoveryTest.CallOutputDebugString
[ OK ] CrashRecoveryTest.CallOutputDebugString (0 ms)
[----------] 5 tests from CrashRecoveryTest (41 ms total)

[----------] Global test environment tear-down
[==========] 5 tests from 1 test case ran. (48 ms total)
[ PASSED ] 1 test.
[ FAILED ] 4 tests, listed below:
[ FAILED ] CrashRecoveryTest.Basic
[ FAILED ] CrashRecoveryTest.Cleanup
[ FAILED ] CrashRecoveryTest.DumpStackCleanup
[ FAILED ] CrashRecoveryTest.RaiseException

4 FAILED TESTS

I'm guessing this might be a problem with clang-cl's SEH support rather than with CrashRecovery, but maybe it was uncovered because of the new in-process-cc1 stuff.

@zmodem
Copy link
Collaborator Author

zmodem commented Jan 28, 2020

Bisection points to this change:

commit a1f1699
Author: Alexandre Ganea alexandre.ganea@ubisoft.com
Date: Sat Jan 11 15:27:07 2020 -0500

[Support] Optionally call signal handlers when a function wrapped by the the CrashRecoveryContext fails

This patch allows for handling a failure inside a CrashRecoveryContext in the same way as the global exception/signal handler. A failure will have the same side-effect, such as cleanup of temporarty file, printing callstack, calling relevant signal handlers, and finally returning an exception code. This is an optional feature, disabled by default.
This is a support patch for D69825.

Differential Revision: https://reviews.llvm.org/D70568

llvm/include/llvm/Support/CrashRecoveryContext.h | 8 +++
llvm/include/llvm/Support/Signals.h | 9 ++++
llvm/lib/Support/CrashRecoveryContext.cpp | 65 +++++++++++++++++++-----
llvm/lib/Support/Unix/Signals.inc | 16 ++++++
llvm/lib/Support/Windows/Signals.inc | 57 +++++++--------------
llvm/unittests/Support/CrashRecoveryTest.cpp | 30 +++++++++++
6 files changed, 132 insertions(+), 53 deletions(-)

But since the test passes when built with MSVC, I'm guessing the problem is really in clang's SEH support? Maybe something regressed there, and this change uncovered it? Also it's interesting that it seems to work in 64-bit builds but not 32-bit.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 28, 2020

Would you have a chance to try using VEH instead?

https://github.com/llvm/llvm-project/blob/master/llvm/lib/Support/CrashRecoveryContext.cpp#L177

Change to #if defined(_MSC_VER) && !defined(clang)

@zmodem
Copy link
Collaborator Author

zmodem commented Jan 28, 2020

It seems we fail to catch the exception. And it seems the problem is the GetExceptionCode() call in the exception handler. If I comment out that, we catch the exceptions successfully (but the test doesn't pass because we don't set RetCode correctly):

//RetCode = GetExceptionCode();

Would you have a chance to try using VEH instead?

I could try, but we want SEH to work too :-)

@dyung
Copy link
Collaborator

dyung commented Jan 28, 2020

We also saw this test failing on our internal Windows build bot after the change in comment 1.

@zmodem
Copy link
Collaborator Author

zmodem commented Jan 28, 2020

Here's a reduced repro:

#include <excpt.h>
#include <stdio.h>

static void nullDeref() { *(volatile int *)0x10 = 0; }

static bool InvokeFunctionCall(void (*Fn)(void), bool DumpStackAndCleanup, int &RetCode) {
__try {
Fn();
} __except (EXCEPTION_EXECUTE_HANDLER) {
RetCode = GetExceptionCode();
return false;
}
return true;
}

int main() {
int RetCode = 0;
__try {
InvokeFunctionCall(&nullDeref, false, RetCode);
} __except(EXCEPTION_EXECUTE_HANDLER) {
printf("i'm not supposed to run!\n");
}
printf("RetCode: %d\n", RetCode);
return 0;
}

C:\src\llvm.monorepo\build.32_2019>bin\clang-cl /O2 \src\tmp\a.cc && a.exe
i'm not supposed to run!
RetCode: 0

Reid, when you have a moment, can you take a look?

@zmodem
Copy link
Collaborator Author

zmodem commented Jan 29, 2020

I've pushed 31e0769 to unbreak the test for now, but I hope we can get the SEH problem fixed before the 10 release.

@rnk
Copy link
Collaborator

rnk commented Jan 30, 2020

There is a second crash in the __except block because a load is hoisted above an instruction that sets up EBP:

LBB2_1: # %__except.ret
movl 16(%ebp), %ecx # Load RetCode from parameters (bad)
movl -24(%ebp), %esp # Restore ESP
addl $12, %ebp # Restore EBP
movl -44(%ebp), %eax
movl $0, -40(%ebp) # 4-byte Folded Spill
movl %eax, (%ecx) # Crash trying to store RetCode
jmp LBB2_2

Some pass needs to know that it can't hoist a reload from a frame index above these instructions that re-establish the stack frame.

@rnk
Copy link
Collaborator

rnk commented Jan 30, 2020

The register allocator inserts this load:

*** IR Dump After Greedy Register Allocator ***:

Machine code for function ?InvokeFunctionCall@@YA_NP6AXXZ_NAAH@Z: NoPHIs, TracksLiveness

...

304B bb.1.__except.ret (landing-pad):
; predecessors: %bb.0
successors: %bb.2(0x80000000); %bb.2(100.00%)

308B %13:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load 4 from %fixed-stack.0)
312B %11:gr32 = COPY %13:gr32
320B EH_RESTORE

We need a better way to "glue" EH_RESTORE to the top of the BB, or instead insert EH_RESTORE later.

@rnk
Copy link
Collaborator

rnk commented Jan 30, 2020

This actually repros in clang 4.0 according to godbolt, so we've had this bug since forever.

@rnk
Copy link
Collaborator

rnk commented Jan 31, 2020

Pending fix: https://reviews.llvm.org/D73752

@stella.stamenova
Copy link
Mannequin

stella.stamenova mannequin commented Jan 31, 2020

I can see this test fail on our internal Windows buildbot even with the fix in https://reviews.llvm.org/D73752.

In our case it reproduces in Release (but not Debug) builds on Windows for x64 and the only test failing is CrashRecoveryTest.DumpStackCleanup.

@rnk
Copy link
Collaborator

rnk commented Jan 31, 2020

I can see this test fail on our internal Windows buildbot even with the fix
in https://reviews.llvm.org/D73752.

In our case it reproduces in Release (but not Debug) builds on Windows for
x64 and the only test failing is CrashRecoveryTest.DumpStackCleanup.

What compiler and version is the internal buildbot using, exactly? All released versions of clang-cl miscompile this testcase, so unless your bot builds clang and then rebuilds LLVM with the just-built clang and runs this test, you will still see the failure.

@stella.stamenova
Copy link
Mannequin

stella.stamenova mannequin commented Jan 31, 2020

I can see this test fail on our internal Windows buildbot even with the fix
in https://reviews.llvm.org/D73752.

In our case it reproduces in Release (but not Debug) builds on Windows for
x64 and the only test failing is CrashRecoveryTest.DumpStackCleanup.

What compiler and version is the internal buildbot using, exactly? All
released versions of clang-cl miscompile this testcase, so unless your bot
builds clang and then rebuilds LLVM with the just-built clang and runs this
test, you will still see the failure.

We build the llvm-project with MSVC:

-- The C compiler identification is MSVC 19.16.27035.0
-- The CXX compiler identification is MSVC 19.16.27035.0

And then I assume that the test cases get built with the just-built clang as that should be the default, right?

@rnk
Copy link
Collaborator

rnk commented Jan 31, 2020

I can repro that failure:
https://reviews.llvm.org/P8188

The code crashes a second time here:
EXPECT_NE(CRC.RetCode, 0);

0:000> u eip
SupportTests!CrashRecoveryTest_DumpStackCleanup_Test::TestBody+0x7e8 [C:\src\llvm-project\llvm\unittests\Support\CrashRecoveryTest.cpp @ 86]:
00007ff6`72b9ebc8 397580 cmp dword ptr [rbp-80h],esi

0:000> r rbp
rbp=0000000000000000

RBP should presumably not be null at this point. That seems like it could be a compiler bug, but maybe we did something odd to make that happen.

@zmodem
Copy link
Collaborator Author

zmodem commented Feb 5, 2020

Pending fix: https://reviews.llvm.org/D73752

That one landed, but since it's fixing an older problem and we have a good work-around on the branch, I'll pass on it for 10.x.

Comment 14 makes it sound like there is another problem too though?

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 5, 2020

Hans: There this also: https://reviews.llvm.org/D73809

@zmodem
Copy link
Collaborator Author

zmodem commented Feb 5, 2020

Hans: There this also: https://reviews.llvm.org/D73809

Thanks! I see you put some comments on it, so I'll follow along and merge when it's all done.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 5, 2020

I think you can merge it as it is for 10.0, the patch fixes the bug. My comments are nice-to-haves to clean up the code, we can do that later.

@zmodem
Copy link
Collaborator Author

zmodem commented Feb 11, 2020

I think you can merge it as it is for 10.0, the patch fixes the bug. My
comments are nice-to-haves to clean up the code, we can do that later.

Okay, pushed to 10.x as dbe9c3a

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants