New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[StackColoring] Handle SEH catch object stack slots conservatively #66988
Conversation
@llvm/pr-subscribers-backend-x86 ChangesThe write to the SEH catch object happens before cleanuppads are executed, while the first reference to the object will typically be in a catchpad. If we make use of first-use analysis, we may end up allocating an alloca used inside the cleanuppad and the catch object at the same stack offset, which would be incorrect. https://reviews.llvm.org/D86673 was a previous attempt to fix it. It used the heuristic "a slot loaded in a WinEH pad and never written" to detect catch objects. However, because it checks for more than one load (while probably more than zero was intended), the fix does not actually work. The general approach also seems dubious to me, so this patch reverts that change entirely, and instead marks all catch object slots as conservative (i.e. excluded from first-use analysis) based on the WinEHFuncInfo. As far as I can tell we don't need any heuristics here, we know exactly which slots are affected. Fixes #66984. Full diff: https://github.com/llvm/llvm-project/pull/66988.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/StackColoring.cpp b/llvm/lib/CodeGen/StackColoring.cpp
index 4d69b6104f68372..d90c3e95f477944 100644
--- a/llvm/lib/CodeGen/StackColoring.cpp
+++ b/llvm/lib/CodeGen/StackColoring.cpp
@@ -370,37 +370,6 @@ STATISTIC(EscapedAllocas, "Number of allocas that escaped the lifetime region");
// If in RPO ordering chosen to walk the CFG we happen to visit the b[k]
// before visiting the memcpy block (which will contain the lifetime start
// for "b" then it will appear that 'b' has a degenerate lifetime.
-//
-// Handle Windows Exception with LifetimeStartOnFirstUse:
-// -----------------
-//
-// There was a bug for using LifetimeStartOnFirstUse in win32.
-// class Type1 {
-// ...
-// ~Type1(){ write memory;}
-// }
-// ...
-// try{
-// Type1 V
-// ...
-// } catch (Type2 X){
-// ...
-// }
-// For variable X in catch(X), we put point pX=&(&X) into ConservativeSlots
-// to prevent using LifetimeStartOnFirstUse. Because pX may merged with
-// object V which may call destructor after implicitly writing pX. All these
-// are done in C++ EH runtime libs (through CxxThrowException), and can't
-// obviously check it in IR level.
-//
-// The loader of pX, without obvious writing IR, is usually the first LOAD MI
-// in EHPad, Some like:
-// bb.x.catch.i (landing-pad, ehfunclet-entry):
-// ; predecessors: %bb...
-// successors: %bb...
-// %n:gr32 = MOV32rm %stack.pX ...
-// ...
-// The Type2** %stack.pX will only be written in EH runtime libs, so we
-// check the StoreSlots to screen it out.
namespace {
@@ -462,9 +431,6 @@ class StackColoring : public MachineFunctionPass {
/// slots lifetime-start-on-first-use is disabled).
BitVector ConservativeSlots;
- /// Record the FI slots referenced by a 'may write to memory'.
- BitVector StoreSlots;
-
/// Number of iterations taken during data flow analysis.
unsigned NumIterations;
@@ -660,13 +626,10 @@ unsigned StackColoring::collectMarkers(unsigned NumSlot) {
InterestingSlots.resize(NumSlot);
ConservativeSlots.clear();
ConservativeSlots.resize(NumSlot);
- StoreSlots.clear();
- StoreSlots.resize(NumSlot);
// number of start and end lifetime ops for each slot
SmallVector<int, 8> NumStartLifetimes(NumSlot, 0);
SmallVector<int, 8> NumEndLifetimes(NumSlot, 0);
- SmallVector<int, 8> NumLoadInCatchPad(NumSlot, 0);
// Step 1: collect markers and populate the "InterestingSlots"
// and "ConservativeSlots" sets.
@@ -722,13 +685,6 @@ unsigned StackColoring::collectMarkers(unsigned NumSlot) {
if (! BetweenStartEnd.test(Slot)) {
ConservativeSlots.set(Slot);
}
- // Here we check the StoreSlots to screen catch point out. For more
- // information, please refer "Handle Windows Exception with
- // LifetimeStartOnFirstUse" at the head of this file.
- if (MI.mayStore())
- StoreSlots.set(Slot);
- if (MF->getWinEHFuncInfo() && MBB->isEHPad() && MI.mayLoad())
- NumLoadInCatchPad[Slot] += 1;
}
}
}
@@ -739,14 +695,23 @@ unsigned StackColoring::collectMarkers(unsigned NumSlot) {
return 0;
}
- // 1) PR27903: slots with multiple start or end lifetime ops are not
+ // PR27903: slots with multiple start or end lifetime ops are not
// safe to enable for "lifetime-start-on-first-use".
- // 2) And also not safe for variable X in catch(X) in windows.
for (unsigned slot = 0; slot < NumSlot; ++slot) {
- if (NumStartLifetimes[slot] > 1 || NumEndLifetimes[slot] > 1 ||
- (NumLoadInCatchPad[slot] > 1 && !StoreSlots.test(slot)))
+ if (NumStartLifetimes[slot] > 1 || NumEndLifetimes[slot] > 1)
ConservativeSlots.set(slot);
}
+
+ // The write to the catch object by the personality function is not propely
+ // modeled in IR: It happens before any cleanuppads are executed, even if the
+ // first mention of the catch object is in a catchpad. As such, mark catch
+ // object slots as conservative, so they are excluded from first-use analysis.
+ if (WinEHFuncInfo *EHInfo = MF->getWinEHFuncInfo())
+ for (WinEHTryBlockMapEntry &TBME : EHInfo->TryBlockMap)
+ for (WinEHHandlerType &H : TBME.HandlerArray)
+ if (H.CatchObj.FrameIndex != std::numeric_limits<int>::max())
+ ConservativeSlots.set(H.CatchObj.FrameIndex);
+
LLVM_DEBUG(dumpBV("Conservative slots", ConservativeSlots));
// Step 2: compute begin/end sets for each block
diff --git a/llvm/test/CodeGen/X86/stack-coloring-seh.ll b/llvm/test/CodeGen/X86/stack-coloring-seh.ll
index 3995bfa8101d69d..892c81a12dc1acf 100644
--- a/llvm/test/CodeGen/X86/stack-coloring-seh.ll
+++ b/llvm/test/CodeGen/X86/stack-coloring-seh.ll
@@ -3,7 +3,7 @@
@type_info = external global ptr
-; FIXME: This is a miscompile.
+; Make sure %a1 and %a2 don't share the same stack offset.
define void @pr66984(ptr %arg) personality ptr @__CxxFrameHandler3 {
; CHECK-LABEL: pr66984:
; CHECK: # %bb.0: # %bb
@@ -12,7 +12,7 @@ define void @pr66984(ptr %arg) personality ptr @__CxxFrameHandler3 {
; CHECK-NEXT: pushl %ebx
; CHECK-NEXT: pushl %edi
; CHECK-NEXT: pushl %esi
-; CHECK-NEXT: subl $20, %esp
+; CHECK-NEXT: subl $24, %esp
; CHECK-NEXT: movl %esp, -28(%ebp)
; CHECK-NEXT: movl $-1, -16(%ebp)
; CHECK-NEXT: leal -24(%ebp), %eax
@@ -31,7 +31,7 @@ define void @pr66984(ptr %arg) personality ptr @__CxxFrameHandler3 {
; CHECK-NEXT: $ehgcr_0_4:
; CHECK-NEXT: movl -24(%ebp), %eax
; CHECK-NEXT: movl %eax, %fs:0
-; CHECK-NEXT: addl $20, %esp
+; CHECK-NEXT: addl $24, %esp
; CHECK-NEXT: popl %esi
; CHECK-NEXT: popl %edi
; CHECK-NEXT: popl %ebx
@@ -47,7 +47,7 @@ define void @pr66984(ptr %arg) personality ptr @__CxxFrameHandler3 {
; CHECK-NEXT: pushl %ebp
; CHECK-NEXT: addl $12, %ebp
; CHECK-NEXT: movl %esp, -28(%ebp)
-; CHECK-NEXT: movl -32(%ebp), %ecx
+; CHECK-NEXT: movl -36(%ebp), %ecx
; CHECK-NEXT: movl $2, -16(%ebp)
; CHECK-NEXT: calll _cleanup
; CHECK-NEXT: movl $LBB0_3, %eax
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good.
ConservativeSlots.set(slot); | ||
} | ||
|
||
// The write to the catch object by the personality function is not propely | ||
// modeled in IR: It happens before any cleanuppads are executed, even if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think proper modelling would mean passing the catch object as a bundled argument to every invoke that unwinds to the catchpad. Every invoke would mention all catch objects of every catch it could unwind to. It would fix the issue, but I worry that the extra clutter would actually make the IR harder for humans to read and understand.
If we find the time to work on the model, catchswitches are another area for improvement. At the time, we didn't have callbr
, so we added catchswitch
as an unwind edge multiplexer. Having multiple successor edges coming off of an invoke which unwinds to a catchswitch serving multiple catch blocks is more accurate, and would eliminate BBs with no insertion point, which are an ongoing source of WinEH bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also do some sort of fixup during ISel, I think? The analysis here is looking at MachineInstrs, so we could add frameindex references to the call instructions.
(I think it's generally a good idea to try to get the MachineIR into the form we want, then consider if we want to make corresponding IR changes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would definitely work from a modelling perspective, and it is what we do for the CFG. We add in all the missing edges from the invokes. I'm not clear on how to add no-op FrameIndex operands to the call instructions, but that's a decent direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect to see problems in passes other than StackColoring?
The somewhat special situation here is that while the invoke effectively writes to the stack slot, code can only rely on that write inside the catchpad. So if we have something like this:
*e = x
invoke (unmodeled potential write to e)
cleanuppad
v = *e
catchpad(e)
and then v = *e
gets GVNd to x
because the write at the invoke is not modeled, I would not really consider that a miscompile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've filed #67110 to keep track of this idea.
@@ -3,7 +3,7 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a nit, but "seh" in the test name usually refers to either:
- Windows call frame info, the unwind opcodes, powered by the .seh_ assembler directive family
- Functionality related to
__try
, non-call exceptions, or EHa, which makes C++ cleanups run for non-call exceptions
For issues related to MSVC C++ EH and _CxxFrameHandler3, I usually just called it "wineh", maybe "wineh-cxx", but you can see we have varying opinions in the test suite currently:
$ git grep -l CxxFrameHand ../llvm/test/CodeGen/X86/
../llvm/test/CodeGen/X86/branchfolding-catchpads.ll
../llvm/test/CodeGen/X86/catchpad-dynamic-alloca.ll
../llvm/test/CodeGen/X86/catchpad-lifetime.ll
../llvm/test/CodeGen/X86/catchpad-realign-savexmm.ll
../llvm/test/CodeGen/X86/catchpad-regmask.ll
../llvm/test/CodeGen/X86/catchpad-reuse.ll
../llvm/test/CodeGen/X86/catchpad-weight.ll
../llvm/test/CodeGen/X86/catchret-fallthrough.ll
../llvm/test/CodeGen/X86/catchret-regmask.ll
../llvm/test/CodeGen/X86/cfguard-checks-funclet.ll
../llvm/test/CodeGen/X86/cleanuppad-inalloca.ll
../llvm/test/CodeGen/X86/cleanuppad-large-codemodel.ll
../llvm/test/CodeGen/X86/cleanuppad-realign.ll
../llvm/test/CodeGen/X86/deopt-bundles.ll
../llvm/test/CodeGen/X86/ehcontguard.ll
../llvm/test/CodeGen/X86/funclet-layout.ll
../llvm/test/CodeGen/X86/machine-licm-vs-wineh.mir
../llvm/test/CodeGen/X86/noreturn-call-win64.ll
../llvm/test/CodeGen/X86/pop-stack-cleanup-msvc.ll
../llvm/test/CodeGen/X86/pr26757.ll
../llvm/test/CodeGen/X86/pr27501.ll
../llvm/test/CodeGen/X86/pr42064.ll
../llvm/test/CodeGen/X86/pr48064.mir
../llvm/test/CodeGen/X86/regalloc-spill-at-ehpad.ll
../llvm/test/CodeGen/X86/seh-unwind-inline-asm-codegen.ll
../llvm/test/CodeGen/X86/stack-coloring-seh.ll
../llvm/test/CodeGen/X86/tail-dup-catchret.ll
../llvm/test/CodeGen/X86/tail-merge-wineh.ll
../llvm/test/CodeGen/X86/win-catchpad-csrs.ll
../llvm/test/CodeGen/X86/win-catchpad-nested-cxx.ll
../llvm/test/CodeGen/X86/win-catchpad-varargs.ll
../llvm/test/CodeGen/X86/win-catchpad.ll
../llvm/test/CodeGen/X86/win-cleanuppad.ll
../llvm/test/CodeGen/X86/win-funclet-cfi.ll
../llvm/test/CodeGen/X86/win32-eh-available-externally.ll
../llvm/test/CodeGen/X86/win32-eh-states.ll
../llvm/test/CodeGen/X86/win32-eh.ll
../llvm/test/CodeGen/X86/win64-eh-empty-block-2.mir
../llvm/test/CodeGen/X86/win64-funclet-preisel-intrinsics.ll
../llvm/test/CodeGen/X86/win64-funclet-savexmm.ll
../llvm/test/CodeGen/X86/win64_eh_leaf2.ll
../llvm/test/CodeGen/X86/windows-seh-EHa-CppCatchDotDotDot.ll
../llvm/test/CodeGen/X86/windows-seh-EHa-CppCondiTemps.ll
../llvm/test/CodeGen/X86/windows-seh-EHa-CppDtors01.ll
../llvm/test/CodeGen/X86/wineh-no-ehpads.ll
There isn't a huge amount of consistency here, do whatever you feel is best.
The write to the SEH catch object happens before cleanuppads are executed, while the first reference to the object will typically be in a catchpad. If we make use of first-use analysis, we may end up allocating an alloca used inside the cleanuppad and the catch object at the same stack offset, which would be incorrect. https://reviews.llvm.org/D86673 was a previous attempt to fix it. It used the heuristic "a slot loaded in a WinEH pad and never written" to detect catch objects. However, because it checks for more than one load (while probably more than zero was intended), the fix does not actually work. The general approach also seems dubious to me, so this patch reverts that change entirely, and instead marks all catch object slots as conservative (i.e. excluded from first-use analysis) based on the WinEHFuncInfo. As far as I can tell we don't need any heuristics here, we know exactly which slots are affected. Fixes llvm#66984.
a47fa1e
to
b3cb4f0
Compare
This is a followup to #66988. The implementation there did not account for the possibility of the catch object frame index referrring to a fixed object, which is the case on win64.
The write to the SEH catch object happens before cleanuppads are executed, while the first reference to the object will typically be in a catchpad.
If we make use of first-use analysis, we may end up allocating an alloca used inside the cleanuppad and the catch object at the same stack offset, which would be incorrect.
https://reviews.llvm.org/D86673 was a previous attempt to fix it. It used the heuristic "a slot loaded in a WinEH pad and never written" to detect catch objects. However, because it checks for more than one load (while probably more than zero was intended), the fix does not actually work.
The general approach also seems dubious to me, so this patch reverts that change entirely, and instead marks all catch object slots as conservative (i.e. excluded from first-use analysis) based on the WinEHFuncInfo. As far as I can tell we don't need any heuristics here, we know exactly which slots are affected.
Fixes #66984.