Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemOperands not updated during ISel argument copy elision, may lead to miscompile #89060

Closed
mikaelholmen opened this issue Apr 17, 2024 · 6 comments
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior llvm:SelectionDAG SelectionDAGISel as well miscompilation

Comments

@mikaelholmen
Copy link
Collaborator

Reproduce with: 06eedff

llc bbi-94617_3.ll -o - -debug

bbi-94617_3.ll.gz

If we look at the debug printouts, in "Initial selection DAG:" we get

SelectionDAG has 29 nodes:
  t0: ch,glue = EntryToken
    t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t3: i16 = truncate t2
    t5: i32,ch = CopyFromReg t0, Register:i32 %1
  t6: i16 = truncate t5
    t8: i32,ch = CopyFromReg t0, Register:i32 %2
  t9: i16 = truncate t8
    t11: i32,ch = CopyFromReg t0, Register:i32 %3
  t12: i16 = truncate t11
    t14: i32,ch = CopyFromReg t0, Register:i32 %4
  t15: i16 = truncate t14
    t17: i32,ch = CopyFromReg t0, Register:i32 %5
  t18: i16 = truncate t17
  t21: i16,ch = load<(load (s16) from %fixed-stack.0)> t0, FrameIndex:i64<-1>, undef:i64
  t23: i64 = Constant<0>
      t24: ch = store<(store (s16) into @v_781, align 1)> t21:1, t21, GlobalAddress:i64<ptr @v_781> 0, undef:i64
    t26: ch = store<(store (s16) into %ir.param.addr, align 1)> t24, Constant:i16<666>, FrameIndex:i64<-1>, undef:i64
  t28: ch = X86ISD::RET_GLUE t26, TargetConstant:i32<0>

Especially consider

t21: i16,ch = load<(load (s16) from %fixed-stack.0)> t0, FrameIndex:i64<-1>, undef:i64

and

t26: ch = store<(store (s16) into %ir.param.addr, align 1)> t24, Constant:i16<666>, FrameIndex:i64<-1>, undef:i64

The load and store both access "FrameIndex:i64<-1>", but the MemOperands look like

load (s16) from %fixed-stack.0

and

store (s16) into %ir.param.addr, align 1

so they access exactly the same memory on the stack, but according to the MemOperands it looks like they access different objects? I think this is wrong.

If we look further at the debug printouts at "MI Scheduling" we get

SU(0):   %6:gr16 = MOV16rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s16) from %fixed-stack.0, align 16)
  # preds left       : 0
  # succs left       : 1
  # rdefs left       : 0
  Latency            : 4
  Depth              : 0
  Height             : 4
  Successors:
    SU(2): Data Latency=4 Reg=%6
  Single Issue       : false;
[...]
SU(2):   MOV16mr %7:gr64, 1, $noreg, 0, $noreg, %6:gr16 :: (store (s16) into @v_781, align 1)
  # preds left       : 2
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 4
  Height             : 0
  Predecessors:
    SU(1): Data Latency=4 Reg=%7
    SU(0): Data Latency=4 Reg=%6
  Single Issue       : false;
SU(3):   MOV16mi %fixed-stack.0, 1, $noreg, 0, $noreg, 666 :: (store (s16) into %ir.param.addr, align 16)
  # preds left       : 0
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 0
  Height             : 0
  Single Issue       : false;

So there is no dependency whatsoever between the load, SU(0) and the store SU(3), even if they access the same memory.

On my downstream target this then lead to a miscompile when the scheduler choose to change the order of the load and store.

I think the problem is that SelectionDAGISel::LowerArguments and tryToElideArgumentCopy does rewrites and change the FI used in some instructions, but they leave the MemOperands unchanged.

This seems to be old but it recently lead to a miscompile for my target when the load and store suddenly executed in the wrong order.

@mikaelholmen mikaelholmen added bug Indicates an unexpected problem or unintended behavior miscompilation labels Apr 17, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 17, 2024

@llvm/issue-subscribers-bug

Author: None (mikaelholmen)

Reproduce with: 06eedff ``` llc bbi-94617_3.ll -o - -debug ``` [bbi-94617_3.ll.gz](https://github.com/llvm/llvm-project/files/15010509/bbi-94617_3.ll.gz)

If we look at the debug printouts, in "Initial selection DAG:" we get

SelectionDAG has 29 nodes:
  t0: ch,glue = EntryToken
    t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t3: i16 = truncate t2
    t5: i32,ch = CopyFromReg t0, Register:i32 %1
  t6: i16 = truncate t5
    t8: i32,ch = CopyFromReg t0, Register:i32 %2
  t9: i16 = truncate t8
    t11: i32,ch = CopyFromReg t0, Register:i32 %3
  t12: i16 = truncate t11
    t14: i32,ch = CopyFromReg t0, Register:i32 %4
  t15: i16 = truncate t14
    t17: i32,ch = CopyFromReg t0, Register:i32 %5
  t18: i16 = truncate t17
  t21: i16,ch = load&lt;(load (s16) from %fixed-stack.0)&gt; t0, FrameIndex:i64&lt;-1&gt;, undef:i64
  t23: i64 = Constant&lt;0&gt;
      t24: ch = store&lt;(store (s16) into @<!-- -->v_781, align 1)&gt; t21:1, t21, GlobalAddress:i64&lt;ptr @<!-- -->v_781&gt; 0, undef:i64
    t26: ch = store&lt;(store (s16) into %ir.param.addr, align 1)&gt; t24, Constant:i16&lt;666&gt;, FrameIndex:i64&lt;-1&gt;, undef:i64
  t28: ch = X86ISD::RET_GLUE t26, TargetConstant:i32&lt;0&gt;

Especially consider

t21: i16,ch = load&lt;(load (s16) from %fixed-stack.0)&gt; t0, FrameIndex:i64&lt;-1&gt;, undef:i64

and

t26: ch = store&lt;(store (s16) into %ir.param.addr, align 1)&gt; t24, Constant:i16&lt;666&gt;, FrameIndex:i64&lt;-1&gt;, undef:i64

The load and store both access "FrameIndex:i64<-1>", but the MemOperands look like

load (s16) from %fixed-stack.0

and

store (s16) into %ir.param.addr, align 1

so they access exactly the same memory on the stack, but according to the MemOperands it looks like they access different objects? I think this is wrong.

If we look further at the debug printouts at "MI Scheduling" we get

SU(0):   %6:gr16 = MOV16rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s16) from %fixed-stack.0, align 16)
  # preds left       : 0
  # succs left       : 1
  # rdefs left       : 0
  Latency            : 4
  Depth              : 0
  Height             : 4
  Successors:
    SU(2): Data Latency=4 Reg=%6
  Single Issue       : false;
[...]
SU(2):   MOV16mr %7:gr64, 1, $noreg, 0, $noreg, %6:gr16 :: (store (s16) into @<!-- -->v_781, align 1)
  # preds left       : 2
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 4
  Height             : 0
  Predecessors:
    SU(1): Data Latency=4 Reg=%7
    SU(0): Data Latency=4 Reg=%6
  Single Issue       : false;
SU(3):   MOV16mi %fixed-stack.0, 1, $noreg, 0, $noreg, 666 :: (store (s16) into %ir.param.addr, align 16)
  # preds left       : 0
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 0
  Height             : 0
  Single Issue       : false;

So there is no dependency whatsoever between the load, SU(0) and the store SU(3), even if they access the same memory.

On my downstream target this then lead to a miscompile when the scheduler choose to change the order of the load and store.

I think the problem is that SelectionDAGISel::LowerArguments and tryToElideArgumentCopy does rewrites and change the FI used in some instructions, but they leave the MemOperands unchanged.

This seems to be old but it recently lead to a miscompile for my target when the load and store suddenly executed in the wrong order.

@mikaelholmen mikaelholmen added the llvm:SelectionDAG SelectionDAGISel as well label Apr 17, 2024
@bjope
Copy link
Collaborator

bjope commented Apr 21, 2024

One defensive appoach might be to mark the fixed stack object as "isAliased" when doing the argument copy elison. Kind of like this (we also need to implement MachineFrameInfo::setIsAliasedObjectIndex):

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index c9080d4fee35..be730515c767 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -11225,6 +11225,7 @@ static void tryToElideArgumentCopy(
   });
   MFI.RemoveStackObject(OldIndex);
   MFI.setIsImmutableObjectIndex(FixedIndex, false);
+  MFI.setIsAliasedObjectIndex(FixedIndex, true);
   AllocaIndex = FixedIndex;
   ArgCopyElisionFrameIndexMap.insert({OldIndex, FixedIndex});
   for (SDValue ArgVal : ArgVals)

I think that at least in theory that should make the FixedStackPseudoSourceValue appear as if it aliases with any IR reference.

A more advanced solution would need to eliminate all references to the IR variable somehow (e.g. when lowering loads/stores etc) to make sure we only find MachinePointerInfo:s that use the corresponding FixedStackPseudoSourceValue. That seems to be more complicated, and I have no idea if it is worth the trouble.

On downstream application C code and benchmarks I could not see any negative impact of the more defensive solution, while it seemed to fix the miscompile found in fuzzy testing. But maybe thing are different for other languages/targets/applications.

@bjope
Copy link
Collaborator

bjope commented Apr 21, 2024

Just for reference as I talked about it in the previous comment:
The reasoning behind MachineFrameInfo::isAliased seems to originate from this commit

commit 0815a05fd77d2ee71eb552c91e7351242d1b6aa8
Author: Hal Finkel <hfinkel@anl.gov>
Date:   Sat Aug 16 00:17:02 2014 +0000

    Make isAliased property for fixed-offset stack objects adjustable
    
    We used to assume that any fixed-offset stack object was not aliased. This
    meant that no IR value could point to the memory contained in such an object.
    This is a reasonable default, but is not a universally-correct
    target-independent fact. For example, on PowerPC (both Darwin and non-Darwin),
    some byval arguments are allocated at fixed offsets by the ABI. These, however,
    certainly can be pointed to by IR values. This change moves the 'isAliased'
    logic out of FixedStackPseudoSourceValue and into MFI, and allows the isAliased
    property to be overridden for fixed-offset objects.
    
    This will be used by an upcoming commit to the PowerPC backend to fix PR20280.
    
    No functionality change intended (the behavior of
    FixedStackPseudoSourceValue::isAliased has been made more conservative for
    callers that don't pass an MFI object, but I don't see any in-tree callers that
    do that).
    
    llvm-svn: 215794

@bjope
Copy link
Collaborator

bjope commented Apr 22, 2024

I think this could be a reproducer showing the same bug. Using hexagon as in-tree target:

; RUN: llc --mtriple hexagon-- -O3 %s

define i32 @f(i32, i32, i32, i32, i32, i32,i32, i32, i32, i32, i32, i32, i32, i32 %q1, i32 %a, i32 %q2) {
  %alloca = alloca i32
  store i32 %a, ptr %alloca
  store i32 666, ptr %alloca
  %x = sub i32 %q1, %q2
  %y = xor i32 %x, %a
  ret i32 %y
}

Print after all gives this:

# *** IR Dump After Hexagon DAG->DAG Pattern Instruction Selection (hexagon-isel) ***:
# Machine code for function f: IsSSA, TracksLiveness
Frame Objects:
  fi#-10: size=4, align=4, fixed, at location [SP+44]
  fi#-9: size=4, align=8, fixed, at location [SP+40]
  fi#-8: size=4, align=4, fixed, at location [SP+36]
  fi#-7: size=4, align=8, fixed, at location [SP+32]
  fi#-6: size=4, align=4, fixed, at location [SP+28]
  fi#-5: size=4, align=8, fixed, at location [SP+24]
  fi#-4: size=4, align=4, fixed, at location [SP+20]
  fi#-3: size=4, align=8, fixed, at location [SP+16]
  fi#-2: size=4, align=4, fixed, at location [SP+12]
  fi#-1: size=4, align=8, fixed, at location [SP+8]
  fi#0: dead

bb.0 (%ir-block.13):
  %6:intregs = L2_loadri_io %fixed-stack.1, 0 :: (load (s32) from %fixed-stack.1, align 8)
  %7:intregs = L2_loadri_io %fixed-stack.0, 0 :: (load (s32) from %fixed-stack.0)
  %8:intregs = L2_loadri_io %fixed-stack.2, 0 :: (load (s32) from %fixed-stack.2)
  S4_storeiri_io %fixed-stack.1, 0, 666 :: (store (s32) into %ir.alloca, align 8)
  %9:intregs = A2_sub killed %8:intregs, killed %7:intregs
  %10:intregs = A2_xor killed %9:intregs, killed %6:intregs
  $r0 = COPY %10:intregs
  PS_jmpret $r31, implicit-def dead $pc, implicit $r0

# End machine code for function f.

  ...

# *** IR Dump After Stack Frame Layout Analysis (stack-frame-layout) ***:
# Machine code for function f: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, FailsVerification, TracksDebugUserValues
Frame Objects:
  fi#-10: size=4, align=4, fixed, at location [SP+44]
  fi#-9: size=4, align=8, fixed, at location [SP+40]
  fi#-8: size=4, align=4, fixed, at location [SP+36]
  fi#-7: size=4, align=8, fixed, at location [SP+32]
  fi#-6: size=4, align=4, fixed, at location [SP+28]
  fi#-5: size=4, align=8, fixed, at location [SP+24]
  fi#-4: size=4, align=4, fixed, at location [SP+20]
  fi#-3: size=4, align=8, fixed, at location [SP+16]
  fi#-2: size=4, align=4, fixed, at location [SP+12]
  fi#-1: size=4, align=8, fixed, at location [SP+8]
  fi#0: dead

bb.0 (%ir-block.13):
  BUNDLE implicit-def $r0, implicit $r29 {
    renamable $r0 = L2_loadri_io $r29, 36 :: (load (s32) from %fixed-stack.0)
    S4_storeiri_io $r29, 32, 666 :: (store (s32) into %ir.alloca, align 8)
  }
  BUNDLE implicit-def $r1, implicit-def $r2, implicit killed $r29 {
    renamable $r1 = L2_loadri_io $r29, 28 :: (load (s32) from %fixed-stack.2)
    renamable $r2 = L2_loadri_io killed $r29, 32 :: (load (s32) from %fixed-stack.1, align 8)
  }
  renamable $r0 = A2_sub killed renamable $r1, renamable $r0
  BUNDLE implicit-def dead $r0, implicit-def dead $pc, implicit $r0, implicit killed $r2, implicit killed $r31 {
    renamable $r0 = A2_xor renamable $r0, killed renamable $r2
    PS_jmpret killed $r31, implicit-def dead $pc, implicit internal killed $r0
  }

# End machine code for function f.

So at the end the store S4_storeiri_io $r29, 32, 666 has been re-orderd with the load L2_loadri_io killed $r29, 32. So $r2 will be 666 when doing the A2_xor. But it should really be the value of the input argument %a.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 23, 2024

@llvm/issue-subscribers-backend-hexagon

Author: None (mikaelholmen)

Reproduce with: 06eedff ``` llc bbi-94617_3.ll -o - -debug ``` [bbi-94617_3.ll.gz](https://github.com/llvm/llvm-project/files/15010509/bbi-94617_3.ll.gz)

If we look at the debug printouts, in "Initial selection DAG:" we get

SelectionDAG has 29 nodes:
  t0: ch,glue = EntryToken
    t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t3: i16 = truncate t2
    t5: i32,ch = CopyFromReg t0, Register:i32 %1
  t6: i16 = truncate t5
    t8: i32,ch = CopyFromReg t0, Register:i32 %2
  t9: i16 = truncate t8
    t11: i32,ch = CopyFromReg t0, Register:i32 %3
  t12: i16 = truncate t11
    t14: i32,ch = CopyFromReg t0, Register:i32 %4
  t15: i16 = truncate t14
    t17: i32,ch = CopyFromReg t0, Register:i32 %5
  t18: i16 = truncate t17
  t21: i16,ch = load&lt;(load (s16) from %fixed-stack.0)&gt; t0, FrameIndex:i64&lt;-1&gt;, undef:i64
  t23: i64 = Constant&lt;0&gt;
      t24: ch = store&lt;(store (s16) into @<!-- -->v_781, align 1)&gt; t21:1, t21, GlobalAddress:i64&lt;ptr @<!-- -->v_781&gt; 0, undef:i64
    t26: ch = store&lt;(store (s16) into %ir.param.addr, align 1)&gt; t24, Constant:i16&lt;666&gt;, FrameIndex:i64&lt;-1&gt;, undef:i64
  t28: ch = X86ISD::RET_GLUE t26, TargetConstant:i32&lt;0&gt;

Especially consider

t21: i16,ch = load&lt;(load (s16) from %fixed-stack.0)&gt; t0, FrameIndex:i64&lt;-1&gt;, undef:i64

and

t26: ch = store&lt;(store (s16) into %ir.param.addr, align 1)&gt; t24, Constant:i16&lt;666&gt;, FrameIndex:i64&lt;-1&gt;, undef:i64

The load and store both access "FrameIndex:i64<-1>", but the MemOperands look like

load (s16) from %fixed-stack.0

and

store (s16) into %ir.param.addr, align 1

so they access exactly the same memory on the stack, but according to the MemOperands it looks like they access different objects? I think this is wrong.

If we look further at the debug printouts at "MI Scheduling" we get

SU(0):   %6:gr16 = MOV16rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s16) from %fixed-stack.0, align 16)
  # preds left       : 0
  # succs left       : 1
  # rdefs left       : 0
  Latency            : 4
  Depth              : 0
  Height             : 4
  Successors:
    SU(2): Data Latency=4 Reg=%6
  Single Issue       : false;
[...]
SU(2):   MOV16mr %7:gr64, 1, $noreg, 0, $noreg, %6:gr16 :: (store (s16) into @<!-- -->v_781, align 1)
  # preds left       : 2
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 4
  Height             : 0
  Predecessors:
    SU(1): Data Latency=4 Reg=%7
    SU(0): Data Latency=4 Reg=%6
  Single Issue       : false;
SU(3):   MOV16mi %fixed-stack.0, 1, $noreg, 0, $noreg, 666 :: (store (s16) into %ir.param.addr, align 16)
  # preds left       : 0
  # succs left       : 0
  # rdefs left       : 0
  Latency            : 1
  Depth              : 0
  Height             : 0
  Single Issue       : false;

So there is no dependency whatsoever between the load, SU(0) and the store SU(3), even if they access the same memory.

On my downstream target this then lead to a miscompile when the scheduler choose to change the order of the load and store.

I think the problem is that SelectionDAGISel::LowerArguments and tryToElideArgumentCopy does rewrites and change the FI used in some instructions, but they leave the MemOperands unchanged.

This seems to be old but it recently lead to a miscompile for my target when the load and store suddenly executed in the wrong order.

bjope added a commit to bjope/llvm-project that referenced this issue Apr 23, 2024
Adding test case related to
  llvm#89060

It shows that after argument copy elison the scheduler may reorder
a load of the input argument and a store to the same fixed stack
entry (the fixed stack entry that is reused for the local variable).
bjope added a commit to bjope/llvm-project that referenced this issue Apr 23, 2024
This is a fix for miscompiles reported in
  llvm#89060

After argument copy elison the IR value for the eliminated alloca
is aliasing with the fixed stack object. This patch is making sure
that we mark the fixed stack object as being aliased with IR values
to avoid that for example schedulers are reordering accesses to
the fixed stack object. This could otherwise happen when there is a
mix of MemOperands refering the shared fixed stack slow via both
the IR value for the elided alloca, and via a fixed stack pseudo
source value (as would be the case when lowering the arguments).
@bjope bjope self-assigned this Apr 23, 2024
bjope added a commit that referenced this issue Apr 23, 2024
Adding test case related to
  #89060

It shows that after argument copy elison the scheduler may reorder
a load of the input argument and a store to the same fixed stack
entry (the fixed stack entry that is reused for the local variable).
bjope added a commit to bjope/llvm-project that referenced this issue Apr 23, 2024
This is a fix for miscompiles reported in
  llvm#89060

After argument copy elison the IR value for the eliminated alloca
is aliasing with the fixed stack object. This patch is making sure
that we mark the fixed stack object as being aliased with IR values
to avoid that for example schedulers are reordering accesses to
the fixed stack object. This could otherwise happen when there is a
mix of MemOperands refering the shared fixed stack slow via both
the IR value for the elided alloca, and via a fixed stack pseudo
source value (as would be the case when lowering the arguments).
bjope added a commit that referenced this issue Apr 23, 2024
…89712)

This is a fix for miscompiles reported in
  #89060

After argument copy elison the IR value for the eliminated alloca
is aliasing with the fixed stack object. This patch is making sure
that we mark the fixed stack object as being aliased with IR values
to avoid that for example schedulers are reordering accesses to
the fixed stack object. This could otherwise happen when there is a
mix of MemOperands refering the shared fixed stack slow via both
the IR value for the elided alloca, and via a fixed stack pseudo
source value (as would be the case when lowering the arguments).
@bjope
Copy link
Collaborator

bjope commented Apr 23, 2024

Solved by d8b253b

@bjope bjope closed this as completed Apr 23, 2024
AtariDreams pushed a commit to AtariDreams/llvm-project that referenced this issue May 4, 2024
…lvm#89712)

This is a fix for miscompiles reported in
  llvm#89060

After argument copy elison the IR value for the eliminated alloca
is aliasing with the fixed stack object. This patch is making sure
that we mark the fixed stack object as being aliased with IR values
to avoid that for example schedulers are reordering accesses to
the fixed stack object. This could otherwise happen when there is a
mix of MemOperands refering the shared fixed stack slow via both
the IR value for the elided alloca, and via a fixed stack pseudo
source value (as would be the case when lowering the arguments).

(cherry picked from commit d8b253b)
tstellar pushed a commit to AtariDreams/llvm-project that referenced this issue May 9, 2024
…lvm#89712)

This is a fix for miscompiles reported in
  llvm#89060

After argument copy elison the IR value for the eliminated alloca
is aliasing with the fixed stack object. This patch is making sure
that we mark the fixed stack object as being aliased with IR values
to avoid that for example schedulers are reordering accesses to
the fixed stack object. This could otherwise happen when there is a
mix of MemOperands refering the shared fixed stack slow via both
the IR value for the elided alloca, and via a fixed stack pseudo
source value (as would be the case when lowering the arguments).

(cherry picked from commit d8b253b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior llvm:SelectionDAG SelectionDAGISel as well miscompilation
Projects
None yet
Development

No branches or pull requests

5 participants