Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PowerPC] don't eliminate the signext if the input is zero extended #84419

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chenzheng1030
Copy link
Collaborator

@b is -1, so 1 is expected to store to @g in block g.exit
But with trunk llvm, 0 is stored instead.

At line https://github.com/chenzheng1030/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp#L542-L554
optimizes:

    t39: i64 = LDtoc<Mem:(load (s64) from got)> TargetGlobalAddress:i64<ptr @b> 0, Register:i64 $x2
  t4: i32,ch = LWZ<Mem:(dereferenceable load (s32) from @b, !tbaa !3)> TargetConstant:i64<0>, t39, t0
  t48: i64 = EXTSW_32_64 t4

        t7: ch = CopyToReg t0, Register:i64 %0, t48

  t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t5: i32 = EXTRACT_SUBREG t2, TargetConstant:i32<1>

to:

  %3:gprc = LWZ 0, killed %2:g8rc_and_g8rc_nox0 :: (dereferenceable load (s32) from @b, !tbaa !3)
  %24:gprc = COPY %3:gprc

Fixes #74915

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 8, 2024

@llvm/pr-subscribers-backend-powerpc

Author: Chen Zheng (chenzheng1030)

Changes

@<!-- -->b is -1, so 1 is expected to store to @<!-- -->g in block g.exit
But with trunk llvm, 0 is stored instead.

At line https://github.com/chenzheng1030/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp#L542-L554
optimizes:

    t39: i64 = LDtoc&lt;Mem:(load (s64) from got)&gt; TargetGlobalAddress:i64&lt;ptr @<!-- -->b&gt; 0, Register:i64 $x2
  t4: i32,ch = LWZ&lt;Mem:(dereferenceable load (s32) from @<!-- -->b, !tbaa !3)&gt; TargetConstant:i64&lt;0&gt;, t39, t0
  t48: i64 = EXTSW_32_64 t4

        t7: ch = CopyToReg t0, Register:i64 %0, t48

  t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t5: i32 = EXTRACT_SUBREG t2, TargetConstant:i32&lt;1&gt;

to:

  %3:gprc = LWZ 0, killed %2:g8rc_and_g8rc_nox0 :: (dereferenceable load (s32) from @<!-- -->b, !tbaa !3)
  %24:gprc = COPY %3:gprc

Fixes #74915


Full diff: https://github.com/llvm/llvm-project/pull/84419.diff

2 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.cpp (+5)
  • (modified) llvm/test/CodeGen/PowerPC/pr74951.ll (+4-4)
diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
index 5d37e929f8755e..7e0012c694d6b5 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
@@ -1041,6 +1041,11 @@ bool PPCInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,
   case PPC::EXTSW_32:
   case PPC::EXTSW_32_64:
     SrcReg = MI.getOperand(1).getReg();
+    // On 64-bit targets, extension can not be eliminated if the input is zero
+    // extended. The input before zero extention may be a negative value.
+    if (Subtarget.isPPC64() &&
+        isZeroExtended(SrcReg, &MI.getMF()->getRegInfo()))
+      return false;
     DstReg = MI.getOperand(0).getReg();
     SubIdx = PPC::sub_32;
     return true;
diff --git a/llvm/test/CodeGen/PowerPC/pr74951.ll b/llvm/test/CodeGen/PowerPC/pr74951.ll
index c1b2e3ee0dd68b..c4c98fe8705864 100644
--- a/llvm/test/CodeGen/PowerPC/pr74951.ll
+++ b/llvm/test/CodeGen/PowerPC/pr74951.ll
@@ -11,12 +11,12 @@ define noundef signext i32 @main() {
 ; CHECK-LABEL: main:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    ld r3, L..C0(r2) # @b
-; CHECK-NEXT:    lwz r3, 0(r3)
-; CHECK-NEXT:    andi. r4, r3, 65535
+; CHECK-NEXT:    lwz r4, 0(r3)
+; CHECK-NEXT:    andi. r3, r4, 65535
+; CHECK-NEXT:    extsw r3, r4
 ; CHECK-NEXT:    bne cr0, L..BB0_4
 ; CHECK-NEXT:  # %bb.1: # %lor.rhs.i.i
-; CHECK-NEXT:    extsw r4, r3
-; CHECK-NEXT:    neg r5, r4
+; CHECK-NEXT:    neg r5, r3
 ; CHECK-NEXT:    rldicl r5, r5, 1, 63
 ; CHECK-NEXT:    xori r5, r5, 1
 ; CHECK-NEXT:    cmpw r4, r5

@bzEq
Copy link
Collaborator

bzEq commented Mar 8, 2024

I'm not really catching where miscompilation occurs, could you please elaborate? To my understanding, the original transformation looks correct, https://alive2.llvm.org/ce/z/aHPwPK.

// extended. The input before zero extention may be a negative value.
if (Subtarget.isPPC64() &&
isZeroExtended(SrcReg, &MI.getMF()->getRegInfo()))
return false;
DstReg = MI.getOperand(0).getReg();
SubIdx = PPC::sub_32;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this transformation is incorrect when SrcReg is i16 or narrower. If SrcReg is i32, this transforamtion is still valid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this transformation is incorrect when SrcReg is i16 or narrower. If SrcReg is i32, this transforamtion is still valid.

There will be no i16 at this instremit phase(after isel).

The bug is in the description

    t39: i64 = LDtoc<Mem:(load (s64) from got)> TargetGlobalAddress:i64<ptr @b> 0, Register:i64 $x2
  t4: i32,ch = LWZ<Mem:(dereferenceable load (s32) from @b, !tbaa !3)> TargetConstant:i64<0>, t39, t0
  t48: i64 = EXTSW_32_64 t4

        t7: ch = CopyToReg t0, Register:i64 %0, t48

  t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t5: i32 = EXTRACT_SUBREG t2, TargetConstant:i32<1>

to:

  %3:gprc = LWZ 0, killed %2:g8rc_and_g8rc_nox0 :: (dereferenceable load (s32) from @b, !tbaa !3)
  %24:gprc = COPY %3:gprc

The specific thing on PPC is EXTSW_32_64(LWZ). If LWZ loads a negative value. Then higher 32bit of the EXTSW_32_64 result will be all 1 instead of 0.

(Due to such specialness on PPC, the optimization is guarded under a target hook TII->isCoalescableExtInstr().)

After the COPY optimization:

%3:gprc = LWZ 0, killed %2:g8rc_and_g8rc_nox0 :: (dereferenceable load (s32) from @b, !tbaa !3)
`%24:gprc  = COPY %3:gprc`, the higher 32-bit is undefined. In the final assembly, the higher 32-bits are all 0 from LWZ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid code generated by Powerpc64 -O1, -Oz
3 participants