Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MachineCopyPropagation] When the source of PreviousCopy is undef, we cannot replace sub register #74682

Closed
wants to merge 3 commits into from

Conversation

DianQK
Copy link
Member

@DianQK DianQK commented Dec 7, 2023

Fixes #74680.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 7, 2023

@llvm/pr-subscribers-llvm-regalloc

Author: DianQK (DianQK)

Changes

Fixes #74680.


Full diff: https://github.com/llvm/llvm-project/pull/74682.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineCopyPropagation.cpp (+3)
  • (added) llvm/test/CodeGen/AArch64/machine-cp-undef.mir (+145)
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index a032b31a1fc7c..fdbdaa0a6d641 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -443,6 +443,9 @@ static bool isNopCopy(const MachineInstr &PreviousCopy, MCRegister Src,
     return true;
   if (!TRI->isSubRegister(PreviousSrc, Src))
     return false;
+  // When the source of PreviousCopy is undef, we cannot replace sub register.
+  if (CopyOperands->Source->isUndef())
+    return false;
   unsigned SubIdx = TRI->getSubRegIndex(PreviousSrc, Src);
   return SubIdx == TRI->getSubRegIndex(PreviousDef, Def);
 }
diff --git a/llvm/test/CodeGen/AArch64/machine-cp-undef.mir b/llvm/test/CodeGen/AArch64/machine-cp-undef.mir
new file mode 100644
index 0000000000000..015b01fcc30b0
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/machine-cp-undef.mir
@@ -0,0 +1,145 @@
+# RUN: llc -o - %s -O3 -mcpu=apple-m1 -mtriple=arm64-apple-macos | FileCheck %s
+
+--- |
+  target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+  target triple = "arm64-apple-macosx11.0.0"
+  
+  ; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+  define fastcc noundef zeroext i1 @foo(i32 %self.0.val, i8 %self.4.val) unnamed_addr #0 {
+  start:
+    switch i32 %self.0.val, label %bb2 [
+      i32 0, label %bb9
+      i32 1, label %bb4
+      i32 2, label %bb5
+      i32 3, label %bb1
+    ]
+  
+  bb2:                                              ; preds = %start
+    unreachable
+  
+  bb4:                                              ; preds = %start
+    br label %bb9
+  
+  bb5:                                              ; preds = %start
+    br label %bb9
+  
+  bb1:                                              ; preds = %start
+    %trunc.not = icmp eq i8 %self.4.val, 0
+    br label %bb9
+  
+  bb9:                                              ; preds = %bb1, %bb5, %bb4, %start
+    %_0.0.off0 = phi i1 [ true, %bb5 ], [ true, %bb4 ], [ false, %start ], [ %trunc.not, %bb1 ]
+    ret i1 %_0.0.off0
+  }
+  
+  attributes #0 = { mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) "frame-pointer"="non-leaf" "target-cpu"="apple-m1" }
+  
+  !llvm.module.flags = !{!0, !1}
+  !llvm.ident = !{!2}
+  
+  !0 = !{i32 8, !"PIC Level", i32 2}
+  !1 = !{i32 7, !"PIE Level", i32 2}
+  !2 = !{!"rustc version 1.76.0-nightly (0e2dac837 2023-12-04)"}
+
+...
+---
+name:            foo
+alignment:       4
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: true
+registers:       []
+liveins:
+  - { reg: '$w0', virtual-reg: '' }
+  - { reg: '$w1', virtual-reg: '' }
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  maxAlignment:    1
+  adjustsStack:    false
+  hasCalls:        false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 0
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:      false
+  hasMustTailInVarArgFunc: false
+  hasTailCall:     false
+  localFrameSize:  0
+  savePoint:       ''
+  restorePoint:    ''
+fixedStack:      []
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo: {}
+jumpTable:
+  kind:            label-difference32
+  entries:
+    - id:              0
+      blocks:          [ '%bb.5', '%bb.2', '%bb.3', '%bb.4' ]
+body:             |
+  bb.0.start:
+    successors: %bb.1(0x80000000)
+    liveins: $w0, $w1
+  
+    ; CHECK: mov	w8, w0
+    ; CHECK-NOT: mov	x8, x0
+    ; CHECK: ldrb	w11, [x9, x8]
+    renamable $w8 = COPY $w0
+    renamable $w8 = ORRWrs $wzr, killed renamable $w8, 0, implicit-def $x8
+    renamable $w0 = COPY $wzr
+  
+  bb.1.start:
+    successors: %bb.5(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
+    liveins: $w0, $w1, $x8
+  
+    renamable $x9 = MOVaddrJT target-flags(aarch64-page) %jump-table.0, target-flags(aarch64-pageoff, aarch64-nc) %jump-table.0
+    early-clobber renamable $x10, dead early-clobber renamable $x11 = JumpTableDest32 killed renamable $x9, killed renamable $x8, %jump-table.0
+    BR killed renamable $x10
+  
+  bb.2.bb4:
+    successors: %bb.5(0x80000000)
+  
+    renamable $w0 = MOVi32imm 1
+    B %bb.5
+  
+  bb.3.bb5:
+    successors: %bb.5(0x80000000)
+  
+    renamable $w0 = MOVi32imm 1
+    B %bb.5
+  
+  bb.4.bb1:
+    successors: %bb.5(0x80000000)
+    liveins: $w1
+  
+    dead $wzr = ANDSWri killed renamable $w1, 7, implicit-def $nzcv
+    renamable $w0 = CSINCWr $wzr, $wzr, 1, implicit killed $nzcv
+  
+  bb.5.bb9:
+    liveins: $w0
+  
+    RET_ReallyLR implicit $w0
+
+...

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 7, 2023

@llvm/pr-subscribers-backend-aarch64

Author: DianQK (DianQK)

Changes

Fixes #74680.


Full diff: https://github.com/llvm/llvm-project/pull/74682.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineCopyPropagation.cpp (+3)
  • (added) llvm/test/CodeGen/AArch64/machine-cp-undef.mir (+145)
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index a032b31a1fc7c..fdbdaa0a6d641 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -443,6 +443,9 @@ static bool isNopCopy(const MachineInstr &PreviousCopy, MCRegister Src,
     return true;
   if (!TRI->isSubRegister(PreviousSrc, Src))
     return false;
+  // When the source of PreviousCopy is undef, we cannot replace sub register.
+  if (CopyOperands->Source->isUndef())
+    return false;
   unsigned SubIdx = TRI->getSubRegIndex(PreviousSrc, Src);
   return SubIdx == TRI->getSubRegIndex(PreviousDef, Def);
 }
diff --git a/llvm/test/CodeGen/AArch64/machine-cp-undef.mir b/llvm/test/CodeGen/AArch64/machine-cp-undef.mir
new file mode 100644
index 0000000000000..015b01fcc30b0
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/machine-cp-undef.mir
@@ -0,0 +1,145 @@
+# RUN: llc -o - %s -O3 -mcpu=apple-m1 -mtriple=arm64-apple-macos | FileCheck %s
+
+--- |
+  target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+  target triple = "arm64-apple-macosx11.0.0"
+  
+  ; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+  define fastcc noundef zeroext i1 @foo(i32 %self.0.val, i8 %self.4.val) unnamed_addr #0 {
+  start:
+    switch i32 %self.0.val, label %bb2 [
+      i32 0, label %bb9
+      i32 1, label %bb4
+      i32 2, label %bb5
+      i32 3, label %bb1
+    ]
+  
+  bb2:                                              ; preds = %start
+    unreachable
+  
+  bb4:                                              ; preds = %start
+    br label %bb9
+  
+  bb5:                                              ; preds = %start
+    br label %bb9
+  
+  bb1:                                              ; preds = %start
+    %trunc.not = icmp eq i8 %self.4.val, 0
+    br label %bb9
+  
+  bb9:                                              ; preds = %bb1, %bb5, %bb4, %start
+    %_0.0.off0 = phi i1 [ true, %bb5 ], [ true, %bb4 ], [ false, %start ], [ %trunc.not, %bb1 ]
+    ret i1 %_0.0.off0
+  }
+  
+  attributes #0 = { mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) "frame-pointer"="non-leaf" "target-cpu"="apple-m1" }
+  
+  !llvm.module.flags = !{!0, !1}
+  !llvm.ident = !{!2}
+  
+  !0 = !{i32 8, !"PIC Level", i32 2}
+  !1 = !{i32 7, !"PIE Level", i32 2}
+  !2 = !{!"rustc version 1.76.0-nightly (0e2dac837 2023-12-04)"}
+
+...
+---
+name:            foo
+alignment:       4
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: true
+registers:       []
+liveins:
+  - { reg: '$w0', virtual-reg: '' }
+  - { reg: '$w1', virtual-reg: '' }
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  maxAlignment:    1
+  adjustsStack:    false
+  hasCalls:        false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 0
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:      false
+  hasMustTailInVarArgFunc: false
+  hasTailCall:     false
+  localFrameSize:  0
+  savePoint:       ''
+  restorePoint:    ''
+fixedStack:      []
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo: {}
+jumpTable:
+  kind:            label-difference32
+  entries:
+    - id:              0
+      blocks:          [ '%bb.5', '%bb.2', '%bb.3', '%bb.4' ]
+body:             |
+  bb.0.start:
+    successors: %bb.1(0x80000000)
+    liveins: $w0, $w1
+  
+    ; CHECK: mov	w8, w0
+    ; CHECK-NOT: mov	x8, x0
+    ; CHECK: ldrb	w11, [x9, x8]
+    renamable $w8 = COPY $w0
+    renamable $w8 = ORRWrs $wzr, killed renamable $w8, 0, implicit-def $x8
+    renamable $w0 = COPY $wzr
+  
+  bb.1.start:
+    successors: %bb.5(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
+    liveins: $w0, $w1, $x8
+  
+    renamable $x9 = MOVaddrJT target-flags(aarch64-page) %jump-table.0, target-flags(aarch64-pageoff, aarch64-nc) %jump-table.0
+    early-clobber renamable $x10, dead early-clobber renamable $x11 = JumpTableDest32 killed renamable $x9, killed renamable $x8, %jump-table.0
+    BR killed renamable $x10
+  
+  bb.2.bb4:
+    successors: %bb.5(0x80000000)
+  
+    renamable $w0 = MOVi32imm 1
+    B %bb.5
+  
+  bb.3.bb5:
+    successors: %bb.5(0x80000000)
+  
+    renamable $w0 = MOVi32imm 1
+    B %bb.5
+  
+  bb.4.bb1:
+    successors: %bb.5(0x80000000)
+    liveins: $w1
+  
+    dead $wzr = ANDSWri killed renamable $w1, 7, implicit-def $nzcv
+    renamable $w0 = CSINCWr $wzr, $wzr, 1, implicit killed $nzcv
+  
+  bb.5.bb9:
+    liveins: $w0
+  
+    RET_ReallyLR implicit $w0
+
+...

@DianQK DianQK requested a review from nikic December 7, 2023 09:18
@DianQK DianQK force-pushed the machine-cp-undef branch 2 times, most recently from c35fa40 to efbd7f1 Compare December 7, 2023 14:56
… cannot replace sub register

The `postrapseudos` may replace `mov w8, w0` with `mov x8, x0`.
@davemgreen
Copy link
Collaborator

Hello. I think that if you removed undef from the first instruction the result would still be incorrect. With:

$x8 = ORRXrs $xzr, $x0, 0, implicit $w0
$w8 = ORRWrs $wzr, $w0, 0, implicit-def $x8

The second instruction will zero-extend the w0 register to x8. It would be OK to remove the first instruction (it is dead), it is not OK to remove the second if something is relying on the top bits being zero. I assume that's what goes wrong in your case? The top bits are not zero into the function?

@DianQK
Copy link
Member Author

DianQK commented Dec 8, 2023

Hello. I think that if you removed undef from the first instruction the result would still be incorrect. With:

$x8 = ORRXrs $xzr, $x0, 0, implicit $w0
$w8 = ORRWrs $wzr, $w0, 0, implicit-def $x8

I'm also curious about it, but this transformation has been around for a long time, so I'm assuming it's correct. I think there exists some other pass solution to this problem here.

The top bits are not zero into the function?

Yes. Please see the link https://llvm.godbolt.org/z/zv3vz8zcs.

_call_from_main:
        sub     sp, sp, #32
        stp     x29, x30, [sp, #16]
        add     x29, sp, #16
        mov     w8, #1
        str     w8, [sp, #4] ;;; We save `0x00000001` on stack
        add     x0, sp, #4 ;;; Save the address
        bl      _device_create_texture
        ldp     x29, x30, [sp, #16]
        add     sp, sp, #32
        ret
_device_create_texture:
        sub     sp, sp, #32
        stp     x29, x30, [sp, #16]
        add     x29, sp, #16
        ldr     x8, [x0] ;;; Load 8-byte, get `0x6f60400000000001`
        str     x8, [sp]
        ldr     w9, [x0, #8]
        str     w9, [sp, #8]
        ubfx    x1, x8, #32, #8
        mov     x0, x8
        bl      __ZN12repro_11790213TextureFormat17required_features17h5e4d5de64322154bE
        ..
        ret
__ZN12repro_11790213TextureFormat17required_features17h5e4d5de64322154bE:
        mov     x8, x0 ;;;  x8/x0 is `0x6f60400000000001`
        mov     w0, #0
        adrp    x9, LJTI10_0@PAGE
        add     x9, x9, LJTI10_0@PAGEOFF
        adr     x10, LBB10_1
        ldrb    w11, [x9, x8] ;;; We expect x8 is `0x0000000000000001`

In this function, we can't guarantee that x8/x0 is 0x1, we have to do a transformation mov w8, w8.
So I think this delete subregister copy instruction is based on a convention similar to this one. We can delete mov w8, w0 on this:

mov x8, x0
mov w8, w0
mov w8, w8

If I use -O0, I can get the result:

__ZN12repro_11790213TextureFormat17required_features17h5e4d5de64322154bE:
        sub     sp, sp, #16
        str     w1, [sp, #4]
        subs    w8, w0, #0
        mov     w8, w8 ; This looks like a transformation that guarantees that x8 is `0x0000000000000001`.

After that I found the following code in lowerCopy.

// This instruction is reading and writing X registers. This may upset
// the register scavenger and machine verifier, so we need to indicate
// that we are reading an undefined value from SrcRegX, but a proper
// value from SrcReg.
BuildMI(MBB, I, DL, get(AArch64::ORRXrr), DestRegX)
.addReg(AArch64::XZR)
.addReg(SrcRegX, RegState::Undef)
.addReg(SrcReg, RegState::Implicit | getKillRegState(KillSrc));

If we lower the code to mov x8, x0 instead of mov w8, w0, this may be somewhat unsafe. So this code leaves an undef.
My PR was answering this comment and undef.

@DianQK
Copy link
Member Author

DianQK commented Dec 11, 2023

@nikic Would you approve of this? Or do we need to add other codegen participants to review?
@davemgreen Any other comments?

@davemgreen
Copy link
Collaborator

I don't believe the undef is the issue - I think the issue is that AArch64InstrInfo::isCopyInstrImpl is saying that a W-reg orr is a copy, even if it is really a zextend because the entire X output register is depended upon.

Can you try and add something to isCopyInstImpl instead, that says: If the register is virtual then it should not be a subreg, and if the register is physical then there should not be an implicit def of the X reg. Would that solve your issue, and would it cause other problems in AArch64 codegen? Thanks

@DianQK
Copy link
Member Author

DianQK commented Dec 12, 2023

I don't believe the undef is the issue - I think the issue is that AArch64InstrInfo::isCopyInstrImpl is saying that a W-reg orr is a copy, even if it is really a zextend because the entire X output register is depended upon.

Thanks for the explanation.

Can you try and add something to isCopyInstrImpl instead, that says: If the register is virtual then it should not be a subreg, and if the register is physical then there should not be an implicit def of the X reg. Would that solve your issue, and would it cause other problems in AArch64 codegen? Thanks

I'm not sure if I'm misunderstanding anything. But I tried to file the #75184 PR.

It breaks the original commit 5372658. It looks like it's because we're generating the mov instruction for the ORRWsr. (At this point this is a copy instruction?) I'm not sure exactly what is going wrong. Maybe we need to add a new method to TargetRegisterInfo?

Or are you going to fix this issue? I'm not familiar with MIR.
Alternatively, we can add implicit-def to MachineCopyPropagation? But that doesn't seem right either.

@davemgreen
Copy link
Collaborator

Thanks. It sounds like there are not a lot of code changes, which is a good sign. I didn't expect the debug problems though.

I'll try and take a look at the patch. Perhaps you are right that we need a new method for the debug info to use.

@DianQK
Copy link
Member Author

DianQK commented Dec 13, 2023

I'll try and take a look at the patch. Perhaps you are right that we need a new method for the debug info to use.

Based on

`TargetInstrInfo::isCopyInstrImpl` must be implemented to recognise any
instructions that are copy-like -- `LiveDebugValues` uses this to identify when
values move between registers.
, perhaps we could add an isCopyLikeInstr method? This should be a good name while changing very little.I 'll give it a try later.

@DianQK DianQK marked this pull request as draft December 13, 2023 15:09
@DianQK
Copy link
Member Author

DianQK commented Dec 13, 2023

I tried adding isCopyLikeInstr to #75184. All known test cases have passed.

@DianQK DianQK closed this Dec 14, 2023
@DianQK DianQK deleted the machine-cp-undef branch December 14, 2023 02:17
DianQK added a commit that referenced this pull request Dec 14, 2023
nikic pushed a commit to rust-lang/llvm-project that referenced this pull request Dec 14, 2023
MingcongBai pushed a commit to AOSC-Tracking/llvm-project that referenced this pull request Mar 26, 2024
qihangkong pushed a commit to rvgpu/rvgpu-llvm that referenced this pull request Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong register used on apple-m1
4 participants