-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PowerPC] Add special handling for arguments that are smaller than pointer size. #119003
Conversation
pointer size. When arguments are passed in memory instead of registers we currently load the entire pointer size even though the argument may be smaller. For exmaple if the pointer size if i32 then we use a load word even if the argument is only an i8. This patch clears the bits that are not required to ensure that we are getting the correct value even if the load is larger.
|
@llvm/pr-subscribers-backend-powerpc Author: Stefan Pintilie (stefanp-ibm) ChangesWhen arguments are passed in memory instead of registers we currently load the entire pointer size even though the argument may be smaller. For exmaple if the pointer size if i32 then we use a load word even if the argument is only an i8. This patch clears the bits that are not required to ensure that we are getting the correct value even if the load is larger. Full diff: https://github.com/llvm/llvm-project/pull/119003.diff 3 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 14e09d502b696b..e3f607bb02f04b 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -7244,6 +7244,8 @@ SDValue PPCTargetLowering::LowerFormalArguments_AIX(
MVT LocVT = VA.getLocVT();
MVT ValVT = VA.getValVT();
ISD::ArgFlagsTy Flags = Ins[VA.getValNo()].Flags;
+
+ EVT ArgVT = Ins[VA.getValNo()].ArgVT;
// For compatibility with the AIX XL compiler, the float args in the
// parameter save area are initialized even if the argument is available
// in register. The caller is required to initialize both the register
@@ -7291,7 +7293,17 @@ SDValue PPCTargetLowering::LowerFormalArguments_AIX(
SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
SDValue ArgValue =
DAG.getLoad(ValVT, dl, Chain, FIN, MachinePointerInfo());
- InVals.push_back(ArgValue);
+
+ // While the ABI specifies the the higher bits of the load should be
+ // zeroed out this is not always the case. For safety this code will zero
+ // extend the loaded value if the size of the argument type is smaller
+ // then the load.
+ if (!ArgVT.isVector() && !ValVT.isVector() &&
+ ArgVT.getScalarSizeInBits() < ValVT.getScalarSizeInBits()) {
+ SDValue ArgValueExt = DAG.getZeroExtendInReg(ArgValue, dl, ArgVT);
+ InVals.push_back(ArgValueExt);
+ } else
+ InVals.push_back(ArgValue);
};
// Vector arguments to VaArg functions are passed both on the stack, and
diff --git a/llvm/test/CodeGen/PowerPC/aix-cc-abi-mir.ll b/llvm/test/CodeGen/PowerPC/aix-cc-abi-mir.ll
index 00d1c471c2fa7c..efc1bd2d47a8a8 100644
--- a/llvm/test/CodeGen/PowerPC/aix-cc-abi-mir.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-cc-abi-mir.ll
@@ -1102,12 +1102,12 @@ define i64 @test_ints_stack(i32 %i1, i32 %i2, i32 %i3, i32 %i4, i32 %i5, i32 %i6
; 32BIT-NEXT: {{ $}}
; 32BIT-NEXT: renamable $r11 = LWZ 0, %fixed-stack.0 :: (load (s32) from %fixed-stack.0)
; 32BIT-NEXT: renamable $r12 = LWZ 0, %fixed-stack.4 :: (load (s32) from %fixed-stack.4)
- ; 32BIT-NEXT: renamable $r0 = LWZ 0, %fixed-stack.1 :: (load (s32) from %fixed-stack.1, align 8)
+ ; 32BIT-NEXT: renamable $r0 = LBZ 3, %fixed-stack.1 :: (load (s8) from %fixed-stack.1 + 3, basealign 4)
; 32BIT-NEXT: renamable $r31 = LWZ 4, %fixed-stack.3 :: (load (s32) from %fixed-stack.3 + 4, basealign 16)
; 32BIT-NEXT: renamable $r30 = LWZ 0, %fixed-stack.3 :: (load (s32) from %fixed-stack.3, align 16)
; 32BIT-NEXT: renamable $r29 = LWZ 0, %fixed-stack.5 :: (load (s32) from %fixed-stack.5, align 8)
- ; 32BIT-NEXT: renamable $r28 = LWZ 0, %fixed-stack.6 :: (load (s32) from %fixed-stack.6)
- ; 32BIT-NEXT: renamable $r27 = LWZ 0, %fixed-stack.7 :: (load (s32) from %fixed-stack.7, align 16)
+ ; 32BIT-NEXT: renamable $r28 = LBZ 3, %fixed-stack.6 :: (load (s8) from %fixed-stack.6 + 3, basealign 4)
+ ; 32BIT-NEXT: renamable $r27 = LHZ 2, %fixed-stack.7 :: (load (s16) from %fixed-stack.7 + 2, basealign 4)
; 32BIT-NEXT: renamable $r26 = LWZ 4, %fixed-stack.9 :: (load (s32) from %fixed-stack.9 + 4, basealign 8)
; 32BIT-NEXT: renamable $r25 = LWZ 0, %fixed-stack.9 :: (load (s32) from %fixed-stack.9, align 8)
; 32BIT-NEXT: renamable $r3 = nsw ADD4 killed renamable $r3, killed renamable $r4
@@ -1143,13 +1143,13 @@ define i64 @test_ints_stack(i32 %i1, i32 %i2, i32 %i3, i32 %i4, i32 %i5, i32 %i6
; 64BIT: bb.0.entry:
; 64BIT-NEXT: liveins: $x3, $x4, $x5, $x6, $x7, $x8, $x9, $x10
; 64BIT-NEXT: {{ $}}
- ; 64BIT-NEXT: renamable $r11 = LWZ 0, %fixed-stack.1, implicit-def $x11 :: (load (s32) from %fixed-stack.1)
+ ; 64BIT-NEXT: renamable $r11 = LBZ 3, %fixed-stack.1, implicit-def $x11 :: (load (s8) from %fixed-stack.1 + 3, basealign 4)
; 64BIT-NEXT: renamable $x12 = LWZ8 0, %fixed-stack.4 :: (load (s32) from %fixed-stack.4)
- ; 64BIT-NEXT: renamable $x0 = LWA 0, %fixed-stack.0 :: (load (s32) from %fixed-stack.0)
- ; 64BIT-NEXT: renamable $x2 = LD 0, %fixed-stack.2 :: (load (s64) from %fixed-stack.2)
- ; 64BIT-NEXT: renamable $x31 = LWA 0, %fixed-stack.3 :: (load (s32) from %fixed-stack.3)
- ; 64BIT-NEXT: renamable $r30 = LWZ 0, %fixed-stack.5, implicit-def $x30 :: (load (s32) from %fixed-stack.5)
- ; 64BIT-NEXT: renamable $x29 = LWA 0, %fixed-stack.6 :: (load (s32) from %fixed-stack.6)
+ ; 64BIT-NEXT: renamable $r0 = LBZ 3, %fixed-stack.5, implicit-def $x0 :: (load (s8) from %fixed-stack.5 + 3, basealign 4)
+ ; 64BIT-NEXT: renamable $x2 = LWA 0, %fixed-stack.0 :: (load (s32) from %fixed-stack.0)
+ ; 64BIT-NEXT: renamable $x31 = LD 0, %fixed-stack.2 :: (load (s64) from %fixed-stack.2)
+ ; 64BIT-NEXT: renamable $x30 = LWA 0, %fixed-stack.3 :: (load (s32) from %fixed-stack.3)
+ ; 64BIT-NEXT: renamable $x29 = LHZ8 2, %fixed-stack.6
; 64BIT-NEXT: renamable $x28 = LD 0, %fixed-stack.7 :: (load (s64) from %fixed-stack.7, align 16)
; 64BIT-NEXT: renamable $r3 = nsw ADD4 renamable $r3, renamable $r4, implicit killed $x4, implicit killed $x3
; 64BIT-NEXT: renamable $r3 = nsw ADD4 killed renamable $r3, renamable $r5, implicit killed $x5
@@ -1161,12 +1161,12 @@ define i64 @test_ints_stack(i32 %i1, i32 %i2, i32 %i3, i32 %i4, i32 %i5, i32 %i6
; 64BIT-NEXT: renamable $x3 = EXTSW_32_64 killed renamable $r3
; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x28
; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x29
- ; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x30
+ ; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x0
; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x12
+ ; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x30
; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x31
- ; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x2
; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x11
- ; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x0
+ ; 64BIT-NEXT: renamable $x3 = nsw ADD8 killed renamable $x3, killed renamable $x2
; 64BIT-NEXT: BLR8 implicit $lr8, implicit $rm, implicit $x3
entry:
%add = add nsw i32 %i1, %i2
@@ -1611,8 +1611,8 @@ define i32 @mix_callee(double %d1, double %d2, double %d3, double %d4, i8 zeroex
; 32BIT-NEXT: liveins: $f1, $f2, $f3, $f4
; 32BIT-NEXT: {{ $}}
; 32BIT-NEXT: renamable $r3 = LWZ 0, %fixed-stack.3 :: (load (s32) from %fixed-stack.3)
- ; 32BIT-NEXT: renamable $r4 = LWZ 0, %fixed-stack.5 :: (load (s32) from %fixed-stack.5)
- ; 32BIT-NEXT: renamable $r5 = LWZ 0, %fixed-stack.6 :: (load (s32) from %fixed-stack.6, align 8)
+ ; 32BIT-NEXT: renamable $r4 = LHZ 2, %fixed-stack.5 :: (load (s16) from %fixed-stack.5 + 2, basealign 4)
+ ; 32BIT-NEXT: renamable $r5 = LBZ 3, %fixed-stack.6 :: (load (s8) from %fixed-stack.6 + 3, basealign 4)
; 32BIT-NEXT: renamable $r6 = LWZ 0, %fixed-stack.2 :: (load (s32) from %fixed-stack.2, align 8)
; 32BIT-NEXT: renamable $r7 = LIS 17200
; 32BIT-NEXT: STW killed renamable $r7, 0, %stack.1 :: (store (s32) into %stack.1, align 8)
diff --git a/llvm/test/CodeGen/PowerPC/aix-cc-abi.ll b/llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
index 433d4273444660..3d5107139cc04f 100644
--- a/llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
@@ -1196,13 +1196,13 @@ define i64 @test_ints_stack(i32 %i1, i32 %i2, i32 %i3, i32 %i4, i32 %i5, i32 %i6
; ASM32PWR4-NEXT: add 3, 3, 10
; ASM32PWR4-NEXT: srawi 5, 11, 31
; ASM32PWR4-NEXT: srawi 8, 3, 31
-; ASM32PWR4-NEXT: lwz 4, 64(1)
+; ASM32PWR4-NEXT: lhz 4, 66(1)
; ASM32PWR4-NEXT: lwz 7, 56(1)
; ASM32PWR4-NEXT: stw 31, -4(1) # 4-byte Folded Spill
; ASM32PWR4-NEXT: srawi 31, 12, 31
; ASM32PWR4-NEXT: addc 3, 3, 6
; ASM32PWR4-NEXT: adde 7, 8, 7
-; ASM32PWR4-NEXT: lwz 6, 68(1)
+; ASM32PWR4-NEXT: lbz 6, 71(1)
; ASM32PWR4-NEXT: srawi 8, 4, 31
; ASM32PWR4-NEXT: addc 3, 3, 4
; ASM32PWR4-NEXT: adde 7, 7, 8
@@ -1216,7 +1216,7 @@ define i64 @test_ints_stack(i32 %i1, i32 %i2, i32 %i3, i32 %i4, i32 %i5, i32 %i6
; ASM32PWR4-NEXT: lwz 7, 80(1)
; ASM32PWR4-NEXT: adde 4, 4, 31
; ASM32PWR4-NEXT: addc 3, 3, 0
-; ASM32PWR4-NEXT: lwz 6, 88(1)
+; ASM32PWR4-NEXT: lbz 6, 91(1)
; ASM32PWR4-NEXT: adde 4, 4, 7
; ASM32PWR4-NEXT: addc 3, 3, 6
; ASM32PWR4-NEXT: lwz 31, -4(1) # 4-byte Folded Reload
@@ -1228,29 +1228,29 @@ define i64 @test_ints_stack(i32 %i1, i32 %i2, i32 %i3, i32 %i4, i32 %i5, i32 %i6
; ASM64PWR4-LABEL: test_ints_stack:
; ASM64PWR4: # %bb.0: # %entry
; ASM64PWR4-NEXT: add 3, 3, 4
-; ASM64PWR4-NEXT: ld 4, 112(1)
+; ASM64PWR4-NEXT: std 2, -8(1) # 8-byte Folded Spill
; ASM64PWR4-NEXT: add 3, 3, 5
; ASM64PWR4-NEXT: add 3, 3, 6
; ASM64PWR4-NEXT: add 3, 3, 7
-; ASM64PWR4-NEXT: lwa 12, 124(1)
+; ASM64PWR4-NEXT: ld 4, 112(1)
; ASM64PWR4-NEXT: add 3, 3, 8
; ASM64PWR4-NEXT: add 3, 3, 9
+; ASM64PWR4-NEXT: lhz 5, 126(1)
; ASM64PWR4-NEXT: add 3, 3, 10
; ASM64PWR4-NEXT: extsw 3, 3
-; ASM64PWR4-NEXT: lwz 5, 132(1)
; ASM64PWR4-NEXT: add 3, 3, 4
-; ASM64PWR4-NEXT: add 3, 3, 12
-; ASM64PWR4-NEXT: std 2, -8(1) # 8-byte Folded Spill
+; ASM64PWR4-NEXT: lbz 2, 135(1)
; ASM64PWR4-NEXT: add 3, 3, 5
-; ASM64PWR4-NEXT: lwz 2, 140(1)
-; ASM64PWR4-NEXT: lwa 11, 148(1)
+; ASM64PWR4-NEXT: lwz 0, 140(1)
; ASM64PWR4-NEXT: add 3, 3, 2
+; ASM64PWR4-NEXT: lwa 11, 148(1)
+; ASM64PWR4-NEXT: add 3, 3, 0
; ASM64PWR4-NEXT: add 3, 3, 11
; ASM64PWR4-NEXT: ld 4, 152(1)
-; ASM64PWR4-NEXT: lwz 0, 164(1)
+; ASM64PWR4-NEXT: lbz 12, 167(1)
; ASM64PWR4-NEXT: add 3, 3, 4
; ASM64PWR4-NEXT: lwa 5, 172(1)
-; ASM64PWR4-NEXT: add 3, 3, 0
+; ASM64PWR4-NEXT: add 3, 3, 12
; ASM64PWR4-NEXT: add 3, 3, 5
; ASM64PWR4-NEXT: ld 2, -8(1) # 8-byte Folded Reload
; ASM64PWR4-NEXT: blr
@@ -1720,11 +1720,11 @@ entry:
define i32 @mix_callee(double %d1, double %d2, double %d3, double %d4, i8 zeroext %c1, i16 signext %s1, i64 %ll1, i32 %i1, i32 %i2, i32 %i3) {
; ASM32PWR4-LABEL: mix_callee:
; ASM32PWR4: # %bb.0: # %entry
-; ASM32PWR4-NEXT: lwz 4, 60(1)
+; ASM32PWR4-NEXT: lhz 4, 62(1)
; ASM32PWR4-NEXT: lis 8, 17200
; ASM32PWR4-NEXT: fadd 1, 1, 2
; ASM32PWR4-NEXT: fadd 1, 1, 3
-; ASM32PWR4-NEXT: lwz 5, 56(1)
+; ASM32PWR4-NEXT: lbz 5, 59(1)
; ASM32PWR4-NEXT: lwz 3, 68(1)
; ASM32PWR4-NEXT: add 4, 5, 4
; ASM32PWR4-NEXT: lwz 5, L..C34(2) # %const.0
|
| ValVT.isInteger() && | ||
| ArgVT.getScalarSizeInBits() < ValVT.getScalarSizeInBits()) { | ||
| SDValue ArgValueTrunc = DAG.getNode(ISD::TRUNCATE, dl, ArgVT, ArgValue); | ||
| // DAG.getLoad(ArgVT, dl, Chain, FIN, MachinePointerInfo()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove commented line
| InVals.push_back(ArgValueExt); | ||
| } else { | ||
| InVals.push_back(ArgValue); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
braces not needed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RolandF77, the coding guidelines have been updated to discourage mixed use of braces in an if/else chain.
| // While the ABI specifies the the higher bits of the load should be | ||
| // zeroed out this is not always the case. For safety this code will zero | ||
| // extend the loaded value if the size of the argument type is smaller | ||
| // then the load. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment should be updated to include sign extension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| } else | ||
| InVals.push_back(ArgValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding guidelines discourage } else now: https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
| } else | |
| InVals.push_back(ArgValue); | |
| } else { | |
| InVals.push_back(ArgValue); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean this example here:
// Use braces for the `else if` and `else` block to keep it uniform with the
// `if` block.
if (isa<FunctionDecl>(D)) {
verifyFunctionDecl(D);
handleFunctionDecl(D);
} else if (isa<GlobalVarDecl>(D)) {
handleGlobalVarDecl(D);
} else {
handleOtherDecl(D);
}
Okay, I will change is back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any coverage for i8 signext?
| InVals.push_back(ArgValue); | ||
|
|
||
| // While the ABI specifies that the higher bits of the load should be | ||
| // zeroed out or sign extended this is not always the case. For safety |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: Missing comma before "this is".
Also, this line exceeds 80 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this so quickly Stefan. 2 minor comments but otherwise LGTM.
| @@ -7291,7 +7294,21 @@ SDValue PPCTargetLowering::LowerFormalArguments_AIX( | |||
| SDValue FIN = DAG.getFrameIndex(FI, PtrVT); | |||
| SDValue ArgValue = | |||
| DAG.getLoad(ValVT, dl, Chain, FIN, MachinePointerInfo()); | |||
| InVals.push_back(ArgValue); | |||
|
|
|||
| // While the ABI specifies that the higher bits of the load should be | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
real minor nit - a suggested slight rewording -
// While the ABI specifies the argument type is sign or zero extended out to register width
// not all code is compliant. We truncate and re-extend to be more forgiving of these callers
// when the argument type is smaller then register width.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More typo/grammar fixes:
While the ABI specifies the argument type is (sign or zero) extended out to register width, not all code is compliant. We truncate and re-extend to be more forgiving of these callers when the argument type is smaller than register width.
| @@ -1185,69 +1185,69 @@ define i64 @test_ints_stack(i32 %i1, i32 %i2, i32 %i3, i32 %i4, i32 %i5, i32 %i6 | |||
| ; ASM32PWR4-LABEL: test_ints_stack: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a signext i8 argument to this test as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i1 zeroext is another case that seems to be missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i1 zeroextis another case that seems to be missing.
It does seem that this PR departs from other AIX compilers in that it actually clears all but the least significant bit. Other AIX compilers treat this case the same as the i8 zeroext case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem that this PR departs from other AIX compilers in that it actually clears all but the least significant bit. Other AIX compilers treat this case the same as the
i8 zeroextcase.
This has been resolved with the changes; thanks @stefanp-ibm!
When arguments are passed in memory instead of registers we currently load the entire pointer size even though the argument may be smaller. For exmaple if the pointer size if i32 then we use a load word even if the argument is only an i8. This patch clears the bits that are not required to ensure that we are getting the correct value even if the load is larger.