-
Notifications
You must be signed in to change notification settings - Fork 15k
[Clang] Use under for padded vectors before storing. #164821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Florian Hahn (fhahn) ChangesCurrently Clang usually leaves padding bits uninitialized, which means When expanding stores of vector types to include padding, the padding This interacts badly with coercion of arguments and return values, where Not sure if there's a better way, but I think we have a number of places Full diff: https://github.com/llvm/llvm-project/pull/164821.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 301d5770cf78f..a581cf821092f 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -2300,6 +2300,9 @@ void CodeGenFunction::EmitStoreOfScalar(llvm::Value *Value, Address Addr,
SmallVector<int, 16> Mask(NewVecTy->getNumElements(), -1);
std::iota(Mask.begin(), Mask.begin() + VecTy->getNumElements(), 0);
Value = Builder.CreateShuffleVector(Value, Mask, "extractVec");
+ // The extra lanes will be poison. Freeze the whole vector to make sure
+ // the padding memory is not poisoned, which may break coercion.
+ Value = Builder.CreateFreeze(Value);
SrcTy = NewVecTy;
}
if (Addr.getElementType() != SrcTy)
diff --git a/clang/test/CodeGen/AArch64/ext-vector-coercion.c b/clang/test/CodeGen/AArch64/ext-vector-coercion.c
new file mode 100644
index 0000000000000..44638fac1560a
--- /dev/null
+++ b/clang/test/CodeGen/AArch64/ext-vector-coercion.c
@@ -0,0 +1,43 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
+// RUN: %clang_cc1 -fenable-matrix -triple arm64-apple-macosx %s -emit-llvm -disable-llvm-passes -o - | FileCheck %s
+
+typedef float float3 __attribute__((ext_vector_type(3)));
+struct Vec3 {
+ union {
+ struct {
+ float x;
+ float y;
+ float z;
+ };
+ float vec __attribute__((ext_vector_type(3)));
+ };
+};
+
+// CHECK-LABEL: define i128 @add(
+// CHECK-SAME: i128 [[A_COERCE:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca [[STRUCT_VEC3:%.*]], align 16
+// CHECK-NEXT: [[A:%.*]] = alloca [[STRUCT_VEC3]], align 16
+// CHECK-NEXT: [[COERCE_DIVE:%.*]] = getelementptr inbounds nuw [[STRUCT_VEC3]], ptr [[A]], i32 0, i32 0
+// CHECK-NEXT: store i128 [[A_COERCE]], ptr [[COERCE_DIVE]], align 16
+// CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw [[STRUCT_VEC3]], ptr [[A]], i32 0, i32 0
+// CHECK-NEXT: [[LOADVECN:%.*]] = load <4 x float>, ptr [[TMP0]], align 16
+// CHECK-NEXT: [[EXTRACTVEC:%.*]] = shufflevector <4 x float> [[LOADVECN]], <4 x float> poison, <3 x i32> <i32 0, i32 1, i32 2>
+// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw [[STRUCT_VEC3]], ptr [[A]], i32 0, i32 0
+// CHECK-NEXT: [[LOADVECN1:%.*]] = load <4 x float>, ptr [[TMP1]], align 16
+// CHECK-NEXT: [[EXTRACTVEC2:%.*]] = shufflevector <4 x float> [[LOADVECN1]], <4 x float> poison, <3 x i32> <i32 0, i32 1, i32 2>
+// CHECK-NEXT: [[ADD:%.*]] = fadd <3 x float> [[EXTRACTVEC]], [[EXTRACTVEC2]]
+// CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw [[STRUCT_VEC3]], ptr [[RETVAL]], i32 0, i32 0
+// CHECK-NEXT: [[EXTRACTVEC3:%.*]] = shufflevector <3 x float> [[ADD]], <3 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 poison>
+// CHECK-NEXT: [[TMP3:%.*]] = freeze <4 x float> [[EXTRACTVEC3]]
+// CHECK-NEXT: store <4 x float> [[TMP3]], ptr [[TMP2]], align 16
+// CHECK-NEXT: [[COERCE_DIVE4:%.*]] = getelementptr inbounds nuw [[STRUCT_VEC3]], ptr [[RETVAL]], i32 0, i32 0
+// CHECK-NEXT: [[TMP4:%.*]] = load i128, ptr [[COERCE_DIVE4]], align 16
+// CHECK-NEXT: ret i128 [[TMP4]]
+//
+struct Vec3 add(struct Vec3 a) {
+ struct Vec3 res;
+ res.vec = a.vec + a.vec;
+ return res;
+}
+
|
clang/lib/CodeGen/CGExpr.cpp
Outdated
| Value = Builder.CreateShuffleVector(Value, Mask, "extractVec"); | ||
| // The extra lanes will be poison. Freeze the whole vector to make sure | ||
| // the padding memory is not poisoned, which may break coercion. | ||
| Value = Builder.CreateFreeze(Value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of freezing, can we shuffle in undef instead? Or does that end up being worse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to go with either. Not having the freeze is likely to be slightly better (e.g. we may be able to prove some lanes of the non-padding lanes could be poison), I just thought initially we might want to stay consistent w.r.t. shuffle with poison.
Updated now
fd3df1e to
ae3e4af
Compare
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef([^a-zA-Z0-9_-]|$)|UndefValue::get)' 'HEAD~1' HEAD clang/test/CodeGen/AArch64/ext-vector-coercion.c clang/lib/CodeGen/CGExpr.cpp clang/test/CodeGenCXX/matrix-vector-bit-int.cppThe following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
}Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
}Please refer to the Undefined Behavior Manual for more information. |
Currently Clang usually leaves padding bits uninitialized, which means they are undef at the moment. When expanding stores of vector types to include padding, the padding lanes will be poison, hence the padding bits will be poison. This interacts badly with coercion of arguments and return values, where 3 x float vectors will be loaded as i128 integer; poisoning the padding bits will make the whole value poison.
ae3e4af to
dd821bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Currently Clang usually leaves padding bits uninitialized, which means they are undef at the moment. When expanding stores of vector types to include padding, the padding lanes will be poison, hence the padding bits will be poison. This interacts badly with coercion of arguments and return values, where 3 x float vectors will be loaded as i128 integer; poisoning the padding bits will make the whole value poison. Not sure if there's a better way, but I think we have a number of places that currently rely on the padding being undef, not poison. PR: llvm/llvm-project#164821
Currently Clang usually leaves padding bits uninitialized, which means they are undef at the moment. When expanding stores of vector types to include padding, the padding lanes will be poison, hence the padding bits will be poison. This interacts badly with coercion of arguments and return values, where 3 x float vectors will be loaded as i128 integer; poisoning the padding bits will make the whole value poison. Not sure if there's a better way, but I think we have a number of places that currently rely on the padding being undef, not poison. PR: llvm#164821 rdar://162655662 (cherry-picked from 5378584)
Currently Clang usually leaves padding bits uninitialized, which means they are undef at the moment. When expanding stores of vector types to include padding, the padding lanes will be poison, hence the padding bits will be poison. This interacts badly with coercion of arguments and return values, where 3 x float vectors will be loaded as i128 integer; poisoning the padding bits will make the whole value poison. Not sure if there's a better way, but I think we have a number of places that currently rely on the padding being undef, not poison. PR: llvm#164821
Currently Clang usually leaves padding bits uninitialized, which means
they are undef at the moment.
When expanding stores of vector types to include padding, the padding
lanes will be poison, hence the padding bits will be poison.
This interacts badly with coercion of arguments and return values, where
3 x float vectors will be loaded as i128 integer; poisoning the padding
bits will make the whole value poison.
Not sure if there's a better way, but I think we have a number of places
that currently rely on the padding being undef, not poison.