-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clang] Emit TBAA info for enums in C #73326
Conversation
When emitting TBAA information for enums in C code we currently just treat the data as an 'omnipotent char'. However, with C strict aliasing this means we fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there are structs that contain arrays of enums, and clang pessmistically assumes that accesses to those enums could alias with other struct members that have a different type. According to https://en.cppreference.com/w/c/language/enum enums should be treated as 'int' types unless explicitly specified (C23) or if 'int' would not be large enough to hold all the enumerated values. In the latter case the compiler is free to choose a suitable integer that would hold all such values. When compiling C code this patch generates TBAA information for the enum by using an equivalent integer of the size clang has already chosen for the enum. I have ignored C++ for now because the rules are more complex. New test added here: clang/test/CodeGen/tbaa.c
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-clang-codegen Author: David Sherwood (david-arm) ChangesWhen emitting TBAA information for enums in C code we currently just treat the data as an 'omnipotent char'. However, with C strict aliasing this means we fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there are structs that contain arrays of enums, and clang pessmistically assumes that accesses to those enums could alias with other struct members that have a different type. According to https://en.cppreference.com/w/c/language/enum enums should be treated as 'int' types unless When compiling C code this patch generates TBAA New test added here: clang/test/CodeGen/tbaa.c Full diff: https://github.com/llvm/llvm-project/pull/73326.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 8705d3d65f1a573..f59d3d422d5209d 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
// Enum types are distinct types. In C++ they have "underlying types",
// however they aren't related for TBAA.
if (const EnumType *ETy = dyn_cast<EnumType>(Ty)) {
+ if (!Features.CPlusPlus)
+ return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0));
+
// In C++ mode, types have linkage, so we can rely on the ODR and
// on their mangled names, if they're external.
// TODO: Is there a way to get a program-wide unique name for a
// decl with local linkage or no linkage?
- if (!Features.CPlusPlus || !ETy->getDecl()->isExternallyVisible())
+ if (!ETy->getDecl()->isExternallyVisible())
return getChar();
SmallString<256> OutName;
diff --git a/clang/test/CodeGen/tbaa.c b/clang/test/CodeGen/tbaa.c
new file mode 100644
index 000000000000000..0ab81f60a71941c
--- /dev/null
+++ b/clang/test/CodeGen/tbaa.c
@@ -0,0 +1,116 @@
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -no-struct-path-tbaa -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefixes=PATH
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O0 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -relaxed-aliasing -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// Test TBAA metadata generated by front-end.
+//
+// NO-TBAA-NOT: !tbaa
+
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+typedef unsigned long long uint64_t;
+
+typedef enum {
+ RED_AUTO_32,
+ GREEN_AUTO_32,
+ BLUE_AUTO_32
+} EnumAuto32;
+
+typedef enum {
+ RED_AUTO_64,
+ GREEN_AUTO_64,
+ BLUE_AUTO_64 = 0x100000000ull
+} EnumAuto64;
+
+typedef enum : uint16_t {
+ RED_16,
+ GREEN_16,
+ BLUE_16
+} Enum16;
+
+typedef enum : uint8_t {
+ RED_8,
+ GREEN_8,
+ BLUE_8
+} Enum8;
+
+uint32_t g0(EnumAuto32 *E, uint32_t *val) {
+// CHECK-LABEL: define{{.*}} i32 @g0(
+// CHECK: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// CHECK: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// CHECK: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// PATH-LABEL: define{{.*}} i32 @g0(
+// PATH: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// PATH: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// PATH: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+ *val = 5;
+ *E = RED_AUTO_32;
+ return *val;
+}
+
+uint64_t g1(EnumAuto64 *E, uint64_t *val) {
+// CHECK-LABEL: define{{.*}} i64 @g1(
+// CHECK: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]]
+// CHECK: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]]
+// CHECK: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]]
+// PATH-LABEL: define{{.*}} i64 @g1(
+// PATH: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]]
+// PATH: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]]
+// PATH: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]]
+ *val = 5;
+ *E = RED_AUTO_64;
+ return *val;
+}
+
+uint16_t g2(Enum16 *E, uint16_t *val) {
+// CHECK-LABEL: define{{.*}} i16 @g2(
+// CHECK: store i16 5, ptr %{{.*}}, align 2, !tbaa [[TAG_i16:!.*]]
+// CHECK: store i16 0, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// CHECK: load i16, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// PATH-LABEL: define{{.*}} i16 @g2(
+// PATH: store i16 5, ptr %{{.*}}, align 2, !tbaa [[TAG_i16:!.*]]
+// PATH: store i16 0, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// PATH: load i16, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+ *val = 5;
+ *E = RED_16;
+ return *val;
+}
+
+uint8_t g3(Enum8 *E, uint8_t *val) {
+// CHECK-LABEL: define{{.*}} i8 @g3(
+// CHECK: store i8 5, ptr %{{.*}}, align 1, !tbaa [[TAG_i8:!.*]]
+// CHECK: store i8 0, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// CHECK: load i8, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// PATH-LABEL: define{{.*}} i8 @g3(
+// PATH: store i8 5, ptr %{{.*}}, align 1, !tbaa [[TAG_i8:!.*]]
+// PATH: store i8 0, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// PATH: load i8, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+ *val = 5;
+ *E = RED_8;
+ return *val;
+}
+
+// CHECK: [[TYPE_char:!.*]] = !{!"omnipotent char", [[TAG_c_tbaa:!.*]],
+// CHECK: [[TAG_c_tbaa]] = !{!"Simple C/C++ TBAA"}
+// CHECK: [[TAG_i32]] = !{[[TYPE_i32:!.*]], [[TYPE_i32]], i64 0}
+// CHECK: [[TYPE_i32]] = !{!"int", [[TYPE_char]],
+// CHECK: [[TAG_i64]] = !{[[TYPE_i64:!.*]], [[TYPE_i64]], i64 0}
+// CHECK: [[TYPE_i64]] = !{!"long long", [[TYPE_char]],
+// CHECK: [[TAG_long]] = !{[[TYPE_long:!.*]], [[TYPE_long]], i64 0}
+// CHECK: [[TYPE_long]] = !{!"long", [[TYPE_char]],
+// CHECK: [[TAG_i16]] = !{[[TYPE_i16:!.*]], [[TYPE_i16]], i64 0}
+// CHECK: [[TYPE_i16]] = !{!"short", [[TYPE_char]],
+// CHECK: [[TAG_i8]] = !{[[TYPE_i8:!.*]], [[TYPE_char]], i64 0}
+
+// PATH: [[TYPE_char:!.*]] = !{!"omnipotent char", [[TAG_c_tbaa:!.*]],
+// PATH: [[TAG_c_tbaa]] = !{!"Simple C/C++ TBAA"}
+// PATH: [[TAG_i32]] = !{[[TYPE_i32:!.*]], [[TYPE_i32]], i64 0}
+// PATH: [[TYPE_i32]] = !{!"int", [[TYPE_char]],
+// PATH: [[TAG_i64]] = !{[[TYPE_i64:!.*]], [[TYPE_i64]], i64 0}
+// PATH: [[TYPE_i64]] = !{!"long long", [[TYPE_char]],
+// PATH: [[TAG_long]] = !{[[TYPE_long:!.*]], [[TYPE_long]], i64 0}
+// PATH: [[TYPE_long]] = !{!"long", [[TYPE_char]],
+// PATH: [[TAG_i16]] = !{[[TYPE_i16:!.*]], [[TYPE_i16]], i64 0}
+// PATH: [[TYPE_i16]] = !{!"short", [[TYPE_char]],
+// PATH: [[TAG_i8]] = !{[[TYPE_i8:!.*]], [[TYPE_char]], i64 0}
|
@llvm/pr-subscribers-clang Author: David Sherwood (david-arm) ChangesWhen emitting TBAA information for enums in C code we currently just treat the data as an 'omnipotent char'. However, with C strict aliasing this means we fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there are structs that contain arrays of enums, and clang pessmistically assumes that accesses to those enums could alias with other struct members that have a different type. According to https://en.cppreference.com/w/c/language/enum enums should be treated as 'int' types unless When compiling C code this patch generates TBAA New test added here: clang/test/CodeGen/tbaa.c Full diff: https://github.com/llvm/llvm-project/pull/73326.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 8705d3d65f1a573..f59d3d422d5209d 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
// Enum types are distinct types. In C++ they have "underlying types",
// however they aren't related for TBAA.
if (const EnumType *ETy = dyn_cast<EnumType>(Ty)) {
+ if (!Features.CPlusPlus)
+ return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0));
+
// In C++ mode, types have linkage, so we can rely on the ODR and
// on their mangled names, if they're external.
// TODO: Is there a way to get a program-wide unique name for a
// decl with local linkage or no linkage?
- if (!Features.CPlusPlus || !ETy->getDecl()->isExternallyVisible())
+ if (!ETy->getDecl()->isExternallyVisible())
return getChar();
SmallString<256> OutName;
diff --git a/clang/test/CodeGen/tbaa.c b/clang/test/CodeGen/tbaa.c
new file mode 100644
index 000000000000000..0ab81f60a71941c
--- /dev/null
+++ b/clang/test/CodeGen/tbaa.c
@@ -0,0 +1,116 @@
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -no-struct-path-tbaa -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefixes=PATH
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O0 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -relaxed-aliasing -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// Test TBAA metadata generated by front-end.
+//
+// NO-TBAA-NOT: !tbaa
+
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+typedef unsigned long long uint64_t;
+
+typedef enum {
+ RED_AUTO_32,
+ GREEN_AUTO_32,
+ BLUE_AUTO_32
+} EnumAuto32;
+
+typedef enum {
+ RED_AUTO_64,
+ GREEN_AUTO_64,
+ BLUE_AUTO_64 = 0x100000000ull
+} EnumAuto64;
+
+typedef enum : uint16_t {
+ RED_16,
+ GREEN_16,
+ BLUE_16
+} Enum16;
+
+typedef enum : uint8_t {
+ RED_8,
+ GREEN_8,
+ BLUE_8
+} Enum8;
+
+uint32_t g0(EnumAuto32 *E, uint32_t *val) {
+// CHECK-LABEL: define{{.*}} i32 @g0(
+// CHECK: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// CHECK: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// CHECK: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// PATH-LABEL: define{{.*}} i32 @g0(
+// PATH: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// PATH: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// PATH: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+ *val = 5;
+ *E = RED_AUTO_32;
+ return *val;
+}
+
+uint64_t g1(EnumAuto64 *E, uint64_t *val) {
+// CHECK-LABEL: define{{.*}} i64 @g1(
+// CHECK: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]]
+// CHECK: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]]
+// CHECK: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]]
+// PATH-LABEL: define{{.*}} i64 @g1(
+// PATH: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]]
+// PATH: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]]
+// PATH: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]]
+ *val = 5;
+ *E = RED_AUTO_64;
+ return *val;
+}
+
+uint16_t g2(Enum16 *E, uint16_t *val) {
+// CHECK-LABEL: define{{.*}} i16 @g2(
+// CHECK: store i16 5, ptr %{{.*}}, align 2, !tbaa [[TAG_i16:!.*]]
+// CHECK: store i16 0, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// CHECK: load i16, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// PATH-LABEL: define{{.*}} i16 @g2(
+// PATH: store i16 5, ptr %{{.*}}, align 2, !tbaa [[TAG_i16:!.*]]
+// PATH: store i16 0, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// PATH: load i16, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+ *val = 5;
+ *E = RED_16;
+ return *val;
+}
+
+uint8_t g3(Enum8 *E, uint8_t *val) {
+// CHECK-LABEL: define{{.*}} i8 @g3(
+// CHECK: store i8 5, ptr %{{.*}}, align 1, !tbaa [[TAG_i8:!.*]]
+// CHECK: store i8 0, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// CHECK: load i8, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// PATH-LABEL: define{{.*}} i8 @g3(
+// PATH: store i8 5, ptr %{{.*}}, align 1, !tbaa [[TAG_i8:!.*]]
+// PATH: store i8 0, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// PATH: load i8, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+ *val = 5;
+ *E = RED_8;
+ return *val;
+}
+
+// CHECK: [[TYPE_char:!.*]] = !{!"omnipotent char", [[TAG_c_tbaa:!.*]],
+// CHECK: [[TAG_c_tbaa]] = !{!"Simple C/C++ TBAA"}
+// CHECK: [[TAG_i32]] = !{[[TYPE_i32:!.*]], [[TYPE_i32]], i64 0}
+// CHECK: [[TYPE_i32]] = !{!"int", [[TYPE_char]],
+// CHECK: [[TAG_i64]] = !{[[TYPE_i64:!.*]], [[TYPE_i64]], i64 0}
+// CHECK: [[TYPE_i64]] = !{!"long long", [[TYPE_char]],
+// CHECK: [[TAG_long]] = !{[[TYPE_long:!.*]], [[TYPE_long]], i64 0}
+// CHECK: [[TYPE_long]] = !{!"long", [[TYPE_char]],
+// CHECK: [[TAG_i16]] = !{[[TYPE_i16:!.*]], [[TYPE_i16]], i64 0}
+// CHECK: [[TYPE_i16]] = !{!"short", [[TYPE_char]],
+// CHECK: [[TAG_i8]] = !{[[TYPE_i8:!.*]], [[TYPE_char]], i64 0}
+
+// PATH: [[TYPE_char:!.*]] = !{!"omnipotent char", [[TAG_c_tbaa:!.*]],
+// PATH: [[TAG_c_tbaa]] = !{!"Simple C/C++ TBAA"}
+// PATH: [[TAG_i32]] = !{[[TYPE_i32:!.*]], [[TYPE_i32]], i64 0}
+// PATH: [[TYPE_i32]] = !{!"int", [[TYPE_char]],
+// PATH: [[TAG_i64]] = !{[[TYPE_i64:!.*]], [[TYPE_i64]], i64 0}
+// PATH: [[TYPE_i64]] = !{!"long long", [[TYPE_char]],
+// PATH: [[TAG_long]] = !{[[TYPE_long:!.*]], [[TYPE_long]], i64 0}
+// PATH: [[TYPE_long]] = !{!"long", [[TYPE_char]],
+// PATH: [[TAG_i16]] = !{[[TYPE_i16:!.*]], [[TYPE_i16]], i64 0}
+// PATH: [[TYPE_i16]] = !{!"short", [[TYPE_char]],
+// PATH: [[TAG_i8]] = !{[[TYPE_i8:!.*]], [[TYPE_char]], i64 0}
|
clang/lib/CodeGen/CodeGenTBAA.cpp
Outdated
@@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) { | |||
// Enum types are distinct types. In C++ they have "underlying types", | |||
// however they aren't related for TBAA. | |||
if (const EnumType *ETy = dyn_cast<EnumType>(Ty)) { | |||
if (!Features.CPlusPlus) | |||
return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is entirely correct so would appreciate some guidance here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ETy->getDecl()->getIntegerType()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion - thanks!
Gentle ping! |
In C23, the enumeration type is compatible with its underlying type, which is either |
Hi @AaronBallman, yes the problem I found with always choosing
gcc chooses a |
The C notion of "compatibility" with regard to enums is a bit weird... clang and gcc disagree whether That said, this patch is pretty conservative; it's hard for me to imagine treating an enum as the underlying type could cause any issues (unless code is assuming an enum has universal aliasing). I guess this should be compatible with older clang versions, so we don't need to rev the version of TBAA data. |
This seems conservatively correct, yeah. My reading is that we could also use the underlying type as a parent type for the TBAA metadata: enums are compatible with their underlying type, but two enums with the same underlying type are not compatible with each other. But this rule should also work. |
Do you think it's worth adding something to the Clang release note? |
Yes, please! |
Done. Hope the documentation I added makes sense! |
I thought the suggestion was to add a few lines to https://github.com/llvm/llvm-project/blob/main/clang/docs/ReleaseNotes.rst |
Yes you're right! For some reason I got mixed up with the LangRef, but I guess adding something to the LangRef does no harm either. I'll put something in the release notes too. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks broadly good to me. I suggest reverting the LangRef change because it doesn't add any new information relevant to LLVM IR.
@@ -196,6 +196,9 @@ C Language Changes | |||
number of elements in the flexible array member. This information can improve | |||
the results of the array bound sanitizer and the | |||
``__builtin_dynamic_object_size`` builtin. | |||
- Enums will now be represented in TBAA metadata using their actual underlying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth saying Enums in C...
since the behaviour for C++ is different and remains unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is actually under the heading C Language Changes
so I think that should be clear enough. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.
When emitting TBAA information for enums in C code we currently just treat the data as an 'omnipotent char'. However, with C strict aliasing this means we fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there are structs that contain arrays of enums, and clang pessmistically assumes that accesses to those enums could alias with other struct members that have a different type.
According to
https://en.cppreference.com/w/c/language/enum
enums should be treated as 'int' types unless explicitly specified (C23) or if 'int' would not be large enough to hold all the enumerated values. In the latter case the compiler is free to choose a suitable integer that would hold all such values.
When compiling C code this patch generates TBAA information for the enum by using an equivalent integer of the size clang has already chosen for the enum. I have ignored C++ for now because the rules are more complex.
New test added here:
clang/test/CodeGen/tbaa.c