Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Clang] Emit TBAA info for enums in C #73326

Merged
merged 5 commits into from
Dec 8, 2023
Merged

Conversation

david-arm
Copy link
Contributor

@david-arm david-arm commented Nov 24, 2023

When emitting TBAA information for enums in C code we currently just treat the data as an 'omnipotent char'. However, with C strict aliasing this means we fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there are structs that contain arrays of enums, and clang pessmistically assumes that accesses to those enums could alias with other struct members that have a different type.

According to

https://en.cppreference.com/w/c/language/enum

enums should be treated as 'int' types unless explicitly specified (C23) or if 'int' would not be large enough to hold all the enumerated values. In the latter case the compiler is free to choose a suitable integer that would hold all such values.

When compiling C code this patch generates TBAA information for the enum by using an equivalent integer of the size clang has already chosen for the enum. I have ignored C++ for now because the rules are more complex.

New test added here:

clang/test/CodeGen/tbaa.c

When emitting TBAA information for enums in C code we
currently just treat the data as an 'omnipotent char'.
However, with C strict aliasing this means we fail to
optimise certain cases. For example, in the SPEC2017
xz benchmark there are structs that contain arrays of
enums, and clang pessmistically assumes that accesses
to those enums could alias with other struct members
that have a different type.

According to

https://en.cppreference.com/w/c/language/enum

enums should be treated as 'int' types unless
explicitly specified (C23) or if 'int' would not be
large enough to hold all the enumerated values. In the
latter case the compiler is free to choose a suitable
integer that would hold all such values.

When compiling C code this patch generates TBAA
information for the enum by using an equivalent integer
of the size clang has already chosen for the enum. I
have ignored C++ for now because the rules are more
complex.

New test added here:

  clang/test/CodeGen/tbaa.c
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen labels Nov 24, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 24, 2023

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-clang-codegen

Author: David Sherwood (david-arm)

Changes

When emitting TBAA information for enums in C code we currently just treat the data as an 'omnipotent char'. However, with C strict aliasing this means we fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there are structs that contain arrays of enums, and clang pessmistically assumes that accesses to those enums could alias with other struct members that have a different type.

According to

https://en.cppreference.com/w/c/language/enum

enums should be treated as 'int' types unless
explicitly specified (C23) or if 'int' would not be large enough to hold all the enumerated values. In the latter case the compiler is free to choose a suitable integer that would hold all such values.

When compiling C code this patch generates TBAA
information for the enum by using an equivalent integer of the size clang has already chosen for the enum. I have ignored C++ for now because the rules are more complex.

New test added here:

clang/test/CodeGen/tbaa.c


Full diff: https://github.com/llvm/llvm-project/pull/73326.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenTBAA.cpp (+4-1)
  • (added) clang/test/CodeGen/tbaa.c (+116)
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 8705d3d65f1a573..f59d3d422d5209d 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
   // Enum types are distinct types. In C++ they have "underlying types",
   // however they aren't related for TBAA.
   if (const EnumType *ETy = dyn_cast<EnumType>(Ty)) {
+    if (!Features.CPlusPlus)
+      return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0));
+
     // In C++ mode, types have linkage, so we can rely on the ODR and
     // on their mangled names, if they're external.
     // TODO: Is there a way to get a program-wide unique name for a
     // decl with local linkage or no linkage?
-    if (!Features.CPlusPlus || !ETy->getDecl()->isExternallyVisible())
+    if (!ETy->getDecl()->isExternallyVisible())
       return getChar();
 
     SmallString<256> OutName;
diff --git a/clang/test/CodeGen/tbaa.c b/clang/test/CodeGen/tbaa.c
new file mode 100644
index 000000000000000..0ab81f60a71941c
--- /dev/null
+++ b/clang/test/CodeGen/tbaa.c
@@ -0,0 +1,116 @@
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -no-struct-path-tbaa -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefixes=PATH
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O0 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -relaxed-aliasing -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// Test TBAA metadata generated by front-end.
+//
+// NO-TBAA-NOT: !tbaa
+
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+typedef unsigned long long uint64_t;
+
+typedef enum {
+  RED_AUTO_32,
+  GREEN_AUTO_32,
+  BLUE_AUTO_32
+} EnumAuto32;
+
+typedef enum {
+  RED_AUTO_64,
+  GREEN_AUTO_64,
+  BLUE_AUTO_64 = 0x100000000ull
+} EnumAuto64;
+
+typedef enum : uint16_t {
+  RED_16,
+  GREEN_16,
+  BLUE_16
+} Enum16;
+
+typedef enum : uint8_t {
+  RED_8,
+  GREEN_8,
+  BLUE_8
+} Enum8;
+
+uint32_t g0(EnumAuto32 *E, uint32_t *val) {
+// CHECK-LABEL: define{{.*}} i32 @g0(
+// CHECK: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// CHECK: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// CHECK: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// PATH-LABEL: define{{.*}} i32 @g0(
+// PATH: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// PATH: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// PATH: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+  *val = 5;
+  *E = RED_AUTO_32;
+  return *val;
+}
+
+uint64_t g1(EnumAuto64 *E, uint64_t *val) {
+// CHECK-LABEL: define{{.*}} i64 @g1(
+// CHECK: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]]
+// CHECK: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]]
+// CHECK: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]]
+// PATH-LABEL: define{{.*}} i64 @g1(
+// PATH: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]]
+// PATH: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]]
+// PATH: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]]
+  *val = 5;
+  *E = RED_AUTO_64;
+  return *val;
+}
+
+uint16_t g2(Enum16 *E, uint16_t *val) {
+// CHECK-LABEL: define{{.*}} i16 @g2(
+// CHECK: store i16 5, ptr %{{.*}}, align 2, !tbaa [[TAG_i16:!.*]]
+// CHECK: store i16 0, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// CHECK: load i16, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// PATH-LABEL: define{{.*}} i16 @g2(
+// PATH: store i16 5, ptr %{{.*}}, align 2, !tbaa [[TAG_i16:!.*]]
+// PATH: store i16 0, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// PATH: load i16, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+  *val = 5;
+  *E = RED_16;
+  return *val;
+}
+
+uint8_t g3(Enum8 *E, uint8_t *val) {
+// CHECK-LABEL: define{{.*}} i8 @g3(
+// CHECK: store i8 5, ptr %{{.*}}, align 1, !tbaa [[TAG_i8:!.*]]
+// CHECK: store i8 0, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// CHECK: load i8, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// PATH-LABEL: define{{.*}} i8 @g3(
+// PATH: store i8 5, ptr %{{.*}}, align 1, !tbaa [[TAG_i8:!.*]]
+// PATH: store i8 0, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// PATH: load i8, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+  *val = 5;
+  *E = RED_8;
+  return *val;
+}
+
+// CHECK: [[TYPE_char:!.*]] = !{!"omnipotent char", [[TAG_c_tbaa:!.*]],
+// CHECK: [[TAG_c_tbaa]] = !{!"Simple C/C++ TBAA"}
+// CHECK: [[TAG_i32]] = !{[[TYPE_i32:!.*]], [[TYPE_i32]], i64 0}
+// CHECK: [[TYPE_i32]] = !{!"int", [[TYPE_char]],
+// CHECK: [[TAG_i64]] = !{[[TYPE_i64:!.*]], [[TYPE_i64]], i64 0}
+// CHECK: [[TYPE_i64]] = !{!"long long", [[TYPE_char]],
+// CHECK: [[TAG_long]] = !{[[TYPE_long:!.*]], [[TYPE_long]], i64 0}
+// CHECK: [[TYPE_long]] = !{!"long", [[TYPE_char]],
+// CHECK: [[TAG_i16]] = !{[[TYPE_i16:!.*]], [[TYPE_i16]], i64 0}
+// CHECK: [[TYPE_i16]] = !{!"short", [[TYPE_char]],
+// CHECK: [[TAG_i8]] = !{[[TYPE_i8:!.*]], [[TYPE_char]], i64 0}
+
+// PATH: [[TYPE_char:!.*]] = !{!"omnipotent char", [[TAG_c_tbaa:!.*]],
+// PATH: [[TAG_c_tbaa]] = !{!"Simple C/C++ TBAA"}
+// PATH: [[TAG_i32]] = !{[[TYPE_i32:!.*]], [[TYPE_i32]], i64 0}
+// PATH: [[TYPE_i32]] = !{!"int", [[TYPE_char]],
+// PATH: [[TAG_i64]] = !{[[TYPE_i64:!.*]], [[TYPE_i64]], i64 0}
+// PATH: [[TYPE_i64]] = !{!"long long", [[TYPE_char]],
+// PATH: [[TAG_long]] = !{[[TYPE_long:!.*]], [[TYPE_long]], i64 0}
+// PATH: [[TYPE_long]] = !{!"long", [[TYPE_char]],
+// PATH: [[TAG_i16]] = !{[[TYPE_i16:!.*]], [[TYPE_i16]], i64 0}
+// PATH: [[TYPE_i16]] = !{!"short", [[TYPE_char]],
+// PATH: [[TAG_i8]] = !{[[TYPE_i8:!.*]], [[TYPE_char]], i64 0}

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 24, 2023

@llvm/pr-subscribers-clang

Author: David Sherwood (david-arm)

Changes

When emitting TBAA information for enums in C code we currently just treat the data as an 'omnipotent char'. However, with C strict aliasing this means we fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there are structs that contain arrays of enums, and clang pessmistically assumes that accesses to those enums could alias with other struct members that have a different type.

According to

https://en.cppreference.com/w/c/language/enum

enums should be treated as 'int' types unless
explicitly specified (C23) or if 'int' would not be large enough to hold all the enumerated values. In the latter case the compiler is free to choose a suitable integer that would hold all such values.

When compiling C code this patch generates TBAA
information for the enum by using an equivalent integer of the size clang has already chosen for the enum. I have ignored C++ for now because the rules are more complex.

New test added here:

clang/test/CodeGen/tbaa.c


Full diff: https://github.com/llvm/llvm-project/pull/73326.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenTBAA.cpp (+4-1)
  • (added) clang/test/CodeGen/tbaa.c (+116)
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 8705d3d65f1a573..f59d3d422d5209d 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
   // Enum types are distinct types. In C++ they have "underlying types",
   // however they aren't related for TBAA.
   if (const EnumType *ETy = dyn_cast<EnumType>(Ty)) {
+    if (!Features.CPlusPlus)
+      return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0));
+
     // In C++ mode, types have linkage, so we can rely on the ODR and
     // on their mangled names, if they're external.
     // TODO: Is there a way to get a program-wide unique name for a
     // decl with local linkage or no linkage?
-    if (!Features.CPlusPlus || !ETy->getDecl()->isExternallyVisible())
+    if (!ETy->getDecl()->isExternallyVisible())
       return getChar();
 
     SmallString<256> OutName;
diff --git a/clang/test/CodeGen/tbaa.c b/clang/test/CodeGen/tbaa.c
new file mode 100644
index 000000000000000..0ab81f60a71941c
--- /dev/null
+++ b/clang/test/CodeGen/tbaa.c
@@ -0,0 +1,116 @@
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -no-struct-path-tbaa -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefixes=PATH
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O0 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -relaxed-aliasing -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// Test TBAA metadata generated by front-end.
+//
+// NO-TBAA-NOT: !tbaa
+
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+typedef unsigned long long uint64_t;
+
+typedef enum {
+  RED_AUTO_32,
+  GREEN_AUTO_32,
+  BLUE_AUTO_32
+} EnumAuto32;
+
+typedef enum {
+  RED_AUTO_64,
+  GREEN_AUTO_64,
+  BLUE_AUTO_64 = 0x100000000ull
+} EnumAuto64;
+
+typedef enum : uint16_t {
+  RED_16,
+  GREEN_16,
+  BLUE_16
+} Enum16;
+
+typedef enum : uint8_t {
+  RED_8,
+  GREEN_8,
+  BLUE_8
+} Enum8;
+
+uint32_t g0(EnumAuto32 *E, uint32_t *val) {
+// CHECK-LABEL: define{{.*}} i32 @g0(
+// CHECK: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// CHECK: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// CHECK: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// PATH-LABEL: define{{.*}} i32 @g0(
+// PATH: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// PATH: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// PATH: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+  *val = 5;
+  *E = RED_AUTO_32;
+  return *val;
+}
+
+uint64_t g1(EnumAuto64 *E, uint64_t *val) {
+// CHECK-LABEL: define{{.*}} i64 @g1(
+// CHECK: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]]
+// CHECK: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]]
+// CHECK: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]]
+// PATH-LABEL: define{{.*}} i64 @g1(
+// PATH: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]]
+// PATH: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]]
+// PATH: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]]
+  *val = 5;
+  *E = RED_AUTO_64;
+  return *val;
+}
+
+uint16_t g2(Enum16 *E, uint16_t *val) {
+// CHECK-LABEL: define{{.*}} i16 @g2(
+// CHECK: store i16 5, ptr %{{.*}}, align 2, !tbaa [[TAG_i16:!.*]]
+// CHECK: store i16 0, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// CHECK: load i16, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// PATH-LABEL: define{{.*}} i16 @g2(
+// PATH: store i16 5, ptr %{{.*}}, align 2, !tbaa [[TAG_i16:!.*]]
+// PATH: store i16 0, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+// PATH: load i16, ptr %{{.*}}, align 2, !tbaa [[TAG_i16]]
+  *val = 5;
+  *E = RED_16;
+  return *val;
+}
+
+uint8_t g3(Enum8 *E, uint8_t *val) {
+// CHECK-LABEL: define{{.*}} i8 @g3(
+// CHECK: store i8 5, ptr %{{.*}}, align 1, !tbaa [[TAG_i8:!.*]]
+// CHECK: store i8 0, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// CHECK: load i8, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// PATH-LABEL: define{{.*}} i8 @g3(
+// PATH: store i8 5, ptr %{{.*}}, align 1, !tbaa [[TAG_i8:!.*]]
+// PATH: store i8 0, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+// PATH: load i8, ptr %{{.*}}, align 1, !tbaa [[TAG_i8]]
+  *val = 5;
+  *E = RED_8;
+  return *val;
+}
+
+// CHECK: [[TYPE_char:!.*]] = !{!"omnipotent char", [[TAG_c_tbaa:!.*]],
+// CHECK: [[TAG_c_tbaa]] = !{!"Simple C/C++ TBAA"}
+// CHECK: [[TAG_i32]] = !{[[TYPE_i32:!.*]], [[TYPE_i32]], i64 0}
+// CHECK: [[TYPE_i32]] = !{!"int", [[TYPE_char]],
+// CHECK: [[TAG_i64]] = !{[[TYPE_i64:!.*]], [[TYPE_i64]], i64 0}
+// CHECK: [[TYPE_i64]] = !{!"long long", [[TYPE_char]],
+// CHECK: [[TAG_long]] = !{[[TYPE_long:!.*]], [[TYPE_long]], i64 0}
+// CHECK: [[TYPE_long]] = !{!"long", [[TYPE_char]],
+// CHECK: [[TAG_i16]] = !{[[TYPE_i16:!.*]], [[TYPE_i16]], i64 0}
+// CHECK: [[TYPE_i16]] = !{!"short", [[TYPE_char]],
+// CHECK: [[TAG_i8]] = !{[[TYPE_i8:!.*]], [[TYPE_char]], i64 0}
+
+// PATH: [[TYPE_char:!.*]] = !{!"omnipotent char", [[TAG_c_tbaa:!.*]],
+// PATH: [[TAG_c_tbaa]] = !{!"Simple C/C++ TBAA"}
+// PATH: [[TAG_i32]] = !{[[TYPE_i32:!.*]], [[TYPE_i32]], i64 0}
+// PATH: [[TYPE_i32]] = !{!"int", [[TYPE_char]],
+// PATH: [[TAG_i64]] = !{[[TYPE_i64:!.*]], [[TYPE_i64]], i64 0}
+// PATH: [[TYPE_i64]] = !{!"long long", [[TYPE_char]],
+// PATH: [[TAG_long]] = !{[[TYPE_long:!.*]], [[TYPE_long]], i64 0}
+// PATH: [[TYPE_long]] = !{!"long", [[TYPE_char]],
+// PATH: [[TAG_i16]] = !{[[TYPE_i16:!.*]], [[TYPE_i16]], i64 0}
+// PATH: [[TYPE_i16]] = !{!"short", [[TYPE_char]],
+// PATH: [[TAG_i8]] = !{[[TYPE_i8:!.*]], [[TYPE_char]], i64 0}

@@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
// Enum types are distinct types. In C++ they have "underlying types",
// however they aren't related for TBAA.
if (const EnumType *ETy = dyn_cast<EnumType>(Ty)) {
if (!Features.CPlusPlus)
return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is entirely correct so would appreciate some guidance here!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ETy->getDecl()->getIntegerType() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion - thanks!

@david-arm
Copy link
Contributor Author

Gentle ping!

@AaronBallman
Copy link
Collaborator

enums should be treated as 'int' types unless explicitly specified (C23) or if 'int' would not be large enough to hold all the enumerated values. In the latter case the compiler is free to choose a suitable integer that would hold all such values.

In C23, the enumeration type is compatible with its underlying type, which is either char or a standard or extended signed or unsigned integer type (C23 6.7.2.2p2). The enumeration member type (C23 6.7.2.2p11) is specified by p12 (when building the enumeration itself) and p15 and p16 (once the enumeration is complete). If there's no fixed underlying type, the members are compatible with either int or the enumerated type, and if there is a fixed underlying type, the members type is the enumerated type. Much of this is implementation-defined behavior. It sounds like our implementation was assuming compatibility with char and not int because of the aliasing behavior, and now you'd like to change it to be int so we don't have the aliasing problems?

@david-arm
Copy link
Contributor Author

Hi @AaronBallman, yes the problem I found with always choosing char as the alias type is that LLVM will just assume that enum types alias with absolutely everything. This is a conservative approach that works fine, but it does prevent important type-based alias optimisations from happening. GCC seems to take advantage of the fact that enums without any explicit type should be compatible with an int (or a suitable integer that the compiler is free to choose that can contain all enumerated values). Consequently it can be more aggressive in optimisations, which is why I started looking at the C specification to understand the rules. If my understanding is correct, I think the important thing is that the type-based alias information should match the actual underlying type we've chosen for the enum. It could be, for example that in the case of an enum like this:

enum Foo {
  A = 1,
  B = 0x100000000ull
};

gcc chooses a long long and clang chooses a long since there seems to be wiggle room in the specification. I think that's fine provided the compiler is self-consistent. In either case both long and long long would not alias with int or short, which is still an improvement on treating it as a char in TBAA info.

@efriedma-quic
Copy link
Collaborator

The C notion of "compatibility" with regard to enums is a bit weird... clang and gcc disagree whether enum e : int; enum e2: int; enum e x; int x; enum e2 x; is valid because 6.2.7 doesn't actually define what's supposed to happen. Probably this should be illegal, but the standard isn't really clear here. Maybe it should be clarified to have a notion of "redeclaration compatibility" which is separate from general type compatibility.

That said, this patch is pretty conservative; it's hard for me to imagine treating an enum as the underlying type could cause any issues (unless code is assuming an enum has universal aliasing).

I guess this should be compatible with older clang versions, so we don't need to rev the version of TBAA data.

@rjmccall
Copy link
Contributor

rjmccall commented Dec 5, 2023

This seems conservatively correct, yeah. My reading is that we could also use the underlying type as a parent type for the TBAA metadata: enums are compatible with their underlying type, but two enums with the same underlying type are not compatible with each other. But this rule should also work.

@paulwalker-arm
Copy link
Collaborator

Do you think it's worth adding something to the Clang release note?

@AaronBallman
Copy link
Collaborator

Do you think it's worth adding something to the Clang release note?

Yes, please!

@llvmbot llvmbot added the llvm:ir label Dec 7, 2023
@david-arm
Copy link
Contributor Author

Do you think it's worth adding something to the Clang release note?

Done. Hope the documentation I added makes sense!

@momchil-velikov
Copy link
Collaborator

I thought the suggestion was to add a few lines to https://github.com/llvm/llvm-project/blob/main/clang/docs/ReleaseNotes.rst

@david-arm
Copy link
Contributor Author

I thought the suggestion was to add a few lines to https://github.com/llvm/llvm-project/blob/main/clang/docs/ReleaseNotes.rst

Yes you're right! For some reason I got mixed up with the LangRef, but I guess adding something to the LangRef does no harm either. I'll put something in the release notes too. :)

Copy link
Collaborator

@paulwalker-arm paulwalker-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks broadly good to me. I suggest reverting the LangRef change because it doesn't add any new information relevant to LLVM IR.

llvm/docs/LangRef.rst Outdated Show resolved Hide resolved
@@ -196,6 +196,9 @@ C Language Changes
number of elements in the flexible array member. This information can improve
the results of the array bound sanitizer and the
``__builtin_dynamic_object_size`` builtin.
- Enums will now be represented in TBAA metadata using their actual underlying
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth saying Enums in C... since the behaviour for C++ is different and remains unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is actually under the heading C Language Changes so I think that should be clear enough. Is that ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

@david-arm david-arm merged commit c1cfa17 into llvm:main Dec 8, 2023
5 checks passed
@david-arm david-arm deleted the enum_tbaa branch December 12, 2023 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang Clang issues not falling into any other category llvm:ir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants