Skip to content

Conversation

qmfrederik
Copy link
Contributor

Type encodings are part of symbol names in the Objective C ABI. Replace characters which are reseved in symbol names:

  • ELF: avoid including '@' characters in type encodings
  • Windows: avoid including '=' characters in type encodings

@qmfrederik
Copy link
Contributor Author

/cc @davidchisnall

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jan 11, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2024

@llvm/pr-subscribers-clang

Author: Frederik Carlier (qmfrederik)

Changes

Type encodings are part of symbol names in the Objective C ABI. Replace characters which are reseved in symbol names:

  • ELF: avoid including '@' characters in type encodings
  • Windows: avoid including '=' characters in type encodings

Full diff: https://github.com/llvm/llvm-project/pull/77797.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGObjCGNU.cpp (+18-13)
  • (modified) clang/test/CodeGenObjC/encode-test-6.m (+36-8)
diff --git a/clang/lib/CodeGen/CGObjCGNU.cpp b/clang/lib/CodeGen/CGObjCGNU.cpp
index cd1a0b6a130ff0..6e76765cea3db1 100644
--- a/clang/lib/CodeGen/CGObjCGNU.cpp
+++ b/clang/lib/CodeGen/CGObjCGNU.cpp
@@ -1431,12 +1431,25 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
                                 const std::string &TypeEncoding) override {
     return GetConstantSelector(Sel, TypeEncoding);
   }
+  std::string GetSymbolNameForTypeEncoding(const std::string &TypeEncoding) {
+    std::string MangledTypes = std::string(TypeEncoding);
+    // @ is used as a special character in ELF symbol names (used for symbol
+    // versioning), so mangle the name to not include it.  Replace it with a
+    // character that is not a valid type encoding character (and, being
+    // non-printable, never will be!)
+    if (CGM.getTriple().isOSBinFormatELF())
+      std::replace(MangledTypes.begin(), MangledTypes.end(),
+        '@', '\1');
+    // = in dll exported names causes lld to fail when linking on Windows.
+    if (CGM.getTriple().isOSWindows())
+      std::replace(MangledTypes.begin(), MangledTypes.end(),
+        '=', '\2');
+    return MangledTypes;
+  }
   llvm::Constant  *GetTypeString(llvm::StringRef TypeEncoding) {
     if (TypeEncoding.empty())
       return NULLPtr;
-    std::string MangledTypes = std::string(TypeEncoding);
-    std::replace(MangledTypes.begin(), MangledTypes.end(),
-      '@', '\1');
+    std::string MangledTypes = GetSymbolNameForTypeEncoding(std::string(TypeEncoding));
     std::string TypesVarName = ".objc_sel_types_" + MangledTypes;
     auto *TypesGlobal = TheModule.getGlobalVariable(TypesVarName);
     if (!TypesGlobal) {
@@ -1453,13 +1466,7 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
   }
   llvm::Constant *GetConstantSelector(Selector Sel,
                                       const std::string &TypeEncoding) override {
-    // @ is used as a special character in symbol names (used for symbol
-    // versioning), so mangle the name to not include it.  Replace it with a
-    // character that is not a valid type encoding character (and, being
-    // non-printable, never will be!)
-    std::string MangledTypes = TypeEncoding;
-    std::replace(MangledTypes.begin(), MangledTypes.end(),
-      '@', '\1');
+    std::string MangledTypes = GetSymbolNameForTypeEncoding(TypeEncoding);
     auto SelVarName = (StringRef(".objc_selector_") + Sel.getAsString() + "_" +
       MangledTypes).str();
     if (auto *GV = TheModule.getNamedGlobal(SelVarName))
@@ -1671,9 +1678,7 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
                                         const ObjCIvarDecl *Ivar) override {
     std::string TypeEncoding;
     CGM.getContext().getObjCEncodingForType(Ivar->getType(), TypeEncoding);
-    // Prevent the @ from being interpreted as a symbol version.
-    std::replace(TypeEncoding.begin(), TypeEncoding.end(),
-      '@', '\1');
+    TypeEncoding = GetSymbolNameForTypeEncoding(TypeEncoding);
     const std::string Name = "__objc_ivar_offset_" + ID->getNameAsString()
       + '.' + Ivar->getNameAsString() + '.' + TypeEncoding;
     return Name;
diff --git a/clang/test/CodeGenObjC/encode-test-6.m b/clang/test/CodeGenObjC/encode-test-6.m
index 261eb7fb3368b2..1eb6808c7d12f8 100644
--- a/clang/test/CodeGenObjC/encode-test-6.m
+++ b/clang/test/CodeGenObjC/encode-test-6.m
@@ -1,5 +1,14 @@
 // RUN: %clang_cc1 -triple x86_64-apple-darwin -emit-llvm -o %t %s
-// RUN: FileCheck < %t %s
+// RUN: FileCheck -check-prefix CHECK-DWARF < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-w64-windows-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-MINGW < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-windows-msvc -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-MSVC < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-ELF < %t %s
 
 typedef struct {} Z;
 
@@ -13,8 +22,17 @@ -(void)bar:(Z)a {}
 -(void)foo:(Z)a: (char*)b : (Z)c : (double) d {}
 @end
 
-// CHECK: private unnamed_addr constant [14 x i8] c"v16@0:8{?=}16
-// CHECK: private unnamed_addr constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24
+// CHECK-DWARF: private unnamed_addr constant [14 x i8] c"v16@0:8{?=}16
+// CHECK-DWARF: private unnamed_addr constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24
+
+// CHECK-MINGW: @".objc_sel_types_v16@0:8{?\02}16" = linkonce_odr hidden constant [14 x i8] c"v16@0:8{?=}16\00"
+// CHECK-MINGW: @".objc_sel_types_v32@0:8{?\02}16*16{?\02}24d24" = linkonce_odr hidden constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24\00"
+
+// CHECK-MSVC: @".objc_sel_types_v20@0:8{?\02}16" = linkonce_odr hidden constant [14 x i8] c"v20@0:8{?=}16\00"
+// CHECK-MSVC: @".objc_sel_types_v40@0:8{?\02}16*20{?\02}28d32" = linkonce_odr hidden constant [26 x i8] c"v40@0:8{?=}16*20{?=}28d32\00"
+
+// CHECK-ELF: @".objc_sel_types_v16\010:8{?=}16" = linkonce_odr hidden constant [14 x i8] c"v16@0:8{?=}16\00"
+// CHECK-ELF: @".objc_sel_types_v32\010:8{?=}16*16{?=}24d24" = linkonce_odr hidden constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24\00"
 
 @interface NSObject @end
 
@@ -31,7 +49,10 @@ @implementation BABugExample
 @synthesize property = _property;
 @end
 
-// CHECK: private unnamed_addr constant [8 x i8] c"@16
+// CHECK-DWARF: private unnamed_addr constant [8 x i8] c"@16
+// CHECK-MINGW: @".objc_sel_types_@16@0:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
+// CHECK-MSVC: @".objc_sel_types_@16@0:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
+// CHECK-ELF @".objc_sel_types_\0116\010:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
 
 @class SCNCamera;
 typedef SCNCamera C3DCamera;
@@ -48,7 +69,10 @@ @implementation SCNCamera
     C3DCameraStorage _storage;
 }
 @end
-// CHECK: private unnamed_addr constant [39 x i8] c"{?=\22presentationInstance\22@\22SCNCamera\22}\00"
+// CHECK-DWARF: private unnamed_addr constant [39 x i8] c"{?=\22presentationInstance\22@\22SCNCamera\22}\00"
+// CHECK-MINGW: @"__objc_ivar_offset_SCNCamera._storage.{?\02@}"
+// CHECK-MSVC: @"__objc_ivar_offset_SCNCamera._storage.{?\02@}"
+// CHECK-ELF: @"__objc_ivar_offset_SCNCamera._storage.{?=\01}"
 
 int i;
 typeof(@encode(typeof(i))) e = @encode(typeof(i));
@@ -56,6 +80,10 @@ @implementation SCNCamera
 {
     return e;
 }
-// CHECK: @e ={{.*}} global [2 x i8] c"i\00", align 1
-// CHECK: define{{.*}} ptr @Test()
-// CHECK: ret ptr @e
+// CHECK-DWARF: @e ={{.*}} global [2 x i8] c"i\00", align 1
+// CHECK-DWARF: define{{.*}} ptr @Test()
+// CHECK-DWARF: ret ptr @e
+
+// CHECK-MSVC: @e = dso_local global [2 x i8] c"i\00", align 1
+// CHECK-MINGW: @e = dso_local global [2 x i8] c"i\00", align 1
+// CHECK-ELF: @e = global [2 x i8] c"i\00", align 1

@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2024

@llvm/pr-subscribers-clang-codegen

Author: Frederik Carlier (qmfrederik)

Changes

Type encodings are part of symbol names in the Objective C ABI. Replace characters which are reseved in symbol names:

  • ELF: avoid including '@' characters in type encodings
  • Windows: avoid including '=' characters in type encodings

Full diff: https://github.com/llvm/llvm-project/pull/77797.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGObjCGNU.cpp (+18-13)
  • (modified) clang/test/CodeGenObjC/encode-test-6.m (+36-8)
diff --git a/clang/lib/CodeGen/CGObjCGNU.cpp b/clang/lib/CodeGen/CGObjCGNU.cpp
index cd1a0b6a130ff0..6e76765cea3db1 100644
--- a/clang/lib/CodeGen/CGObjCGNU.cpp
+++ b/clang/lib/CodeGen/CGObjCGNU.cpp
@@ -1431,12 +1431,25 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
                                 const std::string &TypeEncoding) override {
     return GetConstantSelector(Sel, TypeEncoding);
   }
+  std::string GetSymbolNameForTypeEncoding(const std::string &TypeEncoding) {
+    std::string MangledTypes = std::string(TypeEncoding);
+    // @ is used as a special character in ELF symbol names (used for symbol
+    // versioning), so mangle the name to not include it.  Replace it with a
+    // character that is not a valid type encoding character (and, being
+    // non-printable, never will be!)
+    if (CGM.getTriple().isOSBinFormatELF())
+      std::replace(MangledTypes.begin(), MangledTypes.end(),
+        '@', '\1');
+    // = in dll exported names causes lld to fail when linking on Windows.
+    if (CGM.getTriple().isOSWindows())
+      std::replace(MangledTypes.begin(), MangledTypes.end(),
+        '=', '\2');
+    return MangledTypes;
+  }
   llvm::Constant  *GetTypeString(llvm::StringRef TypeEncoding) {
     if (TypeEncoding.empty())
       return NULLPtr;
-    std::string MangledTypes = std::string(TypeEncoding);
-    std::replace(MangledTypes.begin(), MangledTypes.end(),
-      '@', '\1');
+    std::string MangledTypes = GetSymbolNameForTypeEncoding(std::string(TypeEncoding));
     std::string TypesVarName = ".objc_sel_types_" + MangledTypes;
     auto *TypesGlobal = TheModule.getGlobalVariable(TypesVarName);
     if (!TypesGlobal) {
@@ -1453,13 +1466,7 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
   }
   llvm::Constant *GetConstantSelector(Selector Sel,
                                       const std::string &TypeEncoding) override {
-    // @ is used as a special character in symbol names (used for symbol
-    // versioning), so mangle the name to not include it.  Replace it with a
-    // character that is not a valid type encoding character (and, being
-    // non-printable, never will be!)
-    std::string MangledTypes = TypeEncoding;
-    std::replace(MangledTypes.begin(), MangledTypes.end(),
-      '@', '\1');
+    std::string MangledTypes = GetSymbolNameForTypeEncoding(TypeEncoding);
     auto SelVarName = (StringRef(".objc_selector_") + Sel.getAsString() + "_" +
       MangledTypes).str();
     if (auto *GV = TheModule.getNamedGlobal(SelVarName))
@@ -1671,9 +1678,7 @@ class CGObjCGNUstep2 : public CGObjCGNUstep {
                                         const ObjCIvarDecl *Ivar) override {
     std::string TypeEncoding;
     CGM.getContext().getObjCEncodingForType(Ivar->getType(), TypeEncoding);
-    // Prevent the @ from being interpreted as a symbol version.
-    std::replace(TypeEncoding.begin(), TypeEncoding.end(),
-      '@', '\1');
+    TypeEncoding = GetSymbolNameForTypeEncoding(TypeEncoding);
     const std::string Name = "__objc_ivar_offset_" + ID->getNameAsString()
       + '.' + Ivar->getNameAsString() + '.' + TypeEncoding;
     return Name;
diff --git a/clang/test/CodeGenObjC/encode-test-6.m b/clang/test/CodeGenObjC/encode-test-6.m
index 261eb7fb3368b2..1eb6808c7d12f8 100644
--- a/clang/test/CodeGenObjC/encode-test-6.m
+++ b/clang/test/CodeGenObjC/encode-test-6.m
@@ -1,5 +1,14 @@
 // RUN: %clang_cc1 -triple x86_64-apple-darwin -emit-llvm -o %t %s
-// RUN: FileCheck < %t %s
+// RUN: FileCheck -check-prefix CHECK-DWARF < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-w64-windows-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-MINGW < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-windows-msvc -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-MSVC < %t %s
+
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
+// RUN: FileCheck -check-prefix CHECK-ELF < %t %s
 
 typedef struct {} Z;
 
@@ -13,8 +22,17 @@ -(void)bar:(Z)a {}
 -(void)foo:(Z)a: (char*)b : (Z)c : (double) d {}
 @end
 
-// CHECK: private unnamed_addr constant [14 x i8] c"v16@0:8{?=}16
-// CHECK: private unnamed_addr constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24
+// CHECK-DWARF: private unnamed_addr constant [14 x i8] c"v16@0:8{?=}16
+// CHECK-DWARF: private unnamed_addr constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24
+
+// CHECK-MINGW: @".objc_sel_types_v16@0:8{?\02}16" = linkonce_odr hidden constant [14 x i8] c"v16@0:8{?=}16\00"
+// CHECK-MINGW: @".objc_sel_types_v32@0:8{?\02}16*16{?\02}24d24" = linkonce_odr hidden constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24\00"
+
+// CHECK-MSVC: @".objc_sel_types_v20@0:8{?\02}16" = linkonce_odr hidden constant [14 x i8] c"v20@0:8{?=}16\00"
+// CHECK-MSVC: @".objc_sel_types_v40@0:8{?\02}16*20{?\02}28d32" = linkonce_odr hidden constant [26 x i8] c"v40@0:8{?=}16*20{?=}28d32\00"
+
+// CHECK-ELF: @".objc_sel_types_v16\010:8{?=}16" = linkonce_odr hidden constant [14 x i8] c"v16@0:8{?=}16\00"
+// CHECK-ELF: @".objc_sel_types_v32\010:8{?=}16*16{?=}24d24" = linkonce_odr hidden constant [26 x i8] c"v32@0:8{?=}16*16{?=}24d24\00"
 
 @interface NSObject @end
 
@@ -31,7 +49,10 @@ @implementation BABugExample
 @synthesize property = _property;
 @end
 
-// CHECK: private unnamed_addr constant [8 x i8] c"@16
+// CHECK-DWARF: private unnamed_addr constant [8 x i8] c"@16
+// CHECK-MINGW: @".objc_sel_types_@16@0:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
+// CHECK-MSVC: @".objc_sel_types_@16@0:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
+// CHECK-ELF @".objc_sel_types_\0116\010:8" = linkonce_odr hidden constant [8 x i8] c"@16@0:8\00"
 
 @class SCNCamera;
 typedef SCNCamera C3DCamera;
@@ -48,7 +69,10 @@ @implementation SCNCamera
     C3DCameraStorage _storage;
 }
 @end
-// CHECK: private unnamed_addr constant [39 x i8] c"{?=\22presentationInstance\22@\22SCNCamera\22}\00"
+// CHECK-DWARF: private unnamed_addr constant [39 x i8] c"{?=\22presentationInstance\22@\22SCNCamera\22}\00"
+// CHECK-MINGW: @"__objc_ivar_offset_SCNCamera._storage.{?\02@}"
+// CHECK-MSVC: @"__objc_ivar_offset_SCNCamera._storage.{?\02@}"
+// CHECK-ELF: @"__objc_ivar_offset_SCNCamera._storage.{?=\01}"
 
 int i;
 typeof(@encode(typeof(i))) e = @encode(typeof(i));
@@ -56,6 +80,10 @@ @implementation SCNCamera
 {
     return e;
 }
-// CHECK: @e ={{.*}} global [2 x i8] c"i\00", align 1
-// CHECK: define{{.*}} ptr @Test()
-// CHECK: ret ptr @e
+// CHECK-DWARF: @e ={{.*}} global [2 x i8] c"i\00", align 1
+// CHECK-DWARF: define{{.*}} ptr @Test()
+// CHECK-DWARF: ret ptr @e
+
+// CHECK-MSVC: @e = dso_local global [2 x i8] c"i\00", align 1
+// CHECK-MINGW: @e = dso_local global [2 x i8] c"i\00", align 1
+// CHECK-ELF: @e = global [2 x i8] c"i\00", align 1

Copy link

github-actions bot commented Jan 11, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@qmfrederik qmfrederik force-pushed the mingw-type-encoding-structure branch from 5511454 to e71dd48 Compare January 11, 2024 17:07
Comment on lines 2 to 11
// RUN: FileCheck -check-prefix CHECK-DWARF < %t %s

// RUN: %clang_cc1 -triple x86_64-w64-windows-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
// RUN: FileCheck -check-prefix CHECK-MINGW < %t %s

// RUN: %clang_cc1 -triple x86_64-windows-msvc -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
// RUN: FileCheck -check-prefix CHECK-MSVC < %t %s

// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -fobjc-runtime=gnustep-2.0 -o %t %s
// RUN: FileCheck -check-prefix CHECK-ELF < %t %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to use a temporary file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, no need for a temp file here. Fixed that.

Type encodings are part of symbol names in the Objective C ABI.  Replace
characters which are reseved in symbol names:

- ELF: avoid including '@' characters in type encodings
- Windows: avoid including '=' characters in type encodings
@qmfrederik qmfrederik force-pushed the mingw-type-encoding-structure branch from e71dd48 to 34933c7 Compare January 11, 2024 20:02
@davidchisnall davidchisnall merged commit 3168192 into llvm:main Jan 12, 2024
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
Type encodings are part of symbol names in the Objective C ABI. Replace
characters which are reseved in symbol names:

- ELF: avoid including '@' characters in type encodings
- Windows: avoid including '=' characters in type encodings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants