[MC][AArch64] Segregate constant pool caches by size. #86832

statham-arm · 2024-03-27T17:05:10Z

If you write a 32- and a 64-bit LDR instruction that both refer to the same constant or symbol using the = syntax:

  ldr w0, =something
  ldr x1, =something

then the first call to ConstantPool::addEntry will insert the constant into its cache of existing entries, and the second one will find the cache entry and reuse it. This results in a 64-bit load from a 32-bit constant, reading nonsense into the other half of the target register.

In this patch I've done the simplest fix: include the size of the constant pool entry as part of the key used to index the cache. So now 32- and 64-bit constant loads will never share a constant pool entry.

There's scope for doing this better, in principle: you could imagine merging the two slots with appropriate overlap, so that the 32-bit load loads the LSW of the 64-bit value. But that's much more complicated: you have to take endianness into account, and maybe also adjust the size of an existing entry. This is the simplest fix that restores correctness.

If you write a 32- and a 64-bit LDR instruction that both refer to the same constant or symbol using the = syntax: ldr w0, =something ldr x1, =something then the first call to `ConstantPool::addEntry` will insert the constant into its cache of existing entries, and the second one will find the cache entry and reuse it. This results in a 64-bit load from a 32-bit constant, reading nonsense into the other half of the target register. In this patch I've done the simplest fix: include the size of the constant pool entry as part of the key used to index the cache. So now 32- and 64-bit constant loads will never share a constant pool entry. There's scope for doing this better, in principle: you could imagine merging the two slots with appropriate overlap, so that the 32-bit load loads the LSW of the 64-bit value. But that's much more complicated: you have to take endianness into account, and maybe also adjust the size of an existing entry. This is the simplest fix that restores correctness.

llvmbot · 2024-03-27T17:05:42Z

@llvm/pr-subscribers-backend-aarch64

Author: Simon Tatham (statham-arm)

Changes

If you write a 32- and a 64-bit LDR instruction that both refer to the same constant or symbol using the = syntax:

  ldr w0, =something
  ldr x1, =something

then the first call to ConstantPool::addEntry will insert the constant into its cache of existing entries, and the second one will find the cache entry and reuse it. This results in a 64-bit load from a 32-bit constant, reading nonsense into the other half of the target register.

In this patch I've done the simplest fix: include the size of the constant pool entry as part of the key used to index the cache. So now 32- and 64-bit constant loads will never share a constant pool entry.

There's scope for doing this better, in principle: you could imagine merging the two slots with appropriate overlap, so that the 32-bit load loads the LSW of the 64-bit value. But that's much more complicated: you have to take endianness into account, and maybe also adjust the size of an existing entry. This is the simplest fix that restores correctness.

Full diff: https://github.com/llvm/llvm-project/pull/86832.diff

3 Files Affected:

(modified) llvm/include/llvm/MC/ConstantPools.h (+7-2)
(modified) llvm/lib/MC/ConstantPools.cpp (+5-4)
(added) llvm/test/MC/AArch64/constant-pool-sizes.s (+25)

diff --git a/llvm/include/llvm/MC/ConstantPools.h b/llvm/include/llvm/MC/ConstantPools.h
index 7eac75362effdc..ff21ccda07a836 100644
--- a/llvm/include/llvm/MC/ConstantPools.h
+++ b/llvm/include/llvm/MC/ConstantPools.h
@@ -43,8 +43,13 @@ struct ConstantPoolEntry {
 class ConstantPool {
   using EntryVecTy = SmallVector<ConstantPoolEntry, 4>;
   EntryVecTy Entries;
-  std::map<int64_t, const MCSymbolRefExpr *> CachedConstantEntries;
-  DenseMap<const MCSymbol *, const MCSymbolRefExpr *> CachedSymbolEntries;
+
+  // Caches of entries that already exist, indexed by their contents
+  // and also the size of the constant.
+  std::map<std::pair<int64_t, unsigned>, const MCSymbolRefExpr *>
+      CachedConstantEntries;
+  DenseMap<std::pair<const MCSymbol *, unsigned>, const MCSymbolRefExpr *>
+      CachedSymbolEntries;
 
 public:
   // Initialize a new empty constant pool
diff --git a/llvm/lib/MC/ConstantPools.cpp b/llvm/lib/MC/ConstantPools.cpp
index f895cc6413d74f..824d2463f30fc5 100644
--- a/llvm/lib/MC/ConstantPools.cpp
+++ b/llvm/lib/MC/ConstantPools.cpp
@@ -43,14 +43,15 @@ const MCExpr *ConstantPool::addEntry(const MCExpr *Value, MCContext &Context,
 
   // Check if there is existing entry for the same constant. If so, reuse it.
   if (C) {
-    auto CItr = CachedConstantEntries.find(C->getValue());
+    auto CItr = CachedConstantEntries.find(std::make_pair(C->getValue(), Size));
     if (CItr != CachedConstantEntries.end())
       return CItr->second;
   }
 
   // Check if there is existing entry for the same symbol. If so, reuse it.
   if (S) {
-    auto SItr = CachedSymbolEntries.find(&(S->getSymbol()));
+    auto SItr =
+        CachedSymbolEntries.find(std::make_pair(&(S->getSymbol()), Size));
     if (SItr != CachedSymbolEntries.end())
       return SItr->second;
   }
@@ -60,9 +61,9 @@ const MCExpr *ConstantPool::addEntry(const MCExpr *Value, MCContext &Context,
   Entries.push_back(ConstantPoolEntry(CPEntryLabel, Value, Size, Loc));
   const auto SymRef = MCSymbolRefExpr::create(CPEntryLabel, Context);
   if (C)
-    CachedConstantEntries[C->getValue()] = SymRef;
+    CachedConstantEntries[std::make_pair(C->getValue(), Size)] = SymRef;
   if (S)
-    CachedSymbolEntries[&(S->getSymbol())] = SymRef;
+    CachedSymbolEntries[std::make_pair(&(S->getSymbol()), Size)] = SymRef;
   return SymRef;
 }
 
diff --git a/llvm/test/MC/AArch64/constant-pool-sizes.s b/llvm/test/MC/AArch64/constant-pool-sizes.s
new file mode 100644
index 00000000000000..279402af025f34
--- /dev/null
+++ b/llvm/test/MC/AArch64/constant-pool-sizes.s
@@ -0,0 +1,25 @@
+// RUN: llvm-mc -triple aarch64-none-linux-gnu %s | FileCheck %s
+
+  ldr w0, =symbol
+  ldr x1, =symbol
+
+  ldr w2, =1234567890
+  ldr x3, =1234567890
+
+// CHECK:             ldr     w0, .Ltmp0
+// CHECK:             ldr     x1, .Ltmp1
+// CHECK:             ldr     w2, .Ltmp2
+// CHECK:             ldr     x3, .Ltmp3
+
+// CHECK:             .p2align        2, 0x0
+// CHECK-NEXT:.Ltmp0:
+// CHECK-NEXT:        .word   symbol
+// CHECK:             .p2align        3, 0x0
+// CHECK-NEXT:.Ltmp1:
+// CHECK-NEXT:        .xword  symbol
+// CHECK:             .p2align        2, 0x0
+// CHECK-NEXT:.Ltmp2:
+// CHECK-NEXT:        .word   1234567890
+// CHECK:             .p2align        3, 0x0
+// CHECK-NEXT:.Ltmp3:
+// CHECK-NEXT:        .xword  1234567890

llvmbot · 2024-03-27T17:05:42Z

@llvm/pr-subscribers-mc

Author: Simon Tatham (statham-arm)

Changes

If you write a 32- and a 64-bit LDR instruction that both refer to the same constant or symbol using the = syntax:

  ldr w0, =something
  ldr x1, =something

then the first call to ConstantPool::addEntry will insert the constant into its cache of existing entries, and the second one will find the cache entry and reuse it. This results in a 64-bit load from a 32-bit constant, reading nonsense into the other half of the target register.

In this patch I've done the simplest fix: include the size of the constant pool entry as part of the key used to index the cache. So now 32- and 64-bit constant loads will never share a constant pool entry.

There's scope for doing this better, in principle: you could imagine merging the two slots with appropriate overlap, so that the 32-bit load loads the LSW of the 64-bit value. But that's much more complicated: you have to take endianness into account, and maybe also adjust the size of an existing entry. This is the simplest fix that restores correctness.

Full diff: https://github.com/llvm/llvm-project/pull/86832.diff

3 Files Affected:

(modified) llvm/include/llvm/MC/ConstantPools.h (+7-2)
(modified) llvm/lib/MC/ConstantPools.cpp (+5-4)
(added) llvm/test/MC/AArch64/constant-pool-sizes.s (+25)

diff --git a/llvm/include/llvm/MC/ConstantPools.h b/llvm/include/llvm/MC/ConstantPools.h
index 7eac75362effdc..ff21ccda07a836 100644
--- a/llvm/include/llvm/MC/ConstantPools.h
+++ b/llvm/include/llvm/MC/ConstantPools.h
@@ -43,8 +43,13 @@ struct ConstantPoolEntry {
 class ConstantPool {
   using EntryVecTy = SmallVector<ConstantPoolEntry, 4>;
   EntryVecTy Entries;
-  std::map<int64_t, const MCSymbolRefExpr *> CachedConstantEntries;
-  DenseMap<const MCSymbol *, const MCSymbolRefExpr *> CachedSymbolEntries;
+
+  // Caches of entries that already exist, indexed by their contents
+  // and also the size of the constant.
+  std::map<std::pair<int64_t, unsigned>, const MCSymbolRefExpr *>
+      CachedConstantEntries;
+  DenseMap<std::pair<const MCSymbol *, unsigned>, const MCSymbolRefExpr *>
+      CachedSymbolEntries;
 
 public:
   // Initialize a new empty constant pool
diff --git a/llvm/lib/MC/ConstantPools.cpp b/llvm/lib/MC/ConstantPools.cpp
index f895cc6413d74f..824d2463f30fc5 100644
--- a/llvm/lib/MC/ConstantPools.cpp
+++ b/llvm/lib/MC/ConstantPools.cpp
@@ -43,14 +43,15 @@ const MCExpr *ConstantPool::addEntry(const MCExpr *Value, MCContext &Context,
 
   // Check if there is existing entry for the same constant. If so, reuse it.
   if (C) {
-    auto CItr = CachedConstantEntries.find(C->getValue());
+    auto CItr = CachedConstantEntries.find(std::make_pair(C->getValue(), Size));
     if (CItr != CachedConstantEntries.end())
       return CItr->second;
   }
 
   // Check if there is existing entry for the same symbol. If so, reuse it.
   if (S) {
-    auto SItr = CachedSymbolEntries.find(&(S->getSymbol()));
+    auto SItr =
+        CachedSymbolEntries.find(std::make_pair(&(S->getSymbol()), Size));
     if (SItr != CachedSymbolEntries.end())
       return SItr->second;
   }
@@ -60,9 +61,9 @@ const MCExpr *ConstantPool::addEntry(const MCExpr *Value, MCContext &Context,
   Entries.push_back(ConstantPoolEntry(CPEntryLabel, Value, Size, Loc));
   const auto SymRef = MCSymbolRefExpr::create(CPEntryLabel, Context);
   if (C)
-    CachedConstantEntries[C->getValue()] = SymRef;
+    CachedConstantEntries[std::make_pair(C->getValue(), Size)] = SymRef;
   if (S)
-    CachedSymbolEntries[&(S->getSymbol())] = SymRef;
+    CachedSymbolEntries[std::make_pair(&(S->getSymbol()), Size)] = SymRef;
   return SymRef;
 }
 
diff --git a/llvm/test/MC/AArch64/constant-pool-sizes.s b/llvm/test/MC/AArch64/constant-pool-sizes.s
new file mode 100644
index 00000000000000..279402af025f34
--- /dev/null
+++ b/llvm/test/MC/AArch64/constant-pool-sizes.s
@@ -0,0 +1,25 @@
+// RUN: llvm-mc -triple aarch64-none-linux-gnu %s | FileCheck %s
+
+  ldr w0, =symbol
+  ldr x1, =symbol
+
+  ldr w2, =1234567890
+  ldr x3, =1234567890
+
+// CHECK:             ldr     w0, .Ltmp0
+// CHECK:             ldr     x1, .Ltmp1
+// CHECK:             ldr     w2, .Ltmp2
+// CHECK:             ldr     x3, .Ltmp3
+
+// CHECK:             .p2align        2, 0x0
+// CHECK-NEXT:.Ltmp0:
+// CHECK-NEXT:        .word   symbol
+// CHECK:             .p2align        3, 0x0
+// CHECK-NEXT:.Ltmp1:
+// CHECK-NEXT:        .xword  symbol
+// CHECK:             .p2align        2, 0x0
+// CHECK-NEXT:.Ltmp2:
+// CHECK-NEXT:        .word   1234567890
+// CHECK:             .p2align        3, 0x0
+// CHECK-NEXT:.Ltmp3:
+// CHECK-NEXT:        .xword  1234567890

efriedma-quic

LGTM

If we want to optimize on AArch64 specifically, we could teach the assembler to rewrite ldr x0, =smallconstant to ldr w0, =smallconstant. But probably not the effort.

AtariDreams · 2024-03-27T17:12:33Z

LGTM

If we want to optimize on AArch64 specifically, we could teach the assembler to rewrite ldr x0, =smallconstant to ldr w0, =smallconstant. But probably not the effort.

NASM does this for x86 and I think AArch64 too.

(NASM is also nice to optimize mov rdx, 7 to mov edx, 7 by default, and other similar ones)

rengolin

You do end up with small bloat on your constant pool, but it's better than reading garbage. :)

I'd treat the fusion of these entries as a future optimization, since the endiannes is not a trivial thing and this fixes an actual bug with correct behaviour.

AtariDreams · 2024-03-30T19:37:49Z

/cherry-pick 88b10f3

llvmbot · 2024-03-30T19:42:32Z

/cherry-pick 88b10f3

Error: Command failed due to missing milestone.

statham-arm requested review from rengolin, ostannard and efriedma-quic March 27, 2024 17:05

llvmbot added backend:AArch64 mc Machine (object) code labels Mar 27, 2024

efriedma-quic approved these changes Mar 27, 2024

View reviewed changes

ostannard approved these changes Mar 27, 2024

View reviewed changes

rengolin approved these changes Mar 27, 2024

View reviewed changes

statham-arm merged commit 88b10f3 into llvm:main Mar 28, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MC][AArch64] Segregate constant pool caches by size. #86832

[MC][AArch64] Segregate constant pool caches by size. #86832

statham-arm commented Mar 27, 2024

llvmbot commented Mar 27, 2024

llvmbot commented Mar 27, 2024

efriedma-quic left a comment

AtariDreams commented Mar 27, 2024 •

edited

Loading

rengolin left a comment

AtariDreams commented Mar 30, 2024

llvmbot commented Mar 30, 2024

[MC][AArch64] Segregate constant pool caches by size. #86832

[MC][AArch64] Segregate constant pool caches by size. #86832

Conversation

statham-arm commented Mar 27, 2024

llvmbot commented Mar 27, 2024

llvmbot commented Mar 27, 2024

efriedma-quic left a comment

Choose a reason for hiding this comment

AtariDreams commented Mar 27, 2024 • edited Loading

rengolin left a comment

Choose a reason for hiding this comment

AtariDreams commented Mar 30, 2024

llvmbot commented Mar 30, 2024

AtariDreams commented Mar 27, 2024 •

edited

Loading