Skip to content

Conversation

@sedymrak
Copy link
Contributor

@sedymrak sedymrak commented Nov 4, 2025

In this PR we are proposing to change LLDB codebase so that LLDB is able to print values of integer registers that have more than 64-bits (even if the number of bits is not equal to 128).

@llvmbot llvmbot added the lldb label Nov 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 4, 2025

@llvm/pr-subscribers-lldb

Author: Matej Košík (sedymrak)

Changes

These changes generalize LLDB codebase so that it is able to print integer registers that do not have the conventional size (neither 8-bits, nor 16-bits nor 32-bits nor 64-bits).


Full diff: https://github.com/llvm/llvm-project/pull/166363.diff

3 Files Affected:

  • (modified) lldb/include/lldb/Utility/RegisterValue.h (+4-4)
  • (modified) lldb/source/Utility/RegisterValue.cpp (+25-13)
  • (modified) lldb/unittests/Utility/RegisterValueTest.cpp (+56-13)
diff --git a/lldb/include/lldb/Utility/RegisterValue.h b/lldb/include/lldb/Utility/RegisterValue.h
index 49aaf68be17fc..0226fe36a7990 100644
--- a/lldb/include/lldb/Utility/RegisterValue.h
+++ b/lldb/include/lldb/Utility/RegisterValue.h
@@ -46,7 +46,7 @@ class RegisterValue {
     eTypeUInt16,
     eTypeUInt32,
     eTypeUInt64,
-    eTypeUInt128,
+    eTypeUIntBig,  // an integer that has more than 64 bits
     eTypeFloat,
     eTypeDouble,
     eTypeLongDouble,
@@ -69,7 +69,7 @@ class RegisterValue {
     m_scalar = inst;
   }
 
-  explicit RegisterValue(llvm::APInt inst) : m_type(eTypeUInt128) {
+  explicit RegisterValue(llvm::APInt inst) : m_type(eTypeUIntBig) {
     m_scalar = llvm::APInt(std::move(inst));
   }
 
@@ -178,7 +178,7 @@ class RegisterValue {
   }
 
   void operator=(llvm::APInt uint) {
-    m_type = eTypeUInt128;
+    m_type = eTypeUIntBig;
     m_scalar = llvm::APInt(std::move(uint));
   }
 
@@ -218,7 +218,7 @@ class RegisterValue {
   }
 
   void SetUInt128(llvm::APInt uint) {
-    m_type = eTypeUInt128;
+    m_type = eTypeUIntBig;
     m_scalar = std::move(uint);
   }
 
diff --git a/lldb/source/Utility/RegisterValue.cpp b/lldb/source/Utility/RegisterValue.cpp
index 12c349a143c0f..9501951c094e6 100644
--- a/lldb/source/Utility/RegisterValue.cpp
+++ b/lldb/source/Utility/RegisterValue.cpp
@@ -127,7 +127,7 @@ bool RegisterValue::GetScalarValue(Scalar &scalar) const {
   case eTypeUInt16:
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
   case eTypeFloat:
   case eTypeDouble:
   case eTypeLongDouble:
@@ -196,7 +196,7 @@ Status RegisterValue::SetValueFromData(const RegisterInfo &reg_info,
       SetUInt32(src.GetMaxU32(&src_offset, src_len));
     else if (reg_info.byte_size <= 8)
       SetUInt64(src.GetMaxU64(&src_offset, src_len));
-    else if (reg_info.byte_size <= 16) {
+    else if (reg_info.byte_size == 16) {
       uint64_t data1 = src.GetU64(&src_offset);
       uint64_t data2 = src.GetU64(&src_offset);
       if (src.GetByteOrder() == eByteOrderLittle) {
@@ -207,6 +207,18 @@ Status RegisterValue::SetValueFromData(const RegisterInfo &reg_info,
         int128.x[1] = data1;
       }
       SetUInt128(llvm::APInt(128, 2, int128.x));
+    } else {
+      std::vector<uint8_t> bytes(src_len, 0);
+      for (size_t i = 0; i < src_len; i++)
+        bytes[i] = src.GetU8(&src_offset);
+      if (src.GetByteOrder() == eByteOrderBig)
+        std::reverse(bytes.begin(), bytes.end());
+      // The number of 64-bit wide words that are stored in "src".
+      size_t size64 = (reg_info.byte_size - 1) / 8 + 1;
+      bytes.resize(size64 * sizeof(uint64_t), 0);
+      llvm::APInt uint = llvm::APInt::getZero(src_len * 8);
+      llvm::LoadIntFromMemory(uint, bytes.data(), src_len);
+      SetUInt128(uint);
     }
     break;
   case eEncodingIEEE754:
@@ -442,7 +454,7 @@ bool RegisterValue::SignExtend(uint32_t sign_bitpos) {
   case eTypeUInt16:
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
     return m_scalar.SignExtend(sign_bitpos);
   case eTypeFloat:
   case eTypeDouble:
@@ -465,7 +477,7 @@ bool RegisterValue::CopyValue(const RegisterValue &rhs) {
   case eTypeUInt16:
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
   case eTypeFloat:
   case eTypeDouble:
   case eTypeLongDouble:
@@ -581,7 +593,7 @@ llvm::APInt RegisterValue::GetAsUInt128(const llvm::APInt &fail_value,
   case eTypeUInt16:
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
   case eTypeFloat:
   case eTypeDouble:
   case eTypeLongDouble:
@@ -614,7 +626,7 @@ float RegisterValue::GetAsFloat(float fail_value, bool *success_ptr) const {
     break;
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
   case eTypeFloat:
   case eTypeDouble:
   case eTypeLongDouble:
@@ -634,7 +646,7 @@ double RegisterValue::GetAsDouble(double fail_value, bool *success_ptr) const {
 
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
   case eTypeFloat:
   case eTypeDouble:
   case eTypeLongDouble:
@@ -655,7 +667,7 @@ long double RegisterValue::GetAsLongDouble(long double fail_value,
 
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
   case eTypeFloat:
   case eTypeDouble:
   case eTypeLongDouble:
@@ -674,7 +686,7 @@ const void *RegisterValue::GetBytes() const {
   case eTypeUInt16:
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
   case eTypeFloat:
   case eTypeDouble:
   case eTypeLongDouble:
@@ -696,7 +708,7 @@ uint32_t RegisterValue::GetByteSize() const {
     return 2;
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
   case eTypeFloat:
   case eTypeDouble:
   case eTypeLongDouble:
@@ -747,7 +759,7 @@ bool RegisterValue::operator==(const RegisterValue &rhs) const {
     case eTypeUInt16:
     case eTypeUInt32:
     case eTypeUInt64:
-    case eTypeUInt128:
+    case eTypeUIntBig:
     case eTypeFloat:
     case eTypeDouble:
     case eTypeLongDouble:
@@ -772,7 +784,7 @@ bool RegisterValue::ClearBit(uint32_t bit) {
   case eTypeUInt16:
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
     if (bit < (GetByteSize() * 8)) {
       return m_scalar.ClearBit(bit);
     }
@@ -812,7 +824,7 @@ bool RegisterValue::SetBit(uint32_t bit) {
   case eTypeUInt16:
   case eTypeUInt32:
   case eTypeUInt64:
-  case eTypeUInt128:
+  case eTypeUIntBig:
     if (bit < (GetByteSize() * 8)) {
       return m_scalar.SetBit(bit);
     }
diff --git a/lldb/unittests/Utility/RegisterValueTest.cpp b/lldb/unittests/Utility/RegisterValueTest.cpp
index 6239dbe21634a..48a920b11e4e0 100644
--- a/lldb/unittests/Utility/RegisterValueTest.cpp
+++ b/lldb/unittests/Utility/RegisterValueTest.cpp
@@ -57,13 +57,11 @@ TEST(RegisterValueTest, GetScalarValue) {
                    APInt(128, 0x7766554433221100)));
 }
 
-static const Scalar etalon128(APInt(128, 0xffeeddccbbaa9988ull) << 64 |
-                              APInt(128, 0x7766554433221100ull));
-
-void TestSetValueFromData128(void *src, const lldb::ByteOrder endianness) {
+void TestSetValueFromData(const Scalar &etalon, void *src, size_t src_byte_size,
+                          const lldb::ByteOrder endianness) {
   RegisterInfo ri{"uint128_register",
                   nullptr,
-                  16,
+                  static_cast<uint32_t>(src_byte_size),
                   0,
                   lldb::Encoding::eEncodingUint,
                   lldb::Format::eFormatDefault,
@@ -71,26 +69,71 @@ void TestSetValueFromData128(void *src, const lldb::ByteOrder endianness) {
                   nullptr,
                   nullptr,
                   nullptr};
-  DataExtractor src_extractor(src, 16, endianness, 8);
+  DataExtractor src_extractor(src, src_byte_size, endianness, 8);
   RegisterValue rv;
   EXPECT_TRUE(rv.SetValueFromData(ri, src_extractor, 0, false).Success());
   Scalar s;
   EXPECT_TRUE(rv.GetScalarValue(s));
-  EXPECT_EQ(s, etalon128);
+  EXPECT_EQ(s, etalon);
+}
+
+static const Scalar etalon72(APInt(72, 0x0000000000000008ull) << 1 * 64 |
+                             APInt(72, 0x0706050403020100ull) << 0 * 64);
+
+// Test that the "RegisterValue::SetValueFromData" method works correctly
+// with 72-bit little-endian data that represents an integer.
+TEST(RegisterValueTest, SetValueFromData_72_le) {
+  uint8_t src[] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08};
+  TestSetValueFromData(etalon72, src, 9, lldb::ByteOrder::eByteOrderLittle);
+}
+
+// Test that the "RegisterValue::SetValueFromData" method works correctly
+// with 72-bit big-endian data that represents an integer.
+TEST(RegisterValueTest, SetValueFromData_72_be) {
+  uint8_t src[] = {0x08, 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00};
+  TestSetValueFromData(etalon72, src, 9, lldb::ByteOrder::eByteOrderBig);
 }
 
+static const Scalar etalon128(APInt(128, 0x0f0e0d0c0b0a0908ull) << 1 * 64 |
+                              APInt(128, 0x0706050403020100ull) << 0 * 64);
+
 // Test that the "RegisterValue::SetValueFromData" method works correctly
 // with 128-bit little-endian data that represents an integer.
 TEST(RegisterValueTest, SetValueFromData_128_le) {
-  uint8_t src[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77,
-                   0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff};
-  TestSetValueFromData128(src, lldb::ByteOrder::eByteOrderLittle);
+  uint8_t src[] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+                   0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+  TestSetValueFromData(etalon128, src, 16, lldb::ByteOrder::eByteOrderLittle);
 }
 
 // Test that the "RegisterValue::SetValueFromData" method works correctly
 // with 128-bit big-endian data that represents an integer.
 TEST(RegisterValueTest, SetValueFromData_128_be) {
-  uint8_t src[] = {0xff, 0xee, 0xdd, 0xcc, 0xbb, 0xaa, 0x99, 0x88,
-                   0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11, 0x00};
-  TestSetValueFromData128(src, lldb::ByteOrder::eByteOrderBig);
+  uint8_t src[] = {0x0f, 0x0e, 0x0d, 0x0c, 0x0b, 0x0a, 0x09, 0x08,
+                   0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00};
+  TestSetValueFromData(etalon128, src, 16, lldb::ByteOrder::eByteOrderBig);
+}
+
+static const Scalar etalon256(APInt(256, 0x1f1e1d1c1b1a1918ull) << 3 * 64 |
+                              APInt(256, 0x1716151413121110ull) << 2 * 64 |
+                              APInt(256, 0x0f0e0d0c0b0a0908ull) << 1 * 64 |
+                              APInt(256, 0x0706050403020100ull) << 0 * 64);
+
+// Test that the "RegisterValue::SetValueFromData" method works correctly
+// with 256-bit little-endian data that represents an integer.
+TEST(RegisterValueTest, SetValueFromData_256_le) {
+  uint8_t src[] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+                   0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+                   0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+                   0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f};
+  TestSetValueFromData(etalon256, src, 32, lldb::ByteOrder::eByteOrderLittle);
+}
+
+// Test that the "RegisterValue::SetValueFromData" method works correctly
+// with 256-bit big-endian data that represents an integer.
+TEST(RegisterValueTest, SetValueFromData_256_be) {
+  uint8_t src[] = {0x1f, 0x1e, 0x1d, 0x1c, 0x1b, 0x1a, 0x19, 0x18,
+                   0x17, 0x16, 0x15, 0x14, 0x13, 0x12, 0x11, 0x10,
+                   0x0f, 0x0e, 0x0d, 0x0c, 0x0b, 0x0a, 0x09, 0x08,
+                   0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00};
+  TestSetValueFromData(etalon256, src, 32, lldb::ByteOrder::eByteOrderBig);
 }

@sedymrak sedymrak changed the title 2025 11 01 fix register value set value from data method generalize LLDB codebase so that it is able to print integer register of non-standard sizes Nov 4, 2025
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLDB PRs should start with the "[lldb]" tag. Also, "generalize LLDB codebase" seems a bit excessive for this change.Something like "[lldb] Support arbitrary precision integer registers" would be more precise and to the point.

eTypeUInt32,
eTypeUInt64,
eTypeUInt128,
eTypeUIntBig, // an integer that has more than 64 bits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Big" sounds kind of vague. Maybe eTypeUIntAP for "arbitrary precision" (and match LLVM) would be more descriptive?

Comment on lines 220 to 223
void SetUInt128(llvm::APInt uint) {
m_type = eTypeUInt128;
m_type = eTypeUIntBig;
m_scalar = std::move(uint);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this to just SetUInt or if we go with my suggestion SetUIntAP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. That makes sense. I've pushed a commit that changes the name of this enum value, as you suggest, to eTypeUInt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that eTypeUInt sounds slightly too generic. Why would I pick eTypeUInt64 if I can just pick eTypeUInt? I was hoping that the AP suffix would help clarify what it's supposed to be used for, similar to what "Big" did in your original patch. WDYT?

Copy link
Contributor Author

@sedymrak sedymrak Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure which of the two possibilties:

  • eTypeUInt
  • eTypeUIntAP

is better.

However, before answering the above question, we might consider clarifying another problem:

"Do we need RegisterValue::m_type (and RegisterValue::Type) at all?"

If we removed RegisterValue::m_type, wouldn't we still be able to recover all the relevant information from the primary source of information? (I guess RegisterValue::m_scalar)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For someone with no LLVM background, eTypeUIntN would be ok too. Ultimately either one needs a comment where it's defined to explain that one should use the specific sizes if you have exactly that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have realized (after some more testing) that what I have written above makes no sense. Please ignore my comment above.

According to my investigation, the enum value we are talking about is used only when we need to represent registers whose size is strictly larger than 64 bits.

I am not sure what would be the best name that would the most succintly indicate precisely this.

These alternatives come tome my mind:

  • eTypeUInt65Plus
  • eTypeUInt65OrMore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For someone with no LLVM background, eTypeUIntN would be ok too. Ultimately either one needs a comment where it's defined to explain that one should use the specific sizes if you have exactly that.

Consistently with your suggestion, I have pushed a commit that renames this enum value to eTypeIntN.
The new comment reflects the purpose of this enum.

@sedymrak sedymrak changed the title generalize LLDB codebase so that it is able to print integer register of non-standard sizes [lldb] Support arbitrary precision integer registers Nov 4, 2025
@sedymrak
Copy link
Contributor Author

sedymrak commented Nov 4, 2025

LLDB PRs should start with the "[lldb]" tag. Also, "generalize LLDB codebase" seems a bit excessive for this change.Something like "[lldb] Support arbitrary precision integer registers" would be more precise and to the point.

Thank you. I wasn't quite sure what title should I use. I've updated the current PR title according to your suggestion. I'll try to keep in mind the [lldb] convention.

Comment on lines 220 to 223
void SetUInt128(llvm::APInt uint) {
m_type = eTypeUInt128;
m_type = eTypeUIntBig;
m_scalar = std::move(uint);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that eTypeUInt sounds slightly too generic. Why would I pick eTypeUInt64 if I can just pick eTypeUInt? I was hoping that the AP suffix would help clarify what it's supposed to be used for, similar to what "Big" did in your original patch. WDYT?

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me whether this supports arbitrary sizes where the register is for example, 6 bits (I saw a DSP once with these). "arbitrary" would mean yes, this is supported but please confirm.

If you want to limit this to byte multiples and only > 64-bit total then fine as long as that's clear in comments and testing.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to do BE/LE for all tests.

More bikeshedding to do on the name, and there are some methods in lldb/include/lldb/Utility/RegisterValue.h that deal with UInt128.

Certainly the GetValueAsUInt128 method should fail if the length is > 128, as there are existing callers of that. Same for getting as 32-bit if its 33 and so on.

For setting, hard to say without a use case but I presume you have one. Have you been using this for targets and had to set registers of these sizes? Could be that internally they are constructed mostly from memory buffers so this isn't an issue.

@sedymrak sedymrak force-pushed the 2025-11-01-fix-RegisterValue-SetValueFromData-method branch from e42fc84 to 2a1bb4c Compare November 6, 2025 12:59
@sedymrak
Copy link
Contributor Author

sedymrak commented Nov 6, 2025

It's not clear to me whether this supports arbitrary sizes where the register is for example, 6 bits (I saw a DSP once with these). "arbitrary" would mean yes, this is supported but please confirm.

If you want to limit this to byte multiples and only > 64-bit total then fine as long as that's clear in comments and testing.

RegisterValue is able to store the value of a register that has any number of bits. It always uses at least as many bits as necessary. However, when RegisterValue object is initialized with RegisterValue::SetValueFromData method, the RegisterValue object may used more bits than it is necessary. It has no other option because the information concerning the exact number of bits is not provided. It (i.e. the RegisterValue object) is told only the number of bytes.

So, for example, LLDB converts:

  • 6-bit values to 8-bit values
  • 33-bit values to 64-bit values
  • 65-bit values to 72-bit values.

This is demonstrated in the tests.

So eTypeUIntN registers do not break the convention estabilished by:

  • eTypeUInt8
  • eTypeUInt16
  • eTypeUInt32
  • eTypeUInt64

Hopefully, I haven't misinterpreted your question.

@DavidSpickett
Copy link
Collaborator

No you didn't, I hadn't made the connection that this is all set from multiples of 8 bits anyway. So this change is as arbitrary as we can get without adding a specific N bit setter for this. Which is fine, no need to add something no one will use.

@sedymrak sedymrak changed the title [lldb] Support arbitrary precision integer registers [lldb] Support integer registers with more than 64 bits. Nov 7, 2025
@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Nov 10, 2025

All my comments have been addressed. @JDevlieghere want to take a final look and tell us if you are ok with the new UIntN naming?

@JDevlieghere
Copy link
Member

All my comments have been addressed. @JDevlieghere want to take a final look and tell us if you are ok with the new UIntN naming?

I (still) think for anyone familiar with LLVM, "AP" would be more descriptive, but I can see the case for "N" as that's what we use in math to represent a natural number. Given that both of you prefer this naming I'm happy to go with the latter.

@JDevlieghere JDevlieghere merged commit 1fb8e3d into llvm:main Nov 18, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants