[lldb][NFCI] Refactor AppleObjCClassDescriptorV2 method_t parsing functions #163291

felipepiovezan · 2025-10-13T23:50:07Z

This rewrites ClassDescriptorV2::method_t::Read (and the loop calling that function) in an NFCI way to perform a couple of things:

Cleanup code with indirect style. For example, the old loop would have default-constructor a unique_ptr<method_t>, which was reused on every iteration of the loop. It called method_t::Read on each iteration, and never checked the return value prior to invoking the callback. In other words, if Read failed, the callback was called on random values.
Exposed memory reads that could benefit from the MultiMemoryRead packet proposed in 1.

…ctions This rewrites ClassDescriptorV2::method_t::Read (and the loop calling that function) in an NFCI way to perform a couple of things: 1. Cleanup code with indirect style. For example, the old loop would have default-constructor a `unique_ptr<method_t>`, which was *reused* on every iteration of the loop. It called `method_t::Read` on each iteration, and never checked the return value prior to invoking the callback. In other words, if `Read` failed, the callback was called on random values. 2. Exposed memory reads that could benefit from the MultiMemoryRead packet proposed in [1]. [1]: https://discourse.llvm.org/t/rfc-a-new-vectorized-memory-read-packet/88441

llvmbot · 2025-10-13T23:50:43Z

@llvm/pr-subscribers-lldb

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

This rewrites ClassDescriptorV2::method_t::Read (and the loop calling that function) in an NFCI way to perform a couple of things:

Cleanup code with indirect style. For example, the old loop would have default-constructor a unique_ptr<method_t>, which was reused on every iteration of the loop. It called method_t::Read on each iteration, and never checked the return value prior to invoking the callback. In other words, if Read failed, the callback was called on random values.
Exposed memory reads that could benefit from the MultiMemoryRead packet proposed in 1.

Full diff: https://github.com/llvm/llvm-project/pull/163291.diff

2 Files Affected:

(modified) lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp (+56-28)
(modified) lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.h (+6-1)

diff --git a/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp b/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp
index cc0c9e728964e..6d8f41aef1ffc 100644
--- a/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp
+++ b/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp
@@ -14,6 +14,7 @@
 #include "lldb/Utility/LLDBLog.h"
 #include "lldb/Utility/Log.h"
 #include "lldb/lldb-enumerations.h"
+#include "llvm/ADT/Sequence.h"
 
 using namespace lldb;
 using namespace lldb_private;
@@ -266,22 +267,47 @@ bool ClassDescriptorV2::method_list_t::Read(Process *process,
   return true;
 }
 
-bool ClassDescriptorV2::method_t::Read(Process *process, lldb::addr_t addr,
-                                       lldb::addr_t relative_selector_base_addr,
-                                       bool is_small, bool has_direct_sel) {
-  size_t ptr_size = process->GetAddressByteSize();
-  size_t size = GetSize(process, is_small);
+llvm::SmallVector<ClassDescriptorV2::method_t, 0>
+ClassDescriptorV2::ReadMethods(llvm::ArrayRef<lldb::addr_t> addresses,
+                               lldb::addr_t relative_selector_base_addr,
+                               bool is_small, bool has_direct_sel) const {
+  lldb_private::Process *process = m_runtime.GetProcess();
+  if (!process)
+    return {};
 
-  DataBufferHeap buffer(size, '\0');
-  Status error;
+  const size_t size = method_t::GetSize(process, is_small);
+  const size_t num_methods = addresses.size();
 
-  process->ReadMemory(addr, buffer.GetBytes(), size, error);
-  if (error.Fail()) {
-    return false;
+  llvm::SmallVector<uint8_t, 0> buffer(num_methods * size, 0);
+  llvm::DenseSet<uint32_t> failed_indices;
+
+  for (auto [idx, addr] : llvm::enumerate(addresses)) {
+    Status error;
+    process->ReadMemory(addr, buffer.data() + idx * size, size, error);
+    if (error.Fail())
+      failed_indices.insert(idx);
   }
 
-  DataExtractor extractor(buffer.GetBytes(), size, process->GetByteOrder(),
-                          ptr_size);
+  llvm::SmallVector<method_t, 0> methods;
+  methods.reserve(num_methods);
+  for (auto [idx, addr] : llvm::enumerate(addresses)) {
+    if (failed_indices.contains(idx))
+      continue;
+    DataExtractor extractor(buffer.data() + idx * size, size,
+                            process->GetByteOrder(),
+                            process->GetAddressByteSize());
+    methods.push_back(method_t());
+    methods.back().Read(extractor, process, addr, relative_selector_base_addr,
+                        is_small, has_direct_sel);
+  }
+
+  return methods;
+}
+
+bool ClassDescriptorV2::method_t::Read(DataExtractor &extractor,
+                                       Process *process, lldb::addr_t addr,
+                                       lldb::addr_t relative_selector_base_addr,
+                                       bool is_small, bool has_direct_sel) {
   lldb::offset_t cursor = 0;
 
   if (is_small) {
@@ -291,11 +317,11 @@ bool ClassDescriptorV2::method_t::Read(Process *process, lldb::addr_t addr,
 
     m_name_ptr = addr + nameref_offset;
 
+    Status error;
     if (!has_direct_sel) {
       // The SEL offset points to a SELRef. We need to dereference twice.
-      m_name_ptr = process->ReadUnsignedIntegerFromMemory(m_name_ptr, ptr_size,
-                                                          0, error);
-      if (!error.Success())
+      m_name_ptr = process->ReadPointerFromMemory(m_name_ptr, error);
+      if (error.Fail())
         return false;
     } else if (relative_selector_base_addr != LLDB_INVALID_ADDRESS) {
       m_name_ptr = relative_selector_base_addr + nameref_offset;
@@ -308,13 +334,13 @@ bool ClassDescriptorV2::method_t::Read(Process *process, lldb::addr_t addr,
     m_imp_ptr = extractor.GetAddress_unchecked(&cursor);
   }
 
+  Status error;
   process->ReadCStringFromMemory(m_name_ptr, m_name, error);
-  if (error.Fail()) {
+  if (error.Fail())
     return false;
-  }
 
   process->ReadCStringFromMemory(m_types_ptr, m_types, error);
-  return !error.Fail();
+  return error.Success();
 }
 
 bool ClassDescriptorV2::ivar_list_t::Read(Process *process, lldb::addr_t addr) {
@@ -447,17 +473,19 @@ ClassDescriptorV2::GetMethodList(Process *process,
 bool ClassDescriptorV2::ProcessMethodList(
     std::function<bool(const char *, const char *)> const &instance_method_func,
     ClassDescriptorV2::method_list_t &method_list) const {
-  lldb_private::Process *process = m_runtime.GetProcess();
-  auto method = std::make_unique<method_t>();
-  lldb::addr_t relative_selector_base_addr =
-      m_runtime.GetRelativeSelectorBaseAddr();
-  for (uint32_t i = 0, e = method_list.m_count; i < e; ++i) {
-    method->Read(process, method_list.m_first_ptr + (i * method_list.m_entsize),
-                 relative_selector_base_addr, method_list.m_is_small,
-                 method_list.m_has_direct_selector);
-    if (instance_method_func(method->m_name.c_str(), method->m_types.c_str()))
+  auto idx_to_method_addr = [&](uint32_t idx) {
+    return method_list.m_first_ptr + (idx * method_list.m_entsize);
+  };
+  llvm::SmallVector<addr_t> addresses = llvm::to_vector(llvm::map_range(
+      llvm::seq<uint32_t>(method_list.m_count), idx_to_method_addr));
+
+  llvm::SmallVector<method_t, 0> methods =
+      ReadMethods(addresses, m_runtime.GetRelativeSelectorBaseAddr(),
+                  method_list.m_is_small, method_list.m_has_direct_selector);
+
+  for (const auto &method : methods)
+    if (instance_method_func(method.m_name.c_str(), method.m_types.c_str()))
       break;
-  }
   return true;
 }
 
diff --git a/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.h b/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.h
index 920a5eba20abd..78b33113b59da 100644
--- a/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.h
+++ b/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.h
@@ -172,11 +172,16 @@ class ClassDescriptorV2 : public ObjCLanguageRuntime::ClassDescriptor {
              + field_size; // IMP imp;
     }
 
-    bool Read(Process *process, lldb::addr_t addr,
+    bool Read(DataExtractor &extractor, Process *process, lldb::addr_t addr,
               lldb::addr_t relative_selector_base_addr, bool is_small,
               bool has_direct_sel);
   };
 
+  llvm::SmallVector<method_t, 0>
+  ReadMethods(llvm::ArrayRef<lldb::addr_t> addresses,
+              lldb::addr_t relative_selector_base_addr, bool is_small,
+              bool has_direct_sel) const;
+
   struct ivar_list_t {
     uint32_t m_entsize;
     uint32_t m_count;

Michael137 · 2025-10-14T10:32:00Z

lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp

+  for (auto [idx, addr] : llvm::enumerate(addresses)) {
+    Status error;
+    process->ReadMemory(addr, buffer.data() + idx * size, size, error);
+    if (error.Fail())
+      failed_indices.insert(idx);
  }


Presumably this is where the multi-read packet will come in?

That's why we're doing two passes over the addresses?

Yup, that's correct!

Michael137

SGTM
Collecting all the addresses first and then doing the reads in bulk makes sense

JDevlieghere · 2025-10-14T17:06:48Z

lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp

-                                       bool is_small, bool has_direct_sel) {
-  size_t ptr_size = process->GetAddressByteSize();
-  size_t size = GetSize(process, is_small);
+llvm::SmallVector<ClassDescriptorV2::method_t, 0>


[begin bikeshedding]
Is SmallVector worth it when returning by value? If not, should we take an out parameter and have a sane default for the stack allocated number of elements? Either way, I think it would be helpful to make this a typedef so you don't have to think about the number of elements in the template.
[end bikeshedding]

Is SmallVector worth it when returning by value?

A SmallVector<T, 0> is supposed to be the exact same size as an std::vector, but with a more llvm-friendly API.

Because the small size is set to zero here, the programmer is expressing something: this is expected to always contain a lot of elements.

? If not, should we take an out parameter and have a sane default for the stack allocated number of elements?

Doing this is doing what the compiler already does for us based on most ABIs: https://godbolt.org/z/9Yacveo4o

Either way, I think it would be helpful to make this a typedef so you don't have to think about the number of elements in the template.

I'm generally opposed to this kind of typedef, it adds overhead when reading code, as readers always have to go lookup what the typedef this, to then just find out "oh, this is just a vector, why is this a typedef?!".

I understand it can be annoying to have to spell out the SmallSize, but it really is a small price and there is an expressed intent with this (as mentioned above) (also worth noting that a codebase more friendly to auto would not have that issue 👀 )

Doing this is doing what the compiler already does for us based on most ABIs: https://godbolt.org/z/9Yacveo4o

I wasn't talking about RVO, I meant taking a SmallVectorImpl and letting the caller decide what amount of elements it expects. But that only makes sense if there's a meaningful value. If we knew that most Objective-C classes have less than say 8 methods, that would make the SmallVector an even better choice. It sounds like here you're just doing it for the llvm interface (which is a good enough reason in and by itself).

I'm generally opposed to this kind of typedef, it adds overhead when reading code, as readers always have to go lookup what the typedef this, to then just find out "oh, this is just a vector, why is this a typedef?!".

The argument in favor is that it's abstracting it away on purpose: you decided that the vector "always contain a lot of elements" or at least an unpredictable number of elements (that's how I read that 0) and so you don't have to worry about. It's more about not having to think about the size rather than having to spell it out.

also worth noting that a codebase more friendly to auto would not have that issue 👀

This sounds contradictory to your previous statement: unless you have an IDE that show the types of auto inline, this adds the same amount of (if not more) overhead that you seem to be arguing against.

Anyway, I don't feel strongly. I would have made it a typedef. I won't stand in the way if you don't.

While I will keep this one as is because I think the API is communicating something useful ("this is always big"), I will give the type alias a try on a different PR, which I'll tag you on.

…ctions (llvm#163291) This rewrites ClassDescriptorV2::method_t::Read (and the loop calling that function) in an NFCI way to perform a couple of things: 1. Cleanup code with indirect style. For example, the old loop would have default-constructor a `unique_ptr<method_t>`, which was *reused* on every iteration of the loop. It called `method_t::Read` on each iteration, and never checked the return value prior to invoking the callback. In other words, if `Read` failed, the callback was called on random values. 2. Exposed memory reads that could benefit from the MultiMemoryRead packet proposed in [1]. [1]: https://discourse.llvm.org/t/rfc-a-new-vectorized-memory-read-packet/88441 (cherry picked from commit bed17c0)

felipepiovezan requested review from Michael137 and jasonmolenda October 13, 2025 23:50

felipepiovezan requested a review from JDevlieghere as a code owner October 13, 2025 23:50

llvmbot added the lldb label Oct 13, 2025

felipepiovezan force-pushed the felipe/multimem_objc_refactor branch from 4658fdf to dbfdfc8 Compare October 13, 2025 23:50

Michael137 reviewed Oct 14, 2025

View reviewed changes

Michael137 approved these changes Oct 14, 2025

View reviewed changes

JDevlieghere reviewed Oct 14, 2025

View reviewed changes

felipepiovezan merged commit bed17c0 into llvm:main Oct 14, 2025
13 of 18 checks passed

felipepiovezan deleted the felipe/multimem_objc_refactor branch October 14, 2025 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb][NFCI] Refactor AppleObjCClassDescriptorV2 method_t parsing functions #163291

[lldb][NFCI] Refactor AppleObjCClassDescriptorV2 method_t parsing functions #163291

Uh oh!

felipepiovezan commented Oct 13, 2025

Uh oh!

llvmbot commented Oct 13, 2025

Uh oh!

Michael137 Oct 14, 2025

Uh oh!

Michael137 Oct 14, 2025

Uh oh!

felipepiovezan Oct 14, 2025

Uh oh!

Michael137 left a comment •

edited

Loading

Uh oh!

JDevlieghere Oct 14, 2025

Uh oh!

felipepiovezan Oct 14, 2025 •

edited

Loading

Uh oh!

JDevlieghere Oct 14, 2025

Uh oh!

felipepiovezan Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[lldb][NFCI] Refactor AppleObjCClassDescriptorV2 method_t parsing functions #163291

[lldb][NFCI] Refactor AppleObjCClassDescriptorV2 method_t parsing functions #163291

Uh oh!

Conversation

felipepiovezan commented Oct 13, 2025

Uh oh!

llvmbot commented Oct 13, 2025

Uh oh!

Michael137 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Michael137 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

felipepiovezan Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Michael137 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JDevlieghere Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

felipepiovezan Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JDevlieghere Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

felipepiovezan Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Michael137 left a comment •

edited

Loading

felipepiovezan Oct 14, 2025 •

edited

Loading