[lldb] Implement Process::ReadMemoryRanges #163651

felipepiovezan · 2025-10-15T22:33:02Z

This commit introduces a base-class implementation for a method that reads memory from multiple ranges at once. This implementation simply calls the underlying ReadMemoryFromInferior method on each requested range, intentionally bypassing the memory caching mechanism (though this may be easily changed in the future).

Process implementations that can be perform this operation more efficiently - e.g. with the MultiMemPacket described in 1 - are expected to override this method.

As an example, this commit changes AppleObjCClassDescriptorV2 to use the new API.

Note about the API

In the RFC, we discussed having the API return some kind of class ReadMemoryRangesResult. However, while writing such a class, it became clear that it was merely wrapping a vector, without providing anything useful. For example, this class:

struct ReadMemoryRangesResult {
  ReadMemoryRangesResult(
      llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> ranges)
      : ranges(std::move(ranges)) {}

  llvm::ArrayRef<llvm::MutableArrayRef<uint8_t>> getRanges() const {
    return ranges;
  }

private:
  llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> ranges;
};

As can be seen in the added test and in the added use-case (AppleObjCClassDescriptorV2), users of this API will just iterate over the vector of memory buffers. So they want a return type that can be iterated over, and the vector seems more natural than creating a new class and defining iterators for it.

Likewise, in the RFC, we discussed wrapping the result into an Expected. Upon experimenting with the code, this feels like it limits what the API is able to do as the base class implementation never needs to fail the entire result, it's the individual reads that may fail and this is expressed through a zero-length result. Any derived classes overriding ReadMemoryRanges should also never produce a top level failure: if they did, they can just fall back to the base class implementation, which would produce a better result.

The choice of having the caller allocate a buffer and pass it to Process::ReadMemoryRanges is done mostly to follow conventions already done in the Process class.

llvmbot · 2025-10-15T22:33:35Z

@llvm/pr-subscribers-lldb

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

This commit introduces a base-class implementation for a method that reads memory from multiple ranges at once. This implementation simply calls the underlying ReadMemoryFromInferior method on each requested range, intentionally bypassing the memory caching mechanism (though this may be easily changed in the future).

Process implementations that can be perform this operation more efficiently - e.g. with the MultiMemPacket described in 1 - are expected to override this method.

As an example, this commit changes AppleObjCClassDescriptorV2 to use the new API.

Note about the API

In the RFC, we discussed having the API return some kind of class ReadMemoryRangesResult. However, while writing such a class, it became clear that it was merely wrapping a vector, without providing anything useful. For example, this class:

struct ReadMemoryRangesResult {
  ReadMemoryRangesResult(
      llvm::SmallVector&lt;llvm::MutableArrayRef&lt;uint8_t&gt;&gt; ranges)
      : ranges(std::move(ranges)) {}

  llvm::ArrayRef&lt;llvm::MutableArrayRef&lt;uint8_t&gt;&gt; getRanges() const {
    return ranges;
  }

private:
  llvm::SmallVector&lt;llvm::MutableArrayRef&lt;uint8_t&gt;&gt; ranges;
};

As can be seen in the added test and in the added use-case (AppleObjCClassDescriptorV2), users of this API will just iterate over the vector of memory buffers. So they want a return type that can be iterated over, and the vector seems more natural than creating a new class and defining iterators for it.

Likewise, in the RFC, we discussed wrapping the result into an Expected. Upon experimenting with the code, this feels like it limits what the API is able to do as the base class implementation never needs to fail the entire result, it's the individual reads that may fail and this is expressed through a zero-length result. Any derived classes overriding ReadMemoryRanges should also never produce a top level failure: if they did, they can just fall back to the base class implementation, which would produce a better result.

Full diff: https://github.com/llvm/llvm-project/pull/163651.diff

4 Files Affected:

(modified) lldb/include/lldb/Target/Process.h (+19)
(modified) lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp (+12-11)
(modified) lldb/source/Target/Process.cpp (+28)
(modified) lldb/unittests/Target/MemoryTest.cpp (+62)

diff --git a/lldb/include/lldb/Target/Process.h b/lldb/include/lldb/Target/Process.h
index dc75d98acea70..7a260323b5a3d 100644
--- a/lldb/include/lldb/Target/Process.h
+++ b/lldb/include/lldb/Target/Process.h
@@ -1571,6 +1571,25 @@ class Process : public std::enable_shared_from_this<Process>,
   virtual size_t ReadMemory(lldb::addr_t vm_addr, void *buf, size_t size,
                             Status &error);
 
+  /// Read from multiple memory ranges and write the results into buffer.
+  ///
+  /// \param[in] ranges
+  ///     A collection of ranges (base address + size) to read from.
+  ///
+  /// \param[out] buffer
+  ///     A buffer where the read memory will be written to. It must be at least
+  ///     as long as the sum of the sizes of each range.
+  ///
+  /// \return
+  ///     A vector of MutableArrayRef, where each MutableArrayRef is a slice of
+  ///     the input buffer into which the memory contents were copied. The size
+  ///     of the slice indicates how many bytes were read successfully. Partial
+  ///     reads are always performed from the start of the requested range,
+  ///     never from the middle or end.
+  virtual llvm::SmallVector<llvm::MutableArrayRef<uint8_t>>
+  ReadMemoryRanges(llvm::ArrayRef<Range<lldb::addr_t, size_t>> ranges,
+                   llvm::MutableArrayRef<uint8_t> buffer);
+
   /// Read of memory from a process.
   ///
   /// This function has the same semantics of ReadMemory except that it
diff --git a/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp b/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp
index 6d8f41aef1ffc..7ce4cbf4c61a4 100644
--- a/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp
+++ b/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp
@@ -279,22 +279,23 @@ ClassDescriptorV2::ReadMethods(llvm::ArrayRef<lldb::addr_t> addresses,
   const size_t num_methods = addresses.size();
 
   llvm::SmallVector<uint8_t, 0> buffer(num_methods * size, 0);
-  llvm::DenseSet<uint32_t> failed_indices;
 
-  for (auto [idx, addr] : llvm::enumerate(addresses)) {
-    Status error;
-    process->ReadMemory(addr, buffer.data() + idx * size, size, error);
-    if (error.Fail())
-      failed_indices.insert(idx);
-  }
+  llvm::SmallVector<Range<addr_t, size_t>> mem_ranges =
+      llvm::to_vector(llvm::map_range(llvm::seq(num_methods), [&](size_t idx) {
+        return Range<addr_t, size_t>(addresses[idx], size);
+      }));
+
+  llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> read_results =
+      process->ReadMemoryRanges(mem_ranges, buffer);
 
   llvm::SmallVector<method_t, 0> methods;
   methods.reserve(num_methods);
-  for (auto [idx, addr] : llvm::enumerate(addresses)) {
-    if (failed_indices.contains(idx))
+  for (auto [addr, memory] : llvm::zip(addresses, read_results)) {
+    // Ignore partial reads.
+    if (memory.size() != size)
       continue;
-    DataExtractor extractor(buffer.data() + idx * size, size,
-                            process->GetByteOrder(),
+
+    DataExtractor extractor(memory.data(), size, process->GetByteOrder(),
                             process->GetAddressByteSize());
     methods.push_back(method_t());
     methods.back().Read(extractor, process, addr, relative_selector_base_addr,
diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index 3176852f0b724..9692a7fece7e5 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -1971,6 +1971,34 @@ size_t Process::ReadMemory(addr_t addr, void *buf, size_t size, Status &error) {
   }
 }
 
+llvm::SmallVector<llvm::MutableArrayRef<uint8_t>>
+Process::ReadMemoryRanges(llvm::ArrayRef<Range<lldb::addr_t, size_t>> ranges,
+                          llvm::MutableArrayRef<uint8_t> buffer) {
+  llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> results;
+
+  for (auto [addr, len] : ranges) {
+    // This is either a programmer error, or a protocol violation.
+    // In production builds, gracefully fail.
+    assert(buffer.size() >= len);
+    if (buffer.size() < len) {
+      results.push_back(buffer.take_front(0));
+      continue;
+    }
+
+    Status status;
+    size_t num_bytes_read =
+        ReadMemoryFromInferior(addr, buffer.data(), len, status);
+    // FIXME: ReadMemoryFromInferior promises to return 0 in case of errors, but
+    // it doesn't; it never checks for errors.
+    if (status.Fail())
+      num_bytes_read = 0;
+    results.push_back(buffer.take_front(num_bytes_read));
+    buffer = buffer.drop_front(num_bytes_read);
+  }
+
+  return results;
+}
+
 void Process::DoFindInMemory(lldb::addr_t start_addr, lldb::addr_t end_addr,
                              const uint8_t *buf, size_t size,
                              AddressRanges &matches, size_t alignment,
diff --git a/lldb/unittests/Target/MemoryTest.cpp b/lldb/unittests/Target/MemoryTest.cpp
index 4a96730e00464..1317dd27b256e 100644
--- a/lldb/unittests/Target/MemoryTest.cpp
+++ b/lldb/unittests/Target/MemoryTest.cpp
@@ -17,6 +17,8 @@
 #include "lldb/Utility/ArchSpec.h"
 #include "lldb/Utility/DataBufferHeap.h"
 #include "gtest/gtest.h"
+#include <cstdint>
+#include <random>
 
 using namespace lldb_private;
 using namespace lldb;
@@ -225,3 +227,63 @@ TEST_F(MemoryTest, TesetMemoryCacheRead) {
                                                        // instead of using an
                                                        // old cache
 }
+
+/// A process class that reads `lower_byte(address)` for each `address` it
+/// reads.
+class DummyReaderProcess : public Process {
+public:
+  size_t DoReadMemory(lldb::addr_t vm_addr, void *buf, size_t size,
+                      Status &error) override {
+    uint8_t *buffer = static_cast<uint8_t*>(buf);
+    for(size_t addr = vm_addr; addr < vm_addr + size; addr++)
+      buffer[addr - vm_addr] = addr;
+    return size;
+  }
+  // Boilerplate, nothing interesting below.
+  DummyReaderProcess(lldb::TargetSP target_sp, lldb::ListenerSP listener_sp)
+      : Process(target_sp, listener_sp) {}
+  bool CanDebug(lldb::TargetSP, bool) override { return true; }
+  Status DoDestroy() override { return {}; }
+  void RefreshStateAfterStop() override {}
+  bool DoUpdateThreadList(ThreadList &, ThreadList &) override { return false; }
+  llvm::StringRef GetPluginName() override { return "Dummy"; }
+};
+
+TEST_F(MemoryTest, TestReadMemoryRanges) {
+  ArchSpec arch("x86_64-apple-macosx-");
+
+  Platform::SetHostPlatform(PlatformRemoteMacOSX::CreateInstance(true, &arch));
+
+  DebuggerSP debugger_sp = Debugger::CreateInstance();
+  ASSERT_TRUE(debugger_sp);
+
+  TargetSP target_sp = CreateTarget(debugger_sp, arch);
+  ASSERT_TRUE(target_sp);
+
+  ListenerSP listener_sp(Listener::MakeListener("dummy"));
+  ProcessSP process_sp =
+      std::make_shared<DummyReaderProcess>(target_sp, listener_sp);
+  ASSERT_TRUE(process_sp);
+
+  DummyProcess *process = static_cast<DummyProcess *>(process_sp.get());
+  process->SetMaxReadSize(1024);
+
+  llvm::SmallVector<uint8_t, 0> buffer(1024, 0);
+
+  // Read 8 ranges of 128 bytes, starting at random addresses
+  std::mt19937 rng(42);
+  std::uniform_int_distribution<addr_t> distribution(1, 100000);
+  llvm::SmallVector<Range<addr_t, size_t>> ranges;
+  for (unsigned i = 0; i < 1024; i += 128)
+    ranges.emplace_back(distribution(rng), 128);
+
+  llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> read_results =
+      process->ReadMemoryRanges(ranges, buffer);
+
+  for (auto [range, memory] : llvm::zip(ranges, read_results)) {
+    ASSERT_EQ(memory.size(), 128u);
+    addr_t range_base = range.GetRangeBase();
+    for (auto [idx, byte] : llvm::enumerate(memory))
+      ASSERT_EQ(byte, static_cast<uint8_t>(range_base + idx));
+  }
+}

github-actions · 2025-10-15T22:34:44Z

✅ With the latest revision this PR passed the C/C++ code formatter.

lldb/include/lldb/Target/Process.h

DavidSpickett

What's the thinking for doing this in "ptrace style" where the caller allocates a buffer and gives it to the function, vs. allocating the buffer on behalf of the caller?

The latter would make it impossible to get the buffer size wrong, but on the other hand, the caller would know best how to allocate a given size buffer (stack, heap, and so on).

If the main use case (and the only one right now) is a small number of bytes from a large set of addresses, we're probably talking stack sized allocations not heap. So allowing the caller to set that up would be logical.

I also thought what about a MultiMemWrite, would it make sense to have a symmetric API for that hypothetical packet. If the caller allocates the buffer, the two would be symmetric. And you could do a "gather read" from N places and "scatter write" to N other places.

(no real use case for this, just theory crafting)

It's an internal API so we can always swap later but please include your thinking in the PR description even if it's an arbitrary choice, might be interesting in future.

lldb/include/lldb/Target/Process.h

DavidSpickett · 2025-10-17T09:38:53Z

lldb/source/Target/Process.cpp

+  for (auto [addr, len] : ranges) {
+    // This is either a programmer error, or a protocol violation.
+    // In production builds, gracefully fail.
+    assert(buffer.size() >= len);


I think your algorithm here is:

For each range

Take that much from the front of buffer, making buffer smaller

So if at any point, there aren't "len" bytes left to take, the caller made a mistake and didn't provide enough buffer

I was confused at first because I read the assert backwards. We expect that the remaining buffer size will be >= len of the current range, if it's not then it fails.

But then I see below, take_front(0). And now I'm confused again.

I think some comments would help like:
when the remaining buffer is > range we do this
when it's exactly the same we do this
when it's less than we decide that's an error so we do this

DavidSpickett · 2025-10-17T09:40:30Z

lldb/source/Target/Process.cpp

+    // In production builds, gracefully fail.
+    assert(buffer.size() >= len);
+    if (buffer.size() < len) {
+      results.push_back(buffer.take_front(0));


For instance, this is a bit cryptic without a comment.

I suppose it's because there has to be some pointer into the original buffer, but you are returning 0 bytes for this. Perhaps you can use buffer.data instead to make it clear we do not intend to move the start of the buffer at all?

Seeing 0 here makes me wonder if you meant to type another number.

Here I should have just done emplace_back(nullptr, 0) to be honest, would have made the point more evident.

lldb/source/Target/Process.cpp

DavidSpickett · 2025-10-17T09:42:39Z

lldb/unittests/Target/MemoryTest.cpp

+
+  llvm::SmallVector<uint8_t, 0> buffer(1024, 0);
+
+  // Read 8 ranges of 128 bytes, starting at random addresses


Random in tests rings alarm bells for me but hopefully once I read the rest I'll understand why you need this.

After reading this, I think I understand. The addresses and sizes you put in are somewhat arbitrary.

I think I would prefer to see a few "carefully chosen" sequences instead, or as well. Boundary conditions at least.

This is a relatively simple algorithm to test but if it did fail, I'm not sure the failure reporting would be good enough to know what inputs it was given. Maybe GoogleTest will surprise me though.

Random in tests rings alarm bells for me but hopefully once I read the rest I'll understand why you need this.

Worth mentioning that this is deterministic, as it uses a fixed seed. So every platform and every run will use the exact same data.

I can write some arbitrary sequences instead, it was just simpler to generate the sequence this way

Ok part of this is me being lazy, but future me and anyone else will be even more lazy, so add a comment that it is in fact a fixed seed.

A fixed arbitrary seed that is.

DavidSpickett · 2025-10-17T09:44:06Z

lldb/unittests/Target/MemoryTest.cpp

                                                       // old cache
 }
+
+/// A process class that reads `lower_byte(address)` for each `address` it


What is lower_byte? Do you mean literally it returns the least significant byte of the address? This looks like a function call the way you wrote it.

(the idea is fine overall just the way it's written)

DavidSpickett · 2025-10-17T09:45:23Z

lldb/unittests/Target/MemoryTest.cpp

+                      Status &error) override {
+    uint8_t *buffer = static_cast<uint8_t *>(buf);
+    for (size_t addr = vm_addr; addr < vm_addr + size; addr++)
+      buffer[addr - vm_addr] = addr;


This is a truncation, right? I'd add a static cast to make that clear and keep the compiler happy.

I would also move the comment about the lower byte to here, or repeat it.

felipepiovezan · 2025-10-19T23:19:08Z

What's the thinking for doing this in "ptrace style" where the caller allocates a buffer and gives it to the function, vs. allocating the buffer on behalf of the caller?

No real motivation other than "the other APIs in Process do this, so this must have been discussed in the past", I'll add this to the commit/PR messages. (I'm assuming that's fine, as nobody objected to it in the RFC, but I have no strong opinions).

DavidSpickett · 2025-10-20T09:37:01Z

(I'm assuming that's fine, as nobody objected to it in the RFC, but I have no strong opinions)

I'm fine with it. "when in Rome" is a decent argument when there's no particular reason to go one way or the other.

This commit introduces a base-class implementation for a method that reads memory from multiple ranges at once. This implementation simply calls the underlying `ReadMemoryFromInferior` method on each requested range, intentionally bypassing the memory caching mechanism (though this may be easily changed in the future). `Process` implementations that can be perform this operation more efficiently - e.g. with the MultiMemPacket described in [1] - are expected to override this method. As an example, this commit changes AppleObjCClassDescriptorV2 to use the new API. Note about the API ------------------ In the RFC, we discussed having the API return some kind of class `ReadMemoryRangesResult`. However, while writing such a class, it became clear that it was merely wrapping a vector, without providing anything useful. For example, this class: ``` struct ReadMemoryRangesResult { ReadMemoryRangesResult( llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> ranges) : ranges(std::move(ranges)) {} llvm::ArrayRef<llvm::MutableArrayRef<uint8_t>> getRanges() const { return ranges; } private: llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> ranges; }; ``` As can be seen in the added test and in the added use-case (AppleObjCClassDescriptorV2), users of this API will just iterate over the vector of memory buffers. So they want a return type that can be iterated over, and the vector seems more natural than creating a new class and defining iterators for it. Likewise, in the RFC, we discussed wrapping the result into an `Expected`. Upon experimenting with the code, this feels like it limits what the API is able to do as the base class implementation never needs to fail the entire result, it's the individual reads that may fail and this is expressed through a zero-length result. Any derived classes overriding `ReadMemoryRanges` should also never produce a top level failure: if they did, they can just fall back to the base class implementation, which would produce a better result. The choice of having the caller allocate a buffer and pass it to `Process::ReadMemoryRanges` is done mostly to follow conventions already done in the Process class. [1]: https://discourse.llvm.org/t/rfc-a-new-vectorized-memory-read-packet/

felipepiovezan · 2025-10-20T16:52:50Z

Addressed review comments

felipepiovezan · 2025-10-20T16:54:21Z

I moved some of the assertions out of the main loop to simplify the algorithm a bit.

felipepiovezan · 2025-10-20T20:01:50Z

Btw, the ProcessGDBRemote impl is here: #164311
I haven't added any reviews over there yet because that PR depends on this one, I'll do it once I merge this PR.

felipepiovezan · 2025-10-20T22:17:24Z

The self-hosted runner lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

The AArch64 bot seems to be having some infra issues.

jasonmolenda

LGTM, I think you addressed David's feedback in the commit today.

DavidSpickett

I agree with all the behaviour as is, needs a few more tests.

DavidSpickett · 2025-10-21T10:20:02Z

lldb/source/Target/Process.cpp

+  // In production builds, gracefully fail by returning empty chunks.
+  assert(buffer.size() >= total_ranges_len);
+  if (buffer.size() < total_ranges_len)
+    return llvm::SmallVector<llvm::MutableArrayRef<uint8_t>>(ranges.size());


Add an explicit zero value here, it took me ages to parse what this meant (which is a "C++ be like" sort of issue, but still, make it clearer).

Also you say "chunks" but these are not chunks, we don't return chunks as such we return a description of a chunk. I suggest "returning a length of 0 for all ranges".

lldb/unittests/Target/MemoryTest.cpp

lldb/source/Target/Process.cpp

felipepiovezan · 2025-10-21T15:12:50Z

Added death tests, removed references to "chunks" in the comments, rewrote return llvm::SmallVector<llvm::MutableArrayRef<uint8_t>>(ranges.size()); into something more palatable

DavidSpickett · 2025-10-21T15:20:51Z

lldb/unittests/Target/MemoryTest.cpp

+
+  llvm::SmallVector<uint8_t, 0> too_short_buffer(10, 0);
+  llvm::SmallVector<Range<addr_t, size_t>> ranges = {{0x12345, 128}};
+  ASSERT_DEATH(


I'm not sure whether ASSERT_DEATH incorporates a check whether assertions are on, so please check that if you didn't already.

Though I would like to see the non-assert path anyway. So you'll need an ifdef for that.

DavidSpickett

I would like to see the non-assert side of the ASSERT_DEATH paths tested (in non-asserts builds, not in the same build of course).

Otherwise LGTM so in the interests of time I'm approving and I'm sure you'll figure it out.

felipepiovezan · 2025-10-21T17:53:37Z

Added release-mode testing. I found out that gtest has a ASSERT_DEBUG_DEATH that just executes statements in release builds (without expecting failures), that's how they recommend doing it (followed by some ifdefs)

felipepiovezan · 2025-10-21T19:33:52Z

The AArch bot is still failing with the message below, so I am going to bypass it.

The self-hosted runner lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

DavidSpickett · 2025-10-22T09:42:08Z

The unit tests are failing on Arm 32-bit, I'll figure it out.

Tests added by #163651. Use lldb::addr_t (which is always 64-bit) for all addresses so that we don't calculate an invalid address on 32-bit and segfault. As happened on Linaro's Arm 32-bit buildbot.

This commit makes use of the newly created MultiMemRead packet to provide an efficient implementation of MultiMemRead inside ProcessGDBRemote. Testing is tricky, but it is accomplished two ways: 1. Some Objective-C tests would fail if this were implemented incorrectly, as there is already an in-tree use of the base class implementation of MultiMemRead, which is now getting replaced by the derived class. 2. One Objective-C test is modified so that we ensure the packet is being sent by looking at the packet logs. While not the most elegant solution, it is a strategy adopted in other tests as well. This gets around the fact that we cannot instantiate / unittest a mock ProcessGDBRemote. Depends on #163651

…nges (#164311) This commit makes use of the newly created MultiMemRead packet to provide an efficient implementation of MultiMemRead inside ProcessGDBRemote. Testing is tricky, but it is accomplished two ways: 1. Some Objective-C tests would fail if this were implemented incorrectly, as there is already an in-tree use of the base class implementation of MultiMemRead, which is now getting replaced by the derived class. 2. One Objective-C test is modified so that we ensure the packet is being sent by looking at the packet logs. While not the most elegant solution, it is a strategy adopted in other tests as well. This gets around the fact that we cannot instantiate / unittest a mock ProcessGDBRemote. Depends on llvm/llvm-project#163651

felipepiovezan · 2025-10-22T16:31:25Z

The unit tests are failing on Arm 32-bit, I'll figure it out.

thank you!

This commit introduces a base-class implementation for a method that reads memory from multiple ranges at once. This implementation simply calls the underlying `ReadMemoryFromInferior` method on each requested range, intentionally bypassing the memory caching mechanism (though this may be easily changed in the future). `Process` implementations that can be perform this operation more efficiently - e.g. with the MultiMemPacket described in [1] - are expected to override this method. As an example, this commit changes AppleObjCClassDescriptorV2 to use the new API. Note about the API ------------------ In the RFC, we discussed having the API return some kind of class `ReadMemoryRangesResult`. However, while writing such a class, it became clear that it was merely wrapping a vector, without providing anything useful. For example, this class: ``` struct ReadMemoryRangesResult { ReadMemoryRangesResult( llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> ranges) : ranges(std::move(ranges)) {} llvm::ArrayRef<llvm::MutableArrayRef<uint8_t>> getRanges() const { return ranges; } private: llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> ranges; }; ``` As can be seen in the added test and in the added use-case (AppleObjCClassDescriptorV2), users of this API will just iterate over the vector of memory buffers. So they want a return type that can be iterated over, and the vector seems more natural than creating a new class and defining iterators for it. Likewise, in the RFC, we discussed wrapping the result into an `Expected`. Upon experimenting with the code, this feels like it limits what the API is able to do as the base class implementation never needs to fail the entire result, it's the individual reads that may fail and this is expressed through a zero-length result. Any derived classes overriding `ReadMemoryRanges` should also never produce a top level failure: if they did, they can just fall back to the base class implementation, which would produce a better result. The choice of having the caller allocate a buffer and pass it to `Process::ReadMemoryRanges` is done mostly to follow conventions already done in the Process class. [1]: https://discourse.llvm.org/t/rfc-a-new-vectorized-memory-read-packet/ (cherry picked from commit f2cb997)

…164311) This commit makes use of the newly created MultiMemRead packet to provide an efficient implementation of MultiMemRead inside ProcessGDBRemote. Testing is tricky, but it is accomplished two ways: 1. Some Objective-C tests would fail if this were implemented incorrectly, as there is already an in-tree use of the base class implementation of MultiMemRead, which is now getting replaced by the derived class. 2. One Objective-C test is modified so that we ensure the packet is being sent by looking at the packet logs. While not the most elegant solution, it is a strategy adopted in other tests as well. This gets around the fact that we cannot instantiate / unittest a mock ProcessGDBRemote. Depends on llvm#163651 (cherry picked from commit 276bccd)

This commit introduces a base-class implementation for a method that reads memory from multiple ranges at once. This implementation simply calls the underlying `ReadMemoryFromInferior` method on each requested range, intentionally bypassing the memory caching mechanism (though this may be easily changed in the future). `Process` implementations that can be perform this operation more efficiently - e.g. with the MultiMemPacket described in [1] - are expected to override this method. As an example, this commit changes AppleObjCClassDescriptorV2 to use the new API. Note about the API ------------------ In the RFC, we discussed having the API return some kind of class `ReadMemoryRangesResult`. However, while writing such a class, it became clear that it was merely wrapping a vector, without providing anything useful. For example, this class: ``` struct ReadMemoryRangesResult { ReadMemoryRangesResult( llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> ranges) : ranges(std::move(ranges)) {} llvm::ArrayRef<llvm::MutableArrayRef<uint8_t>> getRanges() const { return ranges; } private: llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> ranges; }; ``` As can be seen in the added test and in the added use-case (AppleObjCClassDescriptorV2), users of this API will just iterate over the vector of memory buffers. So they want a return type that can be iterated over, and the vector seems more natural than creating a new class and defining iterators for it. Likewise, in the RFC, we discussed wrapping the result into an `Expected`. Upon experimenting with the code, this feels like it limits what the API is able to do as the base class implementation never needs to fail the entire result, it's the individual reads that may fail and this is expressed through a zero-length result. Any derived classes overriding `ReadMemoryRanges` should also never produce a top level failure: if they did, they can just fall back to the base class implementation, which would produce a better result. The choice of having the caller allocate a buffer and pass it to `Process::ReadMemoryRanges` is done mostly to follow conventions already done in the Process class. [1]: https://discourse.llvm.org/t/rfc-a-new-vectorized-memory-read-packet/ (cherry picked from commit f2cb997) (cherry picked from commit bccb899)

…164311) This commit makes use of the newly created MultiMemRead packet to provide an efficient implementation of MultiMemRead inside ProcessGDBRemote. Testing is tricky, but it is accomplished two ways: 1. Some Objective-C tests would fail if this were implemented incorrectly, as there is already an in-tree use of the base class implementation of MultiMemRead, which is now getting replaced by the derived class. 2. One Objective-C test is modified so that we ensure the packet is being sent by looking at the packet logs. While not the most elegant solution, it is a strategy adopted in other tests as well. This gets around the fact that we cannot instantiate / unittest a mock ProcessGDBRemote. Depends on llvm#163651 (cherry picked from commit 276bccd) (cherry picked from commit 8e1255f)

Tests added by llvm#163651. Use lldb::addr_t (which is always 64-bit) for all addresses so that we don't calculate an invalid address on 32-bit and segfault. As happened on Linaro's Arm 32-bit buildbot.

…164311) This commit makes use of the newly created MultiMemRead packet to provide an efficient implementation of MultiMemRead inside ProcessGDBRemote. Testing is tricky, but it is accomplished two ways: 1. Some Objective-C tests would fail if this were implemented incorrectly, as there is already an in-tree use of the base class implementation of MultiMemRead, which is now getting replaced by the derived class. 2. One Objective-C test is modified so that we ensure the packet is being sent by looking at the packet logs. While not the most elegant solution, it is a strategy adopted in other tests as well. This gets around the fact that we cannot instantiate / unittest a mock ProcessGDBRemote. Depends on llvm#163651

felipepiovezan requested a review from JDevlieghere as a code owner October 15, 2025 22:33

llvmbot added the lldb label Oct 15, 2025

felipepiovezan requested review from jasonmolenda and jimingham October 15, 2025 22:33

felipepiovezan requested a review from DavidSpickett October 15, 2025 22:33

felipepiovezan force-pushed the felipe/multimem_baseline_process_impl branch from 4f47055 to d6f54b4 Compare October 15, 2025 22:35

jasonmolenda reviewed Oct 15, 2025

View reviewed changes

lldb/include/lldb/Target/Process.h Show resolved Hide resolved

DavidSpickett reviewed Oct 17, 2025

View reviewed changes

felipepiovezan added 2 commits October 20, 2025 04:52

fixup! Address review comments

58d7eda

felipepiovezan force-pushed the felipe/multimem_baseline_process_impl branch from d6f54b4 to 58d7eda Compare October 20, 2025 16:52

felipepiovezan mentioned this pull request Oct 20, 2025

[lldb] Implement ProcessGDBRemote support for ReadMemoryRanges #164311

Merged

felipepiovezan requested review from DavidSpickett and jasonmolenda October 20, 2025 22:15

jasonmolenda approved these changes Oct 21, 2025

View reviewed changes

DavidSpickett requested changes Oct 21, 2025

View reviewed changes

fixup! Remove dead code from test

1bcac34

DavidSpickett reviewed Oct 21, 2025

View reviewed changes

lldb/source/Target/Process.cpp Show resolved Hide resolved

felipepiovezan added 3 commits October 21, 2025 07:41

fixup! Add test for reading fewer bytes than requested

52aae17

fixup! rewrite empty response for clarity

d498717

fixup! add death tests

1b470fb

felipepiovezan force-pushed the felipe/multimem_baseline_process_impl branch from a10d92a to 1b470fb Compare October 21, 2025 15:11

fixup! remove chunk references

c1434e4

felipepiovezan requested a review from DavidSpickett October 21, 2025 15:12

fixup! clang-format

6e13b80

DavidSpickett reviewed Oct 21, 2025

View reviewed changes

DavidSpickett approved these changes Oct 21, 2025

View reviewed changes

fixup! use ASSERT_DEBUG_DEATH

f5e4358

felipepiovezan merged commit f2cb997 into llvm:main Oct 21, 2025
9 of 10 checks passed

felipepiovezan deleted the felipe/multimem_baseline_process_impl branch October 21, 2025 19:34


		llvm::SmallVector<uint8_t, 0> buffer(1024, 0);

		// Read 8 ranges of 128 bytes, starting at random addresses

Uh oh!

[lldb] Implement Process::ReadMemoryRanges #163651

[lldb] Implement Process::ReadMemoryRanges #163651

Uh oh!

Conversation

felipepiovezan commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note about the API

Uh oh!

llvmbot commented Oct 15, 2025

Note about the API

Uh oh!

github-actions bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DavidSpickett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felipepiovezan commented Oct 19, 2025

Uh oh!

DavidSpickett commented Oct 20, 2025

Uh oh!

felipepiovezan commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felipepiovezan commented Oct 20, 2025

Uh oh!

felipepiovezan commented Oct 20, 2025

Uh oh!

felipepiovezan commented Oct 20, 2025

Uh oh!

jasonmolenda left a comment

Choose a reason for hiding this comment

Uh oh!

DavidSpickett left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

felipepiovezan commented Oct 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DavidSpickett left a comment

Choose a reason for hiding this comment

Uh oh!

felipepiovezan commented Oct 21, 2025

Uh oh!

felipepiovezan commented Oct 21, 2025

Uh oh!

Uh oh!

felipepiovezan commented Oct 15, 2025 •

edited

Loading

github-actions bot commented Oct 15, 2025 •

edited

Loading

felipepiovezan commented Oct 20, 2025 •

edited

Loading