[lldb][NFC] Change ObjectFile's DataExtractor to a shared ptr #170066

jasonmolenda · 2025-12-01T02:04:04Z

ObjectFile has an m_data DataExtractor ivar which may be default constructed initially, or initialized with a DataBuffer passed in to its ctor. If the DataExtractor does not get a DataBuffer source passed in, the subclass will initialize it with access to the object file's data. When a DataBuffer is passed in to the base class ctor, the DataExtractor only has its buffer initialized; ObjectFile doesn't yet know the address size and endianness to fully initialize the DataExtractor.

This patch changes ObjectFile to instead have a DataExtractorSP ivar which is always initialized with at least a default-constructed DataExtractor object in the base class ctor. The next patch I will be writing is to change the ObjectFile ctor to take an optional DataExtractorSP, so the caller can pass a DataExtractor subclass -- the VirtualizeDataExtractor being added via
#168802
instead of a DataBuffer which is trivially saved into the DataExtractor.

The change is otherwise mechanical; all m_data. changed to m_data_sp-> and all the places where m_data was passed in for a by-ref call were changed to *m_data_sp.get(). The shared pointer is always initialized to contain an object.

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for getting the Linux testsuite to run on SME-only systems David). All of the ObjectFile subclasses I modifed compile cleanly, but I haven't tested them beyond any unit tests they may have (prob breakpad).

rdar://148939795

ObjectFile has an m_data DataExtractor ivar which may be default constructed initially, or initialized with a DataBuffer passed in to a ctor. Subclasses will provide the DataExtrator with a Buffer source if not. When a DataBuffer is passed in to the base class ctor, the DataExtractor only has its buffer initalized; we don't yet know the address size and endianness to fully initialize the DataExtractor. This patch changes ObjectFile to instead have a DataExtractorSP ivar which is always initialized with at least a default-constructed DataExtractor object in the base class ctor. The next patch I will be writing is to change the ObjectFile ctor which accepts a DataBuffer to instead accept a DataExtractorSP, so the caller can intialize it with a DataExtractor subclass -- the VirtualizeDataExtractor being added in llvm#168802 The change is otherwise mechanical; all `m_data.` changed to `m_data_up->` and all the places where `m_data` was passed in for a by-ref call were changed to `*m_data_up.get()`. The unique pointer is always initialized to contain an object. I can't remember off hand if I'm making a mistake using a unique_ptr here, given that the ctor may take a DataExtractor as an argument. The caller will have to do std::move(extractor_up) when it calls the ObjectFile ctor for correct behavior. Even though a unique_ptr makes sense internal to ObjectFile, given that it can be passed as an argument, should I use the more straightforward shared_ptr? An ObjectFile only has one of them, so the extra storage for the refcount isn't important. I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for getting the Linux testsuite to run on SME-only systems David). All of the ObjectFile subclasses I modifed compile cleanly, but I haven't tested them beyond any unit tests they may have (prob breakpad). rdar://148939795

llvmbot · 2025-12-01T02:04:43Z

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

ObjectFile has an m_data DataExtractor ivar which may be default constructed initially, or initialized with a DataBuffer passed in to a ctor. Subclasses will provide the DataExtrator with a Buffer source if not. When a DataBuffer is passed in to the base class ctor, the DataExtractor only has its buffer initalized; we don't yet know the address size and endianness to fully initialize the DataExtractor.

This patch changes ObjectFile to instead have a DataExtractorSP ivar which is always initialized with at least a default-constructed DataExtractor object in the base class ctor. The next patch I will be writing is to change the ObjectFile ctor which accepts a DataBuffer to instead accept a DataExtractorSP, so the caller can intialize it with a DataExtractor subclass -- the VirtualizeDataExtractor being added in
#168802

The change is otherwise mechanical; all m_data. changed to m_data_up-> and all the places where m_data was passed in for a by-ref call were changed to *m_data_up.get(). The unique pointer is always initialized to contain an object.

I can't remember off hand if I'm making a mistake using a unique_ptr here, given that the ctor may take a DataExtractor as an argument. The caller will have to do std::move(extractor_up) when it calls the ObjectFile ctor for correct behavior. Even though a unique_ptr makes sense internal to ObjectFile, given that it can be passed as an argument, should I use the more straightforward shared_ptr? An ObjectFile only has one of them, so the extra storage for the refcount isn't important.

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for getting the Linux testsuite to run on SME-only systems David). All of the ObjectFile subclasses I modifed compile cleanly, but I haven't tested them beyond any unit tests they may have (prob breakpad).

rdar://148939795

Patch is 54.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170066.diff

11 Files Affected:

(modified) lldb/include/lldb/Symbol/ObjectFile.h (+6-3)
(modified) lldb/include/lldb/lldb-forward.h (+1)
(modified) lldb/source/Expression/ObjectFileJIT.cpp (+6-4)
(modified) lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp (+4-4)
(modified) lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp (+2-2)
(modified) lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp (+14-12)
(modified) lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp (+126-117)
(modified) lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp (+45-41)
(modified) lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp (+1-1)
(modified) lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp (+6-6)
(modified) lldb/source/Symbol/ObjectFile.cpp (+12-10)

diff --git a/lldb/include/lldb/Symbol/ObjectFile.h b/lldb/include/lldb/Symbol/ObjectFile.h
index 1de08a8576507..23653fd43d863 100644
--- a/lldb/include/lldb/Symbol/ObjectFile.h
+++ b/lldb/include/lldb/Symbol/ObjectFile.h
@@ -418,7 +418,7 @@ class ObjectFile : public std::enable_shared_from_this<ObjectFile>,
   /// Attempts to parse the object header.
   ///
   /// This function is used as a test to see if a given plug-in instance can
-  /// parse the header data already contained in ObjectFile::m_data. If an
+  /// parse the header data already contained in ObjectFile::m_data_up. If an
   /// object file parser does not recognize that magic bytes in a header,
   /// false should be returned and the next plug-in can attempt to parse an
   /// object file.
@@ -786,8 +786,11 @@ class ObjectFile : public std::enable_shared_from_this<ObjectFile>,
   lldb::addr_t m_length; ///< The length of this object file if it is known (can
                          ///be zero if length is unknown or can't be
                          ///determined).
-  DataExtractor
-      m_data; ///< The data for this object file so things can be parsed lazily.
+  lldb::DataExtractorUP
+      m_data_up; ///< The data for this object file so things
+                 ///< can be parsed lazily.  This unique pointer
+                 ///< will always have a DataExtractor object,
+                 ///< although it may only be default-constructed.
   lldb::ProcessWP m_process_wp;
   /// Set if the object file only exists in memory.
   const lldb::addr_t m_memory_addr;
diff --git a/lldb/include/lldb/lldb-forward.h b/lldb/include/lldb/lldb-forward.h
index c8e2e97953aa4..09aa036d27875 100644
--- a/lldb/include/lldb/lldb-forward.h
+++ b/lldb/include/lldb/lldb-forward.h
@@ -342,6 +342,7 @@ typedef std::shared_ptr<lldb_private::CompileUnit> CompUnitSP;
 typedef std::shared_ptr<lldb_private::DataBuffer> DataBufferSP;
 typedef std::shared_ptr<lldb_private::WritableDataBuffer> WritableDataBufferSP;
 typedef std::shared_ptr<lldb_private::DataExtractor> DataExtractorSP;
+typedef std::unique_ptr<lldb_private::DataExtractor> DataExtractorUP;
 typedef std::shared_ptr<lldb_private::Debugger> DebuggerSP;
 typedef std::weak_ptr<lldb_private::Debugger> DebuggerWP;
 typedef std::shared_ptr<lldb_private::Disassembler> DisassemblerSP;
diff --git a/lldb/source/Expression/ObjectFileJIT.cpp b/lldb/source/Expression/ObjectFileJIT.cpp
index e4a613551d22e..f64b0fb463e8a 100644
--- a/lldb/source/Expression/ObjectFileJIT.cpp
+++ b/lldb/source/Expression/ObjectFileJIT.cpp
@@ -73,8 +73,8 @@ ObjectFileJIT::ObjectFileJIT(const lldb::ModuleSP &module_sp,
     : ObjectFile(module_sp, nullptr, 0, 0, DataBufferSP(), 0), m_delegate_wp() {
   if (delegate_sp) {
     m_delegate_wp = delegate_sp;
-    m_data.SetByteOrder(delegate_sp->GetByteOrder());
-    m_data.SetAddressByteSize(delegate_sp->GetAddressByteSize());
+    m_data_up->SetByteOrder(delegate_sp->GetByteOrder());
+    m_data_up->SetAddressByteSize(delegate_sp->GetAddressByteSize());
   }
 }
 
@@ -85,12 +85,14 @@ bool ObjectFileJIT::ParseHeader() {
   return false;
 }
 
-ByteOrder ObjectFileJIT::GetByteOrder() const { return m_data.GetByteOrder(); }
+ByteOrder ObjectFileJIT::GetByteOrder() const {
+  return m_data_up->GetByteOrder();
+}
 
 bool ObjectFileJIT::IsExecutable() const { return false; }
 
 uint32_t ObjectFileJIT::GetAddressByteSize() const {
-  return m_data.GetAddressByteSize();
+  return m_data_up->GetAddressByteSize();
 }
 
 void ObjectFileJIT::ParseSymtab(Symtab &symtab) {
diff --git a/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp b/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp
index 33673f139b49a..15e34f99454a3 100644
--- a/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp
+++ b/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp
@@ -130,13 +130,13 @@ void ObjectFileBreakpad::CreateSections(SectionList &unified_section_list) {
 
   std::optional<Record::Kind> current_section;
   offset_t section_start;
-  llvm::StringRef text = toStringRef(m_data.GetData());
+  llvm::StringRef text = toStringRef(m_data_up->GetData());
   uint32_t next_section_id = 1;
   auto maybe_add_section = [&](const uint8_t *end_ptr) {
     if (!current_section)
       return; // We have been called before parsing the first line.
 
-    offset_t end_offset = end_ptr - m_data.GetDataStart();
+    offset_t end_offset = end_ptr - m_data_up->GetDataStart();
     auto section_sp = std::make_shared<Section>(
         GetModule(), this, next_section_id++,
         ConstString(toString(*current_section)), eSectionTypeOther,
@@ -162,8 +162,8 @@ void ObjectFileBreakpad::CreateSections(SectionList &unified_section_list) {
     maybe_add_section(line.bytes_begin());
     // And start a new one.
     current_section = next_section;
-    section_start = line.bytes_begin() - m_data.GetDataStart();
+    section_start = line.bytes_begin() - m_data_up->GetDataStart();
   }
   // Finally, add the last section.
-  maybe_add_section(m_data.GetDataEnd());
+  maybe_add_section(m_data_up->GetDataEnd());
 }
diff --git a/lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp b/lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp
index 1121f696637b6..9def78644150a 100644
--- a/lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp
+++ b/lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp
@@ -300,8 +300,8 @@ bool ObjectFileCOFF::ParseHeader() {
 
   std::lock_guard<std::recursive_mutex> guard(module->GetMutex());
 
-  m_data.SetByteOrder(eByteOrderLittle);
-  m_data.SetAddressByteSize(GetAddressByteSize());
+  m_data_up->SetByteOrder(eByteOrderLittle);
+  m_data_up->SetAddressByteSize(GetAddressByteSize());
 
   return true;
 }
diff --git a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
index 3968715a6d215..323881a202bb0 100644
--- a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
+++ b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
@@ -804,7 +804,7 @@ ByteOrder ObjectFileELF::GetByteOrder() const {
 }
 
 uint32_t ObjectFileELF::GetAddressByteSize() const {
-  return m_data.GetAddressByteSize();
+  return m_data_up->GetAddressByteSize();
 }
 
 AddressClass ObjectFileELF::GetAddressClass(addr_t file_addr) {
@@ -845,7 +845,7 @@ size_t ObjectFileELF::SectionIndex(const SectionHeaderCollConstIter &I) const {
 
 bool ObjectFileELF::ParseHeader() {
   lldb::offset_t offset = 0;
-  return m_header.Parse(m_data, &offset);
+  return m_header.Parse(*m_data_up.get(), &offset);
 }
 
 UUID ObjectFileELF::GetUUID() {
@@ -881,7 +881,7 @@ UUID ObjectFileELF::GetUUID() {
         return UUID();
 
       core_notes_crc =
-          CalculateELFNotesSegmentsCRC32(m_program_headers, m_data);
+          CalculateELFNotesSegmentsCRC32(m_program_headers, *m_data_up.get());
 
       if (core_notes_crc) {
         // Use 8 bytes - first 4 bytes for *magic* prefix, mainly to make it
@@ -892,7 +892,7 @@ UUID ObjectFileELF::GetUUID() {
       }
     } else {
       if (!m_gnu_debuglink_crc)
-        m_gnu_debuglink_crc = calc_crc32(0, m_data);
+        m_gnu_debuglink_crc = calc_crc32(0, *m_data_up.get());
       if (m_gnu_debuglink_crc) {
         // Use 4 bytes of crc from the .gnu_debuglink section.
         u32le data(m_gnu_debuglink_crc);
@@ -1078,7 +1078,8 @@ size_t ObjectFileELF::GetProgramHeaderInfo(ProgramHeaderColl &program_headers,
 
 // ParseProgramHeaders
 bool ObjectFileELF::ParseProgramHeaders() {
-  return GetProgramHeaderInfo(m_program_headers, m_data, m_header) != 0;
+  return GetProgramHeaderInfo(m_program_headers, *m_data_up.get(), m_header) !=
+         0;
 }
 
 lldb_private::Status
@@ -1668,8 +1669,8 @@ ObjectFileELF::StripLinkerSymbolAnnotations(llvm::StringRef symbol_name) const {
 
 // ParseSectionHeaders
 size_t ObjectFileELF::ParseSectionHeaders() {
-  return GetSectionHeaderInfo(m_section_headers, m_data, m_header, m_uuid,
-                              m_gnu_debuglink_file, m_gnu_debuglink_crc,
+  return GetSectionHeaderInfo(m_section_headers, *m_data_up.get(), m_header,
+                              m_uuid, m_gnu_debuglink_file, m_gnu_debuglink_crc,
                               m_arch_spec);
 }
 
@@ -3678,7 +3679,8 @@ ArchSpec ObjectFileELF::GetArchitecture() {
       if (H.p_type != PT_NOTE || H.p_offset == 0 || H.p_filesz == 0)
         continue;
       DataExtractor data;
-      if (data.SetData(m_data, H.p_offset, H.p_filesz) == H.p_filesz) {
+      if (data.SetData(*m_data_up.get(), H.p_offset, H.p_filesz) ==
+          H.p_filesz) {
         UUID uuid;
         RefineModuleDetailsFromNote(data, m_arch_spec, uuid);
       }
@@ -3833,10 +3835,10 @@ llvm::ArrayRef<ELFProgramHeader> ObjectFileELF::ProgramHeaders() {
 }
 
 DataExtractor ObjectFileELF::GetSegmentData(const ELFProgramHeader &H) {
-  // Try and read the program header from our cached m_data which can come from
-  // the file on disk being mmap'ed or from the initial part of the ELF file we
-  // read from memory and cached.
-  DataExtractor data = DataExtractor(m_data, H.p_offset, H.p_filesz);
+  // Try and read the program header from our cached m_data_up which can come
+  // from the file on disk being mmap'ed or from the initial part of the ELF
+  // file we read from memory and cached.
+  DataExtractor data = DataExtractor(*m_data_up.get(), H.p_offset, H.p_filesz);
   if (data.GetByteSize() == H.p_filesz)
     return data;
   if (IsInMemory()) {
diff --git a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
index 2218c23db5a95..4b2e82643efb0 100644
--- a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
+++ b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
@@ -1012,35 +1012,35 @@ bool ObjectFileMachO::ParseHeader() {
   std::lock_guard<std::recursive_mutex> guard(module_sp->GetMutex());
   bool can_parse = false;
   lldb::offset_t offset = 0;
-  m_data.SetByteOrder(endian::InlHostByteOrder());
+  m_data_up->SetByteOrder(endian::InlHostByteOrder());
   // Leave magic in the original byte order
-  m_header.magic = m_data.GetU32(&offset);
+  m_header.magic = m_data_up->GetU32(&offset);
   switch (m_header.magic) {
   case MH_MAGIC:
-    m_data.SetByteOrder(endian::InlHostByteOrder());
-    m_data.SetAddressByteSize(4);
+    m_data_up->SetByteOrder(endian::InlHostByteOrder());
+    m_data_up->SetAddressByteSize(4);
     can_parse = true;
     break;
 
   case MH_MAGIC_64:
-    m_data.SetByteOrder(endian::InlHostByteOrder());
-    m_data.SetAddressByteSize(8);
+    m_data_up->SetByteOrder(endian::InlHostByteOrder());
+    m_data_up->SetAddressByteSize(8);
     can_parse = true;
     break;
 
   case MH_CIGAM:
-    m_data.SetByteOrder(endian::InlHostByteOrder() == eByteOrderBig
-                            ? eByteOrderLittle
-                            : eByteOrderBig);
-    m_data.SetAddressByteSize(4);
+    m_data_up->SetByteOrder(endian::InlHostByteOrder() == eByteOrderBig
+                                ? eByteOrderLittle
+                                : eByteOrderBig);
+    m_data_up->SetAddressByteSize(4);
     can_parse = true;
     break;
 
   case MH_CIGAM_64:
-    m_data.SetByteOrder(endian::InlHostByteOrder() == eByteOrderBig
-                            ? eByteOrderLittle
-                            : eByteOrderBig);
-    m_data.SetAddressByteSize(8);
+    m_data_up->SetByteOrder(endian::InlHostByteOrder() == eByteOrderBig
+                                ? eByteOrderLittle
+                                : eByteOrderBig);
+    m_data_up->SetAddressByteSize(8);
     can_parse = true;
     break;
 
@@ -1049,12 +1049,13 @@ bool ObjectFileMachO::ParseHeader() {
   }
 
   if (can_parse) {
-    m_data.GetU32(&offset, &m_header.cputype, 6);
+    m_data_up->GetU32(&offset, &m_header.cputype, 6);
 
     ModuleSpecList all_specs;
     ModuleSpec base_spec;
-    GetAllArchSpecs(m_header, m_data, MachHeaderSizeFromMagic(m_header.magic),
-                    base_spec, all_specs);
+    GetAllArchSpecs(m_header, *m_data_up.get(),
+                    MachHeaderSizeFromMagic(m_header.magic), base_spec,
+                    all_specs);
 
     for (unsigned i = 0, e = all_specs.GetSize(); i != e; ++i) {
       ArchSpec mach_arch =
@@ -1068,7 +1069,7 @@ bool ObjectFileMachO::ParseHeader() {
       if (SetModulesArchitecture(mach_arch)) {
         const size_t header_and_lc_size =
             m_header.sizeofcmds + MachHeaderSizeFromMagic(m_header.magic);
-        if (m_data.GetByteSize() < header_and_lc_size) {
+        if (m_data_up->GetByteSize() < header_and_lc_size) {
           DataBufferSP data_sp;
           ProcessSP process_sp(m_process_wp.lock());
           if (process_sp) {
@@ -1080,7 +1081,7 @@ bool ObjectFileMachO::ParseHeader() {
               continue;
           }
           if (data_sp)
-            m_data.SetData(data_sp);
+            m_data_up->SetData(data_sp);
         }
       }
       return true;
@@ -1094,7 +1095,7 @@ bool ObjectFileMachO::ParseHeader() {
 }
 
 ByteOrder ObjectFileMachO::GetByteOrder() const {
-  return m_data.GetByteOrder();
+  return m_data_up->GetByteOrder();
 }
 
 bool ObjectFileMachO::IsExecutable() const {
@@ -1114,7 +1115,7 @@ bool ObjectFileMachO::IsKext() const {
 }
 
 uint32_t ObjectFileMachO::GetAddressByteSize() const {
-  return m_data.GetAddressByteSize();
+  return m_data_up->GetAddressByteSize();
 }
 
 AddressClass ObjectFileMachO::GetAddressClass(lldb::addr_t file_addr) {
@@ -1297,13 +1298,13 @@ bool ObjectFileMachO::IsStripped() {
         const lldb::offset_t load_cmd_offset = offset;
 
         llvm::MachO::load_command lc = {};
-        if (m_data.GetU32(&offset, &lc.cmd, 2) == nullptr)
+        if (m_data_up->GetU32(&offset, &lc.cmd, 2) == nullptr)
           break;
         if (lc.cmd == LC_DYSYMTAB) {
           m_dysymtab.cmd = lc.cmd;
           m_dysymtab.cmdsize = lc.cmdsize;
-          if (m_data.GetU32(&offset, &m_dysymtab.ilocalsym,
-                            (sizeof(m_dysymtab) / sizeof(uint32_t)) - 2) ==
+          if (m_data_up->GetU32(&offset, &m_dysymtab.ilocalsym,
+                                (sizeof(m_dysymtab) / sizeof(uint32_t)) - 2) ==
               nullptr) {
             // Clear m_dysymtab if we were unable to read all items from the
             // load command
@@ -1326,14 +1327,14 @@ ObjectFileMachO::EncryptedFileRanges ObjectFileMachO::GetEncryptedFileRanges() {
   llvm::MachO::encryption_info_command encryption_cmd;
   for (uint32_t i = 0; i < m_header.ncmds; ++i) {
     const lldb::offset_t load_cmd_offset = offset;
-    if (m_data.GetU32(&offset, &encryption_cmd, 2) == nullptr)
+    if (m_data_up->GetU32(&offset, &encryption_cmd, 2) == nullptr)
       break;
 
     // LC_ENCRYPTION_INFO and LC_ENCRYPTION_INFO_64 have the same sizes for the
     // 3 fields we care about, so treat them the same.
     if (encryption_cmd.cmd == LC_ENCRYPTION_INFO ||
         encryption_cmd.cmd == LC_ENCRYPTION_INFO_64) {
-      if (m_data.GetU32(&offset, &encryption_cmd.cryptoff, 3)) {
+      if (m_data_up->GetU32(&offset, &encryption_cmd.cryptoff, 3)) {
         if (encryption_cmd.cryptid != 0) {
           EncryptedFileRanges::Entry entry;
           entry.SetRangeBase(encryption_cmd.cryptoff);
@@ -1562,7 +1563,7 @@ void ObjectFileMachO::ProcessSegmentCommand(
   llvm::MachO::segment_command_64 load_cmd;
   memcpy(&load_cmd, &load_cmd_, sizeof(load_cmd_));
 
-  if (!m_data.GetU8(&offset, (uint8_t *)load_cmd.segname, 16))
+  if (!m_data_up->GetU8(&offset, (uint8_t *)load_cmd.segname, 16))
     return;
 
   ModuleSP module_sp = GetModule();
@@ -1586,11 +1587,11 @@ void ObjectFileMachO::ProcessSegmentCommand(
       add_section = false;
     }
   }
-  load_cmd.vmaddr = m_data.GetAddress(&offset);
-  load_cmd.vmsize = m_data.GetAddress(&offset);
-  load_cmd.fileoff = m_data.GetAddress(&offset);
-  load_cmd.filesize = m_data.GetAddress(&offset);
-  if (!m_data.GetU32(&offset, &load_cmd.maxprot, 4))
+  load_cmd.vmaddr = m_data_up->GetAddress(&offset);
+  load_cmd.vmsize = m_data_up->GetAddress(&offset);
+  load_cmd.fileoff = m_data_up->GetAddress(&offset);
+  load_cmd.filesize = m_data_up->GetAddress(&offset);
+  if (!m_data_up->GetU32(&offset, &load_cmd.maxprot, 4))
     return;
 
   SanitizeSegmentCommand(load_cmd, cmd_idx);
@@ -1681,16 +1682,16 @@ void ObjectFileMachO::ProcessSegmentCommand(
   const uint32_t num_u32s = load_cmd.cmd == LC_SEGMENT ? 7 : 8;
   for (segment_sect_idx = 0; segment_sect_idx < load_cmd.nsects;
        ++segment_sect_idx) {
-    if (m_data.GetU8(&offset, (uint8_t *)sect64.sectname,
-                     sizeof(sect64.sectname)) == nullptr)
+    if (m_data_up->GetU8(&offset, (uint8_t *)sect64.sectname,
+                         sizeof(sect64.sectname)) == nullptr)
       break;
-    if (m_data.GetU8(&offset, (uint8_t *)sect64.segname,
-                     sizeof(sect64.segname)) == nullptr)
+    if (m_data_up->GetU8(&offset, (uint8_t *)sect64.segname,
+                         sizeof(sect64.segname)) == nullptr)
       break;
-    sect64.addr = m_data.GetAddress(&offset);
-    sect64.size = m_data.GetAddress(&offset);
+    sect64.addr = m_data_up->GetAddress(&offset);
+    sect64.size = m_data_up->GetAddress(&offset);
 
-    if (m_data.GetU32(&offset, &sect64.offset, num_u32s) == nullptr)
+    if (m_data_up->GetU32(&offset, &sect64.offset, num_u32s) == nullptr)
       break;
 
     if (IsSharedCacheBinary() && !IsInMemory()) {
@@ -1855,8 +1856,8 @@ void ObjectFileMachO::ProcessDysymtabCommand(
     const llvm::MachO::load_command &load_cmd, lldb::offset_t offset) {
   m_dysymtab.cmd = load_cmd.cmd;
   m_dysymtab.cmdsize = load_cmd.cmdsize;
-  m_data.GetU32(&offset, &m_dysymtab.ilocalsym,
-                (sizeof(m_dysymtab) / sizeof(uint32_t)) - 2);
+  m_data_up->GetU32(&offset, &m_dysymtab.ilocalsym,
+                    (sizeof(m_dysymtab) / sizeof(uint32_t)) - 2);
 }
 
 void ObjectFileMachO::CreateSections(SectionList &unified_section_list) {
@@ -1875,7 +1876,7 @@ void ObjectFileMachO::CreateSections(SectionList &unified_section_list) {
   llvm::MachO::load_command load_cmd;
   for (uint32_t i = 0; i < m_header.ncmds; ++i) {
     const lldb::offset_t load_cmd_offset = offset;
-    if (m_data.GetU32(&offset, &load_cmd, 2) == nullptr)
+    if (m_data_up->GetU32(&offset, &load_cmd, 2) == nullptr)
       break;
 
     if (load_cmd.cmd == LC_SEGMENT || load_cmd.cmd == LC_SEGMENT_64)
@@ -2240,13 +2241,13 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
     const lldb::offset_t cmd_offset = offset;
     // Read in the load command and load command size
     llvm::MachO::load_command lc;
-    if (m_data.GetU32(&offset, &lc, 2) == nullptr)
+    if (m_data_up->GetU32(&offset, &lc, 2) == nullptr)
       break;
     // Watch for the symbol table load command
     switch (lc.cmd) {
     case LC_SYMTAB: {
       llvm::MachO::symtab_command lc_obj;
-      if (m_data.GetU32(&offset, &lc_obj.symoff, 4)) {
+      if (m_data_up->GetU32(&offset, &lc_obj.symoff, 4)) {
         lc_obj.cmd = lc.cmd;
         lc_obj.cmdsize = lc.cmdsize;
         symtab_load_command = lc_obj;
@@ -2256,7 +2257,7 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
     case LC_DYLD_INFO:
     case LC_DYLD_INFO_ONLY: {
       llvm::MachO::dyld_info_command lc_obj;
-      if (m_data.GetU32(&offset, &lc_obj.rebase_off, 10)) {
+      if (m_data_up->GetU32(&offset, &lc_obj.rebase_off, 10)) {
         lc_obj.cmd = lc.cmd;
         lc_obj.cmdsize = lc.cmdsize;
         dyld_info = lc_obj;
@@ -2268,8 +2269,8 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
     case LC_REEXPORT_DYLIB:
     case LC_LOADFVMLIB:
     case LC_LOAD_UPWARD_DYLIB: {
-      uint32_t name_offset = cmd_offset + m_data.GetU32(&offset);
-      const char *path = m_data.PeekCStr(name_offset);
+      uint32_t name_offset = cmd_offset + m_data_up->GetU32(&offset);
+      const char *path = m_data_up->PeekCStr(name_offset);
       if (path) {
         FileSpec file_spec(path);
         // Strip the path if there is @rpath, @executable, etc so we just use
@@ -2289,19 +2290,19 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
       llvm::MachO::linkedit_data_command lc_obj;
       lc_obj.cmd = lc.cmd;
       lc_obj.cmdsize = lc.cmdsize;
-      if (m_data.GetU32(&offset, &lc_obj.dataoff, 2))
+      if (m_data_up->GetU32(&offset, &lc_obj.dataoff, 2))
         exports_trie_load_command = lc_obj;
     } break;
     c...
[truncated]

that the callers will need to pass a unique pointer through the Plugin interface intermediary and the unique pointers are going to be more trouble than it's worth I believe.

DavidSpickett · 2025-12-01T10:33:31Z

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for getting the Linux testsuite to run on SME-only systems David).

Good to know. Do you know what Linux kernel version you were using?

jasonmolenda · 2025-12-01T18:03:46Z

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for getting the Linux testsuite to run on SME-only systems David).

Good to know. Do you know what Linux kernel version you were using?

Ubuntu 25.10, running in a VM on my M4 mac. I did a simple cmake + ninja check-lldb with no issues.

bulbazord · 2025-12-01T20:54:34Z

lldb/include/lldb/Symbol/ObjectFile.h

+      m_data_sp; ///< The data for this object file so things
+                 ///< can be parsed lazily.  This shared pointer
+                 ///< will always have a DataExtractor object,
+                 ///< although it may only be default-constructed.


If this is meant to always be non-null, maybe you could use Jonas's new NonNullSharedPtr? You could catch invariant violations at creation time instead of dereference time.

Ah, perfect. Yeah I wasn't thrilled about how this wasn't enforced. Updated.

that the shared pointer always has an object.

JDevlieghere

Looks very mechanical. I used _nsp instead of _sp for the NonNullSharedPtr based on code-review feedback. We probably want to do the same here for consistency.

JDevlieghere · 2025-12-01T22:28:44Z

🚢 it

jasonmolenda requested a review from JDevlieghere as a code owner December 1, 2025 02:04

llvmbot added the lldb label Dec 1, 2025

Change the ObjectFile ivar over to a shared pointer. I realized

04ef7eb

that the callers will need to pass a unique pointer through the Plugin interface intermediary and the unique pointers are going to be more trouble than it's worth I believe.

jasonmolenda changed the title ~~[lldb][NFC] Change ObjectFile's DataExtractor to a unique ptr~~ [lldb][NFC] Change ObjectFile's DataExtractor to a shared ptr Dec 1, 2025

bulbazord reviewed Dec 1, 2025

View reviewed changes

Make m_data_sp a NonNullSharedPtr to enforce

a7e6265

that the shared pointer always has an object.

JDevlieghere approved these changes Dec 1, 2025

View reviewed changes

Rename ivar to m_data_nsp to reflect it is always non-null

1165609

jasonmolenda merged commit ae68377 into llvm:main Dec 1, 2025
10 checks passed

jasonmolenda deleted the objectfile-change-to-dataextractor-up branch December 1, 2025 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb][NFC] Change ObjectFile's DataExtractor to a shared ptr #170066

[lldb][NFC] Change ObjectFile's DataExtractor to a shared ptr #170066

jasonmolenda commented Dec 1, 2025 •

edited

Loading

Uh oh!

llvmbot commented Dec 1, 2025

Uh oh!

DavidSpickett commented Dec 1, 2025

Uh oh!

jasonmolenda commented Dec 1, 2025 •

edited

Loading

Uh oh!

bulbazord Dec 1, 2025

Uh oh!

jasonmolenda Dec 1, 2025

Uh oh!

JDevlieghere left a comment

Uh oh!

JDevlieghere commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[lldb][NFC] Change ObjectFile's DataExtractor to a shared ptr #170066

[lldb][NFC] Change ObjectFile's DataExtractor to a shared ptr #170066

Conversation

jasonmolenda commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 1, 2025

Uh oh!

DavidSpickett commented Dec 1, 2025

Uh oh!

jasonmolenda commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bulbazord Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

jasonmolenda Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

JDevlieghere left a comment

Choose a reason for hiding this comment

Uh oh!

JDevlieghere commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jasonmolenda commented Dec 1, 2025 •

edited

Loading

jasonmolenda commented Dec 1, 2025 •

edited

Loading