Skip to content

Conversation

@jasonmolenda
Copy link
Collaborator

@jasonmolenda jasonmolenda commented Dec 1, 2025

ObjectFile has an m_data DataExtractor ivar which may be default constructed initially, or initialized with a DataBuffer passed in to its ctor. If the DataExtractor does not get a DataBuffer source passed in, the subclass will initialize it with access to the object file's data. When a DataBuffer is passed in to the base class ctor, the DataExtractor only has its buffer initialized; ObjectFile doesn't yet know the address size and endianness to fully initialize the DataExtractor.

This patch changes ObjectFile to instead have a DataExtractorSP ivar which is always initialized with at least a default-constructed DataExtractor object in the base class ctor. The next patch I will be writing is to change the ObjectFile ctor to take an optional DataExtractorSP, so the caller can pass a DataExtractor subclass -- the VirtualizeDataExtractor being added via
#168802
instead of a DataBuffer which is trivially saved into the DataExtractor.

The change is otherwise mechanical; all m_data. changed to m_data_sp-> and all the places where m_data was passed in for a by-ref call were changed to *m_data_sp.get(). The shared pointer is always initialized to contain an object.

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for getting the Linux testsuite to run on SME-only systems David). All of the ObjectFile subclasses I modifed compile cleanly, but I haven't tested them beyond any unit tests they may have (prob breakpad).

rdar://148939795

ObjectFile has an m_data DataExtractor ivar which may be default
constructed initially, or initialized with a DataBuffer passed in
to a ctor.  Subclasses will provide the DataExtrator with a Buffer
source if not.  When a DataBuffer is passed in to the base class
ctor, the DataExtractor only has its buffer initalized; we don't
yet know the address size and endianness to fully initialize the
DataExtractor.

This patch changes ObjectFile to instead have a DataExtractorSP
ivar which is always initialized with at least a default-constructed
DataExtractor object in the base class ctor.  The next patch
I will be writing is to change the ObjectFile ctor which accepts
a DataBuffer to instead accept a DataExtractorSP, so the caller
can intialize it with a DataExtractor subclass -- the
VirtualizeDataExtractor being added in
llvm#168802

The change is otherwise mechanical; all `m_data.` changed to
`m_data_up->` and all the places where `m_data` was passed in for
a by-ref call were changed to `*m_data_up.get()`.  The unique
pointer is always initialized to contain an object.

I can't remember off hand if I'm making a mistake using a unique_ptr
here, given that the ctor may take a DataExtractor as an argument.
The caller will have to do std::move(extractor_up) when it calls
the ObjectFile ctor for correct behavior.  Even though a unique_ptr
makes sense internal to ObjectFile, given that it can be passed as
an argument, should I use the more straightforward shared_ptr?  An
ObjectFile only has one of them, so the extra storage for the
refcount isn't important.

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks
for getting the Linux testsuite to run on SME-only systems David).
All of the ObjectFile subclasses I modifed compile cleanly, but I
haven't tested them beyond any unit tests they may have (prob breakpad).

rdar://148939795
@llvmbot
Copy link
Member

llvmbot commented Dec 1, 2025

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

ObjectFile has an m_data DataExtractor ivar which may be default constructed initially, or initialized with a DataBuffer passed in to a ctor. Subclasses will provide the DataExtrator with a Buffer source if not. When a DataBuffer is passed in to the base class ctor, the DataExtractor only has its buffer initalized; we don't yet know the address size and endianness to fully initialize the DataExtractor.

This patch changes ObjectFile to instead have a DataExtractorSP ivar which is always initialized with at least a default-constructed DataExtractor object in the base class ctor. The next patch I will be writing is to change the ObjectFile ctor which accepts a DataBuffer to instead accept a DataExtractorSP, so the caller can intialize it with a DataExtractor subclass -- the VirtualizeDataExtractor being added in
#168802

The change is otherwise mechanical; all m_data. changed to m_data_up-> and all the places where m_data was passed in for a by-ref call were changed to *m_data_up.get(). The unique pointer is always initialized to contain an object.

I can't remember off hand if I'm making a mistake using a unique_ptr here, given that the ctor may take a DataExtractor as an argument. The caller will have to do std::move(extractor_up) when it calls the ObjectFile ctor for correct behavior. Even though a unique_ptr makes sense internal to ObjectFile, given that it can be passed as an argument, should I use the more straightforward shared_ptr? An ObjectFile only has one of them, so the extra storage for the refcount isn't important.

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for getting the Linux testsuite to run on SME-only systems David). All of the ObjectFile subclasses I modifed compile cleanly, but I haven't tested them beyond any unit tests they may have (prob breakpad).

rdar://148939795


Patch is 54.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170066.diff

11 Files Affected:

  • (modified) lldb/include/lldb/Symbol/ObjectFile.h (+6-3)
  • (modified) lldb/include/lldb/lldb-forward.h (+1)
  • (modified) lldb/source/Expression/ObjectFileJIT.cpp (+6-4)
  • (modified) lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp (+4-4)
  • (modified) lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp (+2-2)
  • (modified) lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp (+14-12)
  • (modified) lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp (+126-117)
  • (modified) lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp (+45-41)
  • (modified) lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp (+1-1)
  • (modified) lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp (+6-6)
  • (modified) lldb/source/Symbol/ObjectFile.cpp (+12-10)
diff --git a/lldb/include/lldb/Symbol/ObjectFile.h b/lldb/include/lldb/Symbol/ObjectFile.h
index 1de08a8576507..23653fd43d863 100644
--- a/lldb/include/lldb/Symbol/ObjectFile.h
+++ b/lldb/include/lldb/Symbol/ObjectFile.h
@@ -418,7 +418,7 @@ class ObjectFile : public std::enable_shared_from_this<ObjectFile>,
   /// Attempts to parse the object header.
   ///
   /// This function is used as a test to see if a given plug-in instance can
-  /// parse the header data already contained in ObjectFile::m_data. If an
+  /// parse the header data already contained in ObjectFile::m_data_up. If an
   /// object file parser does not recognize that magic bytes in a header,
   /// false should be returned and the next plug-in can attempt to parse an
   /// object file.
@@ -786,8 +786,11 @@ class ObjectFile : public std::enable_shared_from_this<ObjectFile>,
   lldb::addr_t m_length; ///< The length of this object file if it is known (can
                          ///be zero if length is unknown or can't be
                          ///determined).
-  DataExtractor
-      m_data; ///< The data for this object file so things can be parsed lazily.
+  lldb::DataExtractorUP
+      m_data_up; ///< The data for this object file so things
+                 ///< can be parsed lazily.  This unique pointer
+                 ///< will always have a DataExtractor object,
+                 ///< although it may only be default-constructed.
   lldb::ProcessWP m_process_wp;
   /// Set if the object file only exists in memory.
   const lldb::addr_t m_memory_addr;
diff --git a/lldb/include/lldb/lldb-forward.h b/lldb/include/lldb/lldb-forward.h
index c8e2e97953aa4..09aa036d27875 100644
--- a/lldb/include/lldb/lldb-forward.h
+++ b/lldb/include/lldb/lldb-forward.h
@@ -342,6 +342,7 @@ typedef std::shared_ptr<lldb_private::CompileUnit> CompUnitSP;
 typedef std::shared_ptr<lldb_private::DataBuffer> DataBufferSP;
 typedef std::shared_ptr<lldb_private::WritableDataBuffer> WritableDataBufferSP;
 typedef std::shared_ptr<lldb_private::DataExtractor> DataExtractorSP;
+typedef std::unique_ptr<lldb_private::DataExtractor> DataExtractorUP;
 typedef std::shared_ptr<lldb_private::Debugger> DebuggerSP;
 typedef std::weak_ptr<lldb_private::Debugger> DebuggerWP;
 typedef std::shared_ptr<lldb_private::Disassembler> DisassemblerSP;
diff --git a/lldb/source/Expression/ObjectFileJIT.cpp b/lldb/source/Expression/ObjectFileJIT.cpp
index e4a613551d22e..f64b0fb463e8a 100644
--- a/lldb/source/Expression/ObjectFileJIT.cpp
+++ b/lldb/source/Expression/ObjectFileJIT.cpp
@@ -73,8 +73,8 @@ ObjectFileJIT::ObjectFileJIT(const lldb::ModuleSP &module_sp,
     : ObjectFile(module_sp, nullptr, 0, 0, DataBufferSP(), 0), m_delegate_wp() {
   if (delegate_sp) {
     m_delegate_wp = delegate_sp;
-    m_data.SetByteOrder(delegate_sp->GetByteOrder());
-    m_data.SetAddressByteSize(delegate_sp->GetAddressByteSize());
+    m_data_up->SetByteOrder(delegate_sp->GetByteOrder());
+    m_data_up->SetAddressByteSize(delegate_sp->GetAddressByteSize());
   }
 }
 
@@ -85,12 +85,14 @@ bool ObjectFileJIT::ParseHeader() {
   return false;
 }
 
-ByteOrder ObjectFileJIT::GetByteOrder() const { return m_data.GetByteOrder(); }
+ByteOrder ObjectFileJIT::GetByteOrder() const {
+  return m_data_up->GetByteOrder();
+}
 
 bool ObjectFileJIT::IsExecutable() const { return false; }
 
 uint32_t ObjectFileJIT::GetAddressByteSize() const {
-  return m_data.GetAddressByteSize();
+  return m_data_up->GetAddressByteSize();
 }
 
 void ObjectFileJIT::ParseSymtab(Symtab &symtab) {
diff --git a/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp b/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp
index 33673f139b49a..15e34f99454a3 100644
--- a/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp
+++ b/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp
@@ -130,13 +130,13 @@ void ObjectFileBreakpad::CreateSections(SectionList &unified_section_list) {
 
   std::optional<Record::Kind> current_section;
   offset_t section_start;
-  llvm::StringRef text = toStringRef(m_data.GetData());
+  llvm::StringRef text = toStringRef(m_data_up->GetData());
   uint32_t next_section_id = 1;
   auto maybe_add_section = [&](const uint8_t *end_ptr) {
     if (!current_section)
       return; // We have been called before parsing the first line.
 
-    offset_t end_offset = end_ptr - m_data.GetDataStart();
+    offset_t end_offset = end_ptr - m_data_up->GetDataStart();
     auto section_sp = std::make_shared<Section>(
         GetModule(), this, next_section_id++,
         ConstString(toString(*current_section)), eSectionTypeOther,
@@ -162,8 +162,8 @@ void ObjectFileBreakpad::CreateSections(SectionList &unified_section_list) {
     maybe_add_section(line.bytes_begin());
     // And start a new one.
     current_section = next_section;
-    section_start = line.bytes_begin() - m_data.GetDataStart();
+    section_start = line.bytes_begin() - m_data_up->GetDataStart();
   }
   // Finally, add the last section.
-  maybe_add_section(m_data.GetDataEnd());
+  maybe_add_section(m_data_up->GetDataEnd());
 }
diff --git a/lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp b/lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp
index 1121f696637b6..9def78644150a 100644
--- a/lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp
+++ b/lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp
@@ -300,8 +300,8 @@ bool ObjectFileCOFF::ParseHeader() {
 
   std::lock_guard<std::recursive_mutex> guard(module->GetMutex());
 
-  m_data.SetByteOrder(eByteOrderLittle);
-  m_data.SetAddressByteSize(GetAddressByteSize());
+  m_data_up->SetByteOrder(eByteOrderLittle);
+  m_data_up->SetAddressByteSize(GetAddressByteSize());
 
   return true;
 }
diff --git a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
index 3968715a6d215..323881a202bb0 100644
--- a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
+++ b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
@@ -804,7 +804,7 @@ ByteOrder ObjectFileELF::GetByteOrder() const {
 }
 
 uint32_t ObjectFileELF::GetAddressByteSize() const {
-  return m_data.GetAddressByteSize();
+  return m_data_up->GetAddressByteSize();
 }
 
 AddressClass ObjectFileELF::GetAddressClass(addr_t file_addr) {
@@ -845,7 +845,7 @@ size_t ObjectFileELF::SectionIndex(const SectionHeaderCollConstIter &I) const {
 
 bool ObjectFileELF::ParseHeader() {
   lldb::offset_t offset = 0;
-  return m_header.Parse(m_data, &offset);
+  return m_header.Parse(*m_data_up.get(), &offset);
 }
 
 UUID ObjectFileELF::GetUUID() {
@@ -881,7 +881,7 @@ UUID ObjectFileELF::GetUUID() {
         return UUID();
 
       core_notes_crc =
-          CalculateELFNotesSegmentsCRC32(m_program_headers, m_data);
+          CalculateELFNotesSegmentsCRC32(m_program_headers, *m_data_up.get());
 
       if (core_notes_crc) {
         // Use 8 bytes - first 4 bytes for *magic* prefix, mainly to make it
@@ -892,7 +892,7 @@ UUID ObjectFileELF::GetUUID() {
       }
     } else {
       if (!m_gnu_debuglink_crc)
-        m_gnu_debuglink_crc = calc_crc32(0, m_data);
+        m_gnu_debuglink_crc = calc_crc32(0, *m_data_up.get());
       if (m_gnu_debuglink_crc) {
         // Use 4 bytes of crc from the .gnu_debuglink section.
         u32le data(m_gnu_debuglink_crc);
@@ -1078,7 +1078,8 @@ size_t ObjectFileELF::GetProgramHeaderInfo(ProgramHeaderColl &program_headers,
 
 // ParseProgramHeaders
 bool ObjectFileELF::ParseProgramHeaders() {
-  return GetProgramHeaderInfo(m_program_headers, m_data, m_header) != 0;
+  return GetProgramHeaderInfo(m_program_headers, *m_data_up.get(), m_header) !=
+         0;
 }
 
 lldb_private::Status
@@ -1668,8 +1669,8 @@ ObjectFileELF::StripLinkerSymbolAnnotations(llvm::StringRef symbol_name) const {
 
 // ParseSectionHeaders
 size_t ObjectFileELF::ParseSectionHeaders() {
-  return GetSectionHeaderInfo(m_section_headers, m_data, m_header, m_uuid,
-                              m_gnu_debuglink_file, m_gnu_debuglink_crc,
+  return GetSectionHeaderInfo(m_section_headers, *m_data_up.get(), m_header,
+                              m_uuid, m_gnu_debuglink_file, m_gnu_debuglink_crc,
                               m_arch_spec);
 }
 
@@ -3678,7 +3679,8 @@ ArchSpec ObjectFileELF::GetArchitecture() {
       if (H.p_type != PT_NOTE || H.p_offset == 0 || H.p_filesz == 0)
         continue;
       DataExtractor data;
-      if (data.SetData(m_data, H.p_offset, H.p_filesz) == H.p_filesz) {
+      if (data.SetData(*m_data_up.get(), H.p_offset, H.p_filesz) ==
+          H.p_filesz) {
         UUID uuid;
         RefineModuleDetailsFromNote(data, m_arch_spec, uuid);
       }
@@ -3833,10 +3835,10 @@ llvm::ArrayRef<ELFProgramHeader> ObjectFileELF::ProgramHeaders() {
 }
 
 DataExtractor ObjectFileELF::GetSegmentData(const ELFProgramHeader &H) {
-  // Try and read the program header from our cached m_data which can come from
-  // the file on disk being mmap'ed or from the initial part of the ELF file we
-  // read from memory and cached.
-  DataExtractor data = DataExtractor(m_data, H.p_offset, H.p_filesz);
+  // Try and read the program header from our cached m_data_up which can come
+  // from the file on disk being mmap'ed or from the initial part of the ELF
+  // file we read from memory and cached.
+  DataExtractor data = DataExtractor(*m_data_up.get(), H.p_offset, H.p_filesz);
   if (data.GetByteSize() == H.p_filesz)
     return data;
   if (IsInMemory()) {
diff --git a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
index 2218c23db5a95..4b2e82643efb0 100644
--- a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
+++ b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
@@ -1012,35 +1012,35 @@ bool ObjectFileMachO::ParseHeader() {
   std::lock_guard<std::recursive_mutex> guard(module_sp->GetMutex());
   bool can_parse = false;
   lldb::offset_t offset = 0;
-  m_data.SetByteOrder(endian::InlHostByteOrder());
+  m_data_up->SetByteOrder(endian::InlHostByteOrder());
   // Leave magic in the original byte order
-  m_header.magic = m_data.GetU32(&offset);
+  m_header.magic = m_data_up->GetU32(&offset);
   switch (m_header.magic) {
   case MH_MAGIC:
-    m_data.SetByteOrder(endian::InlHostByteOrder());
-    m_data.SetAddressByteSize(4);
+    m_data_up->SetByteOrder(endian::InlHostByteOrder());
+    m_data_up->SetAddressByteSize(4);
     can_parse = true;
     break;
 
   case MH_MAGIC_64:
-    m_data.SetByteOrder(endian::InlHostByteOrder());
-    m_data.SetAddressByteSize(8);
+    m_data_up->SetByteOrder(endian::InlHostByteOrder());
+    m_data_up->SetAddressByteSize(8);
     can_parse = true;
     break;
 
   case MH_CIGAM:
-    m_data.SetByteOrder(endian::InlHostByteOrder() == eByteOrderBig
-                            ? eByteOrderLittle
-                            : eByteOrderBig);
-    m_data.SetAddressByteSize(4);
+    m_data_up->SetByteOrder(endian::InlHostByteOrder() == eByteOrderBig
+                                ? eByteOrderLittle
+                                : eByteOrderBig);
+    m_data_up->SetAddressByteSize(4);
     can_parse = true;
     break;
 
   case MH_CIGAM_64:
-    m_data.SetByteOrder(endian::InlHostByteOrder() == eByteOrderBig
-                            ? eByteOrderLittle
-                            : eByteOrderBig);
-    m_data.SetAddressByteSize(8);
+    m_data_up->SetByteOrder(endian::InlHostByteOrder() == eByteOrderBig
+                                ? eByteOrderLittle
+                                : eByteOrderBig);
+    m_data_up->SetAddressByteSize(8);
     can_parse = true;
     break;
 
@@ -1049,12 +1049,13 @@ bool ObjectFileMachO::ParseHeader() {
   }
 
   if (can_parse) {
-    m_data.GetU32(&offset, &m_header.cputype, 6);
+    m_data_up->GetU32(&offset, &m_header.cputype, 6);
 
     ModuleSpecList all_specs;
     ModuleSpec base_spec;
-    GetAllArchSpecs(m_header, m_data, MachHeaderSizeFromMagic(m_header.magic),
-                    base_spec, all_specs);
+    GetAllArchSpecs(m_header, *m_data_up.get(),
+                    MachHeaderSizeFromMagic(m_header.magic), base_spec,
+                    all_specs);
 
     for (unsigned i = 0, e = all_specs.GetSize(); i != e; ++i) {
       ArchSpec mach_arch =
@@ -1068,7 +1069,7 @@ bool ObjectFileMachO::ParseHeader() {
       if (SetModulesArchitecture(mach_arch)) {
         const size_t header_and_lc_size =
             m_header.sizeofcmds + MachHeaderSizeFromMagic(m_header.magic);
-        if (m_data.GetByteSize() < header_and_lc_size) {
+        if (m_data_up->GetByteSize() < header_and_lc_size) {
           DataBufferSP data_sp;
           ProcessSP process_sp(m_process_wp.lock());
           if (process_sp) {
@@ -1080,7 +1081,7 @@ bool ObjectFileMachO::ParseHeader() {
               continue;
           }
           if (data_sp)
-            m_data.SetData(data_sp);
+            m_data_up->SetData(data_sp);
         }
       }
       return true;
@@ -1094,7 +1095,7 @@ bool ObjectFileMachO::ParseHeader() {
 }
 
 ByteOrder ObjectFileMachO::GetByteOrder() const {
-  return m_data.GetByteOrder();
+  return m_data_up->GetByteOrder();
 }
 
 bool ObjectFileMachO::IsExecutable() const {
@@ -1114,7 +1115,7 @@ bool ObjectFileMachO::IsKext() const {
 }
 
 uint32_t ObjectFileMachO::GetAddressByteSize() const {
-  return m_data.GetAddressByteSize();
+  return m_data_up->GetAddressByteSize();
 }
 
 AddressClass ObjectFileMachO::GetAddressClass(lldb::addr_t file_addr) {
@@ -1297,13 +1298,13 @@ bool ObjectFileMachO::IsStripped() {
         const lldb::offset_t load_cmd_offset = offset;
 
         llvm::MachO::load_command lc = {};
-        if (m_data.GetU32(&offset, &lc.cmd, 2) == nullptr)
+        if (m_data_up->GetU32(&offset, &lc.cmd, 2) == nullptr)
           break;
         if (lc.cmd == LC_DYSYMTAB) {
           m_dysymtab.cmd = lc.cmd;
           m_dysymtab.cmdsize = lc.cmdsize;
-          if (m_data.GetU32(&offset, &m_dysymtab.ilocalsym,
-                            (sizeof(m_dysymtab) / sizeof(uint32_t)) - 2) ==
+          if (m_data_up->GetU32(&offset, &m_dysymtab.ilocalsym,
+                                (sizeof(m_dysymtab) / sizeof(uint32_t)) - 2) ==
               nullptr) {
             // Clear m_dysymtab if we were unable to read all items from the
             // load command
@@ -1326,14 +1327,14 @@ ObjectFileMachO::EncryptedFileRanges ObjectFileMachO::GetEncryptedFileRanges() {
   llvm::MachO::encryption_info_command encryption_cmd;
   for (uint32_t i = 0; i < m_header.ncmds; ++i) {
     const lldb::offset_t load_cmd_offset = offset;
-    if (m_data.GetU32(&offset, &encryption_cmd, 2) == nullptr)
+    if (m_data_up->GetU32(&offset, &encryption_cmd, 2) == nullptr)
       break;
 
     // LC_ENCRYPTION_INFO and LC_ENCRYPTION_INFO_64 have the same sizes for the
     // 3 fields we care about, so treat them the same.
     if (encryption_cmd.cmd == LC_ENCRYPTION_INFO ||
         encryption_cmd.cmd == LC_ENCRYPTION_INFO_64) {
-      if (m_data.GetU32(&offset, &encryption_cmd.cryptoff, 3)) {
+      if (m_data_up->GetU32(&offset, &encryption_cmd.cryptoff, 3)) {
         if (encryption_cmd.cryptid != 0) {
           EncryptedFileRanges::Entry entry;
           entry.SetRangeBase(encryption_cmd.cryptoff);
@@ -1562,7 +1563,7 @@ void ObjectFileMachO::ProcessSegmentCommand(
   llvm::MachO::segment_command_64 load_cmd;
   memcpy(&load_cmd, &load_cmd_, sizeof(load_cmd_));
 
-  if (!m_data.GetU8(&offset, (uint8_t *)load_cmd.segname, 16))
+  if (!m_data_up->GetU8(&offset, (uint8_t *)load_cmd.segname, 16))
     return;
 
   ModuleSP module_sp = GetModule();
@@ -1586,11 +1587,11 @@ void ObjectFileMachO::ProcessSegmentCommand(
       add_section = false;
     }
   }
-  load_cmd.vmaddr = m_data.GetAddress(&offset);
-  load_cmd.vmsize = m_data.GetAddress(&offset);
-  load_cmd.fileoff = m_data.GetAddress(&offset);
-  load_cmd.filesize = m_data.GetAddress(&offset);
-  if (!m_data.GetU32(&offset, &load_cmd.maxprot, 4))
+  load_cmd.vmaddr = m_data_up->GetAddress(&offset);
+  load_cmd.vmsize = m_data_up->GetAddress(&offset);
+  load_cmd.fileoff = m_data_up->GetAddress(&offset);
+  load_cmd.filesize = m_data_up->GetAddress(&offset);
+  if (!m_data_up->GetU32(&offset, &load_cmd.maxprot, 4))
     return;
 
   SanitizeSegmentCommand(load_cmd, cmd_idx);
@@ -1681,16 +1682,16 @@ void ObjectFileMachO::ProcessSegmentCommand(
   const uint32_t num_u32s = load_cmd.cmd == LC_SEGMENT ? 7 : 8;
   for (segment_sect_idx = 0; segment_sect_idx < load_cmd.nsects;
        ++segment_sect_idx) {
-    if (m_data.GetU8(&offset, (uint8_t *)sect64.sectname,
-                     sizeof(sect64.sectname)) == nullptr)
+    if (m_data_up->GetU8(&offset, (uint8_t *)sect64.sectname,
+                         sizeof(sect64.sectname)) == nullptr)
       break;
-    if (m_data.GetU8(&offset, (uint8_t *)sect64.segname,
-                     sizeof(sect64.segname)) == nullptr)
+    if (m_data_up->GetU8(&offset, (uint8_t *)sect64.segname,
+                         sizeof(sect64.segname)) == nullptr)
       break;
-    sect64.addr = m_data.GetAddress(&offset);
-    sect64.size = m_data.GetAddress(&offset);
+    sect64.addr = m_data_up->GetAddress(&offset);
+    sect64.size = m_data_up->GetAddress(&offset);
 
-    if (m_data.GetU32(&offset, &sect64.offset, num_u32s) == nullptr)
+    if (m_data_up->GetU32(&offset, &sect64.offset, num_u32s) == nullptr)
       break;
 
     if (IsSharedCacheBinary() && !IsInMemory()) {
@@ -1855,8 +1856,8 @@ void ObjectFileMachO::ProcessDysymtabCommand(
     const llvm::MachO::load_command &load_cmd, lldb::offset_t offset) {
   m_dysymtab.cmd = load_cmd.cmd;
   m_dysymtab.cmdsize = load_cmd.cmdsize;
-  m_data.GetU32(&offset, &m_dysymtab.ilocalsym,
-                (sizeof(m_dysymtab) / sizeof(uint32_t)) - 2);
+  m_data_up->GetU32(&offset, &m_dysymtab.ilocalsym,
+                    (sizeof(m_dysymtab) / sizeof(uint32_t)) - 2);
 }
 
 void ObjectFileMachO::CreateSections(SectionList &unified_section_list) {
@@ -1875,7 +1876,7 @@ void ObjectFileMachO::CreateSections(SectionList &unified_section_list) {
   llvm::MachO::load_command load_cmd;
   for (uint32_t i = 0; i < m_header.ncmds; ++i) {
     const lldb::offset_t load_cmd_offset = offset;
-    if (m_data.GetU32(&offset, &load_cmd, 2) == nullptr)
+    if (m_data_up->GetU32(&offset, &load_cmd, 2) == nullptr)
       break;
 
     if (load_cmd.cmd == LC_SEGMENT || load_cmd.cmd == LC_SEGMENT_64)
@@ -2240,13 +2241,13 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
     const lldb::offset_t cmd_offset = offset;
     // Read in the load command and load command size
     llvm::MachO::load_command lc;
-    if (m_data.GetU32(&offset, &lc, 2) == nullptr)
+    if (m_data_up->GetU32(&offset, &lc, 2) == nullptr)
       break;
     // Watch for the symbol table load command
     switch (lc.cmd) {
     case LC_SYMTAB: {
       llvm::MachO::symtab_command lc_obj;
-      if (m_data.GetU32(&offset, &lc_obj.symoff, 4)) {
+      if (m_data_up->GetU32(&offset, &lc_obj.symoff, 4)) {
         lc_obj.cmd = lc.cmd;
         lc_obj.cmdsize = lc.cmdsize;
         symtab_load_command = lc_obj;
@@ -2256,7 +2257,7 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
     case LC_DYLD_INFO:
     case LC_DYLD_INFO_ONLY: {
       llvm::MachO::dyld_info_command lc_obj;
-      if (m_data.GetU32(&offset, &lc_obj.rebase_off, 10)) {
+      if (m_data_up->GetU32(&offset, &lc_obj.rebase_off, 10)) {
         lc_obj.cmd = lc.cmd;
         lc_obj.cmdsize = lc.cmdsize;
         dyld_info = lc_obj;
@@ -2268,8 +2269,8 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
     case LC_REEXPORT_DYLIB:
     case LC_LOADFVMLIB:
     case LC_LOAD_UPWARD_DYLIB: {
-      uint32_t name_offset = cmd_offset + m_data.GetU32(&offset);
-      const char *path = m_data.PeekCStr(name_offset);
+      uint32_t name_offset = cmd_offset + m_data_up->GetU32(&offset);
+      const char *path = m_data_up->PeekCStr(name_offset);
       if (path) {
         FileSpec file_spec(path);
         // Strip the path if there is @rpath, @executable, etc so we just use
@@ -2289,19 +2290,19 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
       llvm::MachO::linkedit_data_command lc_obj;
       lc_obj.cmd = lc.cmd;
       lc_obj.cmdsize = lc.cmdsize;
-      if (m_data.GetU32(&offset, &lc_obj.dataoff, 2))
+      if (m_data_up->GetU32(&offset, &lc_obj.dataoff, 2))
         exports_trie_load_command = lc_obj;
     } break;
     c...
[truncated]

that the callers will need to pass a unique pointer through the
Plugin interface intermediary and the unique pointers are going to
be more trouble than it's worth I believe.
@jasonmolenda jasonmolenda changed the title [lldb][NFC] Change ObjectFile's DataExtractor to a unique ptr [lldb][NFC] Change ObjectFile's DataExtractor to a shared ptr Dec 1, 2025
@DavidSpickett
Copy link
Collaborator

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for getting the Linux testsuite to run on SME-only systems David).

Good to know. Do you know what Linux kernel version you were using?

@jasonmolenda
Copy link
Collaborator Author

jasonmolenda commented Dec 1, 2025

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for getting the Linux testsuite to run on SME-only systems David).

Good to know. Do you know what Linux kernel version you were using?

Ubuntu 25.10, running in a VM on my M4 mac. I did a simple cmake + ninja check-lldb with no issues.

m_data_sp; ///< The data for this object file so things
///< can be parsed lazily. This shared pointer
///< will always have a DataExtractor object,
///< although it may only be default-constructed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is meant to always be non-null, maybe you could use Jonas's new NonNullSharedPtr? You could catch invariant violations at creation time instead of dereference time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, perfect. Yeah I wasn't thrilled about how this wasn't enforced. Updated.

that the shared pointer always has an object.
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very mechanical. I used _nsp instead of _sp for the NonNullSharedPtr based on code-review feedback. We probably want to do the same here for consistency.

@JDevlieghere
Copy link
Member

🚢 it

@jasonmolenda jasonmolenda merged commit ae68377 into llvm:main Dec 1, 2025
10 checks passed
@jasonmolenda jasonmolenda deleted the objectfile-change-to-dataextractor-up branch December 1, 2025 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants