Skip to content

Conversation

@bulbazord
Copy link
Member

This change is motivated by a recent crash report I received related to re-exported symbols and ConstStrings. I was unable to reproduce the crash so this fix is speculative.

Symbols of type eSymbolTypeReExported have metadata associated with them to resolve them to the correct place. The first is the re-exported name, in case the name differs between the library re-exporting it and the library defining it. The second is the library that actually defines the symbol being re-exported.

LLDB currently stores this metadata in an unsafe fashion. To store the re-export name, it takes the address of a ConstString's underlying storage and puts it in the Symbol's AddressRange's base Address. The same technique is applied for the library path except it is placed in the AddressRange's size member.

The intended way of preventing any potential memory corruption is to call Symbol::ValueIsAddress before accessing or modifying its AddressRange information. If done correctly, this allows you to save 2 pointer's worth of space per Symbol object. However, I do not believe that the saved space is worth the risk of getting this wrong.

rdar://166452748

This change is motivated by a recent crash report I received related to
re-exported symbols and ConstStrings. I was unable to reproduce the
crash so this fix is speculative.

Symbols of type eSymbolTypeReExported have metadata associated with them
to resolve them to the correct place. The first is the re-exported name,
in case the name differs between the library re-exporting it and the
library defining it. The second is the library that actually defines the
symbol being re-exported.

LLDB currently stores this metadata in an unsafe fashion. To store the
re-export name, it takes the address of a ConstString's underlying
storage and puts it in the Symbol's AddressRange's base Address. The
same technique is applied for the library path except it is placed in
the AddressRange's size member.

The intended way of preventing any potential memory corruption is to
call `Symbol::ValueIsAddress` before accessing or modifying its
AddressRange information. If done correctly, this allows you to save 2
pointer's worth of space per Symbol object. However, I do not believe
that the saved space is worth the risk of getting this wrong.

rdar://166452748
@llvmbot
Copy link
Member

llvmbot commented Dec 16, 2025

@llvm/pr-subscribers-lldb

Author: Alex Langford (bulbazord)

Changes

This change is motivated by a recent crash report I received related to re-exported symbols and ConstStrings. I was unable to reproduce the crash so this fix is speculative.

Symbols of type eSymbolTypeReExported have metadata associated with them to resolve them to the correct place. The first is the re-exported name, in case the name differs between the library re-exporting it and the library defining it. The second is the library that actually defines the symbol being re-exported.

LLDB currently stores this metadata in an unsafe fashion. To store the re-export name, it takes the address of a ConstString's underlying storage and puts it in the Symbol's AddressRange's base Address. The same technique is applied for the library path except it is placed in the AddressRange's size member.

The intended way of preventing any potential memory corruption is to call Symbol::ValueIsAddress before accessing or modifying its AddressRange information. If done correctly, this allows you to save 2 pointer's worth of space per Symbol object. However, I do not believe that the saved space is worth the risk of getting this wrong.

rdar://166452748


Full diff: https://github.com/llvm/llvm-project/pull/172565.diff

2 Files Affected:

  • (modified) lldb/include/lldb/Symbol/Symbol.h (+12-3)
  • (modified) lldb/source/Symbol/Symbol.cpp (+15-44)
diff --git a/lldb/include/lldb/Symbol/Symbol.h b/lldb/include/lldb/Symbol/Symbol.h
index b994c34e46493..90a7303de78ff 100644
--- a/lldb/include/lldb/Symbol/Symbol.h
+++ b/lldb/include/lldb/Symbol/Symbol.h
@@ -13,6 +13,7 @@
 #include "lldb/Core/Mangled.h"
 #include "lldb/Core/Section.h"
 #include "lldb/Symbol/SymbolContextScope.h"
+#include "lldb/Utility/FileSpec.h"
 #include "lldb/Utility/Stream.h"
 #include "lldb/Utility/UserID.h"
 #include "lldb/lldb-enumerations.h"
@@ -154,9 +155,11 @@ class Symbol : public SymbolContextScope {
     return m_mangled;
   }
 
-  ConstString GetReExportedSymbolName() const;
+  ConstString GetReExportedSymbolName() const { return m_reexport_name; }
 
-  FileSpec GetReExportedSymbolSharedLibrary() const;
+  FileSpec GetReExportedSymbolSharedLibrary() const {
+    return m_reexport_library;
+  }
 
   void SetReExportedSymbolName(ConstString name);
 
@@ -312,7 +315,7 @@ class Symbol : public SymbolContextScope {
   // modules we've already seen to make sure we don't get caught in a cycle.
 
   Symbol *ResolveReExportedSymbolInModuleSpec(
-      Target &target, ConstString &reexport_name,
+      Target &target, ConstString reexport_name,
       lldb_private::ModuleSpec &module_spec,
       lldb_private::ModuleList &seen_modules) const;
 
@@ -347,6 +350,12 @@ class Symbol : public SymbolContextScope {
   AddressRange m_addr_range; // Contains the value, or the section offset
                              // address when the value is an address in a
                              // section, and the size (if any)
+  /// Stores the re-exported name if this symbol is of type
+  /// eSymbolTypeReExported.
+  ConstString m_reexport_name;
+  /// Stores the re-exported shared library if this symbol is of type
+  /// eSymbolTypeReExported.
+  FileSpec m_reexport_library;
   uint32_t m_flags = 0; // A copy of the flags from the original symbol table,
                         // the ObjectFile plug-in can interpret these
 };
diff --git a/lldb/source/Symbol/Symbol.cpp b/lldb/source/Symbol/Symbol.cpp
index 40497dbccc5c3..1c64d413686dd 100644
--- a/lldb/source/Symbol/Symbol.cpp
+++ b/lldb/source/Symbol/Symbol.cpp
@@ -46,7 +46,8 @@ Symbol::Symbol(uint32_t symID, llvm::StringRef name, SymbolType type,
       m_demangled_is_synthesized(false),
       m_contains_linker_annotations(contains_linker_annotations),
       m_is_weak(false), m_type(type), m_mangled(name),
-      m_addr_range(section_sp, offset, size), m_flags(flags) {}
+      m_addr_range(section_sp, offset, size), m_reexport_name(),
+      m_reexport_library(), m_flags(flags) {}
 
 Symbol::Symbol(uint32_t symID, const Mangled &mangled, SymbolType type,
                bool external, bool is_debug, bool is_trampoline,
@@ -61,7 +62,7 @@ Symbol::Symbol(uint32_t symID, const Mangled &mangled, SymbolType type,
       m_demangled_is_synthesized(false),
       m_contains_linker_annotations(contains_linker_annotations),
       m_is_weak(false), m_type(type), m_mangled(mangled), m_addr_range(range),
-      m_flags(flags) {}
+      m_reexport_name(), m_reexport_library(), m_flags(flags) {}
 
 Symbol::Symbol(const Symbol &rhs)
     : SymbolContextScope(rhs), m_uid(rhs.m_uid), m_type_data(rhs.m_type_data),
@@ -73,7 +74,8 @@ Symbol::Symbol(const Symbol &rhs)
       m_demangled_is_synthesized(rhs.m_demangled_is_synthesized),
       m_contains_linker_annotations(rhs.m_contains_linker_annotations),
       m_is_weak(rhs.m_is_weak), m_type(rhs.m_type), m_mangled(rhs.m_mangled),
-      m_addr_range(rhs.m_addr_range), m_flags(rhs.m_flags) {}
+      m_addr_range(rhs.m_addr_range), m_reexport_name(rhs.m_reexport_name),
+      m_reexport_library(rhs.m_reexport_library), m_flags(rhs.m_flags) {}
 
 const Symbol &Symbol::operator=(const Symbol &rhs) {
   if (this != &rhs) {
@@ -93,6 +95,8 @@ const Symbol &Symbol::operator=(const Symbol &rhs) {
     m_type = rhs.m_type;
     m_mangled = rhs.m_mangled;
     m_addr_range = rhs.m_addr_range;
+    m_reexport_name = rhs.m_reexport_name;
+    m_reexport_library = rhs.m_reexport_library;
     m_flags = rhs.m_flags;
   }
   return *this;
@@ -170,45 +174,14 @@ ConstString Symbol::GetDisplayName() const {
   return GetMangled().GetDisplayDemangledName();
 }
 
-ConstString Symbol::GetReExportedSymbolName() const {
-  if (m_type == eSymbolTypeReExported) {
-    // For eSymbolTypeReExported, the "const char *" from a ConstString is used
-    // as the offset in the address range base address. We can then make this
-    // back into a string that is the re-exported name.
-    intptr_t str_ptr = m_addr_range.GetBaseAddress().GetOffset();
-    if (str_ptr != 0)
-      return ConstString((const char *)str_ptr);
-    else
-      return GetName();
-  }
-  return ConstString();
-}
-
-FileSpec Symbol::GetReExportedSymbolSharedLibrary() const {
-  if (m_type == eSymbolTypeReExported) {
-    // For eSymbolTypeReExported, the "const char *" from a ConstString is used
-    // as the offset in the address range base address. We can then make this
-    // back into a string that is the re-exported name.
-    intptr_t str_ptr = m_addr_range.GetByteSize();
-    if (str_ptr != 0)
-      return FileSpec((const char *)str_ptr);
-  }
-  return FileSpec();
-}
-
 void Symbol::SetReExportedSymbolName(ConstString name) {
   SetType(eSymbolTypeReExported);
-  // For eSymbolTypeReExported, the "const char *" from a ConstString is used
-  // as the offset in the address range base address.
-  m_addr_range.GetBaseAddress().SetOffset((uintptr_t)name.GetCString());
+  m_reexport_name = name;
 }
 
 bool Symbol::SetReExportedSymbolSharedLibrary(const FileSpec &fspec) {
   if (m_type == eSymbolTypeReExported) {
-    // For eSymbolTypeReExported, the "const char *" from a ConstString is used
-    // as the offset in the address range base address.
-    m_addr_range.SetByteSize(
-        (uintptr_t)ConstString(fspec.GetPath().c_str()).GetCString());
+    m_reexport_library = fspec;
     return true;
   }
   return false;
@@ -292,12 +265,11 @@ void Symbol::Dump(Stream *s, Target *target, uint32_t index,
         "                                                         0x%8.8x %s",
         m_flags, name.AsCString(""));
 
-    ConstString reexport_name = GetReExportedSymbolName();
     intptr_t shlib = m_addr_range.GetByteSize();
     if (shlib)
-      s->Printf(" -> %s`%s\n", (const char *)shlib, reexport_name.GetCString());
+      s->Printf(" -> %s`%s\n", (const char *)shlib, m_reexport_name.GetCString());
     else
-      s->Printf(" -> %s\n", reexport_name.GetCString());
+      s->Printf(" -> %s\n", m_reexport_name.GetCString());
   } else {
     const char *format =
         m_size_is_sibling
@@ -431,7 +403,7 @@ void Symbol::DumpSymbolContext(Stream *s) {
 lldb::addr_t Symbol::GetByteSize() const { return m_addr_range.GetByteSize(); }
 
 Symbol *Symbol::ResolveReExportedSymbolInModuleSpec(
-    Target &target, ConstString &reexport_name, ModuleSpec &module_spec,
+    Target &target, ConstString reexport_name, ModuleSpec &module_spec,
     ModuleList &seen_modules) const {
   ModuleSP module_sp;
   if (module_spec.GetFileSpec()) {
@@ -481,13 +453,12 @@ Symbol *Symbol::ResolveReExportedSymbolInModuleSpec(
 }
 
 Symbol *Symbol::ResolveReExportedSymbol(Target &target) const {
-  ConstString reexport_name(GetReExportedSymbolName());
-  if (reexport_name) {
+  if (m_reexport_name) {
     ModuleSpec module_spec;
     ModuleList seen_modules;
-    module_spec.GetFileSpec() = GetReExportedSymbolSharedLibrary();
+    module_spec.GetFileSpec() = m_reexport_library;
     if (module_spec.GetFileSpec()) {
-      return ResolveReExportedSymbolInModuleSpec(target, reexport_name,
+      return ResolveReExportedSymbolInModuleSpec(target, m_reexport_name,
                                                  module_spec, seen_modules);
     }
   }

@github-actions
Copy link

github-actions bot commented Dec 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that using the offset and the byte size to encode a string pointer is a recipe for disaster. That said, out of all the objects in LLDB, Symbol is probably one of the most, if not the most, important to keep as small as possible. Given we have so many instances, I think it's worthwhile to optimize.

In its current state, this PR increases the Symbol size by more than two pointers. ConstString is 8 bytes, but FileSpec holds another two (16 bytes), plus some metadata (let's say 8 bytes). That's an additional 32 bytes per Symbol.

Can we store a unique pointer to a struct that holds these fields? That way only symbols of type eSymbolTypeReExported pay the additional cost (and a little extra for the indirection), while all the other symbols (which I'm assuming is the vast majority) only grow 8 bytes.

I also think it would be worthwhile to compare the memory usage of lldb before and after the patch with something representative like the clang binary.

@bulbazord
Copy link
Member Author

Made the change. I'm working on collecting memory footprints before/after my change.

@bulbazord
Copy link
Member Author

I built LLDB in release mode and loaded the just-built clang. LLDB's memory footprint grew approximately 13.57mb. I don't think clang has a lot of re-exported symbols so I don't think my measurement might accurately capture the impact there. I'll look around for a library that re-exports a lot of symbols for a better idea of the total impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants