Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DWARFUnitVector threadsafe. #71487

Merged
merged 2 commits into from
Nov 8, 2023
Merged

Conversation

clayborg
Copy link
Collaborator

@clayborg clayborg commented Nov 7, 2023

The DWARFUnitVector class lives inside of the DWARFContextState. Prior to this fix a non const reference was being handed out to clients. When fetching the DWO units, there used to be a "bool Lazy" parameter that could be passed that would allow the DWARFUnitVector to parse individual units on the fly. There were two major issues with this approach:

  • not thread safe and causes crashes
  • the accessor would check if DWARFUnitVector was empty and if not empty it would return a partially filled in DWARFUnitVector if it was constructed with "Lazy = true"

This patch fixes the issues by always fully parsing the DWARFUnitVector when it is requested and only hands out a "const DWARFUnitVector &". This allows the thread safety mechanism built into the DWARFContext class to work corrrectly, and avoids the issue where if someone construct DWARFUnitVector with "Lazy = true", and then calls an API that partially fills in the DWARFUnitVector with individual entries, and then someone accesses the DWARFUnitVector, they would get a partial and incomplete listing of the DWARF units for the DWOs.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 7, 2023

@llvm/pr-subscribers-debuginfo

Author: Greg Clayton (clayborg)

Changes

The DWARFUnitVector class lives inside of the DWARFContextState. Prior to this fix a non const reference was being handed out to clients. When fetching the DWO units, there used to be a "bool Lazy" parameter that could be passed that would allow the DWARFUnitVector to parse individual units on the fly. There were two major issues with this approach:

  • not thread safe and causes crashes
  • the accessor would check if DWARFUnitVector was empty and if not empty it would return a partially filled in DWARFUnitVector if it was constructed with "Lazy = true"

This patch fixes the issues by always fully parsing the DWARFUnitVector when it is requested and only hands out a "const DWARFUnitVector &". This allows the thread safety mechanism built into the DWARFContext class to work corrrectly, and avoids the issue where if someone construct DWARFUnitVector with "Lazy = true", and then calls an API that partially fills in the DWARFUnitVector with individual entries, and then someone accesses the DWARFUnitVector, they would get a partial and incomplete listing of the DWARF units for the DWOs.


Full diff: https://github.com/llvm/llvm-project/pull/71487.diff

5 Files Affected:

  • (modified) llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h (+10-13)
  • (modified) llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h (+5-13)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFContext.cpp (+12-12)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp (+44-71)
  • (modified) llvm/tools/llvm-dwarfutil/DebugInfoLinker.cpp (+1-1)
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
index fa98cbcfc4d997f..c775437c0c7c883 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
@@ -69,8 +69,8 @@ class DWARFContext : public DIContext {
   public:
     DWARFContextState(DWARFContext &DC) : D(DC) {}
     virtual ~DWARFContextState() = default;
-    virtual DWARFUnitVector &getNormalUnits() = 0;
-    virtual DWARFUnitVector &getDWOUnits(bool Lazy = false) = 0;
+    virtual const DWARFUnitVector &getNormalUnits() = 0;
+    virtual const DWARFUnitVector &getDWOUnits() = 0;
     virtual const DWARFDebugAbbrev *getDebugAbbrevDWO() = 0;
     virtual const DWARFUnitIndex &getCUIndex() = 0;
     virtual const DWARFUnitIndex &getTUIndex() = 0;
@@ -119,11 +119,8 @@ class DWARFContext : public DIContext {
   std::function<void(Error)> WarningHandler = WithColor::defaultWarningHandler;
 
   /// Read compile units from the debug_info.dwo section (if necessary)
-  /// and type units from the debug_types.dwo section (if necessary)
-  /// and store them in DWOUnits.
-  /// If \p Lazy is true, set up to parse but don't actually parse them.
-  enum { EagerParse = false, LazyParse = true };
-  DWARFUnitVector &getDWOUnits(bool Lazy = false);
+  /// and type units from the debug_types.dwo section (if necessary).
+  const DWARFUnitVector &getDWOUnits();
 
   std::unique_ptr<const DWARFObject> DObj;
 
@@ -167,7 +164,7 @@ class DWARFContext : public DIContext {
 
   /// Get units from .debug_info in this context.
   unit_iterator_range info_section_units() {
-    DWARFUnitVector &NormalUnits = State->getNormalUnits();
+    const DWARFUnitVector &NormalUnits = State->getNormalUnits();
     return unit_iterator_range(NormalUnits.begin(),
                                NormalUnits.begin() +
                                    NormalUnits.getNumInfoUnits());
@@ -179,7 +176,7 @@ class DWARFContext : public DIContext {
 
   /// Get units from .debug_types in this context.
   unit_iterator_range types_section_units() {
-    DWARFUnitVector &NormalUnits = State->getNormalUnits();
+    const DWARFUnitVector &NormalUnits = State->getNormalUnits();
     return unit_iterator_range(
         NormalUnits.begin() + NormalUnits.getNumInfoUnits(), NormalUnits.end());
   }
@@ -194,13 +191,13 @@ class DWARFContext : public DIContext {
 
   /// Get all normal compile/type units in this context.
   unit_iterator_range normal_units() {
-    DWARFUnitVector &NormalUnits = State->getNormalUnits();
+    const DWARFUnitVector &NormalUnits = State->getNormalUnits();
     return unit_iterator_range(NormalUnits.begin(), NormalUnits.end());
   }
 
   /// Get units from .debug_info..dwo in the DWO context.
   unit_iterator_range dwo_info_section_units() {
-    DWARFUnitVector &DWOUnits = State->getDWOUnits();
+    const DWARFUnitVector &DWOUnits = State->getDWOUnits();
     return unit_iterator_range(DWOUnits.begin(),
                                DWOUnits.begin() + DWOUnits.getNumInfoUnits());
   }
@@ -211,7 +208,7 @@ class DWARFContext : public DIContext {
 
   /// Get units from .debug_types.dwo in the DWO context.
   unit_iterator_range dwo_types_section_units() {
-    DWARFUnitVector &DWOUnits = State->getDWOUnits();
+    const DWARFUnitVector &DWOUnits = State->getDWOUnits();
     return unit_iterator_range(DWOUnits.begin() + DWOUnits.getNumInfoUnits(),
                                DWOUnits.end());
   }
@@ -227,7 +224,7 @@ class DWARFContext : public DIContext {
 
   /// Get all units in the DWO context.
   unit_iterator_range dwo_units() {
-    DWARFUnitVector &DWOUnits = State->getDWOUnits();
+    const DWARFUnitVector &DWOUnits = State->getDWOUnits();
     return unit_iterator_range(DWOUnits.begin(), DWOUnits.end());
   }
 
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
index 7084081ce61a43a..55ffd61188654bc 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
@@ -124,22 +124,18 @@ bool isCompileUnit(const std::unique_ptr<DWARFUnit> &U);
 /// Describe a collection of units. Intended to hold all units either from
 /// .debug_info and .debug_types, or from .debug_info.dwo and .debug_types.dwo.
 class DWARFUnitVector final : public SmallVector<std::unique_ptr<DWARFUnit>, 1> {
-  std::function<std::unique_ptr<DWARFUnit>(uint64_t, DWARFSectionKind,
-                                           const DWARFSection *,
-                                           const DWARFUnitIndex::Entry *)>
-      Parser;
   int NumInfoUnits = -1;
 
 public:
   using UnitVector = SmallVectorImpl<std::unique_ptr<DWARFUnit>>;
-  using iterator = typename UnitVector::iterator;
-  using iterator_range = llvm::iterator_range<typename UnitVector::iterator>;
+  using iterator = typename UnitVector::const_iterator;
+  using iterator_range = llvm::iterator_range<typename UnitVector::const_iterator>;
 
   using compile_unit_range =
       decltype(make_filter_range(std::declval<iterator_range>(), isCompileUnit));
 
   DWARFUnit *getUnitForOffset(uint64_t Offset) const;
-  DWARFUnit *getUnitForIndexEntry(const DWARFUnitIndex::Entry &E);
+  DWARFUnit *getUnitForIndexEntry(const DWARFUnitIndex::Entry &E) const;
 
   /// Read units from a .debug_info or .debug_types section.  Calls made
   /// before finishedInfoUnits() are assumed to be for .debug_info sections,
@@ -153,11 +149,7 @@ class DWARFUnitVector final : public SmallVector<std::unique_ptr<DWARFUnit>, 1>
   /// sections.  Caller must not mix calls to addUnitsForSection and
   /// addUnitsForDWOSection.
   void addUnitsForDWOSection(DWARFContext &C, const DWARFSection &DWOSection,
-                             DWARFSectionKind SectionKind, bool Lazy = false);
-
-  /// Add an existing DWARFUnit to this UnitVector. This is used by the DWARF
-  /// verifier to process unit separately.
-  DWARFUnit *addUnit(std::unique_ptr<DWARFUnit> Unit);
+                             DWARFSectionKind SectionKind);
 
   /// Returns number of all units held by this instance.
   unsigned getNumUnits() const { return size(); }
@@ -177,7 +169,7 @@ class DWARFUnitVector final : public SmallVector<std::unique_ptr<DWARFUnit>, 1>
                     const DWARFSection *RS, const DWARFSection *LocSection,
                     StringRef SS, const DWARFSection &SOS,
                     const DWARFSection *AOS, const DWARFSection &LS, bool LE,
-                    bool IsDWO, bool Lazy, DWARFSectionKind SectionKind);
+                    bool IsDWO, DWARFSectionKind SectionKind);
 };
 
 /// Represents base address of the CU.
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
index 088dffeaa2b9f6d..5079a6b92a0cbd4 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
@@ -286,7 +286,7 @@ class ThreadUnsafeDWARFContextState : public DWARFContext::DWARFContextState {
       DWARFContext::DWARFContextState(DC),
       DWPName(std::move(DWP)) {}
 
-  DWARFUnitVector &getNormalUnits() override {
+  const DWARFUnitVector &getNormalUnits() override {
     if (NormalUnits.empty()) {
       const DWARFObject &DObj = D.getDWARFObj();
       DObj.forEachInfoSections([&](const DWARFSection &S) {
@@ -300,16 +300,16 @@ class ThreadUnsafeDWARFContextState : public DWARFContext::DWARFContextState {
     return NormalUnits;
   }
 
-  DWARFUnitVector &getDWOUnits(bool Lazy) override {
+  const DWARFUnitVector &getDWOUnits() override {
     if (DWOUnits.empty()) {
       const DWARFObject &DObj = D.getDWARFObj();
 
       DObj.forEachInfoDWOSections([&](const DWARFSection &S) {
-        DWOUnits.addUnitsForDWOSection(D, S, DW_SECT_INFO, Lazy);
+        DWOUnits.addUnitsForDWOSection(D, S, DW_SECT_INFO);
       });
       DWOUnits.finishedInfoUnits();
       DObj.forEachTypesDWOSections([&](const DWARFSection &S) {
-        DWOUnits.addUnitsForDWOSection(D, S, DW_SECT_EXT_TYPES, Lazy);
+        DWOUnits.addUnitsForDWOSection(D, S, DW_SECT_EXT_TYPES);
       });
     }
     return DWOUnits;
@@ -633,13 +633,13 @@ class ThreadSafeState : public ThreadUnsafeDWARFContextState {
   ThreadSafeState(DWARFContext &DC, std::string &DWP) :
       ThreadUnsafeDWARFContextState(DC, DWP) {}
 
-  DWARFUnitVector &getNormalUnits() override {
+  const DWARFUnitVector &getNormalUnits() override {
     std::unique_lock<std::recursive_mutex> LockGuard(Mutex);
     return ThreadUnsafeDWARFContextState::getNormalUnits();
   }
-  DWARFUnitVector &getDWOUnits(bool Lazy) override {
+  const DWARFUnitVector &getDWOUnits() override {
     std::unique_lock<std::recursive_mutex> LockGuard(Mutex);
-    return ThreadUnsafeDWARFContextState::getDWOUnits(Lazy);
+    return ThreadUnsafeDWARFContextState::getDWOUnits();
   }
   const DWARFUnitIndex &getCUIndex() override {
     std::unique_lock<std::recursive_mutex> LockGuard(Mutex);
@@ -1335,7 +1335,7 @@ void DWARFContext::dump(
 
 DWARFTypeUnit *DWARFContext::getTypeUnitForHash(uint16_t Version, uint64_t Hash,
                                                 bool IsDWO) {
-  DWARFUnitVector &DWOUnits = State->getDWOUnits();
+  const DWARFUnitVector &DWOUnits = State->getDWOUnits();
   if (const auto &TUI = getTUIndex()) {
     if (const auto *R = TUI.getFromHash(Hash))
       return dyn_cast_or_null<DWARFTypeUnit>(
@@ -1350,7 +1350,7 @@ DWARFTypeUnit *DWARFContext::getTypeUnitForHash(uint16_t Version, uint64_t Hash,
 }
 
 DWARFCompileUnit *DWARFContext::getDWOCompileUnitForHash(uint64_t Hash) {
-  DWARFUnitVector &DWOUnits = State->getDWOUnits(LazyParse);
+  const DWARFUnitVector &DWOUnits = State->getDWOUnits();
 
   if (const auto &CUI = getCUIndex()) {
     if (const auto *R = CUI.getFromHash(Hash))
@@ -1497,8 +1497,8 @@ void DWARFContext::clearLineTableForUnit(DWARFUnit *U) {
   return State->clearLineTableForUnit(U);
 }
 
-DWARFUnitVector &DWARFContext::getDWOUnits(bool Lazy) {
-  return State->getDWOUnits(Lazy);
+const DWARFUnitVector &DWARFContext::getDWOUnits() {
+  return State->getDWOUnits();
 }
 
 DWARFCompileUnit *DWARFContext::getCompileUnitForOffset(uint64_t Offset) {
@@ -1526,7 +1526,7 @@ DWARFCompileUnit *DWARFContext::getCompileUnitForDataAddress(uint64_t Address) {
   //
   // So, we walk the CU's and their child DI's manually, looking for the
   // specific global variable.
-  for (std::unique_ptr<DWARFUnit> &CU : compile_units()) {
+  for (const std::unique_ptr<DWARFUnit> &CU : compile_units()) {
     if (CU->getVariableForAddress(Address)) {
       return static_cast<DWARFCompileUnit *>(CU.get());
     }
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp b/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
index 9f455fa7e96a7ef..d20ad804f05bda5 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
@@ -46,20 +46,17 @@ void DWARFUnitVector::addUnitsForSection(DWARFContext &C,
   addUnitsImpl(C, D, Section, C.getDebugAbbrev(), &D.getRangesSection(),
                &D.getLocSection(), D.getStrSection(),
                D.getStrOffsetsSection(), &D.getAddrSection(),
-               D.getLineSection(), D.isLittleEndian(), false, false,
-               SectionKind);
+               D.getLineSection(), D.isLittleEndian(), false, SectionKind);
 }
 
 void DWARFUnitVector::addUnitsForDWOSection(DWARFContext &C,
                                             const DWARFSection &DWOSection,
-                                            DWARFSectionKind SectionKind,
-                                            bool Lazy) {
+                                            DWARFSectionKind SectionKind) {
   const DWARFObject &D = C.getDWARFObj();
   addUnitsImpl(C, D, DWOSection, C.getDebugAbbrevDWO(), &D.getRangesDWOSection(),
                &D.getLocDWOSection(), D.getStrDWOSection(),
                D.getStrOffsetsDWOSection(), &D.getAddrSection(),
-               D.getLineDWOSection(), C.isLittleEndian(), true, Lazy,
-               SectionKind);
+               D.getLineDWOSection(), C.isLittleEndian(), true, SectionKind);
 }
 
 void DWARFUnitVector::addUnitsImpl(
@@ -67,53 +64,48 @@ void DWARFUnitVector::addUnitsImpl(
     const DWARFDebugAbbrev *DA, const DWARFSection *RS,
     const DWARFSection *LocSection, StringRef SS, const DWARFSection &SOS,
     const DWARFSection *AOS, const DWARFSection &LS, bool LE, bool IsDWO,
-    bool Lazy, DWARFSectionKind SectionKind) {
+    DWARFSectionKind SectionKind) {
   DWARFDataExtractor Data(Obj, Section, LE, 0);
-  // Lazy initialization of Parser, now that we have all section info.
-  if (!Parser) {
-    Parser = [=, &Context, &Obj, &Section, &SOS,
-              &LS](uint64_t Offset, DWARFSectionKind SectionKind,
-                   const DWARFSection *CurSection,
-                   const DWARFUnitIndex::Entry *IndexEntry)
+  auto Parser = [=, &Context, &Obj, &Section, &SOS,
+                 &LS](uint64_t Offset, DWARFSectionKind SectionKind,
+                      const DWARFSection *CurSection,
+                      const DWARFUnitIndex::Entry *IndexEntry)
         -> std::unique_ptr<DWARFUnit> {
-      const DWARFSection &InfoSection = CurSection ? *CurSection : Section;
-      DWARFDataExtractor Data(Obj, InfoSection, LE, 0);
-      if (!Data.isValidOffset(Offset))
-        return nullptr;
-      DWARFUnitHeader Header;
-      if (Error ExtractErr =
-              Header.extract(Context, Data, &Offset, SectionKind)) {
-        Context.getWarningHandler()(std::move(ExtractErr));
-        return nullptr;
-      }
-      if (!IndexEntry && IsDWO) {
-        const DWARFUnitIndex &Index = getDWARFUnitIndex(
-            Context, Header.isTypeUnit() ? DW_SECT_EXT_TYPES : DW_SECT_INFO);
-        if (Index) {
-          if (Header.isTypeUnit())
-            IndexEntry = Index.getFromHash(Header.getTypeHash());
-          else if (auto DWOId = Header.getDWOId())
-            IndexEntry = Index.getFromHash(*DWOId);
-        }
-        if (!IndexEntry)
-          IndexEntry = Index.getFromOffset(Header.getOffset());
+    const DWARFSection &InfoSection = CurSection ? *CurSection : Section;
+    DWARFDataExtractor Data(Obj, InfoSection, LE, 0);
+    if (!Data.isValidOffset(Offset))
+      return nullptr;
+    DWARFUnitHeader Header;
+    if (Error ExtractErr =
+            Header.extract(Context, Data, &Offset, SectionKind)) {
+      Context.getWarningHandler()(std::move(ExtractErr));
+      return nullptr;
+    }
+    if (!IndexEntry && IsDWO) {
+      const DWARFUnitIndex &Index = getDWARFUnitIndex(
+          Context, Header.isTypeUnit() ? DW_SECT_EXT_TYPES : DW_SECT_INFO);
+      if (Index) {
+        if (Header.isTypeUnit())
+          IndexEntry = Index.getFromHash(Header.getTypeHash());
+        else if (auto DWOId = Header.getDWOId())
+          IndexEntry = Index.getFromHash(*DWOId);
       }
-      if (IndexEntry && !Header.applyIndexEntry(IndexEntry))
-        return nullptr;
-      std::unique_ptr<DWARFUnit> U;
-      if (Header.isTypeUnit())
-        U = std::make_unique<DWARFTypeUnit>(Context, InfoSection, Header, DA,
-                                             RS, LocSection, SS, SOS, AOS, LS,
-                                             LE, IsDWO, *this);
-      else
-        U = std::make_unique<DWARFCompileUnit>(Context, InfoSection, Header,
-                                                DA, RS, LocSection, SS, SOS,
-                                                AOS, LS, LE, IsDWO, *this);
-      return U;
-    };
-  }
-  if (Lazy)
-    return;
+      if (!IndexEntry)
+        IndexEntry = Index.getFromOffset(Header.getOffset());
+    }
+    if (IndexEntry && !Header.applyIndexEntry(IndexEntry))
+      return nullptr;
+    std::unique_ptr<DWARFUnit> U;
+    if (Header.isTypeUnit())
+      U = std::make_unique<DWARFTypeUnit>(Context, InfoSection, Header, DA,
+                                            RS, LocSection, SS, SOS, AOS, LS,
+                                            LE, IsDWO, *this);
+    else
+      U = std::make_unique<DWARFCompileUnit>(Context, InfoSection, Header,
+                                              DA, RS, LocSection, SS, SOS,
+                                              AOS, LS, LE, IsDWO, *this);
+    return U;
+  };
   // Find a reasonable insertion point within the vector.  We skip over
   // (a) units from a different section, (b) units from the same section
   // but with lower offset-within-section.  This keeps units in order
@@ -136,15 +128,6 @@ void DWARFUnitVector::addUnitsImpl(
   }
 }
 
-DWARFUnit *DWARFUnitVector::addUnit(std::unique_ptr<DWARFUnit> Unit) {
-  auto I = llvm::upper_bound(*this, Unit,
-                             [](const std::unique_ptr<DWARFUnit> &LHS,
-                                const std::unique_ptr<DWARFUnit> &RHS) {
-                               return LHS->getOffset() < RHS->getOffset();
-                             });
-  return this->insert(I, std::move(Unit))->get();
-}
-
 DWARFUnit *DWARFUnitVector::getUnitForOffset(uint64_t Offset) const {
   auto end = begin() + getNumInfoUnits();
   auto *CU =
@@ -158,7 +141,7 @@ DWARFUnit *DWARFUnitVector::getUnitForOffset(uint64_t Offset) const {
 }
 
 DWARFUnit *
-DWARFUnitVector::getUnitForIndexEntry(const DWARFUnitIndex::Entry &E) {
+DWARFUnitVector::getUnitForIndexEntry(const DWARFUnitIndex::Entry &E) const {
   const auto *CUOff = E.getContribution(DW_SECT_INFO);
   if (!CUOff)
     return nullptr;
@@ -174,17 +157,7 @@ DWARFUnitVector::getUnitForIndexEntry(const DWARFUnitIndex::Entry &E) {
   if (CU != end && (*CU)->getOffset() <= Offset)
     return CU->get();
 
-  if (!Parser)
-    return nullptr;
-
-  auto U = Parser(Offset, DW_SECT_INFO, nullptr, &E);
-  if (!U)
-    return nullptr;
-
-  auto *NewCU = U.get();
-  this->insert(CU, std::move(U));
-  ++NumInfoUnits;
-  return NewCU;
+  return nullptr;
 }
 
 DWARFUnit::DWARFUnit(DWARFContext &DC, const DWARFSection &Section,
diff --git a/llvm/tools/llvm-dwarfutil/DebugInfoLinker.cpp b/llvm/tools/llvm-dwarfutil/DebugInfoLinker.cpp
index 02a94596ec7644d..48c1a433317e668 100644
--- a/llvm/tools/llvm-dwarfutil/DebugInfoLinker.cpp
+++ b/llvm/tools/llvm-dwarfutil/DebugInfoLinker.cpp
@@ -55,7 +55,7 @@ class ObjFileAddressMap : public AddressMapBase {
     }
 
     // Check CU address ranges for tombstone value.
-    for (std::unique_ptr<DWARFUnit> &CU : Context.compile_units()) {
+    for (const std::unique_ptr<DWARFUnit> &CU : Context.compile_units()) {
       Expected<llvm::DWARFAddressRangesVector> ARanges =
           CU->getUnitDIE().getAddressRanges();
       if (!ARanges) {

Copy link

github-actions bot commented Nov 7, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

The DWARFUnitVector class lives inside of the DWARFContextState. Prior to this fix a non const reference was being handed out to clients. When fetching the DWO units, there used to be a "bool Lazy" parameter that could be passed that would allow the DWARFUnitVector to parse individual units on the fly. There were two major issues with this approach:
- not thread safe and causes crashes
- the accessor would check if DWARFUnitVector was empty and if not empty it would return a partially filled in DWARFUnitVector if it was constructed with "Lazy = true"

This patch fixes the issues by always fully parsing the DWARFUnitVector when it is requested and only hands out a "const DWARFUnitVector &". This allows the thread safety mechanism built into the DWARFContext class to work corrrectly, and avoids the issue where if someone construct DWARFUnitVector with "Lazy = true", and then calls an API that partially fills in the DWARFUnitVector with individual entries, and then someone accesses the DWARFUnitVector, they would get a partial and incomplete listing of the DWARF units for the DWOs.
@clayborg
Copy link
Collaborator Author

clayborg commented Nov 7, 2023

Fixed clang format issues

@ayermolo
Copy link
Contributor

ayermolo commented Nov 7, 2023

I think this might lead to memory foot print increase in BOLT. Will double check later today.

@dwblaikie
Copy link
Collaborator

Could you check the history and describe why the lazy feature was added (I'd guess I probably added it for symbolizing performance, or for other DWP usage performance) and why this change ensures those needs are still addressed?

not thread safe and causes crashes

Does it cause crashes in non-multithreaded situations? If not, then it may be hard to justify pessimizing the non-multithreaded usage to fix the threaded usage without some more nuance about how to address these different needs.

@clayborg
Copy link
Collaborator Author

clayborg commented Nov 7, 2023

Could you check the history and describe why the lazy feature was added (I'd guess I probably added it for symbolizing performance, or for other DWP usage performance) and why this change ensures those needs are still addressed?

It will regress on the performance gain that they added, but this change, while it got preformance, broke the API contract that is expected from DWARFUnitVector. If you search for all of the calls to getDWOUnits(...), nearly all of the calls specify false for Lazy. So if you let the .dwp file initialize its DWARFUnitVector with Lazy = true, and then call DWARFUnitVector::getUnitForIndexEntry(...), and then call any function that actually wants all of the units, like calling this function:

  unit_iterator_range dwo_info_section_units() {
    DWARFUnitVector &DWOUnits = State->getDWOUnits();
    return unit_iterator_range(DWOUnits.begin(),
                               DWOUnits.begin() + DWOUnits.getNumInfoUnits());
  }

  const DWARFUnitVector &getDWOUnitsVector() {
    return State->getDWOUnits();
  }

You will get a partial list of only the units that were manually added by the call to DWARFUnitVector::getUnitForIndexEntry(...). So the API really doesn't work, or if it does work, it only works for special cases if used in just the righy way.

not thread safe and causes crashes

Does it cause crashes in non-multithreaded situations? If not, then it may be hard to justify pessimizing the non-multithreaded usage to fix the threaded usage without some more nuance about how to address these different needs.

Yeah. I can modify this change to specify Lazy = true only if multi-threading is not enabled, but the class itself doesn't work as it currently is created. the code would need to be modified so if the user needs all of the units, (they call DWARFContextState:: getDWOUnits(Lazy = false), then it should fully populate the list.

So if I want to just fix the bug where llvm-gsymutil crashes I can do so easily, but this API isn't safe to use in all situations as I mention above. Let me know what approach you would prefer

@ayermolo
Copy link
Contributor

ayermolo commented Nov 7, 2023

Should be fine from BOLT perspective.

@clayborg
Copy link
Collaborator Author

clayborg commented Nov 7, 2023

The issue is this function:

  DWARFUnitVector &getDWOUnits(bool Lazy) override {
    if (DWOUnits.empty()) {
      const DWARFObject &DObj = D.getDWARFObj();

      DObj.forEachInfoDWOSections([&](const DWARFSection &S) {
        DWOUnits.addUnitsForDWOSection(D, S, DW_SECT_INFO, Lazy);
      });
      DWOUnits.finishedInfoUnits();
      DObj.forEachTypesDWOSections([&](const DWARFSection &S) {
        DWOUnits.addUnitsForDWOSection(D, S, DW_SECT_EXT_TYPES, Lazy);
      });
    }
    return DWOUnits;
  }

If someone calls this function with Lazy = true, then DWOUnits won't be populated. But if they call it with true, then it will be populated. Users can then ask the DWARFUnitVector to get the DWO for a given hash using DWARFUnitVector::getUnitForIndexEntry(...). And that will populate this list with only one entry. The code like this will causes issues:

DWARFUnitVector &DWOUnits = DC->getDWOUnits(/*Lazy=*/true);
DWARFUnit *DU = DWOUnits.getUnitForIndexEntry(...); // This will populate the DWOUnits with only one entry that matches

Then if you do:

DWARFUnitVector &DWOUnits = DC->getDWOUnits(/*Lazy=*/false);

We now have a DWARFUnitVector that isn't empty, because it has one entry for the DWARFUnit from the above code and the fact that Lazy is now false won't cause it to repopulate...

@ayermolo
Copy link
Contributor

ayermolo commented Nov 7, 2023

Would be interesting to figure out what that history is.
On bolt side it will simplify some things.
Also that vector ends up being populated one at a time if whatever tool uses DWARFContext invokes getNonSkeltonUnitDIE through parseDWO-->getDWOCompileUnitForHash.

@clayborg
Copy link
Collaborator Author

clayborg commented Nov 7, 2023

Would be interesting to figure out what that history is. On bolt side it will simplify some things. Also that vector ends up being populated one at a time if whatever tool uses DWARFContext invokes getNonSkeltonUnitDIE through parseDWO-->getDWOCompileUnitForHash.

It seems clear someone added the ability to get a single .dwo file using a DWARFUnitIndex::Entry in the following API:

DWARFCompileUnit *DWARFContext::getDWOCompileUnitForHash(uint64_t Hash);

But this ruiined the ability to actually use the DWOUnits for anytthing else. So I would prefer something that works correctly for all clients over a patch got some performance at the expense of the API working correctly.

@clayborg
Copy link
Collaborator Author

clayborg commented Nov 7, 2023

I am happy to just fix it for llvm-gsymutil by not allowing lazy parsing if we have multi-threading available, but it would be nice to fix this correctly so it works for everyone.

@clayborg clayborg requested a review from pogo59 November 7, 2023 22:30
@clayborg
Copy link
Collaborator Author

clayborg commented Nov 7, 2023

Adding Paul Robinson as a reviewer in case he has any input.

@clayborg
Copy link
Collaborator Author

clayborg commented Nov 8, 2023

The change was introduced with this patch:

commit e79dda31e9ed6cfc97d06ad87976fb16e53f3780
Author: David Blaikie <dblaikie@gmail.com>
Date:   Tue Sep 19 18:36:11 2017 +0000

    dwarfdump/symbolizer: Avoid loading unneeded CUs from a DWP
    
    When symbolizing large binaries, parsing every CU in a DWP file is a
    significant performance penalty. Instead, use the index to only load the
    CUs that are needed.
    
    llvm-svn: 313659

@clayborg
Copy link
Collaborator Author

clayborg commented Nov 8, 2023

So I simplified this patch to just not do lazy parsing when we need multi-threading. This will solve the problem without affecting performance for single threaded applications. Lemme know your thoughts.

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds OK to me.

Bonus points if you added test coverage for this - though I guess the multithreaded path isn't tested in llvm-dwarfdump? (we could add a not-intended-for-consumers flag that enables multithreaded mode, but with a single thread (or, heck, maybe with multiple threads for parsing or something) - then you could demonstrate that the non-lazy functionality is used by inserting some invalid DWARF and showing that in multithreaded mode it hits a failure due to non-laziness, but it isn't hit in single threaded mode) - Though perhaps the laziness is only accessible through llvm-symbolizer, not llvm-dwarfdump, so the multithreading would have to be there.

@clayborg clayborg merged commit 5aa934e into llvm:main Nov 8, 2023
3 checks passed
@ayermolo
Copy link
Contributor

ayermolo commented Nov 8, 2023

Can Lazy argument be removed from &getDWOUnits(bool Lazy), since we always now pass false to underlying call.

@clayborg
Copy link
Collaborator Author

clayborg commented Nov 9, 2023

Can Lazy argument be removed from &getDWOUnits(bool Lazy), since we always now pass false to underlying call.

We don't always pass false. Only in the threadsafe code do we pass false. In the single threaded code we still pass true for Lazy since it is ok.

@ayermolo
Copy link
Contributor

ayermolo commented Nov 9, 2023

Can Lazy argument be removed from &getDWOUnits(bool Lazy), since we always now pass false to underlying call.

We don't always pass false. Only in the threadsafe code do we pass false. In the single threaded code we still pass true for Lazy since it is ok.

ah right it's a virtual function. Nevermind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants