[DebugNames] Implement per-entry abbreviation support #76731

felipepiovezan · 2024-01-02T16:06:08Z

(ignore the first commit, this depends on #76296)

Prior to this commit, the choice of forms to use in debug_names abbreviations
was made independently of entries themselves. For example, if we wanted entries
to have a "IDX_compile_unit", all entries needed to use the same form (e.g
data2) for such IDX, even if a specific entry could fit its IDX in a single byte
(e.g. data1).

This commit changes that by creating an `AbbreviationContents` data structure to
handle the encoding of abbreviations on a per-entry basis. We don't need the
full generality of abbreviations from the debug_info section -- abbreviations in
debug_names are more limited --, but this commit is desirable for two reasons:

1. Upcoming patches may want to use different forms for different entries (there
is ongoing work for IDX_parent that will benefit from this)
2. This allows for space savings by using smaller forms.

A few tests had to be updated, as this commit changes the abbreviation number
(and therefore the order in which abbreviations are printed). Notably, this
_decreases_ the abbreviation number used, which may lead to smaller ULEB
encodings. A future patch should just renumber these to start with 1.

This template is no longer used.

llvmbot · 2024-01-02T16:06:38Z

@llvm/pr-subscribers-debuginfo

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

(ignore the first commit, this depends on #76296)

Prior to this commit, the choice of forms to use in debug_names abbreviations
was made independently of entries themselves. For example, if we wanted entries
to have a "IDX_compile_unit", all entries needed to use the same form (e.g
data2) for such IDX, even if a specific entry could fit its IDX in a single byte
(e.g. data1).

This commit changes that by creating an `AbbreviationContents` data structure to
handle the encoding of abbreviations on a per-entry basis. We don't need the
full generality of abbreviations from the debug_info section -- abbreviations in
debug_names are more limited --, but this commit is desirable for two reasons:

1. Upcoming patches may want to use different forms for different entries (there
is ongoing work for IDX_parent that will benefit from this)
2. This allows for space savings by using smaller forms.

A few tests had to be updated, as this commit changes the abbreviation number
(and therefore the order in which abbreviations are printed). Notably, this
_decreases_ the abbreviation number used, which may lead to smaller ULEB
encodings. A future patch should just renumber these to start with 1.

Patch is 24.75 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76731.diff

6 Files Affected:

(modified) llvm/include/llvm/CodeGen/AccelTable.h (+3-3)
(modified) llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp (+169-89)
(modified) llvm/lib/DWARFLinker/DWARFStreamer.cpp (+2-5)
(modified) llvm/lib/DWARFLinkerParallel/DWARFEmitterImpl.cpp (+2-5)
(modified) llvm/test/DebugInfo/Generic/debug-names-many-cu.ll (+9-4)
(modified) llvm/test/DebugInfo/X86/debug-names-types.ll (+12-12)

diff --git a/llvm/include/llvm/CodeGen/AccelTable.h b/llvm/include/llvm/CodeGen/AccelTable.h
index 6eb09f32f9f951..9098a1a821bf47 100644
--- a/llvm/include/llvm/CodeGen/AccelTable.h
+++ b/llvm/include/llvm/CodeGen/AccelTable.h
@@ -305,9 +305,9 @@ class DWARF5AccelTable : public AccelTable<DWARF5AccelTableData> {
   TUVectorTy TUSymbolsOrHashes;
 
 public:
-  struct UnitIndexAndEncoding {
+  struct UnitIndex {
     unsigned Index;
-    DWARF5AccelTableData::AttributeEncoding Encoding;
+    bool IsType;
   };
   /// Returns type units that were constructed.
   const TUVectorTy &getTypeUnitsSymbols() { return TUSymbolsOrHashes; }
@@ -366,7 +366,7 @@ void emitDWARF5AccelTable(AsmPrinter *Asm, DWARF5AccelTable &Contents,
 void emitDWARF5AccelTable(
     AsmPrinter *Asm, DWARF5AccelTable &Contents,
     ArrayRef<std::variant<MCSymbol *, uint64_t>> CUs,
-    llvm::function_ref<std::optional<DWARF5AccelTable::UnitIndexAndEncoding>(
+    llvm::function_ref<std::optional<DWARF5AccelTable::UnitIndex>(
         const DWARF5AccelTableData &)>
         getIndexForEntry);
 
diff --git a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
index 30ea7eef3a12ba..6d4959e9c72084 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
@@ -179,14 +179,126 @@ class AppleAccelTableWriter : public AccelTableWriter {
 #endif
 };
 
+/// Helper enum to encode whether something is a Type or Compile unit (or none)
+/// with the minimal number of bits.
+enum class TypeOrCompileUnit : uint8_t { Compile = 0, Type = 1, None = 2 };
+
+/// Helper enum to encode DW_FORM_data{1,2,4,8} with the minimal number of
+/// bits.
+enum class FormDataLength : uint8_t {
+  Data1 = 0,
+  Data2 = 1,
+  Data4 = 2,
+  Data8 = 3
+};
+
+/// Maps `Index` into the smallest DW_FORM_data{1,2,4,8} that can represent it,
+/// and returns the corresponding `FormDataLength`.
+static FormDataLength getFormForIndex(uint32_t Index) {
+  dwarf::Form Form = DIEInteger::BestForm(false /*IsSigned*/, Index);
+  switch (Form) {
+  case dwarf::Form::DW_FORM_data1:
+    return FormDataLength::Data1;
+  case dwarf::Form::DW_FORM_data2:
+    return FormDataLength::Data2;
+  case dwarf::Form::DW_FORM_data4:
+    return FormDataLength::Data4;
+  case dwarf::Form::DW_FORM_data8:
+    return FormDataLength::Data8;
+  default:
+    llvm_unreachable("invalid getFormForIndex");
+  }
+}
+
+/// Maps a `FormDataLength` back to the corresponding DW_FORM_data{1,2,4,8}
+static dwarf::Form toDwarfDataForm(FormDataLength DataLength) {
+  switch (DataLength) {
+  case FormDataLength::Data1:
+    return dwarf::Form::DW_FORM_data1;
+  case FormDataLength::Data2:
+    return dwarf::Form::DW_FORM_data2;
+  case FormDataLength::Data4:
+    return dwarf::Form::DW_FORM_data4;
+  case FormDataLength::Data8:
+    return dwarf::Form::DW_FORM_data8;
+  }
+  llvm_unreachable("invalid toDwarfDataForm");
+}
+
+/// Converts `UnitType` and `UnitForm` into an `AttributeEncoding` and push it
+/// into `Ans`, if UnitType != None.
+void pushTypeOrCompileUnitEncondings(
+    SmallVectorImpl<DWARF5AccelTableData::AttributeEncoding> &Vec,
+    TypeOrCompileUnit UnitType, FormDataLength UnitForm) {
+  switch (UnitType) {
+  case TypeOrCompileUnit::Compile:
+    Vec.push_back({dwarf::DW_IDX_compile_unit, toDwarfDataForm(UnitForm)});
+    break;
+  case TypeOrCompileUnit::Type:
+    Vec.push_back({dwarf::DW_IDX_type_unit, toDwarfDataForm((UnitForm))});
+    break;
+  case TypeOrCompileUnit::None:
+    break;
+  }
+  return;
+}
+
+/// Represent the contents of an Abbreviation Entry for a DWARF5AccelTable, so
+/// that a set of such entries may be created.
+/// All Abbreviations have some common content:
+///    1. IDX_die_offset, with form ref4.
+///    2. IDX_{type, compile}_unit with a form DW_FORM_data{1,2,4,8}.
+/// This class doesn't encode such common elements; instead, it only encodes
+/// contents that may change from one abbreviation to another, as those define
+/// the uniqueness of each abbreviation.
+/// Common elements still show up in accessor methods (e.g. `getAttrEncodings`).
+struct AbbreviationContents {
+  uint16_t Tag : 16;
+  TypeOrCompileUnit UnitType : 2;
+  FormDataLength UnitForm : 2;
+
+  AbbreviationContents(uint16_t Tag, TypeOrCompileUnit UnitType,
+                       uint32_t UnitIndex)
+      : Tag(Tag), UnitType(UnitType), UnitForm(getFormForIndex(UnitIndex)) {}
+  AbbreviationContents(uint16_t Tag)
+      : Tag(Tag), UnitType(TypeOrCompileUnit::None),
+        UnitForm(getFormForIndex(0)) {}
+
+  uint32_t getUniqueCode() const {
+    uint32_t Bitfield = static_cast<uint32_t>(UnitType);
+    Bitfield |= static_cast<uint32_t>(UnitForm) << 2;
+    Bitfield |= static_cast<uint32_t>(Tag) << 4;
+    return Bitfield;
+  }
+
+  SmallVector<DWARF5AccelTableData::AttributeEncoding, 2>
+  getAttrEncodings() const {
+    SmallVector<DWARF5AccelTableData::AttributeEncoding, 2> Ans;
+    pushTypeOrCompileUnitEncondings(Ans, UnitType, UnitForm);
+    Ans.push_back({dwarf::DW_IDX_die_offset, dwarf::DW_FORM_ref4});
+    return Ans;
+  }
+};
+
+struct AbbreviationContentsInfo {
+  static AbbreviationContents getEmptyKey() {
+    return AbbreviationContents(0);
+  };
+  static AbbreviationContents getTombstoneKey() {
+    return AbbreviationContents(~0);
+  };
+  static uint32_t getHashValue(AbbreviationContents Contents) {
+    return Contents.getUniqueCode();
+  }
+  static bool isEqual(AbbreviationContents LHS, AbbreviationContents RHS) {
+    return LHS.Tag == RHS.Tag && LHS.UnitType == RHS.UnitType;
+  }
+};
+
 /// Class responsible for emitting a DWARF v5 Accelerator Table. The only
 /// public function is emit(), which performs the actual emission.
 ///
-/// The class is templated in its data type. This allows us to emit both dyamic
-/// and static data entries. A callback abstract the logic to provide a CU
-/// index for a given entry, which is different per data type, but identical
-/// for every entry in the same table.
-template <typename DataT>
+/// A callback abstracts the logic to provide a CU index for a given entry.
 class Dwarf5AccelTableWriter : public AccelTableWriter {
   struct Header {
     uint16_t Version = 5;
@@ -211,12 +323,11 @@ class Dwarf5AccelTableWriter : public AccelTableWriter {
   };
 
   Header Header;
-  DenseMap<uint32_t, SmallVector<DWARF5AccelTableData::AttributeEncoding, 2>>
-      Abbreviations;
+  DenseSet<AbbreviationContents, AbbreviationContentsInfo> Abbreviations;
   ArrayRef<std::variant<MCSymbol *, uint64_t>> CompUnits;
   ArrayRef<std::variant<MCSymbol *, uint64_t>> TypeUnits;
-  llvm::function_ref<std::optional<DWARF5AccelTable::UnitIndexAndEncoding>(
-      const DataT &)>
+  llvm::function_ref<std::optional<DWARF5AccelTable::UnitIndex>(
+      const DWARF5AccelTableData &)>
       getIndexForEntry;
   MCSymbol *ContributionEnd = nullptr;
   MCSymbol *AbbrevStart = Asm->createTempSymbol("names_abbrev_start");
@@ -232,7 +343,7 @@ class Dwarf5AccelTableWriter : public AccelTableWriter {
   void emitBuckets() const;
   void emitStringOffsets() const;
   void emitAbbrevs() const;
-  void emitEntry(const DataT &Entry) const;
+  void emitEntry(const DWARF5AccelTableData &Entry) const;
   void emitData() const;
 
 public:
@@ -240,8 +351,8 @@ class Dwarf5AccelTableWriter : public AccelTableWriter {
       AsmPrinter *Asm, const AccelTableBase &Contents,
       ArrayRef<std::variant<MCSymbol *, uint64_t>> CompUnits,
       ArrayRef<std::variant<MCSymbol *, uint64_t>> TypeUnits,
-      llvm::function_ref<
-          std::optional<DWARF5AccelTable::UnitIndexAndEncoding>(const DataT &)>
+      llvm::function_ref<std::optional<DWARF5AccelTable::UnitIndex>(
+          const DWARF5AccelTableData &)>
           getIndexForEntry,
       bool IsSplitDwarf);
 
@@ -370,8 +481,7 @@ DWARF5AccelTableData::DWARF5AccelTableData(const DIE &Die,
                                            const bool IsTU)
     : OffsetVal(&Die), DieTag(Die.getTag()), UnitID(UnitID), IsTU(IsTU) {}
 
-template <typename DataT>
-void Dwarf5AccelTableWriter<DataT>::Header::emit(Dwarf5AccelTableWriter &Ctx) {
+void Dwarf5AccelTableWriter::Header::emit(Dwarf5AccelTableWriter &Ctx) {
   assert(CompUnitCount > 0 && "Index must have at least one CU.");
 
   AsmPrinter *Asm = Ctx.Asm;
@@ -400,46 +510,33 @@ void Dwarf5AccelTableWriter<DataT>::Header::emit(Dwarf5AccelTableWriter &Ctx) {
   Asm->OutStreamer->emitBytes({AugmentationString, AugmentationStringSize});
 }
 
-static uint32_t constexpr LowerBitSize = dwarf::DW_IDX_type_hash;
-static uint32_t getTagFromAbbreviationTag(const uint32_t AbbrvTag) {
-  return AbbrvTag >> LowerBitSize;
+static AbbreviationContents constructAbbreviationContents(
+    const unsigned Tag,
+    const std::optional<DWARF5AccelTable::UnitIndex> &EntryRet) {
+  if (!EntryRet)
+    return AbbreviationContents(Tag);
+  return AbbreviationContents(Tag,
+                              EntryRet->IsType ? TypeOrCompileUnit::Type
+                                               : TypeOrCompileUnit::Compile,
+                              EntryRet->Index);
 }
 
-/// Constructs a unique AbbrevTag that captures what a DIE accesses.
-/// Using this tag we can emit a unique abbreviation for each DIE.
-static uint32_t constructAbbreviationTag(
-    const unsigned Tag,
-    const std::optional<DWARF5AccelTable::UnitIndexAndEncoding> &EntryRet) {
-  uint32_t AbbrvTag = 0;
-  if (EntryRet)
-    AbbrvTag |= 1 << EntryRet->Encoding.Index;
-  AbbrvTag |= 1 << dwarf::DW_IDX_die_offset;
-  AbbrvTag |= Tag << LowerBitSize;
-  return AbbrvTag;
-}
-template <typename DataT>
-void Dwarf5AccelTableWriter<DataT>::populateAbbrevsMap() {
+void Dwarf5AccelTableWriter::populateAbbrevsMap() {
   for (auto &Bucket : Contents.getBuckets()) {
     for (auto *Hash : Bucket) {
       for (auto *Value : Hash->Values) {
-        std::optional<DWARF5AccelTable::UnitIndexAndEncoding> EntryRet =
-            getIndexForEntry(*static_cast<const DataT *>(Value));
-        unsigned Tag = static_cast<const DataT *>(Value)->getDieTag();
-        uint32_t AbbrvTag = constructAbbreviationTag(Tag, EntryRet);
-        if (Abbreviations.count(AbbrvTag) == 0) {
-          SmallVector<DWARF5AccelTableData::AttributeEncoding, 2> UA;
-          if (EntryRet)
-            UA.push_back(EntryRet->Encoding);
-          UA.push_back({dwarf::DW_IDX_die_offset, dwarf::DW_FORM_ref4});
-          Abbreviations.try_emplace(AbbrvTag, UA);
-        }
+        std::optional<DWARF5AccelTable::UnitIndex> EntryRet =
+            getIndexForEntry(*static_cast<const DWARF5AccelTableData *>(Value));
+        unsigned Tag =
+            static_cast<const DWARF5AccelTableData *>(Value)->getDieTag();
+        auto AbbrevContents = constructAbbreviationContents(Tag, EntryRet);
+        Abbreviations.insert(AbbrevContents);
       }
     }
   }
 }
 
-template <typename DataT>
-void Dwarf5AccelTableWriter<DataT>::emitCUList() const {
+void Dwarf5AccelTableWriter::emitCUList() const {
   for (const auto &CU : enumerate(CompUnits)) {
     Asm->OutStreamer->AddComment("Compilation unit " + Twine(CU.index()));
     if (std::holds_alternative<MCSymbol *>(CU.value()))
@@ -449,8 +546,7 @@ void Dwarf5AccelTableWriter<DataT>::emitCUList() const {
   }
 }
 
-template <typename DataT>
-void Dwarf5AccelTableWriter<DataT>::emitTUList() const {
+void Dwarf5AccelTableWriter::emitTUList() const {
   for (const auto &TU : enumerate(TypeUnits)) {
     Asm->OutStreamer->AddComment("Type unit " + Twine(TU.index()));
     if (std::holds_alternative<MCSymbol *>(TU.value()))
@@ -462,8 +558,7 @@ void Dwarf5AccelTableWriter<DataT>::emitTUList() const {
   }
 }
 
-template <typename DataT>
-void Dwarf5AccelTableWriter<DataT>::emitBuckets() const {
+void Dwarf5AccelTableWriter::emitBuckets() const {
   uint32_t Index = 1;
   for (const auto &Bucket : enumerate(Contents.getBuckets())) {
     Asm->OutStreamer->AddComment("Bucket " + Twine(Bucket.index()));
@@ -472,8 +567,7 @@ void Dwarf5AccelTableWriter<DataT>::emitBuckets() const {
   }
 }
 
-template <typename DataT>
-void Dwarf5AccelTableWriter<DataT>::emitStringOffsets() const {
+void Dwarf5AccelTableWriter::emitStringOffsets() const {
   for (const auto &Bucket : enumerate(Contents.getBuckets())) {
     for (auto *Hash : Bucket.value()) {
       DwarfStringPoolEntryRef String = Hash->Name;
@@ -484,17 +578,15 @@ void Dwarf5AccelTableWriter<DataT>::emitStringOffsets() const {
   }
 }
 
-template <typename DataT>
-void Dwarf5AccelTableWriter<DataT>::emitAbbrevs() const {
+void Dwarf5AccelTableWriter::emitAbbrevs() const {
   Asm->OutStreamer->emitLabel(AbbrevStart);
   for (const auto &Abbrev : Abbreviations) {
     Asm->OutStreamer->AddComment("Abbrev code");
-    uint32_t Tag = getTagFromAbbreviationTag(Abbrev.first);
-    assert(Tag != 0);
-    Asm->emitULEB128(Abbrev.first);
-    Asm->OutStreamer->AddComment(dwarf::TagString(Tag));
-    Asm->emitULEB128(Tag);
-    for (const auto &AttrEnc : Abbrev.second) {
+    assert(Abbrev.Tag != 0);
+    Asm->emitULEB128(Abbrev.getUniqueCode());
+    Asm->OutStreamer->AddComment(dwarf::TagString(Abbrev.Tag));
+    Asm->emitULEB128(Abbrev.Tag);
+    for (const auto &AttrEnc : Abbrev.getAttrEncodings()) {
       Asm->emitULEB128(AttrEnc.Index, dwarf::IndexString(AttrEnc.Index).data());
       Asm->emitULEB128(AttrEnc.Form,
                        dwarf::FormEncodingString(AttrEnc.Form).data());
@@ -506,19 +598,15 @@ void Dwarf5AccelTableWriter<DataT>::emitAbbrevs() const {
   Asm->OutStreamer->emitLabel(AbbrevEnd);
 }
 
-template <typename DataT>
-void Dwarf5AccelTableWriter<DataT>::emitEntry(const DataT &Entry) const {
-  std::optional<DWARF5AccelTable::UnitIndexAndEncoding> EntryRet =
-      getIndexForEntry(Entry);
-  uint32_t AbbrvTag = constructAbbreviationTag(Entry.getDieTag(), EntryRet);
-  auto AbbrevIt = Abbreviations.find(AbbrvTag);
-  assert(AbbrevIt != Abbreviations.end() &&
+void Dwarf5AccelTableWriter::emitEntry(
+    const DWARF5AccelTableData &Entry) const {
+  std::optional<DWARF5AccelTable::UnitIndex> EntryRet = getIndexForEntry(Entry);
+  auto AbbrevContents = constructAbbreviationContents(Entry.getDieTag(), EntryRet);
+  assert(Abbreviations.contains(AbbrevContents) &&
          "Why wasn't this abbrev generated?");
-  assert(getTagFromAbbreviationTag(AbbrevIt->first) == Entry.getDieTag() &&
-         "Invalid Tag");
-  Asm->emitULEB128(AbbrevIt->first, "Abbreviation code");
+  Asm->emitULEB128(AbbrevContents.getUniqueCode(), "Abbreviation code");
 
-  for (const auto &AttrEnc : AbbrevIt->second) {
+  for (const auto &AttrEnc : AbbrevContents.getAttrEncodings()) {
     Asm->OutStreamer->AddComment(dwarf::IndexString(AttrEnc.Index));
     switch (AttrEnc.Index) {
     case dwarf::DW_IDX_compile_unit:
@@ -537,27 +625,26 @@ void Dwarf5AccelTableWriter<DataT>::emitEntry(const DataT &Entry) const {
   }
 }
 
-template <typename DataT> void Dwarf5AccelTableWriter<DataT>::emitData() const {
+void Dwarf5AccelTableWriter::emitData() const {
   Asm->OutStreamer->emitLabel(EntryPool);
   for (auto &Bucket : Contents.getBuckets()) {
     for (auto *Hash : Bucket) {
       // Remember to emit the label for our offset.
       Asm->OutStreamer->emitLabel(Hash->Sym);
       for (const auto *Value : Hash->Values)
-        emitEntry(*static_cast<const DataT *>(Value));
+        emitEntry(*static_cast<const DWARF5AccelTableData *>(Value));
       Asm->OutStreamer->AddComment("End of list: " + Hash->Name.getString());
       Asm->emitInt8(0);
     }
   }
 }
 
-template <typename DataT>
-Dwarf5AccelTableWriter<DataT>::Dwarf5AccelTableWriter(
+Dwarf5AccelTableWriter::Dwarf5AccelTableWriter(
     AsmPrinter *Asm, const AccelTableBase &Contents,
     ArrayRef<std::variant<MCSymbol *, uint64_t>> CompUnits,
     ArrayRef<std::variant<MCSymbol *, uint64_t>> TypeUnits,
-    llvm::function_ref<
-        std::optional<DWARF5AccelTable::UnitIndexAndEncoding>(const DataT &)>
+    llvm::function_ref<std::optional<DWARF5AccelTable::UnitIndex>(
+        const DWARF5AccelTableData &)>
         getIndexForEntry,
     bool IsSplitDwarf)
     : AccelTableWriter(Asm, Contents, false),
@@ -570,7 +657,7 @@ Dwarf5AccelTableWriter<DataT>::Dwarf5AccelTableWriter(
   populateAbbrevsMap();
 }
 
-template <typename DataT> void Dwarf5AccelTableWriter<DataT>::emit() {
+void Dwarf5AccelTableWriter::emit() {
   Header.emit(*this);
   emitCUList();
   emitTUList();
@@ -631,20 +718,14 @@ void llvm::emitDWARF5AccelTable(
       Asm->getObjFileLowering().getDwarfDebugNamesSection());
 
   Contents.finalize(Asm, "names");
-  dwarf::Form CUIndexForm =
-      DIEInteger::BestForm(/*IsSigned*/ false, CompUnits.size() - 1);
-  dwarf::Form TUIndexForm =
-      DIEInteger::BestForm(/*IsSigned*/ false, TypeUnits.size() - 1);
-  Dwarf5AccelTableWriter<DWARF5AccelTableData>(
+  Dwarf5AccelTableWriter(
       Asm, Contents, CompUnits, TypeUnits,
       [&](const DWARF5AccelTableData &Entry)
-          -> std::optional<DWARF5AccelTable::UnitIndexAndEncoding> {
+          -> std::optional<DWARF5AccelTable::UnitIndex> {
         if (Entry.isTU())
-          return {{TUIndex[Entry.getUnitID()],
-                   {dwarf::DW_IDX_type_unit, TUIndexForm}}};
+          return {{TUIndex[Entry.getUnitID()], true /*IsType*/}};
         if (CUIndex.size() > 1)
-          return {{CUIndex[Entry.getUnitID()],
-                   {dwarf::DW_IDX_compile_unit, CUIndexForm}}};
+          return {{CUIndex[Entry.getUnitID()], false /*IsType*/}};
         return std::nullopt;
       },
       DD.useSplitDwarf())
@@ -662,13 +743,12 @@ void DWARF5AccelTable::addTypeUnitSignature(DwarfTypeUnit &U) {
 void llvm::emitDWARF5AccelTable(
     AsmPrinter *Asm, DWARF5AccelTable &Contents,
     ArrayRef<std::variant<MCSymbol *, uint64_t>> CUs,
-    llvm::function_ref<std::optional<DWARF5AccelTable::UnitIndexAndEncoding>(
+    llvm::function_ref<std::optional<DWARF5AccelTable::UnitIndex>(
         const DWARF5AccelTableData &)>
         getIndexForEntry) {
   std::vector<std::variant<MCSymbol *, uint64_t>> TypeUnits;
   Contents.finalize(Asm, "names");
-  Dwarf5AccelTableWriter<DWARF5AccelTableData>(Asm, Contents, CUs, TypeUnits,
-                                               getIndexForEntry, false)
+  Dwarf5AccelTableWriter(Asm, Contents, CUs, TypeUnits, getIndexForEntry, false)
       .emit();
 }
 
diff --git a/llvm/lib/DWARFLinker/DWARFStreamer.cpp b/llvm/lib/DWARFLinker/DWARFStreamer.cpp
index cd649c328ed930..80a215bb216cdf 100644
--- a/llvm/lib/DWARFLinker/DWARFStreamer.cpp
+++ b/llvm/lib/DWARFLinker/DWARFStreamer.cpp
@@ -306,18 +306,15 @@ void DwarfStreamer::emitDebugNames(DWARF5AccelTable &Table) {
   }
 
   Asm->OutStreamer->switchSection(MOFI->getDwarfDebugNamesSection());
-  dwarf::Form Form = DIEInteger::BestForm(/*IsSigned*/ false,
-                                          (uint64_t)UniqueIdToCuMap.size() - 1);
   /// llvm-dwarfutil doesn't support type units + .debug_names right now.
   // FIXME: add support for type units + .debug_names. For now the behavior is
   // unsuported.
   emitDWARF5AccelTable(
       Asm.get(), Table, CompUnits,
       [&](const DWARF5AccelTableData &Entry)
-          -> std::optional<DWARF5AccelTable::UnitIndexAndEncoding> {
+          -> std::optional<DWARF5AccelTable::UnitIndex> {
         if (UniqueIdToCuMap.size() > 1)
-          return {{UniqueIdToCuMap[Entry.getUnitID()],
-                   {dwarf::DW_IDX_compile_unit, Form}}};
+          return {{UniqueIdToCuMap[Entry.getUnitID()], false /*IsType*/}};
         return std::nullopt;
       });
 }
diff --git a/llvm/lib/DWARFLinkerParallel/DWARFEmitterImpl.cpp b/llvm/lib/DWARFLinkerParallel/DWARFEmitterImpl.cpp
index 355cfae3a64629..7ae7a5db3a9ddc 100644
--- a/llvm/lib/DWARFLinkerParallel/DWARFEmitterImpl.cpp
+++ b/llvm/lib/DWARFLinkerParallel/DWARFEmitterImpl.cpp
@@ -230,17 +230,14 @@ void DwarfEmitterImpl::emitDebugNames(DWARF5AccelTable &Table,
     return;
 
   Asm->OutStreamer->switchSection(MOFI->getDwarfDebugNamesSection());
-  dwarf::Form Form =
-      DIEInteger::BestForm(/*IsSigned*/ false, (uint64_t)CUidToIdx.size() - 1);
   // FIXME: add support for type units + .debug_names. For now the behavior is
   // unsuported.
   emitDWARF5AccelTable(
       Asm.get(), Table, CUOffsets,
       [&](const DWARF5AccelTableData &Entry)
-          -> std::optional<DWARF5AccelTable::UnitIndexAndEncoding> {
+          -> std::optional<DWARF5AccelTable::UnitIndex> {
         if (...
[truncated]

github-actions · 2024-01-02T16:08:30Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Prior to this commit, the choice of forms to use in debug_names abbreviations was made independently of entries themselves. For example, if we wanted entries to have a "IDX_compile_unit", all entries needed to use the same form (e.g data2) for such IDX, even if a specific entry could fit its IDX in a single byte (e.g. data1). This commit changes that by creating an `AbbreviationContents` data structure to handle the encoding of abbreviations on a per-entry basis. We don't need the full generality of abbreviations from the debug_info section -- abbreviations in debug_names are more limited --, but this commit is desirable for two reasons: 1. Upcoming patches may want to use different forms for different entries (there is ongoing work for IDX_parent that will benefit from this) 2. This allows for space savings by using smaller forms. A few tests had to be updated, as this commit changes the abbreviation number (and therefore the order in which abbreviations are printed). Notably, this _decreases_ the abbreviation number used, which may lead to smaller ULEB encodings. A future patch should just renumber these to start with 1.

felipepiovezan · 2024-01-02T20:50:39Z

I think I am going to put this on hold for now, I might be able to bend the existing implementation to support this without too much code.

[AsmPrinter][Dwarf5][nfc] Remove template from AccelTable class

acdf40c

This template is no longer used.

felipepiovezan requested a review from ayermolo January 2, 2024 16:06

llvmbot added the debuginfo label Jan 2, 2024

felipepiovezan requested review from dwblaikie and JDevlieghere January 2, 2024 16:06

felipepiovezan force-pushed the felipe/per-entry-abbrev branch from 3c9ea1e to 8636efe Compare January 2, 2024 16:10

felipepiovezan closed this Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DebugNames] Implement per-entry abbreviation support #76731

[DebugNames] Implement per-entry abbreviation support #76731

felipepiovezan commented Jan 2, 2024

llvmbot commented Jan 2, 2024

github-actions bot commented Jan 2, 2024 •

edited

felipepiovezan commented Jan 2, 2024

[DebugNames] Implement per-entry abbreviation support #76731

[DebugNames] Implement per-entry abbreviation support #76731

Conversation

felipepiovezan commented Jan 2, 2024

llvmbot commented Jan 2, 2024

github-actions bot commented Jan 2, 2024 • edited

felipepiovezan commented Jan 2, 2024

github-actions bot commented Jan 2, 2024 •

edited