Skip to content

Conversation

@Nerixyz
Copy link
Contributor

@Nerixyz Nerixyz commented Nov 3, 2025

When anonymous unions are used in a struct or vice versa, their fields are merged into the parent record when using PDB. LLDB tries to recreate the original definition of the record with the anonymous unions/structs.

For tagged unions (like std::optional) where the tag followed the anonymous union, the result was suboptimal:

// input:
struct Foo {
  union {
    Bar b;
    char c;
  };
  bool tag;
};

// reconstructed:
struct Foo {
  union {
    Bar b;
    struct {
      char c;
      bool tag;
    };
  };
};

Once the algorithm is in some nested union, it can't get out.

In the above case, we can get to the correct reconstructed record if we always add fields that don't overlap others in the root struct. So when we see tag, we'll see that it comes after all other fields, so it's possible to add it in the root Foo.

@Nerixyz Nerixyz requested a review from ZequanWu November 3, 2025 21:58
@Nerixyz Nerixyz requested a review from JDevlieghere as a code owner November 3, 2025 21:58
@llvmbot llvmbot added the lldb label Nov 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 3, 2025

@llvm/pr-subscribers-lldb

Author: nerix (Nerixyz)

Changes

When anonymous unions are used in a struct or vice versa, their fields are merged into the parent record when using PDB. LLDB tries to recreate the original definition of the record with the anonymous unions/structs.

For tagged unions (like std::optional) where the tag followed the anonymous union, the result was suboptimal:

// input:
struct Foo {
  union {
    Bar b;
    char c;
  };
  bool tag;
};

// reconstructed:
struct Foo {
  union {
    Bar b;
    struct {
      char c;
      bool tag;
    };
  };
};

Once the algorithm is in some nested union, it can't get out.

In the above case, we can get to the correct reconstructed record if we always add fields that don't overlap others in the root struct. So when we see tag, we'll see that it comes after all other fields, so it's possible to add it in the root Foo.

I only did this for structs at the top level, because I assume that most uses of anonymous structs/unions will be shallow (most often only one level).


Full diff: https://github.com/llvm/llvm-project/pull/166243.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/SymbolFile/NativePDB/UdtRecordCompleter.cpp (+12-1)
  • (modified) lldb/unittests/SymbolFile/NativePDB/UdtRecordCompleterTests.cpp (+54-1)
diff --git a/lldb/source/Plugins/SymbolFile/NativePDB/UdtRecordCompleter.cpp b/lldb/source/Plugins/SymbolFile/NativePDB/UdtRecordCompleter.cpp
index 1c575e90bd72c..1458079483278 100644
--- a/lldb/source/Plugins/SymbolFile/NativePDB/UdtRecordCompleter.cpp
+++ b/lldb/source/Plugins/SymbolFile/NativePDB/UdtRecordCompleter.cpp
@@ -442,6 +442,10 @@ void UdtRecordCompleter::Record::ConstructRecord() {
 
   // The end offset to a vector of field/struct that ends at the offset.
   std::map<uint64_t, std::vector<Member *>> end_offset_map;
+  auto is_last_end_offset = [&](auto it) {
+    return it != end_offset_map.end() && ++it == end_offset_map.end();
+  };
+
   for (auto &pair : fields_map) {
     uint64_t offset = pair.first;
     auto &fields = pair.second;
@@ -463,7 +467,14 @@ void UdtRecordCompleter::Record::ConstructRecord() {
       if (iter->second.empty())
         continue;
       parent = iter->second.back();
-      iter->second.pop_back();
+
+      // For structs, if the new fields come after the already added ones
+      // without overlap, go back to the root struct.
+      if (parent != &record && iter->first <= offset &&
+          record.kind == Member::Struct && is_last_end_offset(iter))
+        parent = &record;
+      else
+        iter->second.pop_back();
     }
     // If it's a field, then the field is inside a union, so we can safely
     // increase its size by converting it to a struct to hold multiple fields.
diff --git a/lldb/unittests/SymbolFile/NativePDB/UdtRecordCompleterTests.cpp b/lldb/unittests/SymbolFile/NativePDB/UdtRecordCompleterTests.cpp
index 17284b61b9a6e..3432e03bbf853 100644
--- a/lldb/unittests/SymbolFile/NativePDB/UdtRecordCompleterTests.cpp
+++ b/lldb/unittests/SymbolFile/NativePDB/UdtRecordCompleterTests.cpp
@@ -99,7 +99,7 @@ Member *AddField(Member *member, StringRef name, uint64_t byte_offset,
       std::make_unique<Member>(name, byte_offset * 8, byte_size * 8,
                                clang::QualType(), lldb::eAccessPublic, 0);
   field->kind = kind;
-  field->base_offset = base_offset;
+  field->base_offset = base_offset * 8;
   member->fields.push_back(std::move(field));
   return member->fields.back().get();
 }
@@ -111,6 +111,9 @@ TEST_F(UdtRecordCompleterRecordTests, TestAnonymousUnionInStruct) {
   CollectMember("m2", 0, 4);
   CollectMember("m3", 0, 1);
   CollectMember("m4", 0, 8);
+  CollectMember("m5", 8, 8);
+  CollectMember("m6", 16, 4);
+  CollectMember("m7", 16, 8);
   ConstructRecord();
 
   // struct {
@@ -120,6 +123,11 @@ TEST_F(UdtRecordCompleterRecordTests, TestAnonymousUnionInStruct) {
   //       m3;
   //       m4;
   //   };
+  //   m5;
+  //   union {
+  //       m6;
+  //       m7;
+  //   };
   // };
   Record record;
   record.start_offset = 0;
@@ -128,6 +136,10 @@ TEST_F(UdtRecordCompleterRecordTests, TestAnonymousUnionInStruct) {
   AddField(u, "m2", 0, 4, Member::Field);
   AddField(u, "m3", 0, 1, Member::Field);
   AddField(u, "m4", 0, 8, Member::Field);
+  AddField(&record.record, "m5", 8, 8, Member::Field);
+  Member *u2 = AddField(&record.record, "", 16, 0, Member::Union);
+  AddField(u2, "m6", 16, 4, Member::Field);
+  AddField(u2, "m7", 16, 8, Member::Field);
   EXPECT_EQ(WrappedRecord(this->record), WrappedRecord(record));
 }
 
@@ -243,3 +255,44 @@ TEST_F(UdtRecordCompleterRecordTests, TestNestedUnionStructInUnion) {
   AddField(s2, "m4", 2, 4, Member::Field);
   EXPECT_EQ(WrappedRecord(this->record), WrappedRecord(record));
 }
+
+TEST_F(UdtRecordCompleterRecordTests, TestNestedUnionStructInUnion2) {
+  SetKind(Member::Kind::Union);
+  CollectMember("m1", 0, 4);
+  CollectMember("m2", 0, 2);
+  CollectMember("m3", 0, 2);
+  CollectMember("m4", 2, 4);
+  CollectMember("m5", 6, 2);
+  CollectMember("m6", 6, 2);
+  CollectMember("m7", 8, 2);
+  ConstructRecord();
+
+  // union {
+  //   m1;
+  //   m2;
+  //   struct {
+  //       m3;
+  //       m4;
+  //       union {
+  //           m5;
+  //           struct {
+  //               m6;
+  //               m7;
+  //           };
+  //       };
+  //   };
+  // };
+  Record record;
+  record.start_offset = 0;
+  AddField(&record.record, "m1", 0, 4, Member::Field);
+  AddField(&record.record, "m2", 0, 2, Member::Field);
+  Member *s1 = AddField(&record.record, "", 0, 0, Member::Struct);
+  AddField(s1, "m3", 0, 2, Member::Field);
+  AddField(s1, "m4", 2, 4, Member::Field);
+  Member *u1 = AddField(s1, "", 6, 0, Member::Union);
+  AddField(u1, "m5", 6, 2, Member::Field);
+  Member *s2 = AddField(u1, "", 0, 0, Member::Struct, 6);
+  AddField(s2, "m6", 6, 2, Member::Field);
+  AddField(s2, "m7", 8, 2, Member::Field);
+  EXPECT_EQ(WrappedRecord(this->record), WrappedRecord(record));
+}

}

TEST_F(UdtRecordCompleterRecordTests, TestNestedUnionStructInUnion2) {
SetKind(Member::Kind::Union);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this testing this change? This change only changed the behaviour when the root record is a struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's testing that we don't do this "going up one level" everywhere. In this test, m7 does not overlap m5, so it could be placed in the outer struct:

union {
  m1;
  m2;
  struct {
      m3;
      m4;
      union {
          m5;
          m6;
      };
      m7;
  };
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't putting m7 into the struct, where m3 and m4 defined, better than making m6 and m7 into a new struct since it's one less layer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to capture the current state with the test. But now that I think about it, we could do something similar for unions...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this.

Copy link
Contributor

@ZequanWu ZequanWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 476 to 478
lldbassert(record.kind == Member::Union &&
"Current record must be a union");
lldbassert(!record.fields.empty());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://lldb.llvm.org/resources/contributing.html#error-handling, we'd rather not add new uses of lldbassert. If these are proper preconditions, they can be converted to regular asserts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These asserts should fall under

An assertion should be placed to assert invariants that the developer is convinced will always hold, regardless what an end-user does with LLDB.

The first assert holds because we only ever get here for unions and structs. The fields should never be empty, because at this point, the offset is larger than the starting one / if the fields were empty, offset == start_offset.

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Nerixyz Nerixyz merged commit 3c162ba into llvm:main Nov 5, 2025
10 checks passed
@Nerixyz Nerixyz deleted the fix/lldb-npdb-tagged-anon-union branch November 7, 2025 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants