Skip to content

Conversation

Nerixyz
Copy link
Contributor

@Nerixyz Nerixyz commented Sep 7, 2025

This adds a method on the PublicsStream to look up symbols using their address (segment + offset).
It's largely a reimplementation of NearestSym from the reference. However, we don't return the nearest symbol, but the exact symbol.
Still, in case of ICF, we return the symbol that's first in the address map. Users can then use the returned offset to read the next records to check if multiple symbols overlap, if desired.

From #149701.

@llvmbot
Copy link
Member

llvmbot commented Sep 7, 2025

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-platform-windows

Author: nerix (Nerixyz)

Changes

This adds a method on the PublicsStream to look up symbols using their address (segment + offset).
It's largely a reimplementation of NearestSym from the reference. However, we don't return the nearest symbol, but the exact symbol.
Still, in case of ICF, we return the symbol that's first in the address map. Users can then use the returned offset to read the next records to check if multiple symbols overlap, if desired.

From #149701.


Full diff: https://github.com/llvm/llvm-project/pull/157361.diff

6 Files Affected:

  • (modified) llvm/include/llvm/DebugInfo/PDB/Native/PublicsStream.h (+18)
  • (modified) llvm/lib/DebugInfo/PDB/Native/PublicsStream.cpp (+91)
  • (modified) llvm/unittests/DebugInfo/PDB/CMakeLists.txt (+1)
  • (added) llvm/unittests/DebugInfo/PDB/Inputs/PublicSymbols.cpp (+46)
  • (added) llvm/unittests/DebugInfo/PDB/Inputs/PublicSymbols.pdb ()
  • (added) llvm/unittests/DebugInfo/PDB/PublicsStreamTest.cpp (+61)
diff --git a/llvm/include/llvm/DebugInfo/PDB/Native/PublicsStream.h b/llvm/include/llvm/DebugInfo/PDB/Native/PublicsStream.h
index 2cb4bee8ca5df..c5fdad057e867 100644
--- a/llvm/include/llvm/DebugInfo/PDB/Native/PublicsStream.h
+++ b/llvm/include/llvm/DebugInfo/PDB/Native/PublicsStream.h
@@ -18,9 +18,13 @@ namespace llvm {
 namespace msf {
 class MappedBlockStream;
 }
+namespace codeview {
+class PublicSym32;
+}
 namespace pdb {
 struct PublicsStreamHeader;
 struct SectionOffset;
+class SymbolStream;
 
 class PublicsStream {
 public:
@@ -42,6 +46,20 @@ class PublicsStream {
     return SectionOffsets;
   }
 
+  /// Find a public symbol by a segment and offset.
+  ///
+  /// In case there is more than one symbol (for example due to ICF), the first
+  /// one is returned.
+  ///
+  /// \return If a symbol was found, the symbol at the provided address is
+  ///     returned as well as the index of this symbol in the address map. If
+  ///     the binary was linked with ICF, there might be more symbols with the
+  ///     same address after the returned one. If no symbol is found,
+  ///     `std::nullopt` is returned.
+  LLVM_ABI std::optional<std::pair<codeview::PublicSym32, size_t>>
+  findByAddress(const SymbolStream &Symbols, uint16_t Segment,
+                uint32_t Offset) const;
+
 private:
   std::unique_ptr<msf::MappedBlockStream> Stream;
   GSIHashTable PublicsTable;
diff --git a/llvm/lib/DebugInfo/PDB/Native/PublicsStream.cpp b/llvm/lib/DebugInfo/PDB/Native/PublicsStream.cpp
index c350e0e0b3e19..984e6e70adba2 100644
--- a/llvm/lib/DebugInfo/PDB/Native/PublicsStream.cpp
+++ b/llvm/lib/DebugInfo/PDB/Native/PublicsStream.cpp
@@ -22,9 +22,12 @@
 //===----------------------------------------------------------------------===//
 
 #include "llvm/DebugInfo/PDB/Native/PublicsStream.h"
+#include "llvm/DebugInfo/CodeView/SymbolDeserializer.h"
+#include "llvm/DebugInfo/CodeView/SymbolRecord.h"
 #include "llvm/DebugInfo/MSF/MappedBlockStream.h"
 #include "llvm/DebugInfo/PDB/Native/RawError.h"
 #include "llvm/DebugInfo/PDB/Native/RawTypes.h"
+#include "llvm/DebugInfo/PDB/Native/SymbolStream.h"
 #include "llvm/Support/BinaryStreamReader.h"
 #include "llvm/Support/Error.h"
 #include <cstdint>
@@ -96,3 +99,91 @@ Error PublicsStream::reload() {
                                 "Corrupted publics stream.");
   return Error::success();
 }
+
+static uint32_t compareSegmentOffset(uint16_t LhsSegment, uint32_t LhsOffset,
+                                     uint16_t RhsSegment, uint32_t RhsOffset) {
+  if (LhsSegment == RhsSegment)
+    return LhsOffset - RhsOffset;
+  return LhsSegment - RhsSegment;
+}
+
+static uint32_t compareSegmentOffset(uint16_t LhsSegment, uint32_t LhsOffst,
+                                     const codeview::PublicSym32 &Rhs) {
+  return compareSegmentOffset(LhsSegment, LhsOffst, Rhs.Segment, Rhs.Offset);
+}
+
+// This is a reimplementation of NearestSym:
+// https://github.com/microsoft/microsoft-pdb/blob/805655a28bd8198004be2ac27e6e0290121a5e89/PDB/dbi/gsi.cpp#L1492-L1581
+std::optional<std::pair<codeview::PublicSym32, size_t>>
+PublicsStream::findByAddress(const SymbolStream &Symbols, uint16_t Segment,
+                             uint32_t Offset) const {
+  // The address map is sorted by address, so we do binary search.
+  // Each element is an offset into the symbols for a public symbol.
+  auto Lo = AddressMap.begin();
+  auto Hi = AddressMap.end();
+  Hi -= 1;
+
+  while (Lo < Hi) {
+    auto Cur = Lo + ((Hi - Lo + 1) / 2);
+    auto Sym = Symbols.readRecord(Cur->value());
+    if (Sym.kind() != codeview::S_PUB32)
+      return std::nullopt; // this is most likely corrupted debug info
+
+    auto Psym =
+        codeview::SymbolDeserializer::deserializeAs<codeview::PublicSym32>(Sym);
+    if (!Psym) {
+      consumeError(Psym.takeError());
+      return std::nullopt;
+    }
+
+    uint32_t Cmp = compareSegmentOffset(Segment, Offset, *Psym);
+    if (Cmp < 0) {
+      Cur -= 1;
+      Hi = Cur;
+    } else if (Cmp == 0)
+      Lo = Hi = Cur;
+    else
+      Lo = Cur;
+  }
+
+  auto Sym = Symbols.readRecord(Lo->value());
+  if (Sym.kind() != codeview::S_PUB32)
+    return std::nullopt; // this is most likely corrupted debug info
+
+  auto MaybePsym =
+      codeview::SymbolDeserializer::deserializeAs<codeview::PublicSym32>(Sym);
+  if (!MaybePsym) {
+    consumeError(MaybePsym.takeError());
+    return std::nullopt;
+  }
+  codeview::PublicSym32 Psym = std::move(*MaybePsym);
+
+  uint32_t Cmp = compareSegmentOffset(Segment, Offset, Psym);
+  if (Cmp != 0)
+    return std::nullopt;
+
+  // We found a symbol. Due to ICF, multiple symbols can have the same
+  // address, so return the first one
+  while (Lo != AddressMap.begin()) {
+    --Lo;
+    Sym = Symbols.readRecord(Lo->value());
+    if (Sym.kind() != codeview::S_PUB32)
+      return std::nullopt;
+    MaybePsym =
+        codeview::SymbolDeserializer::deserializeAs<codeview::PublicSym32>(Sym);
+    if (!MaybePsym) {
+      consumeError(MaybePsym.takeError());
+      return std::nullopt;
+    }
+
+    if (MaybePsym->Segment != Segment || MaybePsym->Offset != Offset) {
+      ++Lo;
+      break;
+    }
+
+    Psym = std::move(*MaybePsym);
+  }
+
+  std::ptrdiff_t IterOffset = Lo - AddressMap.begin();
+  return std::pair{Psym, static_cast<size_t>(IterOffset)};
+}
diff --git a/llvm/unittests/DebugInfo/PDB/CMakeLists.txt b/llvm/unittests/DebugInfo/PDB/CMakeLists.txt
index ba2a732848f4d..b1b9d2d98c944 100644
--- a/llvm/unittests/DebugInfo/PDB/CMakeLists.txt
+++ b/llvm/unittests/DebugInfo/PDB/CMakeLists.txt
@@ -11,6 +11,7 @@ add_llvm_unittest_with_input_files(DebugInfoPDBTests
   StringTableBuilderTest.cpp
   PDBApiTest.cpp
   PDBVariantTest.cpp
+  PublicsStreamTest.cpp
   )
 
 target_link_libraries(DebugInfoPDBTests PRIVATE LLVMTestingSupport)
diff --git a/llvm/unittests/DebugInfo/PDB/Inputs/PublicSymbols.cpp b/llvm/unittests/DebugInfo/PDB/Inputs/PublicSymbols.cpp
new file mode 100644
index 0000000000000..0aeab04543caf
--- /dev/null
+++ b/llvm/unittests/DebugInfo/PDB/Inputs/PublicSymbols.cpp
@@ -0,0 +1,46 @@
+// clang-format off
+
+// Compile with
+// cl /Z7 /GR- /GS- PublicSymbols.cpp -c /Gy
+// link .\PublicSymbols.obj /DEBUG /NODEFAULTLIB /out:PublicSymbols.exe /ENTRY:main /OPT:ICF
+// llvm-pdbutil pdb2yaml --publics-stream PublicSymbols.pdb > PublicSymbols.yaml
+// llvm-pdbutil yaml2pdb PublicSymbols.yaml
+// 
+// rm PublicSymbols.exe && rm PublicSymbols.obj && rm PublicSymbols.yaml
+
+int foobar(int i){ return i + 1; }
+// these should be merged with ICF
+int dup1(int i){ return i + 2; }
+int dup2(int i){ return i + 2; }
+int dup3(int i){ return i + 2; }
+
+class AClass {
+public:
+    void AMethod(int, char*) {}
+    static bool Something(char c) {
+        return c == ' ';
+    }
+};
+
+struct Base {
+    virtual ~Base() = default;
+};
+struct Derived : public Base {};
+struct Derived2 : public Base {};
+struct Derived3 : public Derived2, public Derived {};
+
+int AGlobal;
+
+void operator delete(void *,unsigned __int64) {}
+
+int main() {
+    foobar(1);
+    dup1(1);
+    dup2(1);
+    dup3(1);
+    AClass a;
+    a.AMethod(1, nullptr);
+    AClass::Something(' ');
+    Derived3 d3;
+    return AGlobal;
+}
diff --git a/llvm/unittests/DebugInfo/PDB/Inputs/PublicSymbols.pdb b/llvm/unittests/DebugInfo/PDB/Inputs/PublicSymbols.pdb
new file mode 100644
index 0000000000000..ffa3275d58d7b
Binary files /dev/null and b/llvm/unittests/DebugInfo/PDB/Inputs/PublicSymbols.pdb differ
diff --git a/llvm/unittests/DebugInfo/PDB/PublicsStreamTest.cpp b/llvm/unittests/DebugInfo/PDB/PublicsStreamTest.cpp
new file mode 100644
index 0000000000000..708b42a3c51b1
--- /dev/null
+++ b/llvm/unittests/DebugInfo/PDB/PublicsStreamTest.cpp
@@ -0,0 +1,61 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/DebugInfo/PDB/Native/PublicsStream.h"
+#include "llvm/DebugInfo/CodeView/SymbolRecord.h"
+#include "llvm/DebugInfo/PDB/Native/PDBFile.h"
+#include "llvm/Support/BinaryByteStream.h"
+#include "llvm/Support/MemoryBuffer.h"
+
+#include "llvm/Testing/Support/SupportHelpers.h"
+
+#include "gtest/gtest.h"
+
+using namespace llvm;
+using namespace llvm::pdb;
+
+extern const char *TestMainArgv0;
+
+static std::string getExePath() {
+  SmallString<128> InputsDir = unittest::getInputFileDirectory(TestMainArgv0);
+  llvm::sys::path::append(InputsDir, "PublicSymbols.pdb");
+  return std::string(InputsDir);
+}
+
+TEST(PublicsStreamTest, FindByAddress) {
+  std::string ExePath = getExePath();
+  auto Buffer = MemoryBuffer::getFile(ExePath, /*IsText=*/false,
+                                  /*RequiresNullTerminator=*/false);
+  ASSERT_TRUE(bool(Buffer));
+  auto Stream = std::make_unique<MemoryBufferByteStream>(
+      std::move(*Buffer), llvm::endianness::little);
+
+  BumpPtrAllocator Alloc;
+  PDBFile File(ExePath, std::move(Stream), Alloc);
+  ASSERT_FALSE(bool(File.parseFileHeaders()));
+  ASSERT_FALSE(bool(File.parseStreamData()));
+
+  auto Publics = File.getPDBPublicsStream();
+  ASSERT_TRUE(bool(Publics));
+  auto Symbols = File.getPDBSymbolStream();
+  ASSERT_TRUE(bool(Symbols));
+
+  auto VTableDerived = Publics->findByAddress(*Symbols, 2, 8);
+  ASSERT_TRUE(VTableDerived.has_value());
+  // both derived and derived2 have their vftables there - but derived2 is first (due to ICF)
+  ASSERT_EQ(VTableDerived->first.Name, "??_7Derived2@@6B@");
+  ASSERT_EQ(VTableDerived->second, 26);
+
+  ASSERT_FALSE(Publics->findByAddress(*Symbols, 2, 7).has_value());
+  ASSERT_FALSE(Publics->findByAddress(*Symbols, 2, 9).has_value());
+
+  auto GlobalSym = Publics->findByAddress(*Symbols, 3, 0);
+  ASSERT_TRUE(GlobalSym.has_value());
+  ASSERT_EQ(GlobalSym->first.Name, "?AGlobal@@3HA");
+  ASSERT_EQ(GlobalSym->second, 30);
+}

Copy link

github-actions bot commented Sep 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Nerixyz Nerixyz force-pushed the feat/llvm-pdb-pubsym branch from d76099b to 76a14bb Compare September 7, 2025 20:07
@Nerixyz Nerixyz requested a review from rnk September 7, 2025 20:32
[&](support::ulittle32_t Cur, auto Addr) {
auto Sym = Symbols.readRecord(Cur.value());
if (Sym.kind() != codeview::S_PUB32)
return false; // stop here, this is most likely corrupted debug info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to craft a small test that can exercice this codepath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check. I'm not sure if it's fine to just modify the backing data there, but the test works.

Copy link
Member

@aganea aganea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the changes!

@Nerixyz Nerixyz merged commit c82d09c into llvm:main Sep 10, 2025
9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 10, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-lld-multistage-test running on ppc64le-lld-multistage-test while building llvm at step 7 "test-build-stage1-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/16070

Here is the relevant piece of the build log for the reference
Step 7 (test-build-stage1-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: tools/llvm-exegesis/RISCV/rvv/filter.test' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=inverse_throughput --opcode-name=PseudoVNCLIPU_WX_M1_MASK     --riscv-filter-config='vtype = {VXRM: rod, AVL: VLMAX, SEW: e(8|16), Policy: ta/mu}' --max-configs-per-opcode=1000 --min-instructions=10 | /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/tools/llvm-exegesis/RISCV/rvv/filter.test # RUN: at line 1
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=inverse_throughput --opcode-name=PseudoVNCLIPU_WX_M1_MASK '--riscv-filter-config=vtype = {VXRM: rod, AVL: VLMAX, SEW: e(8|16), Policy: ta/mu}' --max-configs-per-opcode=1000 --min-instructions=10
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/tools/llvm-exegesis/RISCV/rvv/filter.test
PseudoVNCLIPU_WX_M1_MASK: Failed to produce any snippet via: instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, for uses, one unique register for each position
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/tools/llvm-exegesis/RISCV/rvv/filter.test

--

********************


@Nerixyz
Copy link
Contributor Author

Nerixyz commented Sep 10, 2025

This failure looks unrelated and in the next build, the test passed: https://lab.llvm.org/buildbot/#/builders/168/builds/16071

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants