Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[memprof] Add call stack IDs to IndexedAllocationInfo #85888

Merged

Conversation

kazutakahirata
Copy link
Contributor

The indexed MemProf file has a huge amount of redundancy. In a large
internal application, 82% of call stacks, stored in
IndexedAllocationInfo::CallStack, are duplicates.

We should work toward deduplicating call stacks by referring to them
with unique IDs with actual call stacks stored in a separate data
structure, much like we refer to memprof::Frame with memprof::FrameId.

At the same time, we need to facilitate a graceful transition from the
current version of the MemProf format to the next. We should be able
to read (but not write) the current version of the MemProf file even
after we move onto the next one.

With those goals in mind, I propose to have an integer ID next to
CallStack in IndexedAllocationInfo to refer to a call stack in a
succinct manner. We'll gradually increase the areas of the compiler
where IDs and call stacks have one-to-one correspondence and
eventually remove the existing CallStack field.

This patch adds call stack ID, named CSId, to IndexedAllocationInfo
and teaches the raw profile reader to compute unique call stack IDs
and store them in the new field. It does not introduce any user of
the call stack IDs yet, except in verifyFunctionProfileData.

The indexed MemProf file has a huge amount of redundancy.  In a large
internal application, 82% of call stacks, stored in
IndexedAllocationInfo::CallStack, are duplicates.

We should work toward deduplicating call stacks by referring to them
with unique IDs with actual call stacks stored in a separate data
structure, much like we refer to memprof::Frame with memprof::FrameId.

At the same time, we need to facilitate a graceful transition from the
current version of the MemProf format to the next.  We should be able
to read (but not write) the current version of the MemProf file even
after we move onto the next one.

With those goals in mind, I propose to have an integer ID next to
CallStack in IndexedAllocationInfo to refer to a call stack in a
succinct manner.  We'll gradually increase the areas of the compiler
where IDs and call stacks have one-to-one correspondence and
eventually remove the existing CallStack field.

This patch adds call stack ID, named CSId, to IndexedAllocationInfo
and teaches the raw profile reader to compute unique call stack IDs
and store them in the new field.  It does not introduce any user of
the call stack IDs yet, except in verifyFunctionProfileData.
@llvmbot llvmbot added the PGO Profile Guided Optimizations label Mar 20, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 20, 2024

@llvm/pr-subscribers-pgo

Author: Kazu Hirata (kazutakahirata)

Changes

The indexed MemProf file has a huge amount of redundancy. In a large
internal application, 82% of call stacks, stored in
IndexedAllocationInfo::CallStack, are duplicates.

We should work toward deduplicating call stacks by referring to them
with unique IDs with actual call stacks stored in a separate data
structure, much like we refer to memprof::Frame with memprof::FrameId.

At the same time, we need to facilitate a graceful transition from the
current version of the MemProf format to the next. We should be able
to read (but not write) the current version of the MemProf file even
after we move onto the next one.

With those goals in mind, I propose to have an integer ID next to
CallStack in IndexedAllocationInfo to refer to a call stack in a
succinct manner. We'll gradually increase the areas of the compiler
where IDs and call stacks have one-to-one correspondence and
eventually remove the existing CallStack field.

This patch adds call stack ID, named CSId, to IndexedAllocationInfo
and teaches the raw profile reader to compute unique call stack IDs
and store them in the new field. It does not introduce any user of
the call stack IDs yet, except in verifyFunctionProfileData.


Full diff: https://github.com/llvm/llvm-project/pull/85888.diff

6 Files Affected:

  • (modified) llvm/include/llvm/ProfileData/MemProf.h (+17-2)
  • (modified) llvm/include/llvm/ProfileData/RawMemProfReader.h (+2)
  • (modified) llvm/lib/ProfileData/MemProf.cpp (+29)
  • (modified) llvm/lib/ProfileData/RawMemProfReader.cpp (+14-1)
  • (modified) llvm/unittests/ProfileData/InstrProfTest.cpp (+1-1)
  • (modified) llvm/unittests/ProfileData/MemProfTest.cpp (+3-2)
diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index 37c19094bc2a63..bc8601733f6d26 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -1,6 +1,7 @@
 #ifndef LLVM_PROFILEDATA_MEMPROF_H_
 #define LLVM_PROFILEDATA_MEMPROF_H_
 
+#include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/STLFunctionalExtras.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/IR/GlobalValue.h"
@@ -252,18 +253,26 @@ struct Frame {
   }
 };
 
+// A type representing the index into the table of call stacks.
+using CallStackId = uint64_t;
+
 // Holds allocation information in a space efficient format where frames are
 // represented using unique identifiers.
 struct IndexedAllocationInfo {
   // The dynamic calling context for the allocation in bottom-up (leaf-to-root)
   // order. Frame contents are stored out-of-line.
+  // TODO: Remove once we fully transition to CSId.
   llvm::SmallVector<FrameId> CallStack;
+  // Conceptually the same as above.  We are going to keep both CallStack and
+  // CallStackId while we are transitioning from CallStack to CallStackId.
+  CallStackId CSId = 0;
   // The statistics obtained from the runtime for the allocation.
   PortableMemInfoBlock Info;
 
   IndexedAllocationInfo() = default;
-  IndexedAllocationInfo(ArrayRef<FrameId> CS, const MemInfoBlock &MB)
-      : CallStack(CS.begin(), CS.end()), Info(MB) {}
+  IndexedAllocationInfo(ArrayRef<FrameId> CS, CallStackId CSId,
+                        const MemInfoBlock &MB)
+      : CallStack(CS.begin(), CS.end()), CSId(CSId), Info(MB) {}
 
   // Returns the size in bytes when this allocation info struct is serialized.
   size_t serializedSize() const {
@@ -622,6 +631,12 @@ class FrameLookupTrait {
     return Frame::deserialize(D);
   }
 };
+
+// Verify that the set of CallStackIds and the set of call stacks have
+// one-to-one correspondence.
+void verifyFunctionProfileData(
+    const llvm::MapVector<GlobalValue::GUID, IndexedMemProfRecord>
+        &FunctionProfileData);
 } // namespace memprof
 } // namespace llvm
 
diff --git a/llvm/include/llvm/ProfileData/RawMemProfReader.h b/llvm/include/llvm/ProfileData/RawMemProfReader.h
index 6aa5caec65f791..f84d13c48b4850 100644
--- a/llvm/include/llvm/ProfileData/RawMemProfReader.h
+++ b/llvm/include/llvm/ProfileData/RawMemProfReader.h
@@ -97,6 +97,8 @@ class MemProfReader {
   }
   // A mapping from FrameId (a hash of the contents) to the frame.
   llvm::DenseMap<FrameId, Frame> IdToFrame;
+  // A vector of all unique call stacks, indexed by CallStackId.
+  llvm::SmallVector<llvm::SmallVector<FrameId>> CallStacks;
   // A mapping from function GUID, hash of the canonical function symbol to the
   // memprof profile data for that function, i.e allocation and callsite info.
   llvm::MapVector<GlobalValue::GUID, IndexedMemProfRecord> FunctionProfileData;
diff --git a/llvm/lib/ProfileData/MemProf.cpp b/llvm/lib/ProfileData/MemProf.cpp
index 0461f0e9f84078..0c9c5d3dad086d 100644
--- a/llvm/lib/ProfileData/MemProf.cpp
+++ b/llvm/lib/ProfileData/MemProf.cpp
@@ -117,5 +117,34 @@ Expected<MemProfSchema> readMemProfSchema(const unsigned char *&Buffer) {
   return Result;
 }
 
+// Verify that the set of CallStackIds and the set of call stacks have
+// one-to-one correspondence.  This function is intended to help transition from
+// CallStack to CSId in IndexedAllocationInfo.
+void verifyFunctionProfileData(
+    const llvm::MapVector<GlobalValue::GUID, IndexedMemProfRecord>
+        &FunctionProfileData) {
+  DenseMap<CallStackId, SmallVector<FrameId>> CSIdToCS;
+  std::map<llvm::SmallVector<FrameId>, CallStackId> CSToCSId;
+  for (const auto &[GUID, Record] : FunctionProfileData) {
+    (void)GUID;
+    for (const auto &AS : Record.AllocSites) {
+      auto Result = CSToCSId.insert({AS.CallStack, AS.CSId});
+      if (!Result.second) {
+        assert(Result.first->second == AS.CSId);
+      }
+      auto Result2 = CSIdToCS.insert({AS.CSId, AS.CallStack});
+      if (!Result2.second) {
+        const auto &Other = Result2.first->second;
+        assert(Other.size() == AS.CallStack.size());
+        (void)Other;
+        for (size_t I = 0, E = AS.CallStack.size(); I != E; ++I) {
+          assert(Other[I] == AS.CallStack[I]);
+        }
+      }
+    }
+  }
+  assert(CSIdToCS.size() == CSToCSId.size());
+}
+
 } // namespace memprof
 } // namespace llvm
diff --git a/llvm/lib/ProfileData/RawMemProfReader.cpp b/llvm/lib/ProfileData/RawMemProfReader.cpp
index 0e2b8668bab72c..43b498287cd375 100644
--- a/llvm/lib/ProfileData/RawMemProfReader.cpp
+++ b/llvm/lib/ProfileData/RawMemProfReader.cpp
@@ -402,6 +402,9 @@ Error RawMemProfReader::mapRawProfileToRecords() {
   llvm::MapVector<GlobalValue::GUID, llvm::SetVector<LocationPtr>>
       PerFunctionCallSites;
 
+  // Hold a mapping from callstack to its CallStackID.
+  std::map<llvm::SmallVector<FrameId>, CallStackId> CallStackToCallStackId;
+
   // Convert the raw profile callstack data into memprof records. While doing so
   // keep track of related contexts so that we can fill these in later.
   for (const auto &Entry : CallstackProfileData) {
@@ -445,6 +448,12 @@ Error RawMemProfReader::mapRawProfileToRecords() {
       Callstack.append(Frames.begin(), Frames.end());
     }
 
+    auto InsertResult =
+        CallStackToCallStackId.insert({Callstack, CallStacks.size()});
+    if (InsertResult.second)
+      CallStacks.push_back(Callstack);
+    CallStackId CSId = InsertResult.first->second;
+
     // We attach the memprof record to each function bottom-up including the
     // first non-inline frame.
     for (size_t I = 0; /*Break out using the condition below*/; I++) {
@@ -452,7 +461,7 @@ Error RawMemProfReader::mapRawProfileToRecords() {
       auto Result =
           FunctionProfileData.insert({F.Function, IndexedMemProfRecord()});
       IndexedMemProfRecord &Record = Result.first->second;
-      Record.AllocSites.emplace_back(Callstack, Entry.second);
+      Record.AllocSites.emplace_back(Callstack, CSId, Entry.second);
 
       if (!F.IsInlineFrame)
         break;
@@ -470,6 +479,10 @@ Error RawMemProfReader::mapRawProfileToRecords() {
     }
   }
 
+#ifdef EXPENSIVE_CHECKS
+  verifyFunctionProfileData(FunctionProfileData);
+#endif
+
   return Error::success();
 }
 
diff --git a/llvm/unittests/ProfileData/InstrProfTest.cpp b/llvm/unittests/ProfileData/InstrProfTest.cpp
index cd4552a039b36b..aac2f53565cbdd 100644
--- a/llvm/unittests/ProfileData/InstrProfTest.cpp
+++ b/llvm/unittests/ProfileData/InstrProfTest.cpp
@@ -366,7 +366,7 @@ IndexedMemProfRecord makeRecord(
     const MemInfoBlock &Block = MemInfoBlock()) {
   llvm::memprof::IndexedMemProfRecord MR;
   for (const auto &Frames : AllocFrames)
-    MR.AllocSites.emplace_back(Frames, Block);
+    MR.AllocSites.emplace_back(Frames, 0, Block);
   for (const auto &Frames : CallSiteFrames)
     MR.CallSites.push_back(Frames);
   return MR;
diff --git a/llvm/unittests/ProfileData/MemProfTest.cpp b/llvm/unittests/ProfileData/MemProfTest.cpp
index f5e4a4aff2ed17..3aca09ed36e3f6 100644
--- a/llvm/unittests/ProfileData/MemProfTest.cpp
+++ b/llvm/unittests/ProfileData/MemProfTest.cpp
@@ -280,7 +280,7 @@ TEST(MemProf, RecordSerializationRoundTrip) {
   IndexedMemProfRecord Record;
   for (const auto &ACS : AllocCallStacks) {
     // Use the same info block for both allocation sites.
-    Record.AllocSites.emplace_back(ACS, Info);
+    Record.AllocSites.emplace_back(ACS, 0, Info);
   }
   Record.CallSites.assign(CallSites);
 
@@ -376,7 +376,8 @@ TEST(MemProf, BaseMemProfReader) {
   Block.AllocCount = 1U, Block.TotalAccessDensity = 4,
   Block.TotalLifetime = 200001;
   std::array<FrameId, 2> CallStack{F1.hash(), F2.hash()};
-  FakeRecord.AllocSites.emplace_back(/*CS=*/CallStack, /*MB=*/Block);
+  FakeRecord.AllocSites.emplace_back(/*CS=*/CallStack, /*CSId=*/0,
+                                     /*MB=*/Block);
   ProfData.insert({F1.hash(), FakeRecord});
 
   MemProfReader Reader(FrameIdMap, ProfData);

@@ -445,14 +448,20 @@ Error RawMemProfReader::mapRawProfileToRecords() {
Callstack.append(Frames.begin(), Frames.end());
}

auto InsertResult =
CallStackToCallStackId.insert({Callstack, CallStacks.size()});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, it might be better to use a content hash (I.e. hash of the frame IDs) as the call stack ID. You can refactor and reuse the frame ID hashing implementation here [1] which is suitable for IDs meant to be persisted to disk. Factoring it out would also mean we can reuse it for @teresajohnson's use case of tagging callstacks with statistics.

[1]

inline FrameId hash() const {
auto HashCombine = [](auto Value, size_t Seed) {
std::hash<decltype(Value)> Hasher;
// The constant used below is the 64 bit representation of the fractional
// part of the golden ratio. Used here for the randomness in their bit
// pattern.
return Hasher(Value) + 0x9e3779b97f4a7c15 + (Seed << 6) + (Seed >> 2);
};
size_t Result = 0;
Result ^= HashCombine(Function, Result);
Result ^= HashCombine(LineOffset, Result);
Result ^= HashCombine(Column, Result);
Result ^= HashCombine(IsInlineFrame, Result);
return static_cast<FrameId>(Result);
}
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just switched to the content hash for call stacks in the latest iteration. Since computing a hash value requires no data structure to speak of (unlike the sequential CallStackId), I've updated the tests to call hashCallStack instead of the placeholder 0.

For now, I am using the same hash function as in computeStackId in MemProfiler.cpp, but I can switch to std::hash also.

@@ -97,6 +97,8 @@ class MemProfReader {
}
// A mapping from FrameId (a hash of the contents) to the frame.
llvm::DenseMap<FrameId, Frame> IdToFrame;
// A vector of all unique call stacks, indexed by CallStackId.
llvm::SmallVector<llvm::SmallVector<FrameId>> CallStacks;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unused and should be deleted - I think this was from the earlier solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Deleted in the latest iteration.

CallStackId hashCallStack(ArrayRef<FrameId> CS) {
llvm::HashBuilder<llvm::TruncatedBLAKE3<8>, llvm::endianness::little>
HashBuilder;
for (FrameId F : CS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to move the FrameId to also use BLAKE3? (Doesn't need to be in this patch, but wondering if we can do that so we get more consistency with the compiler's hashes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thinking about that, but yes, we could move the FrameID to also use BLAKE3.

Copy link

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link

✅ With the latest revision this PR passed the Python code formatter.

@kazutakahirata kazutakahirata merged commit 74799f4 into llvm:main Mar 24, 2024
4 checks passed
@kazutakahirata kazutakahirata deleted the pr_memprof_callstacks_coexist_prep branch March 24, 2024 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants