Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sample Profile Loader] Fix potential invalidated reference #73181

Merged
merged 2 commits into from
Nov 29, 2023

Conversation

huangjd
Copy link
Contributor

@huangjd huangjd commented Nov 22, 2023

There is a potential issue in ProfiledCallGraph where pointers to ProfiledCallGraphNode are used to construct edges, while ProfiledCallGraphNode instances are being added to a hash map ProfiledFunctions simultaneously. If rehash happens, those pointers are invalidated, resulting in undefined behavior/crash. Previously (before md5phase2) ProfiledFunctions is a llvm::StringMap, which also have the same issue theoretically when rehashing but was not observed. This patch fixes this potential issue by using a backing buffer for ProrfiledCallGraphNode that does not relocate.

@huangjd huangjd added the PGO Profile Guided Optimizations label Nov 22, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 22, 2023

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-pgo

Author: William Junda Huang (huangjd)

Changes

There is a potential issue in ProfiledCallGraph where pointers to ProfiledCallGraphNode are used to construct edges, while ProfiledCallGraphNode instances are being added to a hash map ProfiledFunctions simultaneously. If rehash happens, those pointers are invalidated, resulting in undefined behavior/crash. Previously (before md5phase2) ProfiledFunctions is a llvm::StringMap, which also have the same issue theoretically when rehashing but was not observed. This patch fixes this potential issue by using a backing buffer for ProrfiledCallGraphNode that does not relocate.


Full diff: https://github.com/llvm/llvm-project/pull/73181.diff

1 Files Affected:

  • (modified) llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h (+12-8)
diff --git a/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h b/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
index 3f986caeb547287..3d844cc92aa0a8e 100644
--- a/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
+++ b/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
@@ -140,8 +140,11 @@ class ProfiledCallGraph {
     if (!ProfiledFunctions.count(Name)) {
       // Link to synthetic root to make sure every node is reachable
       // from root. This does not affect SCC order.
-      ProfiledFunctions[Name] = ProfiledCallGraphNode(Name);
-      Root.Edges.emplace(&Root, &ProfiledFunctions[Name], 0);
+      // Store the pointer of the node because the map can be rehashed.
+      auto &Node =
+          ProfiledCallGraphNodeList.emplace_back(ProfiledCallGraphNode(Name));
+      ProfiledFunctions[Name] = &Node;
+      Root.Edges.emplace(&Root, ProfiledFunctions[Name], 0);
     }
   }
 
@@ -152,9 +155,9 @@ class ProfiledCallGraph {
     auto CalleeIt = ProfiledFunctions.find(CalleeName);
     if (CalleeIt == ProfiledFunctions.end())
       return;
-    ProfiledCallGraphEdge Edge(&ProfiledFunctions[CallerName],
-                               &CalleeIt->second, Weight);
-    auto &Edges = ProfiledFunctions[CallerName].Edges;
+    ProfiledCallGraphEdge Edge(ProfiledFunctions[CallerName],
+                               CalleeIt->second, Weight);
+    auto &Edges = ProfiledFunctions[CallerName]->Edges;
     auto EdgeIt = Edges.find(Edge);
     if (EdgeIt == Edges.end()) {
       Edges.insert(Edge);
@@ -193,7 +196,7 @@ class ProfiledCallGraph {
       return;
 
     for (auto &Node : ProfiledFunctions) {
-      auto &Edges = Node.second.Edges;
+      auto &Edges = Node.second->Edges;
       auto I = Edges.begin();
       while (I != Edges.end()) {
         if (I->Weight <= Threshold)
@@ -205,8 +208,9 @@ class ProfiledCallGraph {
   }
 
   ProfiledCallGraphNode Root;
-  HashKeyMap<std::unordered_map, FunctionId, ProfiledCallGraphNode>
-      ProfiledFunctions;
+  // backing buffer for ProfiledCallGraphNodes.
+  std::list<ProfiledCallGraphNode> ProfiledCallGraphNodeList;
+  llvm::DenseMap<FunctionId, ProfiledCallGraphNode*> ProfiledFunctions;
 };
 
 } // end namespace sampleprof

Copy link

github-actions bot commented Nov 22, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 227654e87117d7705c78e0ad1d4d9261bbc6af55 b10add58c92716b52027f229c73da6920c8cf1f0 -- llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
View the diff from clang-format here.
diff --git a/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h b/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
index 5381ada37f..eb1e69b718 100644
--- a/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
+++ b/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
@@ -155,8 +155,8 @@ private:
     auto CalleeIt = ProfiledFunctions.find(CalleeName);
     if (CalleeIt == ProfiledFunctions.end())
       return;
-    ProfiledCallGraphEdge Edge(ProfiledFunctions[CallerName],
-                               CalleeIt->second, Weight);
+    ProfiledCallGraphEdge Edge(ProfiledFunctions[CallerName], CalleeIt->second,
+                               Weight);
     auto &Edges = ProfiledFunctions[CallerName]->Edges;
     auto EdgeIt = Edges.find(Edge);
     if (EdgeIt == Edges.end()) {
@@ -210,7 +210,7 @@ private:
   ProfiledCallGraphNode Root;
   // backing buffer for ProfiledCallGraphNodes.
   std::list<ProfiledCallGraphNode> ProfiledCallGraphNodeList;
-  HashKeyMap<llvm::DenseMap, FunctionId, ProfiledCallGraphNode*>
+  HashKeyMap<llvm::DenseMap, FunctionId, ProfiledCallGraphNode *>
       ProfiledFunctions;
 };
 

Copy link
Contributor

@david-xl david-xl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to prevent similar problems (adding while iterating)?

@huangjd
Copy link
Contributor Author

huangjd commented Nov 23, 2023

how to prevent similar problems (adding while iterating)?

There's no general solution, as this is a "feature" in C++ that the user is responsible for checking that. I searched other usages of hash maps in SampleProfile* to make sure there's no more use case like that (or use map which does not invalidate reference).

The problem can be detected with memory sanitizer.

Developers should be careful especially working with graphs, because it is likely to involve pointers onto a map.

@WenleiHe
Copy link
Member

Previously (before md5phase2) ProfiledFunctions is a llvm::StringMap, which also have the same issue theoretically when rehashing but was not observed.

This is again the consequence of you breaking reference stability with the new container. Before your change, there is no issue with StringMap -- it guarantees reference stability and rehash does not cause the underlying StringMapEntryBase to be reallocated. The code works with the assumption of reference stability.

ProfiledFunctions;
// backing buffer for ProfiledCallGraphNodes.
std::list<ProfiledCallGraphNode> ProfiledCallGraphNodeList;
llvm::DenseMap<FunctionId, ProfiledCallGraphNode*> ProfiledFunctions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why moving away from HashKeyMap? Don't you still want the unification between MD5 and string name in this case?

@david-xl
Copy link
Contributor

Previously (before md5phase2) ProfiledFunctions is a llvm::StringMap, which also have the same issue theoretically when rehashing but was not observed.

This is again the consequence of you breaking reference stability with the new container. Before your change, there is no issue with StringMap -- it guarantees reference stability and rehash does not cause the underlying StringMapEntryBase to be reallocated. The code works with the assumption of reference stability.

In all fairness, the old code was probably written without reference stability in mind -- the StringMap might be implemented without that guarantee. I think making code not depending on that behavior can be more robust and immune to breakage in the future.

@WenleiHe
Copy link
Member

Previously (before md5phase2) ProfiledFunctions is a llvm::StringMap, which also have the same issue theoretically when rehashing but was not observed.

This is again the consequence of you breaking reference stability with the new container. Before your change, there is no issue with StringMap -- it guarantees reference stability and rehash does not cause the underlying StringMapEntryBase to be reallocated. The code works with the assumption of reference stability.

In all fairness, the old code was probably written without reference stability in mind -- the StringMap might be implemented without that guarantee. I think making code not depending on that behavior can be more robust and immune to breakage in the future.

I'm pretty sure that's not the case. FWIW, I remember instances of such cases where originally in prototype backing container is used just so we can have reference stability. But such backing container was later removed/optimized away because other container provide stability guarantee.

As for guarantees, LLVM is fairly explicit, document points out DenseMap doesn't have reference stability, and StringMap has it.

My main point is, certain changes are just riskier, and removing reference stability is one of them. Breakage is fine, let's just acknowledge it and fix forward, rather than keep trying to attribute breakages to some "existing bug".

@david-xl
Copy link
Contributor

Previously (before md5phase2) ProfiledFunctions is a llvm::StringMap, which also have the same issue theoretically when rehashing but was not observed.

This is again the consequence of you breaking reference stability with the new container. Before your change, there is no issue with StringMap -- it guarantees reference stability and rehash does not cause the underlying StringMapEntryBase to be reallocated. The code works with the assumption of reference stability.

In all fairness, the old code was probably written without reference stability in mind -- the StringMap might be implemented without that guarantee. I think making code not depending on that behavior can be more robust and immune to breakage in the future.

I'm pretty sure that's not the case. FWIW, I remember instances of such cases where originally in prototype backing container is used just so we can have reference stability. But such backing container was later removed/optimized away because other container provide stability guarantee.

As for guarantees, LLVM is fairly explicit, document points out DenseMap doesn't have reference stability, and StringMap has it.

My main point is, certain changes are just riskier, and removing reference stability is one of them. Breakage is fine, let's just acknowledge it and fix forward, rather than keep trying to attribute breakages to some "existing bug".

Many good points, thanks.

DenseMap does document invalidation of iterator on insertion, but StringMap does not document the opposite (though that property can be inferred from the implementation?): https://llvm.org/docs/ProgrammersManual.html#llvm-adt-stringmap-h

Having a backing container that guarantees the behavior (and with good unit test for it) to support exposing element address sounds like a good idea if it is so desired program pattern, but removing such dependency might be better.

@huangjd
Copy link
Contributor Author

huangjd commented Nov 28, 2023

Previously (before md5phase2) ProfiledFunctions is a llvm::StringMap, which also have the same issue theoretically when rehashing but was not observed.

This is again the consequence of you breaking reference stability with the new container. Before your change, there is no issue with StringMap -- it guarantees reference stability and rehash does not cause the underlying StringMapEntryBase to be reallocated. The code works with the assumption of reference stability.

In all fairness, the old code was probably written without reference stability in mind -- the StringMap might be implemented without that guarantee. I think making code not depending on that behavior can be more robust and immune to breakage in the future.

I'm pretty sure that's not the case. FWIW, I remember instances of such cases where originally in prototype backing container is used just so we can have reference stability. But such backing container was later removed/optimized away because other container provide stability guarantee.
As for guarantees, LLVM is fairly explicit, document points out DenseMap doesn't have reference stability, and StringMap has it.
My main point is, certain changes are just riskier, and removing reference stability is one of them. Breakage is fine, let's just acknowledge it and fix forward, rather than keep trying to attribute breakages to some "existing bug".

Many good points, thanks.

DenseMap does document invalidation of iterator on insertion, but StringMap does not document the opposite (though that property can be inferred from the implementation?): https://llvm.org/docs/ProgrammersManual.html#llvm-adt-stringmap-h

Having a backing container that guarantees the behavior (and with good unit test for it) to support exposing element address sounds like a good idea if it is so desired program pattern, but removing such dependency might be better.

After looking into the source code StringMap's implementation seems to have reference stability, however it is neither explicitly mentioned in (https://llvm.org/docs/ProgrammersManual.html) nor clearly stated in the source code's comments. In this case I think it's better to use a backing container, in case the implementation of the map is changed.

@huangjd huangjd merged commit 68106bd into llvm:main Nov 29, 2023
2 of 3 checks passed
@huangjd huangjd deleted the md5phase2_bug2 branch November 29, 2023 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants