Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DomTree] Reduce number of hash table lookups (NFC) #73097

Merged
merged 1 commit into from Nov 27, 2023
Merged

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Nov 22, 2023

Inside runSemiNCA(), create a direct mapping from node number of node info, so we can save the node number -> node pointer -> node info lookup in many cases.

To allow this in more cases, change Label to a node number instead of node pointer.

I've limited this to runSemiNCA() for now, because we have the convenient property there that no new node infos will be added, so we don't have to worry about pointer invalidation.

This gives a pretty nice compile-time improvement: http://llvm-compile-time-tracker.com/compare.php?from=a3908d33b17cb9655d336039bf6a9bd798930eb4&to=ba6ac65bc746eecefe105a7bd11ad3dcaba0f246&stat=instructions%3Au

Inside runSemiNCA(), create a direct mapping from node number of node
info, so we can save the node number -> node pointer -> node info
lookup in many cases.

To allow this in more case, change Label to a node number instead
of node pointer.

I've limited this to runSemiNCA() for now, because we have the
convenient property there that no new node infos will be added,
so we don't have to worry about pointer invalidation.
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 22, 2023

@llvm/pr-subscribers-llvm-support

Author: Nikita Popov (nikic)

Changes

Inside runSemiNCA(), create a direct mapping from node number of node info, so we can save the node number -> node pointer -> node info lookup in many cases.

To allow this in more cases, change Label to a node number instead of node pointer.

I've limited this to runSemiNCA() for now, because we have the convenient property there that no new node infos will be added, so we don't have to worry about pointer invalidation.

This gives a pretty nice compile-time improvement: http://llvm-compile-time-tracker.com/compare.php?from=a3908d33b17cb9655d336039bf6a9bd798930eb4&to=ba6ac65bc746eecefe105a7bd11ad3dcaba0f246&stat=instructions%3Au


Full diff: https://github.com/llvm/llvm-project/pull/73097.diff

1 Files Affected:

  • (modified) llvm/include/llvm/Support/GenericDomTreeConstruction.h (+15-15)
diff --git a/llvm/include/llvm/Support/GenericDomTreeConstruction.h b/llvm/include/llvm/Support/GenericDomTreeConstruction.h
index d1adcad339c81b2..188f1b44903b3e5 100644
--- a/llvm/include/llvm/Support/GenericDomTreeConstruction.h
+++ b/llvm/include/llvm/Support/GenericDomTreeConstruction.h
@@ -65,7 +65,7 @@ struct SemiNCAInfo {
     unsigned DFSNum = 0;
     unsigned Parent = 0;
     unsigned Semi = 0;
-    NodePtr Label = nullptr;
+    unsigned Label = 0;
     NodePtr IDom = nullptr;
     SmallVector<NodePtr, 2> ReverseChildren;
   };
@@ -189,8 +189,7 @@ struct SemiNCAInfo {
 
       // Visited nodes always have positive DFS numbers.
       if (BBInfo.DFSNum != 0) continue;
-      BBInfo.DFSNum = BBInfo.Semi = ++LastNum;
-      BBInfo.Label = BB;
+      BBInfo.DFSNum = BBInfo.Semi = BBInfo.Label = ++LastNum;
       NumToNode.push_back(BB);
 
       constexpr bool Direction = IsReverse != IsPostDom;  // XOR.
@@ -237,8 +236,9 @@ struct SemiNCAInfo {
   //
   // For each vertex V, its Label points to the vertex with the minimal sdom(U)
   // (Semi) in its path from V (included) to NodeToInfo[V].Parent (excluded).
-  NodePtr eval(NodePtr V, unsigned LastLinked,
-               SmallVectorImpl<InfoRec *> &Stack) {
+  unsigned eval(NodePtr V, unsigned LastLinked,
+                SmallVectorImpl<InfoRec *> &Stack,
+                ArrayRef<InfoRec *> NumToInfo) {
     InfoRec *VInfo = &NodeToInfo[V];
     if (VInfo->Parent < LastLinked)
       return VInfo->Label;
@@ -247,17 +247,17 @@ struct SemiNCAInfo {
     assert(Stack.empty());
     do {
       Stack.push_back(VInfo);
-      VInfo = &NodeToInfo[NumToNode[VInfo->Parent]];
+      VInfo = NumToInfo[VInfo->Parent];
     } while (VInfo->Parent >= LastLinked);
 
     // Path compression. Point each vertex's Parent to the root and update its
     // Label if any of its ancestors (PInfo->Label) has a smaller Semi.
     const InfoRec *PInfo = VInfo;
-    const InfoRec *PLabelInfo = &NodeToInfo[PInfo->Label];
+    const InfoRec *PLabelInfo = NumToInfo[PInfo->Label];
     do {
       VInfo = Stack.pop_back_val();
       VInfo->Parent = PInfo->Parent;
-      const InfoRec *VLabelInfo = &NodeToInfo[VInfo->Label];
+      const InfoRec *VLabelInfo = NumToInfo[VInfo->Label];
       if (PLabelInfo->Semi < VLabelInfo->Semi)
         VInfo->Label = PInfo->Label;
       else
@@ -270,18 +270,20 @@ struct SemiNCAInfo {
   // This function requires DFS to be run before calling it.
   void runSemiNCA(DomTreeT &DT, const unsigned MinLevel = 0) {
     const unsigned NextDFSNum(NumToNode.size());
+    SmallVector<InfoRec *, 8> NumToInfo = {nullptr};
+    NumToInfo.reserve(NextDFSNum);
     // Initialize IDoms to spanning tree parents.
     for (unsigned i = 1; i < NextDFSNum; ++i) {
       const NodePtr V = NumToNode[i];
       auto &VInfo = NodeToInfo[V];
       VInfo.IDom = NumToNode[VInfo.Parent];
+      NumToInfo.push_back(&VInfo);
     }
 
     // Step #1: Calculate the semidominators of all vertices.
     SmallVector<InfoRec *, 32> EvalStack;
     for (unsigned i = NextDFSNum - 1; i >= 2; --i) {
-      NodePtr W = NumToNode[i];
-      auto &WInfo = NodeToInfo[W];
+      auto &WInfo = *NumToInfo[i];
 
       // Initialize the semi dominator to point to the parent node.
       WInfo.Semi = WInfo.Parent;
@@ -294,7 +296,7 @@ struct SemiNCAInfo {
         if (TN && TN->getLevel() < MinLevel)
           continue;
 
-        unsigned SemiU = NodeToInfo[eval(N, i + 1, EvalStack)].Semi;
+        unsigned SemiU = NumToInfo[eval(N, i + 1, EvalStack, NumToInfo)]->Semi;
         if (SemiU < WInfo.Semi) WInfo.Semi = SemiU;
       }
     }
@@ -304,8 +306,7 @@ struct SemiNCAInfo {
     // Note that the parents were stored in IDoms and later got invalidated
     // during path compression in Eval.
     for (unsigned i = 2; i < NextDFSNum; ++i) {
-      const NodePtr W = NumToNode[i];
-      auto &WInfo = NodeToInfo[W];
+      auto &WInfo = *NumToInfo[i];
       const unsigned SDomNum = NodeToInfo[NumToNode[WInfo.Semi]].DFSNum;
       NodePtr WIDomCandidate = WInfo.IDom;
       while (NodeToInfo[WIDomCandidate].DFSNum > SDomNum)
@@ -325,8 +326,7 @@ struct SemiNCAInfo {
     assert(NumToNode.size() == 1 && "SNCAInfo must be freshly constructed");
 
     auto &BBInfo = NodeToInfo[nullptr];
-    BBInfo.DFSNum = BBInfo.Semi = 1;
-    BBInfo.Label = nullptr;
+    BBInfo.DFSNum = BBInfo.Semi = BBInfo.Label = 1;
 
     NumToNode.push_back(nullptr);  // NumToNode[1] = nullptr;
   }

@@ -304,8 +306,7 @@ struct SemiNCAInfo {
// Note that the parents were stored in IDoms and later got invalidated
// during path compression in Eval.
for (unsigned i = 2; i < NextDFSNum; ++i) {
const NodePtr W = NumToNode[i];
auto &WInfo = NodeToInfo[W];
auto &WInfo = *NumToInfo[i];
const unsigned SDomNum = NodeToInfo[NumToNode[WInfo.Semi]].DFSNum;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're wondering why I didn't change this to NumToNode[WInfo.Semi] as well, the reason is that WInfo.Semi can be 0 for PDTs here. Because the PDT virtual root nullptr happens to be the same as the NumToNode sentinel, this ends up looking up the info for the virtual root.

I could easily work around this here, but I didn't do this because I think this indicates a (probably harmless) bug somewhere else, and Semi is supposed to be 1 (the virtual root) rather than 0 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put up a potential fix for this issue at #73148.

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for looking into this @nikic. I remember a large percentage of the total domtree computation/update time has always been spent inside dense maps.

Just to confirm, here we are effectively trading more memory usage (one new vector) for better runtime performance (fewer hash map lookups)? Is the memory usage difference insignificant enough, or could we have someone with the opposite constraints come and suggest a reverse change?

@nikic
Copy link
Contributor Author

nikic commented Nov 22, 2023

Just to confirm, here we are effectively trading more memory usage (one new vector) for better runtime performance (fewer hash map lookups)? Is the memory usage difference insignificant enough, or could we have someone with the opposite constraints come and suggest a reverse change?

The additional memory usage here is basically one pointer per node and only used temporarily, so I don't think it will have any measurable impact on peak memory usage. (Far below the noise threshold for my max-rss measurements.)

@dwblaikie
Copy link
Collaborator

(wonder if this generic code could use unit tests - this change could then be tested with an observer hash function that counts the number of calls, for instance)

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this, the change makes sense to me, especially with the benchmark results linked.

I see @dwblaikie's point about demonstrating that this actually helps with cache lookups, but I think this hits the point of diminishing returns fairly quickly. In the past I tried to do something similar and count the number of DFS nodes visited as a proxy for overall performance. I quickly run into the problems with llvm's statistic class because it's difficult to properly anchor it in highly-generic code like this, and ended up hacking it locally and never upstreaming this counters.
With this kind of complexity in mind, I would side with David if this PR was not backed by any wall time measurements, but it does improve wall time, so I don't have further concerns here.

@dwblaikie
Copy link
Collaborator

I see @dwblaikie's point about demonstrating that this actually helps with cache lookups, but I think this hits the point of diminishing returns fairly quickly.

I think if the data structure already had unit test infrastructure/coverage, it might be a more reasonable request (& that's part of the problem - the threshold to add that sort of infrastructure now is high (rather than when the data structure was added) & so any individual improvement/change usually doesn't motivate the work, unfortunately)

In the past I tried to do something similar and count the number of DFS nodes visited as a proxy for overall performance. I quickly run into the problems with llvm's statistic class because it's difficult to properly anchor it in highly-generic code like this, and ended up hacking it locally and never upstreaming this counters.

Oh, yeah, I don't think I'd use the statistics stuff for this - I was thinking of a custom mock DomTree type, which could have a custom NodePtr type with a custom hasher that counts the number of times the type is hashed. A small unit test could use such a mock to count and check the number of hashes. (probably a bit brittle, but if the interface usage is small enough, it might be a sufficiently stable number to have OK signal/noise)

(no changes required from me - just idle thoughts in case someone feels sufficiently compelled to add some extra testing in here along these lines)

@nikic
Copy link
Contributor Author

nikic commented Nov 25, 2023

I don't think adding tests of that kind would be appropriate -- we should only test public API behavior, not implementation details. How many hash lookups the domtree implementation performs is a pure implementation detail, that we don't (and shouldn't) make any guarantees about. This isn't even something that affects the asymptotic computational complexity (something that could be considered an API guarantee), it "only" affects the constant factor.

@dwblaikie
Copy link
Collaborator

I get that it's somewhat an implementation detail (though observable through the public API - but not guaranteed) - but I think on the whole it could probably be a fairly stable test with useful signal, but maybe changes that affect the amount of hashing are more frequent/it's a less stable property than I'm estimating.

@aeubanks
Copy link
Contributor

I agree with nikic that this doesn't need an explicit test, "regressions" against this sort of implementation detail change should be monitored with compile time tracking, not functional tests, since that's ultimately what we care about in the context of this change. imo it's reasonable to assume that the existing domtree test coverage should cover everything here especially since this change didn't add any branches

@nikic nikic merged commit 553f885 into llvm:main Nov 27, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants