[DomTree] Reduce number of hash table lookups (NFC) #73097

nikic · 2023-11-22T09:20:27Z

Inside runSemiNCA(), create a direct mapping from node number of node info, so we can save the node number -> node pointer -> node info lookup in many cases.

To allow this in more cases, change Label to a node number instead of node pointer.

I've limited this to runSemiNCA() for now, because we have the convenient property there that no new node infos will be added, so we don't have to worry about pointer invalidation.

This gives a pretty nice compile-time improvement: http://llvm-compile-time-tracker.com/compare.php?from=a3908d33b17cb9655d336039bf6a9bd798930eb4&to=ba6ac65bc746eecefe105a7bd11ad3dcaba0f246&stat=instructions%3Au

Inside runSemiNCA(), create a direct mapping from node number of node info, so we can save the node number -> node pointer -> node info lookup in many cases. To allow this in more case, change Label to a node number instead of node pointer. I've limited this to runSemiNCA() for now, because we have the convenient property there that no new node infos will be added, so we don't have to worry about pointer invalidation.

llvmbot · 2023-11-22T09:20:53Z

@llvm/pr-subscribers-llvm-support

Author: Nikita Popov (nikic)

Changes

Inside runSemiNCA(), create a direct mapping from node number of node info, so we can save the node number -> node pointer -> node info lookup in many cases.

To allow this in more cases, change Label to a node number instead of node pointer.

I've limited this to runSemiNCA() for now, because we have the convenient property there that no new node infos will be added, so we don't have to worry about pointer invalidation.

This gives a pretty nice compile-time improvement: http://llvm-compile-time-tracker.com/compare.php?from=a3908d33b17cb9655d336039bf6a9bd798930eb4&to=ba6ac65bc746eecefe105a7bd11ad3dcaba0f246&stat=instructions%3Au

Full diff: https://github.com/llvm/llvm-project/pull/73097.diff

1 Files Affected:

(modified) llvm/include/llvm/Support/GenericDomTreeConstruction.h (+15-15)

diff --git a/llvm/include/llvm/Support/GenericDomTreeConstruction.h b/llvm/include/llvm/Support/GenericDomTreeConstruction.h
index d1adcad339c81b2..188f1b44903b3e5 100644
--- a/llvm/include/llvm/Support/GenericDomTreeConstruction.h
+++ b/llvm/include/llvm/Support/GenericDomTreeConstruction.h
@@ -65,7 +65,7 @@ struct SemiNCAInfo {
     unsigned DFSNum = 0;
     unsigned Parent = 0;
     unsigned Semi = 0;
-    NodePtr Label = nullptr;
+    unsigned Label = 0;
     NodePtr IDom = nullptr;
     SmallVector<NodePtr, 2> ReverseChildren;
   };
@@ -189,8 +189,7 @@ struct SemiNCAInfo {
 
       // Visited nodes always have positive DFS numbers.
       if (BBInfo.DFSNum != 0) continue;
-      BBInfo.DFSNum = BBInfo.Semi = ++LastNum;
-      BBInfo.Label = BB;
+      BBInfo.DFSNum = BBInfo.Semi = BBInfo.Label = ++LastNum;
       NumToNode.push_back(BB);
 
       constexpr bool Direction = IsReverse != IsPostDom;  // XOR.
@@ -237,8 +236,9 @@ struct SemiNCAInfo {
   //
   // For each vertex V, its Label points to the vertex with the minimal sdom(U)
   // (Semi) in its path from V (included) to NodeToInfo[V].Parent (excluded).
-  NodePtr eval(NodePtr V, unsigned LastLinked,
-               SmallVectorImpl<InfoRec *> &Stack) {
+  unsigned eval(NodePtr V, unsigned LastLinked,
+                SmallVectorImpl<InfoRec *> &Stack,
+                ArrayRef<InfoRec *> NumToInfo) {
     InfoRec *VInfo = &NodeToInfo[V];
     if (VInfo->Parent < LastLinked)
       return VInfo->Label;
@@ -247,17 +247,17 @@ struct SemiNCAInfo {
     assert(Stack.empty());
     do {
       Stack.push_back(VInfo);
-      VInfo = &NodeToInfo[NumToNode[VInfo->Parent]];
+      VInfo = NumToInfo[VInfo->Parent];
     } while (VInfo->Parent >= LastLinked);
 
     // Path compression. Point each vertex's Parent to the root and update its
     // Label if any of its ancestors (PInfo->Label) has a smaller Semi.
     const InfoRec *PInfo = VInfo;
-    const InfoRec *PLabelInfo = &NodeToInfo[PInfo->Label];
+    const InfoRec *PLabelInfo = NumToInfo[PInfo->Label];
     do {
       VInfo = Stack.pop_back_val();
       VInfo->Parent = PInfo->Parent;
-      const InfoRec *VLabelInfo = &NodeToInfo[VInfo->Label];
+      const InfoRec *VLabelInfo = NumToInfo[VInfo->Label];
       if (PLabelInfo->Semi < VLabelInfo->Semi)
         VInfo->Label = PInfo->Label;
       else
@@ -270,18 +270,20 @@ struct SemiNCAInfo {
   // This function requires DFS to be run before calling it.
   void runSemiNCA(DomTreeT &DT, const unsigned MinLevel = 0) {
     const unsigned NextDFSNum(NumToNode.size());
+    SmallVector<InfoRec *, 8> NumToInfo = {nullptr};
+    NumToInfo.reserve(NextDFSNum);
     // Initialize IDoms to spanning tree parents.
     for (unsigned i = 1; i < NextDFSNum; ++i) {
       const NodePtr V = NumToNode[i];
       auto &VInfo = NodeToInfo[V];
       VInfo.IDom = NumToNode[VInfo.Parent];
+      NumToInfo.push_back(&VInfo);
     }
 
     // Step #1: Calculate the semidominators of all vertices.
     SmallVector<InfoRec *, 32> EvalStack;
     for (unsigned i = NextDFSNum - 1; i >= 2; --i) {
-      NodePtr W = NumToNode[i];
-      auto &WInfo = NodeToInfo[W];
+      auto &WInfo = *NumToInfo[i];
 
       // Initialize the semi dominator to point to the parent node.
       WInfo.Semi = WInfo.Parent;
@@ -294,7 +296,7 @@ struct SemiNCAInfo {
         if (TN && TN->getLevel() < MinLevel)
           continue;
 
-        unsigned SemiU = NodeToInfo[eval(N, i + 1, EvalStack)].Semi;
+        unsigned SemiU = NumToInfo[eval(N, i + 1, EvalStack, NumToInfo)]->Semi;
         if (SemiU < WInfo.Semi) WInfo.Semi = SemiU;
       }
     }
@@ -304,8 +306,7 @@ struct SemiNCAInfo {
     // Note that the parents were stored in IDoms and later got invalidated
     // during path compression in Eval.
     for (unsigned i = 2; i < NextDFSNum; ++i) {
-      const NodePtr W = NumToNode[i];
-      auto &WInfo = NodeToInfo[W];
+      auto &WInfo = *NumToInfo[i];
       const unsigned SDomNum = NodeToInfo[NumToNode[WInfo.Semi]].DFSNum;
       NodePtr WIDomCandidate = WInfo.IDom;
       while (NodeToInfo[WIDomCandidate].DFSNum > SDomNum)
@@ -325,8 +326,7 @@ struct SemiNCAInfo {
     assert(NumToNode.size() == 1 && "SNCAInfo must be freshly constructed");
 
     auto &BBInfo = NodeToInfo[nullptr];
-    BBInfo.DFSNum = BBInfo.Semi = 1;
-    BBInfo.Label = nullptr;
+    BBInfo.DFSNum = BBInfo.Semi = BBInfo.Label = 1;
 
     NumToNode.push_back(nullptr);  // NumToNode[1] = nullptr;
   }

nikic · 2023-11-22T09:23:56Z

llvm/include/llvm/Support/GenericDomTreeConstruction.h

@@ -304,8 +306,7 @@ struct SemiNCAInfo {
    // Note that the parents were stored in IDoms and later got invalidated
    // during path compression in Eval.
    for (unsigned i = 2; i < NextDFSNum; ++i) {
-      const NodePtr W = NumToNode[i];
-      auto &WInfo = NodeToInfo[W];
+      auto &WInfo = *NumToInfo[i];
      const unsigned SDomNum = NodeToInfo[NumToNode[WInfo.Semi]].DFSNum;


If you're wondering why I didn't change this to NumToNode[WInfo.Semi] as well, the reason is that WInfo.Semi can be 0 for PDTs here. Because the PDT virtual root nullptr happens to be the same as the NumToNode sentinel, this ends up looking up the info for the virtual root.

I could easily work around this here, but I didn't do this because I think this indicates a (probably harmless) bug somewhere else, and Semi is supposed to be 1 (the virtual root) rather than 0 here.

I've put up a potential fix for this issue at #73148.

kuhar

Awesome, thanks for looking into this @nikic. I remember a large percentage of the total domtree computation/update time has always been spent inside dense maps.

Just to confirm, here we are effectively trading more memory usage (one new vector) for better runtime performance (fewer hash map lookups)? Is the memory usage difference insignificant enough, or could we have someone with the opposite constraints come and suggest a reverse change?

nikic · 2023-11-22T15:54:42Z

Just to confirm, here we are effectively trading more memory usage (one new vector) for better runtime performance (fewer hash map lookups)? Is the memory usage difference insignificant enough, or could we have someone with the opposite constraints come and suggest a reverse change?

The additional memory usage here is basically one pointer per node and only used temporarily, so I don't think it will have any measurable impact on peak memory usage. (Far below the noise threshold for my max-rss measurements.)

dwblaikie · 2023-11-22T22:55:01Z

(wonder if this generic code could use unit tests - this change could then be tested with an observer hash function that counts the number of calls, for instance)

kuhar

Thanks for looking into this, the change makes sense to me, especially with the benchmark results linked.

I see @dwblaikie's point about demonstrating that this actually helps with cache lookups, but I think this hits the point of diminishing returns fairly quickly. In the past I tried to do something similar and count the number of DFS nodes visited as a proxy for overall performance. I quickly run into the problems with llvm's statistic class because it's difficult to properly anchor it in highly-generic code like this, and ended up hacking it locally and never upstreaming this counters.
With this kind of complexity in mind, I would side with David if this PR was not backed by any wall time measurements, but it does improve wall time, so I don't have further concerns here.

dwblaikie · 2023-11-25T05:04:26Z

I see @dwblaikie's point about demonstrating that this actually helps with cache lookups, but I think this hits the point of diminishing returns fairly quickly.

I think if the data structure already had unit test infrastructure/coverage, it might be a more reasonable request (& that's part of the problem - the threshold to add that sort of infrastructure now is high (rather than when the data structure was added) & so any individual improvement/change usually doesn't motivate the work, unfortunately)

In the past I tried to do something similar and count the number of DFS nodes visited as a proxy for overall performance. I quickly run into the problems with llvm's statistic class because it's difficult to properly anchor it in highly-generic code like this, and ended up hacking it locally and never upstreaming this counters.

Oh, yeah, I don't think I'd use the statistics stuff for this - I was thinking of a custom mock DomTree type, which could have a custom NodePtr type with a custom hasher that counts the number of times the type is hashed. A small unit test could use such a mock to count and check the number of hashes. (probably a bit brittle, but if the interface usage is small enough, it might be a sufficiently stable number to have OK signal/noise)

(no changes required from me - just idle thoughts in case someone feels sufficiently compelled to add some extra testing in here along these lines)

nikic · 2023-11-25T16:51:11Z

I don't think adding tests of that kind would be appropriate -- we should only test public API behavior, not implementation details. How many hash lookups the domtree implementation performs is a pure implementation detail, that we don't (and shouldn't) make any guarantees about. This isn't even something that affects the asymptotic computational complexity (something that could be considered an API guarantee), it "only" affects the constant factor.

dwblaikie · 2023-11-25T17:55:07Z

I get that it's somewhat an implementation detail (though observable through the public API - but not guaranteed) - but I think on the whole it could probably be a fairly stable test with useful signal, but maybe changes that affect the amount of hashing are more frequent/it's a less stable property than I'm estimating.

aeubanks · 2023-11-25T18:09:18Z

I agree with nikic that this doesn't need an explicit test, "regressions" against this sort of implementation detail change should be monitored with compile time tracking, not functional tests, since that's ultimately what we care about in the context of this change. imo it's reasonable to assume that the existing domtree test coverage should cover everything here especially since this change didn't add any branches

nikic requested review from alinas and kuhar November 22, 2023 09:20

llvmbot added the llvm:support label Nov 22, 2023

nikic commented Nov 22, 2023

View reviewed changes

kuhar reviewed Nov 22, 2023

View reviewed changes

nikic mentioned this pull request Nov 22, 2023

[DomTree] Fix root attachment in runDFS() #73148

Merged

kuhar approved these changes Nov 24, 2023

View reviewed changes

nikic merged commit 553f885 into llvm:main Nov 27, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DomTree] Reduce number of hash table lookups (NFC) #73097

[DomTree] Reduce number of hash table lookups (NFC) #73097

nikic commented Nov 22, 2023

llvmbot commented Nov 22, 2023

nikic Nov 22, 2023

nikic Nov 22, 2023

kuhar left a comment

nikic commented Nov 22, 2023

dwblaikie commented Nov 22, 2023

kuhar left a comment

dwblaikie commented Nov 25, 2023

nikic commented Nov 25, 2023

dwblaikie commented Nov 25, 2023

aeubanks commented Nov 25, 2023

[DomTree] Reduce number of hash table lookups (NFC) #73097

[DomTree] Reduce number of hash table lookups (NFC) #73097

Conversation

nikic commented Nov 22, 2023

llvmbot commented Nov 22, 2023

nikic Nov 22, 2023

Choose a reason for hiding this comment

nikic Nov 22, 2023

Choose a reason for hiding this comment

kuhar left a comment

Choose a reason for hiding this comment

nikic commented Nov 22, 2023

dwblaikie commented Nov 22, 2023

kuhar left a comment

Choose a reason for hiding this comment

dwblaikie commented Nov 25, 2023

nikic commented Nov 25, 2023

dwblaikie commented Nov 25, 2023

aeubanks commented Nov 25, 2023