Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADT] Update SCC iterator documentation #84268

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

zyd2001
Copy link
Contributor

@zyd2001 zyd2001 commented Mar 7, 2024

Current 'scc_iterator' uses a partial Tarjan's algorithm to find SCCs for CFG-like graph and does not work with generic graph with multiple entry points. The documentation only states it finds all SCCs using Tarjan's algorithm. This PR updates the doc to reflect that.

Old:
Current scc_iterator in LLVM performs DFS from only one entry node. It works fine for CFG, but for a more general graph with multiple "entry nodes" or has multiple unconnected components, you won't get expected result using scc_iterator on that graph. In my project I need to find SCCs in some kind of dependency graph and scc_iterator doesn't meet my need, so I decide to make a more general SCC iterator.
For this PR, I didn't change any original code, and I just add a new template iterator graph_scc_iterator in llvm/ADT/SCCIterator.h which implements Tarjan's algorithm. It basically has the same interface as scc_iterator but requires nodes_iterator from the graph.
The code is rough for now, if you think this is worth to be included, I can add more comments and update the code.

Copy link

github-actions bot commented Mar 7, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@zyd2001 zyd2001 marked this pull request as ready for review March 7, 2024 03:39
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 7, 2024

@llvm/pr-subscribers-llvm-adt

Author: Yida Zhang (zyd2001)

Changes

Current scc_iterator in LLVM performs DFS from only one entry node. It works fine for CFG, but for a more general graph with multiple "entry nodes" or has multiple unconnected components, you won't get expected result using scc_iterator on that graph. In my project I need to find SCCs in some kind of dependency graph and scc_iterator doesn't meet my need, so I decide to make a more general SCC iterator.
For this PR, I didn't change any original code, and I just add a new template iterator graph_scc_iterator in llvm/ADT/SCCIterator.h which implements Tarjan's algorithm. It basically has the same semantics as scc_iterator but requires nodes_iterator from the graph.
The code is rough for now, if you think this is worth to be included, I can add more comments to the code.


Full diff: https://github.com/llvm/llvm-project/pull/84268.diff

1 Files Affected:

  • (modified) llvm/include/llvm/ADT/SCCIterator.h (+140)
diff --git a/llvm/include/llvm/ADT/SCCIterator.h b/llvm/include/llvm/ADT/SCCIterator.h
index e743ae7c11edbc..57722789492726 100644
--- a/llvm/include/llvm/ADT/SCCIterator.h
+++ b/llvm/include/llvm/ADT/SCCIterator.h
@@ -31,6 +31,7 @@
 #include <iterator>
 #include <queue>
 #include <set>
+#include <stack>
 #include <unordered_map>
 #include <unordered_set>
 #include <vector>
@@ -377,6 +378,145 @@ scc_member_iterator<GraphT, GT>::scc_member_iterator(
   assert(InputNodes.size() == Nodes.size() && "missing nodes in MST");
   std::reverse(Nodes.begin(), Nodes.end());
 }
+
+template <class GraphT, class GT = GraphTraits<GraphT>>
+class graph_scc_iterator
+    : public iterator_facade_base<
+          graph_scc_iterator<GraphT, GT>, std::forward_iterator_tag,
+          const std::vector<typename GT::NodeRef>, ptrdiff_t> {
+  using NodeRef = typename GT::NodeRef;
+  using NodesIter = typename GT::nodes_iterator;
+  using ChildIter = typename GT::ChildIteratorType;
+  using reference = typename graph_scc_iterator::reference;
+
+  struct NodeEntry {
+    int index = 0;
+    int lowlink;
+    bool onStack = false;
+  };
+
+  struct StackEntry {
+    NodeRef n;
+    ChildIter currentChild;
+    NodeRef lastChild;
+    bool visited;
+  };
+
+  std::vector<NodeRef> currentSCC;
+  std::stack<StackEntry> recursionStack;
+  std::stack<NodeRef> S;
+  DenseMap<NodeRef, NodeEntry> nodeInfo;
+  int index = 1;
+
+  // current node in the outer loop for all nodes
+  NodesIter currentNode;
+  NodesIter nodeEndIter;
+
+  graph_scc_iterator(const GraphT &g, bool e = false)
+      : currentNode(GT::nodes_begin(&g)), nodeEndIter(GT::nodes_end(&g)) {
+    if (e) {
+      currentNode = nodeEndIter;
+      return;
+    }
+    computeNext();
+  }
+
+  void computeNext() {
+    currentSCC.clear();
+    if (recursionStack.empty()) {
+      // not in DFS process
+      for (; currentNode != nodeEndIter; currentNode++) {
+        NodeRef v = *currentNode;
+        if (nodeInfo[v].index == 0) {
+          recursionStack.emplace(
+              StackEntry{v, GT::child_begin(v), nullptr, false});
+          currentNode++;
+          break;
+        }
+      }
+    }
+
+    while (!recursionStack.empty()) {
+      StackEntry &stackEntry = recursionStack.top();
+      NodeRef v = stackEntry.n;
+      ChildIter &child = stackEntry.currentChild;
+      NodeEntry &vEntry = nodeInfo[v];
+      if (!stackEntry.visited) {
+        // first time
+        vEntry.index = index;
+        vEntry.lowlink = index;
+        index++;
+        S.push(v);
+        vEntry.onStack = true;
+        stackEntry.visited = true;
+      } else {
+        assert(nodeInfo.count(stackEntry.lastChild));
+        vEntry.lowlink =
+            std::min(vEntry.lowlink, nodeInfo[stackEntry.lastChild].lowlink);
+      }
+
+      bool inserted = false;
+      for (; child != GT::child_end(v); child++) {
+        NodeRef w = *child;
+        NodeEntry &wEntry = nodeInfo[w];
+        if (wEntry.index == 0) {
+          inserted = true;
+          recursionStack.emplace(
+              StackEntry{w, GT::child_begin(w), nullptr, false});
+          stackEntry.lastChild = w;
+          child++;
+          break;
+        } else if (wEntry.onStack) {
+          vEntry.lowlink = std::min(vEntry.lowlink, wEntry.index);
+        }
+      }
+
+      if (inserted)
+        continue;
+
+      recursionStack.pop();
+      if (vEntry.lowlink == vEntry.index) {
+        NodeRef w;
+        do {
+          w = S.top();
+          S.pop();
+          nodeInfo[w].onStack = false;
+          currentSCC.push_back(w);
+        } while (w != v);
+        return;
+      }
+    }
+  }
+
+public:
+  static graph_scc_iterator begin(const GraphT &g) {
+    return graph_scc_iterator(g);
+  }
+  static graph_scc_iterator end(const GraphT &g) {
+    return graph_scc_iterator(g, true);
+  }
+
+  bool operator!=(const graph_scc_iterator &x) const {
+    return currentNode != x.currentNode || currentSCC != x.currentSCC;
+  }
+
+  graph_scc_iterator &operator++() {
+    computeNext();
+    return *this;
+  }
+
+  reference operator*() const { return currentSCC; }
+};
+/// Construct the begin iterator for a deduced graph type T.
+template <class T> graph_scc_iterator<T> graph_scc_begin(const T &G) {
+  return graph_scc_iterator<T>::begin(G);
+}
+
+/// Construct the end iterator for a deduced graph type T.
+template <class T> graph_scc_iterator<T> graph_scc_end(const T &G) {
+  return graph_scc_iterator<T>::end(G);
+}
+
 } // end namespace llvm
 
 #endif // LLVM_ADT_SCCITERATOR_H

@zyd2001 zyd2001 changed the title SCC iterator for general graph [ADT] SCC iterator for general graph Mar 7, 2024
Comment on lines 499 to 502
bool operator!=(const graph_scc_iterator &x) const {
return currentNode != x.currentNode || currentSCC != x.currentSCC;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally prefer non-member operator overloads when possible (you can still make it an inline friend definition if having the definition inside the class definition is nicer/keeps related things close together, etc) - it ensures type conversions are equally considered for the LHS and RHS of the operator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it. Thanks.

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description you said:

In my project I need to find SCCs in some kind of dependency graph and scc_iterator doesn't meet my need, so I decide to make a more general SCC iterator.

Does the llvm project need it though? Do you plan to upstream future code that requires general graphs, or know of some code that could be cleaned up using it?

@zyd2001
Copy link
Contributor Author

zyd2001 commented Mar 9, 2024

I don't have plan to upstream code for now. I'm not sure whether currently llvm project need this, but I do see in LazyCallGraph there is a generic implementation of Tarjan's DFS called buildGenericSCCs. Maybe this iterator can be used there.
Also, I believe this implementation can replace the old one since it is just more generic implementation and should produce the same result as the old scc_iterator.
However, I think the problem is that the current documentation and interface of SCC Iterator is misleading. It says "Enumerate the SCCs of a directed graph in reverse topological order of the SCC DAG" and the interface just accepts a graph. But it only processes the graph from a single entry node. My colleague and I used it to compute SCC on generic graphs without knowing it is not the correct way to use. We found this problem only after we got weird output and dig into the code.

@kuhar
Copy link
Member

kuhar commented Mar 9, 2024

@zyd2001 Thanks for the context. The way I see it is that it's a nice improvement but it comes at the expense of future maintenance. If there are not uses in the code base, it is likely to suffer from 'bit rot'.

However, I think the problem is that the current documentation and interface of SCC Iterator is misleading. It says "Enumerate the SCCs of a directed graph in reverse topological order of the SCC DAG" and the interface just accepts a graph.

What do you think about improving the documentation and/or adding an assertion instead?

@zyd2001
Copy link
Contributor Author

zyd2001 commented Mar 12, 2024

I think we can say in the doc that scc_iterator is for CFG like graph since it only performs Tarjar's DFS based algorithm from single entry point, so it won't produce correct result for generic graph.

@zyd2001
Copy link
Contributor Author

zyd2001 commented Apr 7, 2024

I reverted the change and modify the doc of scc_iterator.

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but please update the PR title and description before landing.

@zyd2001 zyd2001 changed the title [ADT] SCC iterator for general graph [ADT] Update SCC iterator documentation Apr 8, 2024
@zyd2001
Copy link
Contributor Author

zyd2001 commented Apr 8, 2024

I changed the title and description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants