pantsbuild · stuhood · Sep 1, 2020 · Aug 25, 2020 · Aug 20, 2020 · Aug 28, 2020
diff --git a/src/rust/engine/rule_graph/README.md b/src/rust/engine/rule_graph/README.md
@@ -33,7 +33,7 @@ If a user makes a request to the engine that does not have a corresponding `Quer
 
 `Params` are typed, comparable (`eq`/`hash`) values that represent both the inputs to `Rules`, and the building block of the runtime memoization key for a `Rule`. The set of `Params` (unique by type) that are consumed to create a `Rule`'s inputs (plus the `Rule`'s own identity) make up the memoization key for a runtime instance of the `Rule`.
 
-It's important to note though that the `Params` in a `Rule` instance's identity will _not_ always become the positional arguments to that `Rule`: in many cases, a `Param` will be consumed by a `Rule`'s dependencies in order to produce an output value that becomes either a positional argument to the `Rule` as it starts, or the result of a `Get` while a coroutine `Rule` runs.
+`Param`s are eventually used as positional args to `Rule`s, but it's important to note that the `Param`s in a `Rule` instance's identity/memoization-key will not always become the positional arguments to _that_ `Rule`: in many cases, a `Param` will be used by a `Rule`'s transitive dependencies in order to produce an output value that becomes either a positional argument to the `Rule` as it starts, or the result of a `Get` while a coroutine `Rule` runs.
 
 The `Param`s that are available to a `Rule` are made available by the `Rule`'s dependees (its "callers"), but similar to how `Rule`s are not called by name, neither are all of their `Param`s passed explicitly at each use site. A `Rule` will be used to compute the output value for a `DependencyKey`: i.e., a positional argument, `Get` result, or `Query` result. Of these usage sites, only `Query` specifies the complete set of `Params` that will be available: the other two usages (positional arguments and `Get`s) are able to use any Param that will be "in scope" at the use site.
 
@@ -63,7 +63,7 @@ The construction algorithm is broken up into phases:
 2. [live_param_labeled](https://github.com/pantsbuild/pants/blob/3a188a1e06d8c27ff86d8c311ff1b2bdea0d39ff/src/rust/engine/rule_graph/src/builder.rs#L749-L754) - Run [live variable analysis](https://en.wikipedia.org/wiki/Live_variable_analysis) on the polymorphic graph to compute the initial set of `Params` used by each node in the graph. Because nodes in the polymorphic graph have references to all possible sources of a particular dependency type, the computed set is conservative (i.e., overly large).
     * For example: if a `Rule` `x` has exactly one `DependencyKey`, but there are two potential dependencies to provide that `DependencyKey` with input `Param`s `{A,B}` and `{B,C}` (respectively), then at this phase the input `Param`s for `x` must be the union of all possibilities: `{A,B,C}`.
     * If we were to stop `RuleGraph` construction at this phase, it would be necessary to do a form of [dynamic dispatch](https://en.wikipedia.org/wiki/Dynamic_dispatch) at runtime to decide which source of a dependency to use based on the `Param`s that were currently in scope. And the sets of `Param`s used in the memoization key for each `Rule` would still be overly large, causing excess invalidation.
-3. [monomorphize](https://github.com/pantsbuild/pants/blob/3a188a1e06d8c27ff86d8c311ff1b2bdea0d39ff/src/rust/engine/rule_graph/src/builder.rs#L325-L353) - "Monomorphize" the polymorphic graph by using the in/out sets initialized during live variable analysis to partition nodes (and their dependees) for each valid combination of their dependencies. Unfortunately, this phase remains the most complicated: while it is implemented using an algorithm similar to live variable analysis, the fit isn't perfect, and so there is a special case to break out of fruitless loops: see the "splits" TODO in `fn monomorphize`.
+3. [monomorphize](https://github.com/pantsbuild/pants/blob/3a188a1e06d8c27ff86d8c311ff1b2bdea0d39ff/src/rust/engine/rule_graph/src/builder.rs#L325-L353) - "Monomorphize" the polymorphic graph by using the in/out sets initialized during live variable analysis to partition nodes (and their dependees) for each valid combination of their dependencies. Combinations of depedencies that would be invalid (because a dependency's in-set does not consume a provided `Param`, or consumes a `Param` that isn't present in the out-set) are not generated, which causes some pruning of the graph to happen during this phase. Unfortunately, this phase remains the most complicated: while it is implemented using an algorithm similar to live variable analysis, the fit isn't perfect, and so there is a special case to break out of fruitless loops: see the "splits" TODO in `fn monomorphize`.
     * Continuing the example from above: the goal of monomorphize is to create one copy of `Rule` `x` per legal combination of its `DependencyKey`. Assuming that both of `x`'s dependencies remain legal (i.e. that all of `{A,B,C}` are still in scope in the dependees of `x`, etc), then two copies of `x` will be created: one that uses the first dependency and has an in-set of `{A,B}`, and another that uses the second dependency and has an in-set of `{B,C}`.
 4. [prune_edges](https://github.com/pantsbuild/pants/blob/3a188a1e06d8c27ff86d8c311ff1b2bdea0d39ff/src/rust/engine/rule_graph/src/builder.rs#L836-L845) - Once the monomorphic graph has [converged](https://en.wikipedia.org/wiki/Data-flow_analysis#Convergence), each node in the graph will ideally have exactly one source of each `DependencyKey` (with the exception of `Query`s, which are not monomorphized). This phase validates that, and chooses the smallest input `Param` set to use for each `Query`. In cases where a node has more that one dependency per `DependencyKey`, it is because given a particular set of input `Params` there was more than one valid way to compute a dependency. This can happen either because there were too many `Param`s in scope, or because there were multiple `Rule`s with the same `Param` requirements.
     * This phase is the second phase that renders errors. If a node has too many sources of a `DependencyKey`, this phase will recurse to attempt to locate the node in the `Rule` graph where the ambiguity was introduced. Likewise, if a node has no source of a `DependencyKey`, this phase will recurse on deleted nodes (which are preserved by `monomorphize`) to attempt to locate the bottom-most `Rule` that was missing a `DependencyKey`.
@@ -83,6 +83,7 @@ async def fibonacci(n: int) -> Fibonacci:
     x, y = await Get(Fib, int(n - 2)), await Get(Fib, int(n - 1))
     return Fibonacci(x.val + y.val)
 ```
+
 ```
 @rule
 async def is_even(n: int) -> IsEven:
@@ -97,22 +98,79 @@ But as the set of rules in Pants has grown, more cycles have made their way into
 
 The special case mentioned above to break out of monomorphize if it is not converging allows us to successfully construct graphs for most of these dozen-node cases (relatively quickly: 4000 node visits to construct a 450 node output graph), but larger cases can take exponentially longer, and the special case causes us to break out with errors in cases that should be resolvable.
 
-### Issues
+### Example
 
-#### Monotonicity vs Global Information
+One minimal problematic testcase (`fn wide_cycle` in `src/rust/engine/rule_graph/src/tests.rs`) looks like:
+```
+@rule
+async def sao() -> A:
+    .. = Get(AWO, AS, ..)
+    .. = Get(AWO, FS, ..)
 
-It is unclear precisely what factors the monomorphize phase should use on a node-by-node basis to decide whether a node should be split or whether to noop. The implementation started out similar to live variable analysis, which is expected to be monotonic and converge (meaning that the in/out sets would get smaller/larger over time in a lattice). The [special casing we've introduced](https://github.com/pantsbuild/pants/blob/3a188a1e06d8c27ff86d8c311ff1b2bdea0d39ff/src/rust/engine/rule_graph/src/builder.rs#L386-L397) around "giving up" when a node would attempt a split that we had previously recorded attempts to use global information to decide that further splitting is fruitless, but it's possible that there is a cycle-safe way to inspect a set of nodes that have stabilized?
+@rule
+async def awofs(fs: FS) -> AWO:
+    ...
+
+@rule
+async def awoas(as: AS, a: A) -> AWO:
+    ...
+```
+... which results in a polymorphic graph that looks like:
+```
+// initial polymorphic:
+digraph {
+    0 [label="Query(A for )"]
+    1 [label="Rule(\"sao\", [DependencyKey(\"AWO\", Some(\"AS\")), DependencyKey(\"AWO\", Some(\"FS\"))])"]
+    2 [label="Rule(\"awofs\", [DependencyKey(\"FS\", None)])"]
+    3 [label="Rule(\"awoas\", [DependencyKey(\"AS\", None), DependencyKey(\"A\", None)])"]
+    4 [label="Param(AS)"]
+    5 [label="Param(FS)"]
+    0 -> 1 [label="DependencyKey(\"A\", None)"]
+    1 -> 2 [label="DependencyKey(\"AWO\", Some(\"AS\"))"]
+    1 -> 3 [label="DependencyKey(\"AWO\", Some(\"AS\"))"]
+    1 -> 2 [label="DependencyKey(\"AWO\", Some(\"FS\"))"]
+    1 -> 3 [label="DependencyKey(\"AWO\", Some(\"FS\"))"]
+    3 -> 4 [label="DependencyKey(\"AS\", None)"]
+    3 -> 1 [label="DependencyKey(\"A\", None)"]
+    2 -> 5 [label="DependencyKey(\"FS\", None)"]
+}
+```
 
-We've also explored converting our irreducible loops to reducible loops using [Janssen/Corporaal](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.3925&rep=rep1&type=pdf), but the reducible graph does not appear to converge more quickly.
+If you were to hand-solve this graph, you'd end up with:
+```
+// hand solved
+digraph {
+    0 [label="Query(A for )"]
+    1 [label="Rule(\"sao\", [DependencyKey(\"AWO\", Some(\"AS\")), DependencyKey(\"AWO\", Some(\"FS\"))])"]
+    2 [label="Rule(\"awofs\", [DependencyKey(\"FS\", None)])"]
+    3 [label="Rule(\"awoas\", [DependencyKey(\"AS\", None), DependencyKey(\"A\", None)])"]
+    4 [label="Param(AS)"]
+    5 [label="Param(FS)"]
+    0 -> 1 [label="DependencyKey(\"A\", None)"]
+    // Eliminated because the target does not consume `AS`.
+    1 -> 2 [label="DependencyKey(\"AWO\", Some(\"AS\"))" color="red"]
+    1 -> 3 [label="DependencyKey(\"AWO\", Some(\"AS\"))"]
+    1 -> 2 [label="DependencyKey(\"AWO\", Some(\"FS\"))"]
+    // Eliminated because although the target does transitively consume `FS`
+    // (via awoas->sao->awofs->FS) it consumes the instance that is provided on
+    // the `sao->awofs` edge, rather than the instance provided here.
+    1 -> 3 [label="DependencyKey(\"AWO\", Some(\"FS\"))" color="red"]
+    3 -> 4 [label="DependencyKey(\"AS\", None)"]
+    3 -> 1 [label="DependencyKey(\"A\", None)"]
+    2 -> 5 [label="DependencyKey(\"FS\", None)"]
+}
+```
+
+But (likely) due to how we special case (see the "splits" TODO) the breaking of cycles, the graph does not end up properly pruned.
 
-#### Pruning Choices Early
+### Possible Angles
 
-Because of how the in/out sets are adjusted during monomorphize, we're not able to prune the graph as we would be able to if we knew for certain that all of a node's transitive dependencies had already converged. As mentioned in the `Param`s section, we have a constraint that a "provided" `Param` (the input to a `Get`) must be consumed by a subgraph.
+#### Monotonicity vs Global Information
 
-But we can only apply these constraints "directly": i.e., when we're actually visiting a node with a `Get`, we _can_ eliminate dependencies whose in-sets (`Param`s transitively consumed by dependencies) show that the provided `Param` is not consumed in that subgraph. Likewise, we _can_ prune a direct dependency on a `Param` node if the out-set (`Param`s provided by transitive dependees) shows that it is not present.
+It is unclear precisely what factors the monomorphize phase should use on a node-by-node basis to decide whether a node should be split or whether to noop. The implementation started out similar to live variable analysis, which is expected to be monotonic and converge (meaning that the in/out sets would get smaller/larger over time in a lattice). The [special casing we've introduced](https://github.com/pantsbuild/pants/blob/3a188a1e06d8c27ff86d8c311ff1b2bdea0d39ff/src/rust/engine/rule_graph/src/builder.rs#L386-L397) around "giving up" when a node would attempt a split that we had previously recorded attempts to use global information to decide that further splitting is fruitless, but it's possible that there is a cycle-safe way to inspect a set of nodes that have stabilized?
 
-But we _cannot_ use the out-set to prune choices based on their in-sets in the current implementation. Both the in-set and out-set shrink as nodes are split, and it's possible that the in-set of a dependency will shrink later and become satisfiable. We can only use transitive information if we know that all reachable dependencies have already converged... and that is non-trivial in the presence of cycles. It's possible that nodes need to be in one of multiple states depending on whether they depend transitively on themselves? Or perhaps some property of a [dominator tree](https://en.wikipedia.org/wiki/Dominator_(graph_theory)) would help here? 
+We've also explored converting our irreducible loops to reducible loops using [Janssen/Corporaal](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.3925&rep=rep1&type=pdf), but the reducible graph does not appear to converge more quickly. It's possible that some property of a [dominator tree](https://en.wikipedia.org/wiki/Dominator_(graph_theory)) would help here? Being able to differentiate "back" edges might allow for better local information.
 
 #### Adjusting Phases
 
-It's possible that rather than generating a polymorphic graph as a first step, that there is a viable strategy for directly generating a monomorphic graph without regard for requirements, and then running live variable analysis on that before pruning. This would likely generate a much larger initial graph (because until live variable analysis, no pruning would be possible), but the simplicity might be worth it.
+It's possible that rather than generating a polymorphic graph as a first step, that there is a viable strategy for directly generating a monomorphic graph without regard for requirements, and then running live variable analysis on that before pruning. The challenge with this though seems to be the same as the challenge with dynamically splitting nodes during monomorphize: deciding how many times to split the nodes of a cycle.