Tolerate dependency cycles when using the v2 engine #10046

Eric-Arellano · 2020-06-13T03:08:05Z

Often, languages are able to handle dependency cycles. For example, in Python, you can have import cycles in certain situations.

But, Pants has never worked with cycles between targets, particularly when resolving transitive targets, e.g. dependencies --transitive.

Turns out, we can tolerate cycles by changing the algorithm to resolve transitive dependencies to use iteration, rather than recursion. The algorithm is inspired by BFS, but uses batching via MultiGet for better concurrency.

Any rules that use recursion will still fail. Right now, we don't have any. But, when implementing things like Java compilation, we will need to be careful that those too can handle dependency cycles.

Fails with v1

Even if we changed LegacyTransitiveHydratedTarget to tolerate cycles, some v1 goals like test would still fail because they expect an acyclic graph. So, to keep a good error message + to error more eagerly, we do not touch LegacyTransitiveHydratedTarget.

Remaining followup: warn on cycles

Some codebases may prefer to keep the current behavior of erroring on dep cycles. We should add an option that will allow you to ignore, warn, or error on dep cycles.

To be helpful, this error message must trace the full path of the cycle, e.g.

src/python/pants/util:strutil -> src/python/pants/util:strutil 
src/python/pants/util:strutil -> src/python/pants/util:dirutil -> src/python/pants/util:strutil

A great error message would collect all dep cycles before erroring, whereas the previous behavior before this PR would error upon the first cycle encountered.

Result on performance

Running multitime -n 10 ./v2 --no-enable-pantsd dependencies2 --transitive ::.

Before:

1: ./v2 --no-enable-pantsd dependencies2 --transitive ::
            Mean        Std.Dev.    Min         Median      Max
real        6.086       0.087       5.960       6.059       6.227
user        6.121       0.080       5.994       6.104       6.264
sys         3.846       0.076       3.686       3.889       3.928

After:

            Mean        Std.Dev.    Min         Median      Max
real        6.173       0.126       5.909       6.200       6.320
user        6.061       0.120       5.790       6.094       6.186
sys         3.829       0.100       3.628       3.856       3.949

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Eric-Arellano · 2020-06-13T18:30:26Z

I tried getting the cycle detection to work and got the structure of the code, but the performance is twice as bad (13.3 user seconds for dependencies2 --transitive :: vs. 6.1).

The main intuition I had is that we would have a separate resolution for each distinct target root. Why? We need to know the precise path of the cycle. This requires keeping track of each distinct target root, then their dependencies, then their dependencies' dependencies, and so on, so that we can trace the full path.

This resulted in this code:

@dataclass(frozen=True)
class _TransitiveTargetRequest:
    target: Target


@dataclass(frozen=True)
class _TransitiveTarget:
    closure: FrozenOrderedSet[Target]
    cycles: FrozenOrderedSet[Tuple[Target, ...]]


@rule
async def transitive_target(request: _TransitiveTargetRequest) -> _TransitiveTarget:
    count = 0
    visited: Dict[int, FrozenOrderedSet[Target]] = {}
    queued = FrozenOrderedSet([request.target])
    while queued:
        visited[count] = queued
        count += 1
        direct_dependencies = await MultiGet(
            Get[Targets](DependenciesRequest(tgt.get(Dependencies))) for tgt in queued
        )
        queued = FrozenOrderedSet(itertools.chain.from_iterable(direct_dependencies)).difference(
            itertools.chain.from_iterable(visited.values())
        )
    # print({k: [tgt.address.spec for tgt in v] for k, v in visited.items()})
    return _TransitiveTarget(
        closure=FrozenOrderedSet(itertools.chain.from_iterable(visited.values())),
        cycles=FrozenOrderedSet(),
    )


@rule
async def transitive_targets(targets: Targets) -> TransitiveTargets:
    """Find all the targets transitively depended upon by the target roots.

    This uses iteration, rather than recursion, so that we can tolerate dependency cycles. Unlike a
    traditional BFS algorithm, we batch each round of traversals via `MultiGet` for improved
    performance / concurrency.
    """
    transitive_per_target_root = await MultiGet(
        Get[_TransitiveTarget](_TransitiveTargetRequest(tgt)) for tgt in targets
    )
    return TransitiveTargets(
        tuple(targets),
        FrozenOrderedSet(
            itertools.chain.from_iterable(tt.closure for tt in transitive_per_target_root)
        ),
    )

But this results in much worse performance. We need many more iterations and to revisit the same nodes multiple times.

--

Instead of having a global option like --cycle-error-behavior, how do you all feel about adding a backend pants.backend.project_info.lint.cycle_detection that hooks up to ./pants lint?

I like that it avoids adding yet another goal. It also avoids the annoyance of a warning showing up every single time you run ./pants test when you have --cycle-error-behavior=warn. You will only see the cycle when you explicitly run ./pants lint.

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

This reverts commit 9fe4c4e. Turns out, v1 cannot handle dep cycles when running `test`. [ci skip-rust-tests] [ci skip-jvm-tests]

This reverts commit 0622a95. [ci skip-rust-tests] [ci skip-jvm-tests]

Eric-Arellano · 2020-06-27T03:22:49Z

Stu is working on an alternative design that allows us to still tolerate cycles, but to go through a little more ceremony to do it so that people are very conscious when introducing cycles.

…in file-addresses. (#10409) ### Problem After more deeply investigating the issues with #10230 (demonstrated in #10393 and its revert), it doesn't seem like the right idea to rely on the engine's cycle detection (which the implementation of #10230 showed should primarily be used for deadlock detection) to expose high level cycle tolerance for the build graph. ### Solution Move to iterative transitive target expansion (à la #10046), and cycle detect afterward using DFS with a stack of visited entries. ### Result Fixes #10059, and closes #10229. A followup will back out portions of #10230 (but not all of it, as there were a number of other improvements mixed in). [ci skip-rust-tests]

Eric-Arellano added 4 commits June 12, 2020 18:24

Tolerate dependency cycles with v2

a744577

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Tolerate dependency cycles with v1 too

0622a95

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Add tests

407c1b4

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Declare dep cycle in engine/

9fe4c4e

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Eric-Arellano requested review from stuhood, benjyw, illicitonion and cosmicexplorer June 13, 2020 03:08

Eric-Arellano changed the title ~~Tolerate dependency cycles~~ Tolerate dependency cycles when using the v2 engine Jun 13, 2020

Eric-Arellano added 3 commits June 13, 2020 11:40

Fix fmt

5ad192e

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Revert "Declare dep cycle in engine/"

c1a34aa

This reverts commit 9fe4c4e. Turns out, v1 cannot handle dep cycles when running `test`. [ci skip-rust-tests] [ci skip-jvm-tests]

Revert "Tolerate dependency cycles with v1 too"

4658736

This reverts commit 0622a95. [ci skip-rust-tests] [ci skip-jvm-tests]

Eric-Arellano closed this Jun 27, 2020

Eric-Arellano deleted the tolerate-cycles branch June 27, 2020 03:22

stuhood mentioned this pull request Jul 21, 2020

Implement cycle detection in transitive_targets, and tolerate cycles in file-addresses. #10409

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tolerate dependency cycles when using the v2 engine #10046

Tolerate dependency cycles when using the v2 engine #10046

Eric-Arellano commented Jun 13, 2020 •

edited

Eric-Arellano commented Jun 13, 2020 •

edited

Eric-Arellano commented Jun 27, 2020

Tolerate dependency cycles when using the v2 engine #10046

Tolerate dependency cycles when using the v2 engine #10046

Conversation

Eric-Arellano commented Jun 13, 2020 • edited

Fails with v1

Remaining followup: warn on cycles

Result on performance

Eric-Arellano commented Jun 13, 2020 • edited

Eric-Arellano commented Jun 27, 2020

Eric-Arellano commented Jun 13, 2020 •

edited

Eric-Arellano commented Jun 13, 2020 •

edited