-
-
Notifications
You must be signed in to change notification settings - Fork 627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inlining of PathGlob expansion causes excess invalidation with the daemon #4394
Comments
This is primarily annoying when changing files in the root of the repo, which is relatively rare. And since it doesn't effect correctness, going to defer to |
@jsirois : One thing to be cautious of: @illicitonion will be changing this code a bit in order to integrate it with #4585. The memoization that is done in this codepath will continue to be necessary to provide good incremental performance in the context of the engine though, so that part will not change. @illicitonion : Worth being aware of this ticket, as it explains a bit of why the |
This will be resolved by #5106. |
assuming fixed by #5106 (not sure why github didn't auto-close based on the annotation) - thanks @illicitonion ! |
So, it looks like this is still an issue, in that touching a file in the root of the repository will invalidate the listing there, which will invalidate all Fixing #4558 seems like the least contorted way to resolve this, so will re-open this until we resolve #4558. cc @jsirois |
…nged (#6059) ### Problem As described in #4558, we currently completely delete `Node`s from the `Graph` when their inputs have changed. One concrete case where this is problematic is that all `Snapshots` in the graph end up with a dependency on the `scandir` outputs of all of their parent directories, because we need to expand symlinks recursively from the root when consuming a `Path` (in order to see whether any path component on the way down is a symlink). This means that changes anywhere above a `Snapshot` invalidate that `Snapshot`, and changes at the root of the repo invalidate _all_ `Snapshots` (although 99% of the syscalls they depend on are not invalidated, having no dependencies of their own). But this case is just one of many cases affected by the current implementation: there are many other times where we re-compute more than we should due to the current `Node` invalidation strategy. ### Solution Implement node "dirtying", as described on #4558. There are a few components to this work: * In addition to being `Entry::clear`ed (which will force a `Node` to re-run), a `Node` may be `Entry::dirty`ed. A "dirty" `Node` is eligible to be "cleaned" if its dependencies have not changed since it was dirtied. * Each `Node` records a `Generation` value that acts as proxy for "my output has changed". The `Node` locally maintains this value, and when a Node re-runs for any reason (either due to being `dirtied` or `cleared`), it compares its new output value to its old output value to determine whether to increment the `Generation`. * Each `Node` records the `Generation` values of the dependencies that it used to run, at the point when it runs. When a dirtied `Node` is deciding whether to re-run, it compares the previous generation values of its dependencies to their current dependency values: if they are equal, then the `Node` can be "cleaned": ie, its previous value can be used without re-running it. This patch also expands the testing of `Graph` to differentiate dirtying a `Node` from clearing it, and confirms that the correct `Nodes` re-run in each of those cases. ### Result Cleaning all `Nodes` involved in `./pants list ::` after touching `pants.ini` completes 6 times faster than recomputing them from scratch (56 seconds vs 336 seconds in our repository). More gains are likely possible by implementing the performance improvement(s) described on #6013. Fixes #4558 and fixes #4394.
Inlining of
PathGlobsExpansion
causesSnapshots
to be more fragile than necessary. In particular, becauseVFS::expand
recursively walks from the root down to the relevant directories without memoizingPathGlobsExpansion
s, any change in the root invalidates allSnapshots
from a changed path up to the root of the repo (but not the directory listings and stats).This significantly affects warm performance in the presence of the daemon.
We should experiment with memoizing (portions of?)
PathGlobsExpansion
in theGraph
(similar to what we do forReadLink
/Scandir
), and see how much it affects cold/warm performance.NB: As a wildcard: it's possible that fixing #4558 would be a better approach than increasing memoization would be.
The relevant codepath centers around
pants/src/rust/engine/fs/src/lib.rs
Lines 570 to 576 in cdba694
pants/src/rust/engine/src/nodes.rs
Lines 913 to 940 in 01bcda6
VFS
in question ispants/src/rust/engine/src/nodes.rs
Line 73 in 01bcda6
ReadLink
andScandir
calls in theGraph
(which later allows for filesystem invalidation).The effect of memoizing
ReadLink
andScandir
(but none of the rest of the other steps of aPathGlobsExpansion
) is that aSnapshot
Node
in theGraph
is flat: theSnapshot
depends directly on the syscalls, and to recompute it the entireVFS::expand
call reruns.When we fix this, it would be good to get a simple benchmark in place on the rust side. But it's also important to directly measure the performance of:
./pants list ::
./pants list ::
after touching one file at the root of the repoThe text was updated successfully, but these errors were encountered: