New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
paths: allow finding paths between addresses that expand to multiple targets #19482
paths: allow finding paths between addresses that expand to multiple targets #19482
Conversation
f7554a9
to
333f90e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
I liked the version with the logic in a dedicated rule, it made the goal rule easier to see what was going on end-to-end imo, but no big deal either way I think.
Thanks for taking a look, Andreas, very kind of you!
Would you prefer to have (A) OR (B) one for loop (with one rule that will find all paths for each ? I personally believe it's best readable in (B) with a single
|
(A) feels perhaps more efficient, as we do want to inline |
…iple targets - add a rule to get paths between individual targets
…iple targets - add multiple rules to find paths
333f90e
to
afeb676
Compare
Alrighty, this is done then! Thank you so much for taking a look! |
The
paths
goal currently expects a singlefrom
and a singleto
address. This works great for finding all paths in the dependency graph between two individual targets, e.g. apython_source
or apython_requirement
. Attempting to find all paths between multiple targets in a directory, however, would fail with an error:This can be solved by running the
paths
goal multiple times, of course, by listing the targets in a directory and then usingpaths
goal in a for loop (Bash/Python). This is very expensive, however, as making a Pants call is known to have an overhead (pantsd/scheduler/etc). It's a whole lot cheaper to get all the paths in a single Pants call.Being able to find all paths between multiple targets would help answering the following questions:
One can find out if the module is accessing the package by doing
dependencies --transitive
for the module of interest and see if there are any modules from the package of concern. If there's a single module listed, then doing just another manualpaths
invocation is fine. However, if there are multiple modules listed, you'd have to runpaths
for each of them which is, again, very expensive.For example, you may want to estimate the refactoring efforts (to make two packages not depend on each other) and for this you need to see how many paths exist between two packages. Currently, obtaining the dependency graph for individual files in one shot (as
{moduleA: [deps], moduleB: [deps]}
) is not possible with Pants (I've solved this with a plugin), so you need to run dependencies, transitively for every file of interest, which again has an overhead (one file - one Pants call). Runningpants paths --from=src/libraryA:: --to=src/appB::
would provide all the necessary information in one shot.After the changes made, these commands produce identical results (i.e. it doesn't matter if a directory is passed or an address (with single or double colon notation):
Running this command consistently takes about 45-50s with nested for loop (the final suggested implementation).
I've experimented using
MultiGet
with multiple rules (please see commit 2 and 3), but the performance was identical. I therefore suggest to keep the for loops in place for simplicity purposes. Of course happy to switch to a full or partial rule-based implementation, if desired.The paths are still printed in the ASC order based on the length, however, they are grouped. I think if a user dumps multiple paths, the order doesn't have any meaning any longer so we don't have to sort them in any particular way and should keep the current behavior in place.
An example of the output:
As a little stress test experiment, I've run: