-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pipeline: show: outs: eliminate extra edges in DAG #3217
pipeline: show: outs: eliminate extra edges in DAG #3217
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to connect outputs to deps, this is not changed here. The code still connects outs to outs.
@Suor I initially looked at the issue description on Discord and tried to reproduce the same issue. This fixes that diamond problem, where A generates B and C, and B doesn't go on to be a dependency of the ultimate target but is still in the graph. I'll try to reproduce the more general issue and see what fixes it. |
@Suor there. It's getting pretty nested and I could use itertools.product() to make those two for loops just one, but it's probably unnecessarily complicated. |
Wait, that implementation is incorrect. |
This should do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need a test to clear this out.
tests/func/test_pipeline.py
Outdated
) | ||
|
||
args = ["pipeline", "show", "--outs", "--dot", final_stage.path] | ||
with tmp_dir.chdir(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are already in tmp_dir, so no need to chdir. Or am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct! I didn't know about that. Updating.
tests/func/test_pipeline.py
Outdated
final_stage.path, commands=False, outs=True | ||
) | ||
|
||
assert set(nodes) == set(["final", "derived1", "base"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for converting list to set, set(["final", "derived1", "base"])
== {"final", "derived1", "base"}
tests/func/test_pipeline.py
Outdated
deps=["derived1"], outs=["final"], cmd="echo final > final" | ||
) | ||
|
||
args = ["pipeline", "show", "--outs", "--dot", final_stage.path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are calling __build_graph
anyway, why would we provide those args? You can simply initialize command with empty list. :)
tests/func/test_pipeline.py
Outdated
@@ -98,6 +99,28 @@ def test_dot_commands(self): | |||
self.assertEqual(ret, 0) | |||
|
|||
|
|||
def test_disconnected_stage(tmp_dir, dvc, caplog): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caplog seems to be unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little suggestions for tests.
Another thing, not necessarily in this change: @efiop, do you think that we should keep the name of the method __build_graph
? It has special meaning in python, and at the same time we do not leverage the functionality it provides (preventing method name collisions with child classes), so maybe its better to just replace it with _build_graph
and access it by command._build_graph?
Yeah, that hurts my eyes too, it was introduced a few years back by one of the contributors and we didn't touch that since. We should change it, but not necessarily in this PR (though, might as well) 🙂 |
@efiop that is very tactful of you, to say "one of our contributors" and not "you did that, remember?" :D |
Wait, I didn't mean that, though. I don't think it was you, was it? 👀 EDIT: oh, it was https://github.com/iterative/dvc/pull/1141/files . I didn't mean to shame anyone. But at least we are now clear that we shouldn't have done that, so that is what really matters. We've all come a long way 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some cosmetic things, but LGTM overall.
dvc/command/pipeline.py
Outdated
|
||
return nodes, edges, networkx.is_tree(G) | ||
if outs and not commands: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't this two mutually exclusive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they seem to be!
I'll remove the |
I believe I've addressed all of the feedback. Additionally, I found a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @fabiosantoscode ! 🙏
This fixes extra nodes edges appearing in
pipeline show --outs
, by using networkx to traverse the graph instead of listing all edges.I spent a lot of time trying to figure out how to test this, so I thought I'd either leave it untested or take suggestions from the PR review.
Fixes #1744
❗ Have you followed the guidelines in the Contributing to DVC list?
📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏