Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline: show: doesn't show all DAGs in the projects when ran without arguments #2392

Closed
drorata opened this issue Aug 13, 2019 · 17 comments · Fixed by #3905
Closed

pipeline: show: doesn't show all DAGs in the projects when ran without arguments #2392

drorata opened this issue Aug 13, 2019 · 17 comments · Fixed by #3905
Assignees
Labels
enhancement Enhances DVC p2-medium Medium priority, should be done, but less important ui user interface / interaction

Comments

@drorata
Copy link

drorata commented Aug 13, 2019

As per discussion with @pared we realized that the behavior of dvc pipeline show --ascii contradicts the documentation. Current behavior looks for Dvcfile and fails if it is not found.

The expected behavior of dvc pipeline show --ascii is that it will find all .dvc files in the workspace and plot all pipelines. It does make sense IMHO that if a .dvc is provided then the yielded graph is showing the steps up to the file's corresponding stage.

@pared
Copy link
Contributor

pared commented Aug 13, 2019

Some context
During update of help message for targets argument we introduced same help message for dvc pipeliens show as for other commands (add, checkout, commit). The problem is that it does not work similiar way as the other commands. As of today (0.54.1) if target has not been provided dvc pipelines show will look for Dvcfile. We should reconsider the behaviour, as Dvcfile is created in specific circumstances.

We need to either:

  • modify behaviour to look for last possible stage (as it is intuitive behaviour to show whole pipeline) and modify help message accordingly
  • roll back help message for pipelines show

@pared
Copy link
Contributor

pared commented Aug 13, 2019

In case of first approach we need to consider what to do in case like this:

             get_data.dvc
           /               \
   test_data.dvc      train_data.dvc

@drorata
Copy link
Author

drorata commented Aug 13, 2019

I think that the most natural behavior is:

  • if a target foo.dvc is provide then all steps preceding it will be shown
  • if no target is provided then all .dvc files will be discovered and the corresponding pipeline(s) will be plotted.

If this approach will be picked, then the case mentioned by @pared should yield:

             get_data.dvc
           /              
   test_data.dvc

assuming the target test_data.dvc was provided.

@pared
Copy link
Contributor

pared commented Aug 13, 2019

@drorata showing all paipelines seems reasonable in case of no target

@efiop efiop added enhancement Enhances DVC p3-nice-to-have It should be done this or next sprint p2-medium Medium priority, should be done, but less important and removed p3-nice-to-have It should be done this or next sprint labels Aug 14, 2019
@drorata
Copy link
Author

drorata commented Aug 21, 2019

Maybe I'm confused, but, consider the following example. Assume we have a data file data.txt:

foo
bar
hello
world

and we build the following steps:

dvc run -f split.dvc -d data.txt -o first_2.txt -o last_2.txt "head -n 2 data.txt > first_2.txt && tail -n 2 data.txt > last_2.txt"
dvc run -f add_ts.dvc -d first_2.txt -o first_2_w_ts.txt "cp first_2.txt first_2_w_ts.txt && echo $(date) | tee -a first_2_w_ts.txt"
dvc run -f add_sig.dvc -d last_2.txt -o last_2_w_sig.txt "cp last_2.txt last_2_w_sig.txt && echo 'this is my signature' >> last_2_w_sig.txt"
dvc run -f combine.dvc -d first_2_w_ts.txt -d last_2_w_sig.txt -o combined.txt "cat first_2_w_ts.txt last_2_w_sig.txt > combined.txt"
dvc run -f lines_count.dvc -d combined.txt -m lines_count "wc -l combined.txt > lines_count"

Now the result of

dvc pipeline show --ascii lines_count.dvc | less
dvc pipeline show --ascii combine.dvc | less
dvc pipeline show --ascii split.dvc | less

is always the same --- showing the whole pipeline. This is yet another behavior --- independently of the specified .dvc file the result is the overall pipeline.

@efiop
Copy link
Member

efiop commented Aug 21, 2019

@drorata dvc pipeline show --ascii stage.dvc is showing the full pipeline that stage.dvc is a part of. Our docs are not right. About the Dvcfile, pipeline show needs to know the target stage, so it could isolate a DAG that it belongs to and show it. Otherwise you might get a few disconnected DAGs, that are not trivial to visualize in any form. E.g. pipeline show(no flags) shows the order in which the pipeline will be ran, and so for multiple DAGs in your project we'll have to probably print it one after another with some newline between them to separate them. Same with --ascii, the navigation won't be trivial.

@efiop efiop changed the title dvc doesn't discover all stages when Dvcfile is omitted in the pipeline sub-command pipeline: show: doesn't show all DAGs in the projects when ran without arguments Aug 21, 2019
@shcheklein
Copy link
Member

Related #2391

@shcheklein
Copy link
Member

May be I'm confused as well, but I think it's reasonable to change the behavior to show only the part of the pipeline up to the target. What are the problems we can expect here, @efiop ?

@efiop
Copy link
Member

efiop commented Aug 21, 2019

@shcheklein yep, we could do that by default and have --full or something to behave like we do right now.

@drorata
Copy link
Author

drorata commented Aug 21, 2019

May be I'm confused as well, but I think it's reasonable to change the behavior to show only the part of the pipeline up to the target. What are the problems we can expect here, @efiop ?

Let's for the time being, forget about the discrepancy between the actual behavior and the docs and focus on the desired behavior. I also agree that if a target was provided, then the DAG containing this target should be rendered from the target (including it) all the way up to (all) the source(s). Obviously, if the target is a "last" step of the DAG, then (almost, see the next item) the whole DAG is to be rendered.

BTW, this raises a question what to do with "siblings" (a term which is not well defined in a DAG's context) of a target? I'd suggest sticking to a concise approach as I described above.

If a target is specified and the flag --full is used, then I guess the expected behavior is that the complete DAG containing the target will be rendered.

Another case is when a target is not provided at all. What would be rendered then? I believe it is desired to return all the possible DAGs. Indeed there could be several disconnected DAGs, but still, it is worthy to have them. Maybe, in this case, it won't be very helpful to actually render them, but having an explicit description of them (like dot) is great. As this is kind of complex case, it could be that

dvc pipeline show

would yield an error saying something like:

If you want to obtain all available DAGs run dvc pipeline show --all

An additional issue which was raised is the Dvcfile and how to handle it. Here I leave everything to you guys as I don't really understand when and how to use Dvcfiles.

@shcheklein shcheklein added the ui user interface / interaction label Aug 21, 2019
@drorata
Copy link
Author

drorata commented Sep 12, 2019

Isn't this related to #2453?

@shcheklein shcheklein changed the title pipeline: show: doesn't show all DAGs in the projects when ran without arguments [HACKTOBERFEST] pipeline: show: doesn't show all DAGs in the projects when ran without arguments Oct 3, 2019
@shcheklein shcheklein pinned this issue Oct 3, 2019
@shcheklein shcheklein unpinned this issue Nov 3, 2019
@shcheklein shcheklein changed the title [HACKTOBERFEST] pipeline: show: doesn't show all DAGs in the projects when ran without arguments pipeline: show: doesn't show all DAGs in the projects when ran without arguments Nov 21, 2019
@mastaer
Copy link

mastaer commented Dec 27, 2019

i would love this feature :)

@mastaer
Copy link

mastaer commented Dec 28, 2019

from dvc.repo import Repo as DVCRepo
dvcrepo = DVCRepo('.')
Gs = dvcrepo.pipelines
nodes = []
for G in Gs:
   nodes = nodes + [n for n in G.nodes()]
self.args.targets = nodes

, Right?

@efiop
Copy link
Member

efiop commented Dec 29, 2019

@mastaer Yes, that line is the cause. But the solution is a bit deeper, and probably would look like making _show show all of those stages when given None as a target. We would also probably need to separate those pipelines with some divider, so it would make more sense. And if we are adopting such behavior for pipeline show, we'll probably need to do the same for pipeline show --ascii and possibly other flags. Could go about it 1by1, if you would like to contribute a patch 🙂 but ideally need to adopt a common behavior.

@efiop
Copy link
Member

efiop commented Apr 29, 2020

Related #3661

@efiop efiop added this to To do in DVC Sprint 5 - 19 May 2019 via automation Apr 29, 2020
@drorata
Copy link
Author

drorata commented Apr 30, 2020

I this comment @efiop was not sure about the behavior of version 0.66. Here is a small example:

Build the following DAG (in a fresh environment with version 0.66 installed):

dvc run -d my_data.txt -o head.txt "head -n 2 my_data.txt > head.txt"
dvc run -d my_data.txt -o tail.txt "tail -n 3 my_data.txt > tail.txt"
dvc run -d tail.txt -o tail_count "wc -l tail.txt > tail_count"
dvc run -d head.txt -o head_count "wc -l head.txt > head_count"

Then:

# dvc pipeline show --ascii head.txt.dvc
Checking for updates...
                 .-----------------.
                 | my_data.txt.dvc |
                 `-----------------'
                 **               **
              ***                   ***
            **                         **
 .--------------.                  .--------------.
 | head.txt.dvc |                  | tail.txt.dvc |
 `--------------'                  `--------------'
         *                                 *
         *                                 *
         *                                 *
.----------------.                .----------------.
| head_count.dvc |                | tail_count.dvc |
`----------------'                `----------------'

Now, if you update to version 0.93 then:

+-----------------+
| my_data.txt.dvc |
+-----------------+
          *
          *
          *
  +--------------+
  | head.txt.dvc |
  +--------------+

is the output of dvc pipeline show --ascii head.txt.dvc.

@efiop efiop removed this from To do in DVC Sprint 5 - 19 May 2019 May 19, 2020
@efiop efiop added this to To do in DVC Sprint 19 May - 2 June 2019 via automation May 19, 2020
@efiop efiop self-assigned this May 27, 2020
@efiop efiop moved this from To do to In progress in DVC Sprint 19 May - 2 June 2019 May 27, 2020
@efiop efiop moved this from In progress to Review in progress in DVC Sprint 19 May - 2 June 2019 May 29, 2020
@efiop
Copy link
Member

efiop commented May 29, 2020

@drorata @mastaer Guys, we are looking into reworking this and have created a draft PR #3905 that introduces dvc dag command that tries to take into account all of the amazing feedback you've guys provided. Would appreciate if you could take a look and tell what you think. 🙏

DVC Sprint 19 May - 2 June 2019 automation moved this from Review in progress to Done May 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC p2-medium Medium priority, should be done, but less important ui user interface / interaction
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants