Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd ref: improve run and commit / how to add outs/deps without re-running stage? #460

Closed
maximerischard opened this issue Jun 27, 2019 · 6 comments · Fixed by #1840 or #1914
Closed
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: ref Content of /doc/*-reference type: enhancement Something is not clear, small updates, improvement suggestions

Comments

@maximerischard
Copy link

maximerischard commented Jun 27, 2019

Often when I run a command with dvc run, I realise that I have forgotten to specify one of the outputs. I would therefore like to update the DVC file with an additional output, but without re-running the (potentially expensive) command.

With the help of @efiop on the discourse channel, I was able to figure out that this can be achieved with the following steps:

  1. dvc run with the additional output and the --no-exec flag
  2. dvc commit to add the new output to the dvc cache, compute its checksum and add it to the dvc file.

This works perfectly, but looking at the documentation, it wasn't obvious that this is what dvc commit would do. In particular, the opening line “Record changes to the repository by updating DVC-files and saving outputs to cache.” It wasn't clear to me that “updating DVC-files” meant recomputing the checksums of the outputs.

In the step-by-step explanation of what dvc commit does:

What commit means is that DVC:

  • Computes a checksum for the file/directory.
  • Enters the checksum and file name into the DVC-file.
  • Tells the SCM to ignore the file/directory (e.g. add entry to .gitignore). Note that if the workspace was initialized with no SCM support (dvc init --no-scm), this does not happen.
  • Adds the file/directory or to the DVC cache.

I would suggest the first bullet could be reworded as “computes the checksum of each output file/directory, as well as the checksum of the DVC-file itself” (if my understanding is correct). The second bullet should read “enter the checksums of the outputs and of the DVC-file into the DVC-file”. I'm actually still unsure what is meant by “enters the file name”. Aren't all file names already present in the DVC-file?

UPDATE (From #612)
Dependencies can also be added to a stage without re-running a stage , using the same steps as described above.

@jorgeorpinel jorgeorpinel changed the title Improve documentation of dvc commit / how to add output without re-running stage? cmd ref: improve dvc run doc / how to add output without re-running stage? Jun 27, 2019
@jorgeorpinel jorgeorpinel changed the title cmd ref: improve dvc run doc / how to add output without re-running stage? cmd ref: improve run and commit / how to add output without re-running stage? Jun 27, 2019
@ryokugyu
Copy link
Contributor

already started working on it. @jorgeorpinel

@jorgeorpinel
Copy link
Contributor

This would make a great note in the command descriptions and possibly a new example, at least in dvc run.

@shcheklein shcheklein added command-reference A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions labels Jun 27, 2019
@shcheklein
Copy link
Member

I agree - it can be even a section in user guide - editing the pipeline.

And of course, dvc commit should be improved. @maximerischard you can make those changes easily with the Edit on Github button, btw :)

@dashohoxha dashohoxha mentioned this issue Oct 25, 2019
10 tasks
@imhardikj imhardikj changed the title cmd ref: improve run and commit / how to add output without re-running stage? cmd ref: improve run and commit / how to add output and dependencies without re-running stage? Oct 30, 2020
@jorgeorpinel
Copy link
Contributor

Per #612 (comment) this wasn't fully addressed by #1840. @imhardikj can you confirm what part this is still pending? Thanks

@jorgeorpinel jorgeorpinel reopened this Nov 6, 2020
@imhardikj
Copy link
Contributor

"How to add missing dependency to existing pipeline." part is remaining.
Same method as in https://dvc.org/doc/user-guide/how-to/add-output-to-stage will be used for this.
Should this be a separate doc or part of "add output to stage" doc itself?

@jorgeorpinel jorgeorpinel changed the title cmd ref: improve run and commit / how to add output and dependencies without re-running stage? cmd ref: improve run and commit / how to add outs/deps without re-running stage? Nov 9, 2020
@jorgeorpinel
Copy link
Contributor

@imhardikj the same doc should be enough. I see you opened #1913, let's follow up there. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: ref Content of /doc/*-reference type: enhancement Something is not clear, small updates, improvement suggestions
Projects
None yet
6 participants