Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repro: Disable run-cache if --no-commit. #8718

Merged
merged 1 commit into from Jan 9, 2023

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Dec 20, 2022

No description provided.

@daavoo daavoo linked an issue Dec 20, 2022 that may be closed by this pull request
@daavoo daavoo force-pushed the 4428-dvc-repro-incorrectly-saving-directory-to-cache branch from ac9f77b to 80a3cc7 Compare December 20, 2022 18:27
@daavoo
Copy link
Contributor Author

daavoo commented Dec 20, 2022

@dberenbaum In the issue there was also a point about dvc repro -f should disable restoring the run or outputs. but I found this is already working as expected (unless I misunderstand the point)

@daavoo daavoo added the A: run-cache Related to the run-cache feature label Dec 20, 2022
@codecov
Copy link

codecov bot commented Dec 20, 2022

Codecov Report

Base: 93.54% // Head: 93.55% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (a1729ac) compared to base (0e5674f).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8718   +/-   ##
=======================================
  Coverage   93.54%   93.55%           
=======================================
  Files         457      457           
  Lines       36236    36237    +1     
  Branches     5259     5260    +1     
=======================================
+ Hits        33896    33900    +4     
+ Misses       1835     1832    -3     
  Partials      505      505           
Impacted Files Coverage Δ
dvc/stage/__init__.py 93.99% <100.00%> (+0.01%) ⬆️
tests/func/test_data_cloud.py 99.41% <100.00%> (ø)
tests/func/test_repro.py 99.04% <100.00%> (ø)
tests/unit/repo/experiments/conftest.py 95.31% <0.00%> (+4.68%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@daavoo daavoo force-pushed the 4428-dvc-repro-incorrectly-saving-directory-to-cache branch 2 times, most recently from dd52da5 to b8f964d Compare December 20, 2022 19:48
@daavoo daavoo requested a review from a team December 22, 2022 10:07
@dberenbaum
Copy link
Contributor

Thanks @daavoo! It's been a very long time since I looked into this issue, and I'm no longer sure about it, and I'm hesitant to put it in a minor version release.

Let's say I'm using experiments, and my training stage outputs both a model that's cached in dvc and metrics that aren't. With the recent name changes, we made it possible to have duplicate experiments, but it didn't seem so bad because duplicate stages would still be cached and skipped. With this PR, I will end up duplicating all the computation of this training stage.

It also makes dvc much more reliant on dvc.lock. Before this PR, It's possible to have a workflow where you treat dvc.lock as ephemeral and don't commit or share it. As long as you share the run-cache, it's still easy to recover everything. After this PR, there's no way to recover the outputs from a stage with cache: false outputs unless you have dvc.lock.

Should dvc be saving the git hash for objects that have cache: false instead of the dvc md5? In the typical case for cache: false where those files are being tracked by git, it seems like it's still useful to have the run-cache but there's no need to add those files to the dvc cache since they can be found in git objects.

What do you think?

@daavoo
Copy link
Contributor Author

daavoo commented Dec 27, 2022

What do you think?

I think there are a lot of things to discuss there 😅

WDYT about the first commit/part of the P.R. (repro: Disable run-cache if --no-commit)?

I think we can start by discussing/merging that and continue discussing cache: false behavior separately.

@daavoo daavoo force-pushed the 4428-dvc-repro-incorrectly-saving-directory-to-cache branch from b8f964d to eee8627 Compare December 27, 2022 16:23
@daavoo daavoo changed the title stage: Revisit run-cache usage for repro --no-commit and cache: false outputs. stage: Revisit run-cache usage for repro --no-commit Dec 27, 2022
@daavoo daavoo changed the title stage: Revisit run-cache usage for repro --no-commit repro: Disable run-cache if --no-commit. Dec 27, 2022
@daavoo daavoo force-pushed the 4428-dvc-repro-incorrectly-saving-directory-to-cache branch from eee8627 to 47130ad Compare December 27, 2022 18:06
@daavoo daavoo force-pushed the 4428-dvc-repro-incorrectly-saving-directory-to-cache branch from 47130ad to a03270a Compare January 3, 2023 11:24
@daavoo daavoo force-pushed the 4428-dvc-repro-incorrectly-saving-directory-to-cache branch from a03270a to 656aaea Compare January 9, 2023 10:46
@daavoo daavoo enabled auto-merge (rebase) January 9, 2023 10:46
@daavoo daavoo added the enhancement Enhances DVC label Jan 9, 2023
As described in #4428 (comment).

Add `run_cache` argument to `Stage.save`. Set it to `False` if `no_commit`.
@daavoo daavoo force-pushed the 4428-dvc-repro-incorrectly-saving-directory-to-cache branch from 656aaea to a1729ac Compare January 9, 2023 11:21
@daavoo daavoo merged commit 1d5de9c into main Jan 9, 2023
@daavoo daavoo deleted the 4428-dvc-repro-incorrectly-saving-directory-to-cache branch January 9, 2023 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: run-cache Related to the run-cache feature enhancement Enhances DVC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DVC Repro incorrectly saving directory to cache
2 participants