Skip to content

Conversation

@efiop
Copy link
Contributor

@efiop efiop commented Sep 7, 2020

This option allows run-cache to pull the cache for the outputs
when restoring a stage instead of pulling everything at once during
dvc pull --run-cache.

Fixes #4223

iterative/dvc.org#1841

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@efiop efiop force-pushed the fix-4223 branch 3 times, most recently from 26beef8 to 7bc41fe Compare September 7, 2020 21:40
@peper0
Copy link
Contributor

peper0 commented Oct 1, 2020

Are there any time estimates on merging this feature? I'm waiting for it ;)

@efiop
Copy link
Contributor Author

efiop commented Oct 1, 2020

@peper0 Planning to get back to this in the upcoming weeks. Considering pulling by default to have a more intuitive and convenient behaviour. Thank you for your interest!

@efiop efiop force-pushed the fix-4223 branch 2 times, most recently from 3a42d6f to 4df15b3 Compare October 5, 2020 22:42
This option allows run-cache to pull the cache for the outputs
when restoring a stage instead of pulling everything at once during
`dvc pull --run-cache`.

Fixes iterative#4223
efiop added a commit to iterative/dvc.org that referenced this pull request Oct 5, 2020
@efiop efiop changed the title [WIP] repro: support --pull repro: support --pull Oct 5, 2020
@efiop
Copy link
Contributor Author

efiop commented Oct 5, 2020

Ok, i think it is better to move on with this explicit option and then we'll turn it on by default in the future.

shcheklein pushed a commit to iterative/dvc.org that referenced this pull request Oct 6, 2020
* docs: repro: add --pull

Per iterative/dvc#4538

* update message
@efiop efiop merged commit 7424f22 into iterative:master Oct 6, 2020
@efiop efiop added the feature is a feature label Oct 7, 2020
@jorgeorpinel
Copy link
Contributor

I just noticed this. Looks great but I have a concern:

dvc pull --run-cache has an option name that obviously refers to the run-cache. But dvc repro --pull doesn't seem like an intuitive name that would refer to the run-cache. To me it sounds like it's related to remote storage (which is confusing since repro and pull are basically alternatives). why not repro --run-cache (or something else)?

@jorgeorpinel
Copy link
Contributor

Another 2 alternatives: --pull-cache, --pull-run-cache

@efiop
Copy link
Contributor Author

efiop commented Oct 9, 2020

--pull-cache

This one sounds a bit tautological, as what else are we pulling anyway.

--pull-run-cache

This one doesn't correspond to what is going on, as run-cache itself is pulled in dvc pull --run-cache right now.

@jorgeorpinel Ideally both run-cache and cache pulling should be automatic with a --no-pull option to opt-out for both repro and checkout. Some variation of that will happen in the near future, as the overall experience is not yet polished. So let's just keep --pull for now, as it is not really critically different from suggested --pull-cache/run-cache.

@efiop efiop deleted the fix-4223 branch October 9, 2020 23:02
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Oct 11, 2020

--pull-cache is a bit tautological, as what else are we pulling anyway.

What do you mean @efiop? You pull (fetch, really) from remote storage TO the cache right? Oh, do you mean we alwyas "pull the cache (from remote storage)"? Yeah true.

But do you agree that --pull is too vague/ confusing?

I think that just "pull" means to the user: "pull tracked data from remote storage (to the cache)" but we are trying to say "pull the run cache (from remote storage)", right?

--pull-run-cache doesn't correspond to what is going on, run-cache itself is pulled in dvc pull --run-cache right now.

I don't understand this. Didn't this change introduce a new way to pull the run-cache? And it's already merged so it's available rn. Can you please clarify what repro --pull does then? According to the PR it "allows run-cache to pull ... the outputs when restoring a stage (during repro) instead of ... dvc pull --run-cache." — which is why I suggested --pull-run-cache as an alternative: it's unambiguous it and uses all the right terms (pull and run-cache).

But if I'm missing something, do you have a suggestion for a better option name? Thanks

@efiop
Copy link
Contributor Author

efiop commented Oct 11, 2020

@jorgeorpinel dvc pull --run-cache fetches run-cache: small files in .dvc/cache/runs. dvc repro --pull pulls regular files, hashes for which might've been restored from the existing run-cache, so kinda like regular dvc pull, hence why --pull seems ok for now.

@jorgeorpinel
Copy link
Contributor

dvc repro --pull pulls regular files, hashes for which might've been restored from the existing run-cache, so kinda like regular dvc pull

Woah I don't understand this 😅 is there an example somewhere? Sorry...

Will comment on the docs PR actually, since it's more an issue with the explanation I think. Thanks!

@jorgeorpinel
Copy link
Contributor

Hello, me again!

I commented on iterative/dvc.org#1841 (review): is is possible to get an example of the usage of this option to double check the explanation? (Please reply on the docs PR is possible.)

Cc @skshetry (since you commented about this on the support chat recently)

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature is a feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make dvc pull --run-cache allow targets for a granular pull process

3 participants