Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: skip if remote doesn't match output remote #10365

Merged
merged 2 commits into from
Mar 25, 2024
Merged

Conversation

dberenbaum
Copy link
Contributor

Fixes #8298.

Take an example where there is an output like:

outs:
- md5: d3b07384d113edec49eaa6238ad5ff00
  size: 4
  hash: md5
  path: foo
  remote: foo_remote

Previously, dvc push/pull/fetch -r bar_remote would push/pull/fetch output foo from foo_remote. With this PR, it skips output foo.

Copy link

codecov bot commented Mar 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.69%. Comparing base (6f60388) to head (1830e95).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #10365   +/-   ##
=======================================
  Coverage   90.68%   90.69%           
=======================================
  Files         501      501           
  Lines       38825    38853   +28     
  Branches     5620     5623    +3     
=======================================
+ Hits        35210    35238   +28     
  Misses       2968     2968           
  Partials      647      647           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@skshetry
Copy link
Member

skshetry commented Mar 22, 2024

As far as I remember, this is intentional. --remote only changes the default remote.

We expect dvc pull or dvc pull -r remote to pull all the necessary files. After this change, it might pull partially, right?

@dberenbaum
Copy link
Contributor Author

dberenbaum commented Mar 22, 2024

As far as I remember, this is intentional.

I think it is undefined. I don't see any discussion or tests that show it was intentional. The docs here and here seem to indicate that dvc will only pull from the specified remote.

--remote only changes the default remote.

This PR doesn't change non-default remotes specified in .dvc files. It respects those by skipping them when requesting to only pull from a specific remote.

We expect dvc pull or dvc pull -r remote to pull all the necessary files. After this change, it might pull partially, right?

Yes, and this is why I think it's a worthwhile change even if the current behavior is intended. Current behavior has no product benefit that I can see, while this change makes -r more useful. dvc pull already pulls from all the specified remotes. To change the default remote, you can easily modify the config or create a local config. However, there is no way to only pull data from a specific remote, which is what is being asked for in #8298. Is there any product reason we would not want this behavior?

@shcheklein
Copy link
Member

The change (at least direction) makes total sense to me. Thanks @dberenbaum !

@skshetry
Copy link
Member

skshetry commented Mar 25, 2024

I think it is undefined. I don't see any discussion or tests that show it was intentional.

We have always treated dvc pull as an alias to dvc pull -r $(dvc remote default). From user's perspective, we have always treated them as equivalent.

Yes, and this is why I think it's a worthwhile change even if the current behavior is intended. Current behavior has no product benefit that I can see ...

I understand the motivation/use case. We had too many bugs last year with regard to remote, and I am a bit worried about changing the semantics of a command at this time. Also, this change could be considered as an incompatible change(?). But I'll leave it up to you to decide here.

@dberenbaum
Copy link
Contributor Author

Also, this change could be considered as an incompatible change(?).

It changes behavior, but I don't think we need to consider it a breaking change. Regardless of any internal understanding of the expected behavior, I don't see any docs or external understanding that current behavior is expected rather than undefined, and indeed we have 2 people in #8298 who expected the opposite.

@dberenbaum dberenbaum enabled auto-merge (rebase) March 25, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pull: how to only download data from a specified remote?
3 participants