Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc update: support worktree update #8649

Merged
merged 4 commits into from
Dec 6, 2022
Merged

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Dec 1, 2022

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Will close #8382

  • When using a worktree remote, dvc update <target>... will update the specified targets to get the latest modifications from the remote
  • If a file is unmodified (and etags match), the version ID from the local .dvc file will not be modified even if the latest remote version ID does not match (to prevent merge conflicts and be consistent with dvc push for worktrees)
  • For a file output, if the file does not exist in the remote, dvc update will output a warning and will not make local modifications
  • For a dir output, if the directory does not exist in the remote, or is empty in the remote (and contains no actual files within the dir), dvc update will output a warning and will not make local modifications
  • For dir outputs that are not empty in the remote, all other file updates within the dir will be reflected in the update (i.e. newly added files, deleted files, modified files)
  • dvc update for worktrees will only download new and modified files from the remote (it does not re-download an entire remote directory like we do for updating imports)

Also note that dvc update will only update one target at a time in sequence. When passed multiple targets it updates one target at a time, rather than attempting to batch all of the updates together (this is the same behavior as updating imports with multiple targets)

@pmrowla pmrowla added A: data-sync Related to dvc get/fetch/import/pull/push A: cloud-versioning Related to cloud-versioned remotes labels Dec 1, 2022
@pmrowla pmrowla self-assigned this Dec 1, 2022
@codecov
Copy link

codecov bot commented Dec 1, 2022

Codecov Report

Base: 94.12% // Head: 93.98% // Decreases project coverage by -0.14% ⚠️

Coverage data is based on head (15e7acd) compared to base (74a13a4).
Patch coverage: 17.52% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8649      +/-   ##
==========================================
- Coverage   94.12%   93.98%   -0.15%     
==========================================
  Files         435      435              
  Lines       33463    33523      +60     
  Branches     4711     4729      +18     
==========================================
+ Hits        31496    31505       +9     
- Misses       1538     1587      +49     
- Partials      429      431       +2     
Impacted Files Coverage Ξ”
dvc/commands/update.py 88.88% <ΓΈ> (ΓΈ)
dvc/repo/worktree.py 10.08% <8.45%> (-0.18%) ⬇️
dvc/repo/update.py 50.00% <37.50%> (-21.43%) ⬇️
dvc/repo/index.py 95.60% <100.00%> (ΓΈ)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

β˜” View full report at Codecov.
πŸ“’ Do you have feedback about the report comment? Let us know in this issue.

@dberenbaum
Copy link
Contributor

dberenbaum commented Dec 1, 2022

Basic functionality looks good. Need to clarify a few flags:

  • --to-remote/--remote: I can't see how this makes sense. Let's disable it/throw an error for a cloud-versioned remote.
  • --no-download: It should update the .dvc file but not download locally (seems to ignore the flag right now).
  • --rev: For directories, it should throw an error. For files, it should ideally update to the specified revision, but this can be a follow-up PR/discussion (fine to throw an error for now).

@pmrowla
Copy link
Contributor Author

pmrowla commented Dec 2, 2022

  • --no-download: It should update the .dvc file but not download locally (seems to ignore the flag right now).

I'm not sure we can actually implement no-download right now with the way cloud versioning is currently implemented. It works for imports because the etag/version/url for the import source is stored separately as a dep, and we just completely remove the output information for the import stage entirely. The problem with supporting this for worktrees is that we have to collect both the cloud/remote etag/version and the local md5 (which requires downloading) for the newly updated output. I guess we will need to just clear the md5 field for worktree update --no-download and then have fetch account for that possibility.

I've been thinking about this a little bit and it seems like we probably need to change how we handle worktree remotes in the .dvc file. What we really have in this scenario is a local out, and then one more external data locations for each remote. The external locations function like an import's deps entry, but in this case you could have multiple possible locations (if you configure multiple remotes), and we can also write (push) to the external location as well (so it's not a read-only dependency).

related: #8356

  • --rev: For directories, it should throw an error. For files, it should ideally update to the specified revision, but this can be a follow-up PR/discussion (fine to throw an error for now).

I'm not sure supporting --rev makes sense for worktrees. It's not an import with a pinned dependency version. If my current workspace and the latest versions on a remote are always supposed to be mirrors of each other, it doesn't really make sense that I would update to some intermediate (non-latest) version from the remote.

@dberenbaum
Copy link
Contributor

Makes sense, @pmrowla! Can we throw an error that they are not supported for version_aware/worktree remotes for all those flags?

@pmrowla pmrowla merged commit f5699ce into iterative:main Dec 6, 2022
@pmrowla pmrowla deleted the worktree-update branch December 6, 2022 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: cloud-versioning Related to cloud-versioned remotes A: data-sync Related to dvc get/fetch/import/pull/push
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cloud versioning: pull non-DVC cloud updates
2 participants