Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc update after importing an older version of a file #3336

Closed
elleobrien opened this issue Feb 14, 2020 · 9 comments
Closed

dvc update after importing an older version of a file #3336

elleobrien opened this issue Feb 14, 2020 · 9 comments
Labels
awaiting response we are waiting for your reply, please respond! :) question I have a question?

Comments

@elleobrien
Copy link

elleobrien commented Feb 14, 2020

I'm working with this repository to import a data file (there are currently two tags at different commits, which you can see in "releases")

when i do

$ dvc import --rev v.20.0 https://github.com/iterative/aita_dataset aita_clean.csv
I am able to download the first tagged version of the dataset. Then when I want to update the data file to its most recent version, I run

$ dvc update aita_clean.csv.dvc

And nothing happens (the file size is unchanged, think checksum is the same).

If, however, I go into aita_clean.csv.dvc and remove the line rev:v20.0 in the info below:

locked: true
deps:
- path: aita_clean.csv
  repo:
    url: https://github.com/iterative/aita_dataset
    rev: v.20.0
    rev_lock: c09235672eb7720f09e1cc4cc055f5c4f3f5286d
outs:
- md5: 0e5d7dbfc480cf7dcfaf5e341c3dde05
  path: aita_clean.csv
  cache: true
  metric: false
  persist: false

I am then able to successfully run dvc update and get the latest version of the file.

It seems that, as long as I used dvc import to get an older file version, dvc update considers it to be current? Is that by design?

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Feb 14, 2020
@efiop
Copy link
Member

efiop commented Feb 14, 2020

@andronovhopf Did the tag move though? When you remove rev, it tries to use default branch, hence why it updates. But I suspect that the tag itself didn't move between your runs.

@efiop efiop added the question I have a question? label Feb 14, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Feb 14, 2020
@efiop efiop added awaiting response we are waiting for your reply, please respond! :) triage Needs to be triaged labels Feb 14, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Feb 14, 2020
@elleobrien
Copy link
Author

elleobrien commented Feb 14, 2020

@efiop I'm pretty sure the tag didn't move- since the tags are on the default branch (master), not sure I understand

I'm still learning about tags- I think I assumed that, since the tags are all on the master branch, I could just "zoom forward" in time to the latest commit from a previous tag?

@efiop
Copy link
Member

efiop commented Feb 15, 2020

@andronovhopf Ah, got it :) So tag is just an alias for specific commit. If it didn't move, it means that commit is the same and so update just sees that nothing has changed. Adding commits to master doesn't move tags, unless you manually move them. So dvc acts correctly here. What you are probably looking for is dvc update --rev v.21.0, right? If so, we do have a ticket for it, but we've been kinda slacking on the implementation. 🙁 I'll try to create a quick patch to unblock you. But for now, a workaround is to just re-import with the new tag version. E.g.

dvc import --rev v.21.0 https://github.com/iterative/aita_dataset aita_clean.csv

@efiop
Copy link
Member

efiop commented Feb 15, 2020

@andronovhopf For the record: #2849 . Escalated the priority there. Also here is some additional info on the very similar case #2696.

@elleobrien
Copy link
Author

elleobrien commented Feb 15, 2020 via email

@efiop
Copy link
Member

efiop commented Feb 15, 2020

In particular, what I was hoping was for dvc update to whatever the
latest commit on the branch was- even if that commit doesn't have a tag,
since dvc update --rev v.20.1 might not be the most recent commit any
longer. Does that make sense?

I do understand what you mean, but technically tag might belong to multiple branches, so we don't really know which branch to follow. I suppose you were trying to emulate this type of workflow:

  1. work on aita_dataset (say we are on master and on commit 1111)
  2. import aita_clean.csv from aita_dataset from master
  3. works some more on aita_dataset, so that aita_dataset changes (we are on master, but now on commit 2222)
  4. update aita_clean.csv to 2222

And that is a totally valid case that will work as is right now, but the way you emulate it is incorrect, because you don't move tags, so commit stays the same.

@elleobrien
Copy link
Author

elleobrien commented Feb 15, 2020 via email

@efiop
Copy link
Member

efiop commented Feb 15, 2020

@andronovhopf Sure, feel free to ping us about it 🙂 I'll close this ticket for now in favour of the ones mentioned above. Please feel free to reopen.

@efiop efiop closed this as completed Feb 15, 2020
@efiop
Copy link
Member

efiop commented Feb 15, 2020

@andronovhopf FYI: thanks to outstanding work by @skshetry 🎖️ , we now have dvc update --rev support and will release new version with it very soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :) question I have a question?
Projects
None yet
Development

No branches or pull requests

2 participants