Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc import: misleading warnings and outputs #2839

Closed
3 of 4 tasks
dmpetrov opened this issue Nov 23, 2019 · 3 comments
Closed
3 of 4 tasks

dvc import: misleading warnings and outputs #2839

dmpetrov opened this issue Nov 23, 2019 · 3 comments
Labels
p2-medium Medium priority, should be done, but less important research ui user interface / interaction

Comments

@dmpetrov
Copy link
Member

dmpetrov commented Nov 23, 2019

$ dvc -V
0.70.0+3de46d

One data file was updated in the imported repo. However, it was not a file that I'm trying to update.

$ dvc update data/data.xml.dvc
WARNING: DVC-file 'data/data.xml.dvc' changed.
WARNING: Stage 'data/data.xml.dvc' changed.
Importing 'data.xml (https://github.com/dmpetrov/dataset)' -> 'data/data.xml'
Output 'data/data.xml' didn't change. Skipping saving.
  • UI issues 1: It should not be imported since there are no changes. Output: Importing 'data.xml (https://github.com/dmpetrov/dataset)' -> 'data/data.xml'
  • UI issues 2: Warning duplication DVC-file 'data/data.xml.dvc' changed. and Stage 'data/data.xml.dvc' changed.

PS: Updated file was dir1/file2 which is corresponding to stage file dir1.dvc

EDITED:

  • Issue 3 (not UI): It turned out that data/data.xml.dvc was actually updated after the update command - the repo checksum was changed (while checksum was not updated in the changed file dir1.dvc) with dvc-file checksum. Is it expected?

  • Issue 4 (not UI): When I update a stage that was actually changed I still see this output ... didn't change message while the outputs were changed and even cache was updated. This happens even when I import a single artifact from a repo (I've check in another project). Code:

$ dvc update dir1.dvc
WARNING: DVC-file 'dir1.dvc' changed.
WARNING: Stage 'dir1.dvc' changed.
Importing 'dir1 (https://github.com/dmpetrov/dataset)' -> 'dir1'
Output 'dir1' didn't change. Skipping saving.
@dmpetrov dmpetrov added the ui user interface / interaction label Nov 23, 2019
@dmpetrov dmpetrov changed the title dvc import without target file changes: misleading warnings and outputs dvc import: misleading warnings and outputs Nov 23, 2019
@pared
Copy link
Contributor

pared commented Nov 25, 2019

Reproduction script:

#!/bin/bash

rm -rf repo remote_repo storage git_repo
mkdir repo remote_repo storage git_repo

maindir=$(pwd)
pushd remote_repo
git init >> /dev/null && dvc init -q

set -x
set -e
echo data >> data
echo data2 >> data2
dvc remote add -d str $maindir/storage
dvc add -q data data2
git add .dvc/config data.dvc .gitignore data2.dvc
git commit -m "add data"
dvc push -q

popd
pushd repo

git init >> /dev/null && dvc init -q
dvc import ../remote_repo data

popd
pushd remote_repo

#edit data2 only
echo data_new >> data2
dvc add data2
git add data2.dvc
git commit -m "modify data2"

dvc push

popd 
pushd repo
#data supposedly changed
dvc update data.dvc

@efiop efiop added the p1-important Important, aka current backlog of things to do label Nov 26, 2019
@pared pared added the research label Nov 26, 2019
@dberenbaum
Copy link
Collaborator

This is much improved since the issue was opened.

The output of the script @pared provided in the old version was:

WARNING: DVC-file 'data.dvc' changed.
WARNING: Stage 'data.dvc' changed.
Importing 'data (../remote_repo)' -> 'data'
Output 'data' didn't change. Skipping saving.

Currently, it is:

Importing 'data (../remote_repo)' -> 'data'

This looks like everything other than issue 3 above has been addressed. Not sure what expected behavior is in this case where the repo has been updated but without changes to the specified file. @pared @efiop Any thoughts on this?

@dberenbaum dberenbaum added p2-medium Medium priority, should be done, but less important and removed p1-important Important, aka current backlog of things to do labels Feb 18, 2022
@efiop
Copy link
Contributor

efiop commented Dec 8, 2023

Seems like point 3 is expected as that's the level of granularity we are dealing with in dvc update. Closing.

@efiop efiop closed this as completed Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2-medium Medium priority, should be done, but less important research ui user interface / interaction
Projects
None yet
Development

No branches or pull requests

4 participants