-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RepoTree: work with dirty repos #4005
Conversation
The failing test might be related to #3991 CC @pmrowla https://travis-ci.com/github/iterative/dvc/jobs/346909743#L8052 |
dvc/remote/base.py
Outdated
if not ( | ||
tree.isdvc(path_info, strict=False) and tree.fetch | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what has been causing that test failure. We rely on a side-effect here, that might not work if the tree is dirty. So now we try to open first, which might download the cache, and then check if we indeed have the cache now and if not - try to save it to cache. CC @pmrowla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks kind of strange now, since we're calling changed_cache
twice, before and after tree.open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that relying on this side effect the way it was originally written is correct, since we should only be calling open
specifically when we need to download the cache file (which was checked when we call changed_cache
before calling open
)
With your changes to DvcTree/RepoTree
, but without this change to _save_file
, tests/func/test_update.py::test_update_import_after_remote_updates_to_dvc
fails, but I think that is a bug in the test.
If I make this change in that test:
with erepo_dir.branch("branch", new=False), erepo_dir.chdir():
erepo_dir.scm.repo.index.remove(["version"])
erepo_dir.dvc_gen("version", "updated")
- erepo_dir.scm.add(["version", "version.dvc"])
+ erepo_dir.scm.add(["version.dvc"])
erepo_dir.scm.commit("upgrade to DVC tracking")
new_rev = erepo_dir.scm.get_rev()
it passes with the original _save_file
implementation (but with your changes to prefer dirty version in DvcTree).
I think the issue is that scm.add()
calls git.index.add(<path>)
directly, which is not the same as git add <path>
. It looks like git.index.add
doesn't respect gitignores, so the original test actually makes both version
and version.dvc
into git-tracked files.
In that test, when we open("version")
to copy/fetch it into cache, repo.tree.exists("version")
should be false, since it's a DVC out from our clean erepo clone tree, which does not have any dirty/modified copy of version
at all. version.dvc
should exist in the git tree, but not .gitignored DVC out file version
.
I think the issue w/scm.add()
not respecting .gitignore can be verified with
def test_gittree_add_ignore(tmp_dir, scm):
tmp_dir.gen({"foo": "foo", ".gitignore": "foo"})
with tmp_dir.chdir():
scm.add(["foo", ".gitignore"])
scm.commit("init")
tree = scm.get_tree("HEAD")
assert not tree.exists(tmp_dir / "foo")
(test will fail in master)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great explanation! You are absolutely right, we don't have to touch that line for that particular test right now.
The double if changed_cache
is just our way of verifying the side-effect to tell if the cache was actually downloaded and there is nothing we need to do here.
Also, it seems like wiith the working tree we don't really go through that branch of execution (because of the if tree
), so my change might not even affect it. So that also might be something that is just not abstracted yet.
I'll remove that change in favor of fixing the test itself, and will keep it in mind for the future.
Thanks!
56378fc
to
7569772
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments on remote/base.py
Currently `RepoTree.open` will throw FileNotFoundError if you have a file that is tracked by dvc, but which has no md5 or a cache for which is missing, even though you have the file itself in your workspace. This corresponds very nicely to the `--no-commit` or `--no-exec` scenario, where you might not want to commit your artifacts to cache just yet. Fixes iterative#3974
@@ -52,9 +52,9 @@ def test_update_import_after_remote_updates_to_dvc(tmp_dir, dvc, erepo_dir): | |||
new_rev = None | |||
with erepo_dir.branch("branch", new=False), erepo_dir.chdir(): | |||
erepo_dir.scm.repo.index.remove(["version"]) | |||
erepo_dir.dvc_gen("version", "updated") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #4015
Reverts an issue caused by iterative#4005
Reverts an issue caused by #4005
Currently
RepoTree.open
will throw FileNotFoundError if you have afile that is tracked by dvc, but which has no md5 or a cache for which
is missing, even though you have the file itself in your workspace.
This corresponds very nicely to the
--no-commit
or--no-exec
scenario, where you might not want to commit your artifacts to cache
just yet.
Fixes #3974
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.
❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. 🙏