-
Notifications
You must be signed in to change notification settings - Fork 1.2k
repo: move dvcignore from repo to tree #2974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b238c05 to
90e39c6
Compare
|
Note to self: what about resetting dvcignore of CleanTree |
dvc/repo/add.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use repo.tree.walk?
dvc/repo/brancher.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess need to wrap tree from down below too, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiop Do you mean the one obtained with get_tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared Yes π Will fix https://github.com/iterative/dvc/pull/2974/files#r360827242 too
dvc/scm/tree.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better keep it in the dvcignore.py? Not a strong opinion, just wondering, since it is more of a wrapper than a thing that is actually related to scm.
dvc/scm/tree.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still using dvc_walk with dvcignore, which you probably shouldn't since you have CleanTree now π Also, maybe we could get rid of dvc_walk at all, since we now have CleanTree?
dvc/state.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could pass it a tree instead? Seems to be natural. Maybe I'm missing something though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The principles of CleanTree approach are:
- all the code outside dvcignore module should know only about
CleanTree, i.e. should not know about filter, etc. The only way to use it is to wrap a tree with.walk()with a CleanTree, then use its public methods. .walk()and.walk_files()should never acceptdvcignore- there should be no
walk_files()nordvc_walk()functions, only methods
Also the question is when we should or should not filter by with .dvcignore files. Now we always do that for local remotes, which might not make sense at all, be a useful feature or lead to a surprsing behavior, say we have .dvcignore:
*.backup
work-data # Just excluding some dir in a repo root It excludes *.backup both within a repo and in external deps, which is a useful feature. It also excludes work-data in repo root, repo nested dirs (not intended) and external deps, like a network share (a surprising behavior).
Also consider this dir structure:
/some-dir
/repo
.dvcignore
/repo-data
/ignored-data
/external-dep
...
Here .dvcignore entries will affect collecting files in /external-dep, which is in an outer scope to the file.
Some remarks below illustrate the principles at the start.
dvc/scm/tree.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about scm. This is about ignore, should be able to wrap WorkingTree, GitTree and Remote as long as they provide those methods.
This one should also provide .walk_files().
dvc/scm/tree.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrapped tree should know nothing about dvcignores.
dvc/remote/local.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a bug here, this function is still broken by design, i.e. it has two arguments that need to be consistent, i.e. dvcignore tree might be of some branch, while we list working dir.
Other issue with this is that dvcignores should not work outside repo (or should they?), but they are used here. We should carefully distinguish these two cases, now they look mixed.
dvc/remote/local.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, this works by a coincidence.
dvc/scm/git/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scm/git should know nothing about clean trees.
dvc/scm/git/tree.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No **kwargs. You are leaving a possibility of still ignoring passed dvcignore.
dvc/state.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should accept a tree, not dvcignore.
Did you prepare some reproduction script? I prepared a little test and both statements seem not to be valid.: If you analyze Also, as to ignoring external dependencies: AFAIK we do not support ignoring them. (some context is here: #2161) |
|
@pared you are right external deps/outs are not affected, we only get surprising behavior by ignoring nested, but this is what git also does. Checked current behavior with a couple of tests #3003 The principles still stand. We still need BTW, in the PR you've linked here I've stated this design vulnerability issue :) #2161 (review) |
|
@Suor sure, I do not oppose removing dvc_walk and walk_files, just wanted to make sure we are on the same page. |
1053606 to
81ebd85
Compare
dvc/utils/fs.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But only to WorkingTree, right? π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared Please remove this FIXME, since you've created an issue already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just need to remove that "FIXME".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job. Some cleanup and protections below.
dvc/remote/local.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to assert that tree is a working tree or a clean working tree. We want to be sure that we never walk the git tree here.
dvc/ignore.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check that path is not dvcignored to be entirely safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we should check if its ignored, a path can be both file and be ignored at the same time. What is the reasoning for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if file is ignored then it should not be visible, i.e. tree should pretend it doesn't exist. This is not an issue for now because we only open files that we walk, probably.
I see at least one issue though dvc pipeline show ignored.dvc will actually instantiate this ignored stage, and we will get a puzzling KeyError later.
We may postpone this though because this might be solved in various ways.
dvc/remote/local.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same assert here.
dvc/scm/git/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCM code should not know anything about clean trees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If diff will not be dvcignore-aware, we will get different tree for a branch when using diff, and when traversing it using dvcignore-wrapped GitTree, I don't see a way to make GitTree completely unaware of dvcignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, we can wrap this into CleanTree later in a repo method.
dvc/utils/fs.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to assert that tree is a working tree/clean working tree. Passing git tree here will break the abstraction - we walk git, but os.stat() working tree.
tests/func/test_ignore.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in is a very non reliable test, please use == instead. Why can't you just:
assert _files_set(".", tree) == {"./bar"}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix a couple of asserts.
dvc/remote/local.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update this assert the same way as in walk_files().
dvc/utils/fs.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update this assert the same way as in walk_files().
β Have you followed the guidelines in the Contributing to DVC list?
π Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
β Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.
Thank you for the contribution - we'll try to review it as soon as possible. π
Fixes #2914