- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.2k
LocalRemoteTree: use repo tree as work_tree with local outputs #4125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
        
          
                tests/func/test_ignore.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider making test for LocalRemoteTree only, but I guess, in the end, we want to be sure that one can actually ignore something in added directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the real underlying issue in #4110 is that we still use "remotes" in DVC outputs rather than purely using trees.
This causes problems for LocalOutput, since we treat regular local DVC output paths the same way as a local external dependency (we handle them both using LocalRemoteTree) even though regular output paths should be repo tree paths rather than remote paths.
So for a regular dvc added dir output, right now when we run
    def save_info(self):
        return self.remote.save_info(self.path_info)we eventually walk the LocalRemoteTree to determine what files should go in dir cache and generate our dir hash. This is what causes the bug w/not using DVC ignore for that directory, and why wrapping the remote work tree w/cleantree fixes the bug.
I think what we should really be doing is treating regular outputs separately from local external dependencies.
We do want to use CleanTree for regular outputs, since we are dealing with actual DVC repo paths (I'm not sure whether that means we use a RepoTree or wrapping LocalRemoteTree work tree w/CleanTree for regular outputs).
But for other types of local "remote" paths (including external dependencies) we should not need to use CleanTree.
        
          
                dvc/ignore.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works as a temporary solution, but it still seems unintuitive.
It seems to me that dvcignore should only ever apply to a DVC repo, and by definition anything outside the repo root directory should not exist in the clean tree.
        
          
                dvc/remote/local.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before, this works as part of a temporary fix but I'm not sure it makes sense as a real solution.
For local remotes, and DVC local cache we should need to wrap the work tree with CleanTree. With a local remote, it should only contain files which have been pushed to it, and we should not care about filtering/ignoring anything inside the remote. Likewise, for cache, we should not care about filtering anything inside .dvc/cache.
The main point of this tree refactor was that local remotes and local cache should only be dealing with paths inside the actual remote or cache. We run into issues when we start mixing remote/cache/repo tree paths and treating them all as just "local filesystem paths".
| These changes do fix the user issue, and CleanTree used to allow paths outside the repo root before the remote tree refactoring, so I'd be ok with merging this as a bug fix for now. But long term I think we should keep the other things I mentioned in mind. | 
e311fa1    to
    8000d80      
    Compare
  
    8000d80    to
    ce61e28      
    Compare
  
    | @pmrowla I tried to make it less hackish behavior, and determine what tree should be used on  I do believe that current behavior is still hackish. 
 | 
        
          
                dvc/output/local.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this remote setting in __init__?  Just feels a bit weird introducing cached property & _remote mix in both here and base class for this hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but in that case, we will set remote inside parent constructor and override it here. Thats what I wanted to avoid creating cached_property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared but we can set it after super(), no? Or pass the new instance to the constructor.
| Had a similar issue with config and it really hurts π Thinking that maybe we should tackle #4050 right away and get rid of that is_working_tree mess once and for all. Otherwise, we might be introducing much more hidden issues than we are solving (though in this case this is clearly very serious). Let's discuss this during planning today, maybe someone will be able to look into this part of trees deeply ASAP. If not, we'll have to merge the hack, indeed. π | 
ce61e28    to
    aaea4e2      
    Compare
  
    aaea4e2    to
    99ad242      
    Compare
  
    
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
β I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. π
EDIT:
Fixes #4110
Fixes #4197