Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RepoTree: add support for subrepo traversal #4381

Merged
merged 1 commit into from
Aug 15, 2020

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Aug 12, 2020

Right now, no actual "functionality" is surfaced as
subrepos is always set to False by default for now.

Subrepos are collected dynamically and trie is created to track those repos inside RepoTree, and this does not incur any cost
as .dvcignore is shared among subrepos and RepoTree just utilizes them.

Possible alternative design

  1. Create "MotherOfTree" that will be prefix-based and can switch between those just based on prefix. So, some separate tree class that works on top of RepoTree or GitTree for submodules.
    cons: Hard to traverse, especially heirarchial traversal required for dvc repos, etc.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Right now, no actual "functionality" is surfaced as
`subrepos` is always set to False by default for now.
@skshetry skshetry self-assigned this Aug 12, 2020
@skshetry skshetry added this to In progress in DVC 11 - 25 August 2020 via automation Aug 12, 2020
@skshetry skshetry moved this from In progress to Review in progress in DVC 11 - 25 August 2020 Aug 12, 2020
@skshetry
Copy link
Member Author

@efiop, I cannot split this into smaller PR, though more than half of it is tests.

tests/unit/repo/test_repo_tree.py Show resolved Hide resolved
tests/unit/repo/test_repo_tree.py Show resolved Hide resolved
dvc/repo/tree.py Show resolved Hide resolved
Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design-wise I think this looks a lot better than the first PR, but the some of the function naming was a bit confusing at first.

It wasn't entirely clear to me that _get_repo() and _get_tree_pairs() were returning Repo objs and tree pairs for subrepos (when the specified paths are inside a subrepo) until I read through everything a few times. Even one-liner comments for those methods explaining things would probably work.

Copy link
Member

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Peter, looks much better than the previous one. Though it is clear that some concepts are still under development, so let's treat this PR as an evolutionary and cleanup more on top.

@efiop efiop merged commit 0b1d4f1 into iterative:master Aug 15, 2020
DVC 11 - 25 August 2020 automation moved this from Review in progress to Done Aug 15, 2020
@efiop efiop mentioned this pull request Aug 15, 2020
2 tasks
@skshetry skshetry deleted the repo-tree branch August 15, 2020 14:33
@skshetry
Copy link
Member Author

Tests are failing due to #4229, as now we create a ".dvcignore" on every dvc init. So these 3 tests should add ".dvcignore" in it's expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants