Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support subrepos for get/list/api/import #4465

Merged
merged 8 commits into from
Sep 3, 2020

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Aug 25, 2020

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

This adds support for get/list/api/import/ls on a subrepo.

Also fixes #3180, by adding a granular url support for dvc.api.get_url
And, of course it fixes #3369.

@skshetry skshetry self-assigned this Aug 25, 2020
@skshetry skshetry added this to In progress in DVC 11 - 25 August 2020 via automation Aug 25, 2020
tests/remotes/__init__.py Outdated Show resolved Hide resolved
dvc/dependency/repo.py Outdated Show resolved Hide resolved
dvc/external_repo.py Outdated Show resolved Hide resolved
dvc/tree/repo.py Outdated Show resolved Hide resolved
dvc/dependency/repo.py Outdated Show resolved Hide resolved
@efiop efiop moved this from In progress to Review in progress in DVC 11 - 25 August 2020 Aug 25, 2020
@efiop efiop moved this from Review in progress to In progress in DVC 11 - 25 August 2020 Aug 25, 2020
dvc/external_repo.py Outdated Show resolved Hide resolved
Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change, "configuring" external repo from the start feels much better than using nested with statements which change existing instance.

@efiop
Copy link
Member

efiop commented Aug 27, 2020

Looks good! But I see that a lot of the changes are not actually about subrepos and could be introduced without mentioning subrepos at all (e.g. that get_hash() piece).

@skshetry
Copy link
Member Author

@efiop, subrepo support is already there. This PR is more about enabling it, which is not possible without changing APIs of External(Git)Repo, which in turn affects dependency/repo.py and friends.

@skshetry skshetry marked this pull request as ready for review August 27, 2020 16:41
@skshetry skshetry changed the title [WIP] Refactor external repo, support subrepos Refactor external repo, support subrepos Aug 27, 2020
dvc/api.py Outdated Show resolved Hide resolved
@skshetry

This comment has been minimized.

This adds support for get/list/api/import/ls on a subrepo.

Also fixes iterative#3180, by adding a granular url support for `dvc.api.get_url`
And, of course it fixes iterative#3369.
@skshetry skshetry changed the title Refactor external repo, support subrepos support subrepos for get/list/api/import/ls Sep 2, 2020
metadata = _repo.repo_tree.metadata(path_info)

if not metadata.is_dvc:
raise OutputNotFoundError(path, repo)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into this, we used to throw OutputNotFoundError before as well, so this does not change anything.

Regarding UrlNotDvcRepoError, it's not precise to throw now as a git-repo can have subrepo inside of it.

@@ -28,30 +31,44 @@
logger = logging.getLogger(__name__)


class IsADVCRepoError(DvcException):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thrown when the path is a root of the dvc repo during get/import.

with pytest.raises(InvalidArgumentError) as exc:
Repo().run(**kwargs)
dvc.run(**kwargs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we used to wait for 5s for lock timeout, so this used to work. But, since we reduced the timeout, this has become flakey.

@skshetry skshetry requested a review from pared September 2, 2020 07:58
@skshetry
Copy link
Member Author

skshetry commented Sep 2, 2020

@efiop @pared @pmrowla, this is ready for a review. The only issue right now is with dvc ls as it currently throws error when the path is not found in remote or if it cannot connect to a remote. This is a pre-existing issue (exaggerated by subrepo support), which I'll try fixing in a separate PR.

Also, I have separated tests and implementation commits to make it easier for review.

@skshetry skshetry added this to In progress in DVC 25 August - 8 September 2020 via automation Sep 2, 2020
@skshetry skshetry moved this from In progress to Review in progress in DVC 25 August - 8 September 2020 Sep 2, 2020
@skshetry skshetry added the minor label Sep 2, 2020
Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor things.

tests/unit/test_external_repo.py Outdated Show resolved Hide resolved
tests/unit/test_external_repo.py Outdated Show resolved Hide resolved
Comment on lines 76 to +80
def open(
self, path, mode="r", encoding="utf-8"
self, path, mode="r", encoding=None
): # pylint: disable=arguments-differ
assert mode in {"r", "rb"}
encoding = encoding or "utf-8"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe api.open should use utf-8 by default?

Copy link
Member

@efiop efiop Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared Ideally - no, it should use your default system encoding. This is how things like pathlib's Path do it. I believe, Saugat just changed this for a better practice for optional args and to match BaseTree.open (there are still some discrepancies in tree methods, that I left untouched when adding these trees).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should use default system encoding, I just tried to make api.open() work without changing any behavior here.
Something to fix it later.

* use `spy` instead of `wraps`
* remove dvc fixture in a test
* set `cache_types` on Git repo
@efiop efiop merged commit 4564e8b into iterative:master Sep 3, 2020
DVC 25 August - 8 September 2020 automation moved this from Review in progress to Done Sep 3, 2020
@skshetry skshetry deleted the external_repo branch September 3, 2020 10:47
skshetry added a commit to iterative/dvc.org that referenced this pull request Sep 8, 2020
@skshetry skshetry changed the title support subrepos for get/list/api/import/ls support subrepos for get/list/api/import Sep 8, 2020
shcheklein pushed a commit to iterative/dvc.org that referenced this pull request Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC feature is a feature
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

dvc get/import support for subrepos api: get_url does not support links for granular file
4 participants