Skip to content

import/get/api: external repo and cached_clone are problematic #3124

@Suor

Description

@Suor

There are several issues with external_repo(), cached_clone() and code using them - get, import and api stuff:

  • there are 3 parallel code paths doing very close things, we add the same features for them, fix the same bugs, essentially do triple work
  • external_repo() and cached_clone() provide two entry points to conceptually the same thing, calling code need to make dup calls, i.e. cached_clone() and then external_repo() if that happens to be a dvc repo.
  • _external_repo() and cached_clone() share cache, which leads to _external_repo() not making proper initialization under some circumstances
  • tests are insufficent:
    • we don't properly test urls, these are either dirs or rarely file://,
    • we don't test caching behavior
  • cached_clone() should really be a context manager or be integrated back into external_repo()
  • some bugs are tricky to fix under all these circumstances, i.e. summon api: caching during interactive sessions #3062

So we need a common abstraction extending external_repo() to work with non-dvc repos/branches, which might be reused across get, import and api. We then need to refactor the latter to use that new abstraction.

Metadata

Metadata

Assignees

Labels

p1-importantImportant, aka current backlog of things to dorefactoringFactoring and re-factoring

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions