Skip to content

split scm from dvc #6836

@skshetry

Description

@skshetry

We want to split scm from dvc into a separate package on its own. Ideally, we'd like to have a proper and stable API. Though I'd prefer we split as-is.

I want to split the scm from dvc, using filter-repo to keep history intact.

Though I still see a few issues that we need to fix:

  • DVC uses logging info for UI purposes, which dvc.scm still has.
    So we may need to get rid of logging.info calls, and make scm work in terms of proper exceptions/returns. (Important)
  • Migrate exceptions as they subclass DvcException and are part of our UI framework.
  • scm uses Tqdm progress bars for clone operation. We need to update it to use callbacks. (Medium priority)
  • Migrating tests and fixtures used. We need to copy it or figure out a way to share fixtures.

As I want to split those from DVC with history intact, we should fix 1 and 2 in DVC itself and then need to split it out.

filter-repo how-to

I used the following snippet to split scm before.

cd "$(mktemp -d)"
git clone git@github.com:iterative/dvc.git .
git filter-repo --path dvc/scm --path dvc/tree/git.py --path dvc/fs/git.py --tag-rename '':'scmrepo-' --path-rename dvc:scmrepo

The following structure is generated:

tree .
.
├── scmrepo
│   ├── fs
│   │   └── git.py
│   └── scm
│       ├── __init__.py
│       ├── base.py
│       └── git
│           ├── __init__.py
│           ├── backend
│           │   ├── __init__.py
│           │   ├── base.py
│           │   ├── dulwich.py
│           │   ├── gitpython.py
│           │   └── pygit2.py
│           ├── objects.py
│           └── stash.py

Metadata

Metadata

Assignees

Labels

gitRelated to git and git backendsp1-importantImportant, aka current backlog of things to dorefactoringFactoring and re-factoring

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions