Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Remote repositories for versioning #26

Closed
wants to merge 133 commits into from
Closed

Conversation

zhiltsov-max
Copy link
Contributor

@zhiltsov-max zhiltsov-max commented Sep 25, 2020

Summary

Related to #130, #131

Key changes:

  • Added integration with Git and DVC for versioning.
  • Added support for remote repository for dataset configuration.
  • Added support of remote data sources for project datasets (with DVC. Available HTTP, s3, Git, DVC).
  • A project now consists of a number of data sources - local or remote ones.
  • Removed support of project's own datasets.

CLI changes:

  • Modifying operations on project data (transform, filter, export) are being recorded now. They can be reproduced after with build command.
  • Updated config file structure. Old projects can be read, but they will be saved with a new version.
  • Updated installation: to install Datumaro with Git and DVC support, add [VCS] suffix: pip install <url>[VCS]
  • Added a number of versioning commands in CLI: tag, pull, push, checkout, commit
  • Added remote CLI context to interact with data remotes of a project
  • Added repo CLI context to interact with bound Git repositories of a project

Library changes:

  • Project class has been significantly changed, however, most of the code should work with minimal, or no changes.
  • Projects without binding to a local disk are considered detached. In this mode a Project can only interact with locally available data (no remotes) - mostly, exactly the way it was working prior changes. No versioning capabilities is available in this mode.

How to test

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT

@zhiltsov-max
Copy link
Contributor Author

@nmanovic, implemented:

datum create

# addition (url format here: https://dvc.org/doc/command-reference/import-url)
# with auto remotes:
datum add path/ -f image_dir
datum add path/to.json -f coco_instances

# with manual remotes:
datum remote add s3://net.loc -n r1
datum source add remote://r1/path/to.xml  -f cvat

datum filter (not checked)/transform # copying variant
datum export
datum build

datum commit

datum source *
datum remote *

@nmanovic
Copy link

@zhiltsov-max , should we close the PR?

@zhiltsov-max
Copy link
Contributor Author

zhiltsov-max commented Jun 25, 2021

It will be continued after the first one as "remote sources support".

@leeyh20
Copy link

leeyh20 commented Sep 17, 2021

What happened to this PR? Versioning sounds like a very good idea for dataset management

@zhiltsov-max
Copy link
Contributor Author

@leeyh20, it is split into 2 parts - this one with remotes and #238 with local commands.

@zhiltsov-max zhiltsov-max changed the title [WIP] Versioning [WIP] Remote repositories for versioning Oct 12, 2021
@JaviFuentes94
Copy link

Any update on this?

@zhiltsov-max
Copy link
Contributor Author

@JaviFuentes94, not yet - currently, we have no resources for this task. We are welcome for ideas and suggestions on this functionality, though. Could you describe your use cases?

@wonjuleee wonjuleee closed this Nov 29, 2022
@wonjuleee wonjuleee deleted the zm/versioning branch November 29, 2022 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants