Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is Intake’s dataset versioning story? #382

Open
talebzeghmi opened this issue Jul 12, 2019 · 1 comment
Open

What is Intake’s dataset versioning story? #382

talebzeghmi opened this issue Jul 12, 2019 · 1 comment

Comments

@talebzeghmi
Copy link
Contributor

  • Ability to get latest version of a dataset.
  • Ability to search for all versions of a dataset.
  • Story on major and minor versioning of datasets.
  • Is it semantic versioning?
  • Should there be a naming scheme if the version is in the name?

goal is to capture Gitter conversation. thank you!

@martindurant
Copy link
Member

Copying my comments:

The simplest answers are:

  • use the metadata of a catalogue or data source to specify a given definition’s version
  • or distribute data via conda packages, which have their own internal versioning system (and dependency resolution)

Also, Intake aims to provide datafile versioning via the cache and persist mechanisms (you get time snapshots in the local copies of remote files/resources), but this has not been implemented or even well planned yet. It could use help!

Intake also can generally interact with any data service provider (e.g., git or S3 for very simple examples, DAT and others for more complex situation), which have versioning of their own, and present the versions to the user for selection. Making such options available explicitly as “versions” rather than generic arguments passed to a driver is also a future project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants