What is Intake’s dataset versioning story? #382

talebzeghmi · 2019-07-12T18:56:53Z

Ability to get latest version of a dataset.
Ability to search for all versions of a dataset.
Story on major and minor versioning of datasets.
Is it semantic versioning?
Should there be a naming scheme if the version is in the name?

goal is to capture Gitter conversation. thank you!

martindurant · 2019-07-12T18:58:44Z

Copying my comments:

The simplest answers are:

use the metadata of a catalogue or data source to specify a given definition’s version

or distribute data via conda packages, which have their own internal versioning system (and dependency resolution)

Also, Intake aims to provide datafile versioning via the cache and persist mechanisms (you get time snapshots in the local copies of remote files/resources), but this has not been implemented or even well planned yet. It could use help!

Intake also can generally interact with any data service provider (e.g., git or S3 for very simple examples, DAT and others for more complex situation), which have versioning of their own, and present the versions to the user for selection. Making such options available explicitly as “versions” rather than generic arguments passed to a driver is also a future project.

martindurant mentioned this issue Jul 16, 2019

Intake integration ranaroussi/pystore#18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is Intake’s dataset versioning story? #382

What is Intake’s dataset versioning story? #382

talebzeghmi commented Jul 12, 2019

martindurant commented Jul 12, 2019

What is Intake’s dataset versioning story? #382

What is Intake’s dataset versioning story? #382

Comments

talebzeghmi commented Jul 12, 2019

martindurant commented Jul 12, 2019