Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use-cases: new case study based on Versioning tutorial dataset #674

Closed
jorgeorpinel opened this issue Oct 4, 2019 · 1 comment · Fixed by #679
Closed

use-cases: new case study based on Versioning tutorial dataset #674

jorgeorpinel opened this issue Oct 4, 2019 · 1 comment · Fixed by #679
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: cases Content of /doc/use-cases p1-important Active priorities to deal within next sprints

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Oct 4, 2019

In https://dvc.org/doc/tutorials/versioning

Then we'll create a use case based on this dataset, following the comments below (from Ivan):

for the “Datasets registry”/“Data registry” (high level usage scenario) we can use cats-and-dogs as an example - before S3 within some random bucket, and evolution as a new zip with new images

replace ZIP archive with actual directory with images with two revisions of those so that you can dvc import it, then use dvc update to get the latest version (2x images)

and in the use case just mention - that see - this how ugly it looks like when datasets come and evolve out of DVC control, the better way can be to organize datasets registry - which can enable reusability, tracking of changes in Git-like fashion, versioning, etc

in this case ZIP is not actually needed + it means you are duplicating 1000K images

@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (gatsby-theme-iterative) p1-important Active priorities to deal within next sprints use-cases labels Oct 4, 2019
@jorgeorpinel jorgeorpinel changed the title use-cases: new case study based on Versioning tutorial datasets use-cases: new case study based on Versioning tutorial dataset Oct 6, 2019
jorgeorpinel added a commit that referenced this issue Oct 7, 2019
…ts...

in preparation for new data registry use case (#674)
@jorgeorpinel
Copy link
Contributor Author

From @shcheklein:

Maybe as part of this case study we can show an evolution briefly - how it was wget first (ad-hoc path in S3, not tracking, no guarantees that data stays the same, etc), then this dvc get to introduce tracking (see who changed what) + now it’s guaranteed that data is not changed. And now dvc import + directories - to properly use directories (easier to manage, potential benefits storage-wise), to keep the connection.

(In the context of possibly going back to using wget directly from S3 as opposed to dvc get ZIP data files in the versioning tutorial [and get started].)

@dashohoxha dashohoxha mentioned this issue Oct 25, 2019
10 tasks
@iesahin iesahin added the C: cases Content of /doc/use-cases label Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: cases Content of /doc/use-cases p1-important Active priorities to deal within next sprints
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants