-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc. updates #1901
Misc. updates #1901
Conversation
jorgeorpinel
commented
Oct 31, 2020
•
edited
Loading
edited
- Misc. copy edits e.g. per add: document behavior for adding symlinked targets #1890 (review)
- Term reviews around "layers" and "components" (DVC features)
- Fixes to https://dvc.org/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory
- DVC uses Git as the underlying layer for data, pipelines, an experiment | ||
versioning, instead of a custom web application. | ||
- DVC integrates with Git, so it remains the underlying version control tool for | ||
data, pipelines, an experiments, instead of reinventing the wheel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this gives it a quite different meaning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for 2 reasons.
- A correction: DVC doesn't use Git, it integrates with Git.
- "instead of a custom web application"? It's not clear why we're talking about web apps here IMO. I guess in reference to MLFlow's interface? Any other examples? Maybe we should mention them directly like we mention git-lfs for ex.
In any case, "uses Git for versioning instead of a web app" still doesn't make much sense. Are web apps used for versioning? But maybe I'm missing something or not understanding the original intention of this text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I'm simplifying it to:
- DVC integrates with Git as the underlying version control tool for data,
pipelines, an experiments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I was referring to the instead of reinventing the wheel.
part mostly (integrates or uses - not that important, uses sounds friendlier though)
a custom web application
is not the best either, agreed. What it was trying to say that DVC saves info into Git vs some external app or external database or external API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK changed back to "uses" although it's slightly incorrect... But you're right, easier to read/ friendlier. I also updated the other part of the explanation. PTAL
@@ -308,21 +317,10 @@ data/images/ | |||
$ dvc add data/images | |||
``` | |||
|
|||
When running `dvc add` on this directory of images, a `data/images.dvc` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change? it looks more or less good for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this chunk because it was explaining dvc add
/.dvc
files, not the cache structure.
I can put it back if you prefer.
content/docs/start/index.md
Outdated
large files, datasets, machine learning models versioning and efficient | ||
sharing. We'll show how to use a regular Git workflow, without storing large | ||
files with Git. Think "Git for data". | ||
- [**Data versioning**](/doc/start/data-versioning) is the base layer of DVC. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't understand the reason for this change. Not obvious, sounds repetitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main change here is "core part" -> "base layer". I was reviewing the usage of terms around "feature". We used "layers" before but the concept of "components" (Composable principle) is more appropriate (this came from reading Hashicorp's Tao).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was more repetitive before: "Data versioning is the core part of DVC for large file versioning..."
But let me re-work this to fix the language with a minimal change... ⌛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK changed it again. I separated the part about "efficient sharing" because it was making the first sentence too complicated. PTAL
p.s. I think the repetitiveness was inserted recently by a SEO PR now that I think about it... Jeremy wanted to boost the term "ML model versioning". I guess it's too forced to have that in this doc.
Please enter the commit message for your changes. Lines starting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, except some extra doc