-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metadata fields: label, type to data #8232
Conversation
@skshetry looks amazing! thank you for taking an extra mile with
Not right now. Let's wait for more feedback.
Good question. It might be safer to roll it out under
It is needed for discovery. Your An additional questions.
For example. User needs a summary stats like: $ dvc add data.csv --desc "User data" --type data --labels user,online \
--custom-type summary=`Rscript -e 'summary (as.numeric (readLines ("stdin")))' < data.csv`
$ cat data.csv.dvc
outs:
- md5: d3b07384d113edec49eaa6238ad5ff00
size: 237894
path: data.csv
desc: User data
type: data
labels:
- user
- online
custom:
summary: "
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 2.25 3.50 3.50 4.75 6.00 "
$ dvc add model.pkl --desc "My model" --type model --labels get-started,dataset-registry \
--custom-type final-accuracy=0.82745
|
10abb31
to
4aa8a28
Compare
I have dropped This should be ready for a review. |
@dmpetrov Let's separate the summary stats elsewhere since it goes into other topics like metrics. Seems like this already supports arbitrary metadata under @skshetry Looks great! A few minor UI notes: I don't think we can have
In other places like Similarly, comma-separated Finally, do the descriptions all need to end in |
No strong opinion here from me. I don't think I know enough yet about this usage to make an informed decision. I'm fine with the current implementation for now. |
Looks like a bug in python: python/cpython#53584, which consumes all positional arguments. Indeed, looks like we have to go with multiple
No strong opinion here, happy to go with
That's how it was in |
Note that we do use |
Hi @skshetry! We're getting back to this with @dberenbaum since we're thinking about removing You implemented this: $ dvc data ls
Path Type Labels Description
data.xml data data-registry,get-started imported code
data/data.xml data data-registry,get-started imported
foo mytype1 model,get-started,dataset-registry foo
scripts/innosetup/dvc.ico - -
scripts/innosetup/dvc_left.bmp - -
scripts/innosetup/dvc_up.bmp - -
$ dvc data ls --type data --labels model,data-registry
Path Type Labels Description
data.xml data data-registry,get-started imported code
data/data.xml data data-registry,get-started imported Why not simply E.g., |
Good point @aguschin. The problem is that |
I should note that this command is hidden/undocumented until we figure out what we need. |
Related: #8214, Closes #8243
This PR:
labels
- a list type, andtype
- a string type to.dvc
schema.--type
flag inadd/import/import-url
.--labels
flag inadd/import/import-url
. You can specify the flag multiple times, and also can specify labels as comma-separated list.$ dvc add model.pkl --labels model,get-started --labels dataset-registry
type
/labels
are preserved on rewrites/overwrites.Example
.dvc
file